git.kernel.dk Git - linux-2.6-block.git/log

drm/xe/display: fix i915_gem_object_is_shmem() wrapper

shmem ensures the memory is cleared on allocation, however here we are
using TTM, which doesn't natively support shmem (other than for swap),
but instead just allocates normal system memory. And we only zero such
memory for userspace allocations. In the case of intel_fbdev we are
missing the memset_io() since display path incorrectly thinks object is
shmem based.

Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240205153110.38340-2-matthew.auld@intel.com

drm/xe/irq: allocate all possible msix interrupts

If platform supports MSIX, driver needs to allocate all possible
interrupts.

v2:
 - drop msix_cap and use the api return code instead.
 - fix commit message.

v3:
 - pass specific type in irq flags.

Cc: Ohad Sharabi <osharabi@habana.ai>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124075058.2302235-1-dliberman@habana.ai

drm/xe/vm: Avoid reserving zero fences

The function xe_vm_prepare_vma was blindly accepting zero as the
number of fences and forwarded that to drm_exec_prepare_obj.

However, that leads to an out-of-bounds shift in the
dma_resv_reserve_fences() and while one could argue that the
dma_resv code should be robust against that, avoid attempting
to reserve zero fences.

Relevant stack trace:

[773.183188] ------------[ cut here ]------------
[773.183199] UBSAN: shift-out-of-bounds in ../include/linux/log2.h:57:13
[773.183241] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[773.183254] CPU: 2 PID: 1816 Comm: xe_evict Tainted: G U 6.8.0-rc3-xe #1
[773.183256] Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 2014 10/14/2022
[773.183257] Call Trace:
[773.183258] <TASK>
[773.183260] dump_stack_lvl+0xaf/0xd0
[773.183266] dump_stack+0x10/0x20
[773.183283] ubsan_epilogue+0x9/0x40
[773.183286] __ubsan_handle_shift_out_of_bounds+0x10f/0x170
[773.183293] dma_resv_reserve_fences.cold+0x2b/0x48
[773.183295] ? ww_mutex_lock+0x3c/0x110
[773.183301] drm_exec_prepare_obj+0x45/0x60 [drm_exec]
[773.183313] xe_vm_prepare_vma+0x33/0x70 [xe]
[773.183375] xe_vma_destroy_unlocked+0x55/0xa0 [xe]
[773.183427] xe_vm_close_and_put+0x526/0x940 [xe]

Fixes: 2714d5093620 ("drm/xe: Convert pagefaulting code to use drm_exec")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240208132115.3132-1-thomas.hellstrom@linux.intel.com

drm/xe: Avoid cryptic message when there's no GuC definition

If there's no GuC firmware entry in the table and the user didn't pass
an override path, the error message is very cryptic: xe will simply try
to continue and then fail when submitting the default context:

xe 0000:00:02.0: [drm:xe_pci_probe [xe]] XE_LUNARLAKE 64b0:0001 dgfx:0 gfx:Xe2_LPG (20.04) media:Xe2_LPM (20.00) display:no dma_m_s:46 tc:1 gscfi:0
...
xe: probe of 0000:00:02.0 failed with error -22

Add an explicit error message and bail out:

xe 0000:00:02.0: [drm:xe_pci_probe [xe]] XE_LUNARLAKE 64b0:0001 dgfx:0 gfx:Xe2_LPG (20.04) media:Xe2_LPM (20.00) display:no dma_m_s:46 tc:1 gscfi:0
xe 0000:00:02.0: [drm] *ERROR* No GuC firmware defined for platform
xe 0000:00:02.0: [drm] *ERROR* GuC init failed with -2
...
xe: probe of 0000:00:02.0 failed with error -2

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201224724.551130-3-lucas.demarchi@intel.com

drm/xe: Always allow to override firmware

The current logic for firmware selection is reminiscent from i915 where
there are 2 backends and several platforms support only 1: execlist or
GuC. The xe driver has only the GuC backend and it simply doesn't work
without it. Allow developers to override the firmware path even if there
isn't a firmware entry in the table yet: this allows developers to more
easily test the very first firmware before adding it there.

The justification above is only true for GuC, however those override
paths should really be viewed as developer aid param. Simply make the
same logic for all firmwares and allow the override path to be used
for all of them.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201224724.551130-2-lucas.demarchi@intel.com

drm/xe: Remove TEST_VM_ASYNC_OPS_ERROR

TEST_VM_ASYNC_OPS_ERROR is broken and unused. Remove for now and will
pull back in a later time when it is used, fixed, and properly hidden
behind a Kconfig option. Also fixup the supported flags value.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240206045010.2981051-1-matthew.brost@intel.com

drm/xe/pm: add debug logs for D3cold

add additional debug logs for PME# capability and
presence of ACPI _PR3 resources. This is to identify
the reason why the card is not capable of D3cold.

No functional changes

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240206055917.2629027-1-riana.tauro@intel.com

drm/xe/hwmon: Refactor xe hwmon

Check latest platform first in xe_hwmon_get_reg.
Move PVC HWMON registers to regs/xe_pcode.h.

Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201180600.434822-1-karthik.poosa@intel.com

drm/xe/vm: don't ignore error when in_kthread

If GUP fails and we are in_kthread, we can have pinned = 0 and ret = 0.
If that happens we call sg_alloc_append_table_from_pages() with n_pages
= 0, which is not well behaved and can trigger:

kernel BUG at include/linux/scatterlist.h:115!

depending on if the pages array happens to be zeroed or not. Even if we
don't hit that it crashes later when trying to dma_map the returned
table.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202171435.427630-2-matthew.auld@intel.com

drm/xe: Assume large page size if VMA not yet bound

The calculation to determine max page size of a VMA during a REMAP
operations assumes the VMA has been bound. This assumption is not true
if the VMA is from an eariler operation in an array of binds. If a VMA
has not been bound use the maximum page size which will ensure the
previous / next REMAP operations are not incorrectly skipped.

Fixes: 8f33b4f054fc ("drm/xe: Avoid doing rebinds")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240205231714.2956225-1-matthew.brost@intel.com

drm/xe/query: Use kzalloc for drm_xe_query_engines

Use kzalloc like other routines for better consistency.

v2: Improve the subject(Matt)

Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240131051838.24705-1-nirmoy.das@intel.com

drm/xe/guc: Add support for LNL firmware

First release of GuC firmware for LNL is now available, so start
using it.

v2: Actually use xe directory. Doh! (review feedback from Lucas)

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202200017.2133438-6-John.C.Harrison@Intel.com

drm/xe/guc: Update to GuC firmware 70.19.2

API compatibility version: 1.8.2

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202200017.2133438-5-John.C.Harrison@Intel.com

drm/xe/uc: Include patch version in expectations

Patch level releases can be just as important as major level releases
if they fix a critical bug. So include the patch version in the
expectation check so the user is properly informed if they need to
update.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202200017.2133438-4-John.C.Harrison@Intel.com

drm/xe/display: Fix memleak in display initialization

intel_power_domains_init is called twice in xe_device_probe:

1) intel_power_domains_init()
 xe_display_init_nommio()
 xe_device_probe()

2) intel_power_domains_init()
 intel_display_driver_probe_noirq()
 xe_display_init_noirq()
 xe_device_probe()

It needs remove one to avoid power_domains->power_wells double malloc.

unreferenced object 0xffff88811150ee00 (size 512):
 comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
 hex dump (first 32 bytes):
 10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff ................
 ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................
 backtrace:
 [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
 [<ffffffff812c98b2>] __kmalloc+0x52/0x150
 [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
 [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
 [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
 [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
 [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
 [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
 [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
 [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
 [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
 [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
 [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
 [<ffffffff8192e159>] bus_add_driver+0x119/0x220
 [<ffffffff81930d00>] driver_register+0x60/0x120
 [<ffffffffa05e50a0>] 0xffffffffa05e50a0

The call to intel_power_domains_cleanup() needs to stay where it is for
now. The main issue is that while the init is called by the display
side, shared by i915 and xe, the cleanup is called by a non-shared code
path. Fixing that will be done as a separate commit.

Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Signed-off-by: Xiaoming Wang <xiaoming.wang@intel.com>
[ reword commit message and explain why the fini needs to stay
 where it is ]
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202215658.561298-1-lucas.demarchi@intel.com

drm/xe: Map both mem.kernel_bb_pool and usm.bb_pool

For integrated devices we need to map both mem.kernel_bb_pool and
usm.bb_pool to be able to run batches from both pools.

Fixes: a682b6a42d4d ("drm/xe: Support device page faults on integrated platforms")
Tested-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240202033440.2351862-1-matthew.brost@intel.com

drm/xe: circumvent bogus stringop-overflow warning

gcc-13 warns about an array overflow that it sees but that is
prevented by the "asid % NUM_PF_QUEUE" calculation:

drivers/gpu/drm/xe/xe_gt_pagefault.c: In function 'xe_guc_pagefault_handler':
include/linux/fortify-string.h:57:33: error: writing 16 bytes into a region of size 0 [-Werror=stringop-overflow=]
include/linux/fortify-string.h:689:26: note: in expansion of macro '__fortify_memcpy_chk'
 689 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \
 | ^~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/xe/xe_gt_pagefault.c:341:17: note: in expansion of macro 'memcpy'
 341 | memcpy(pf_queue->data + pf_queue->tail, msg, len * sizeof(u32));
 | ^~~~~~
drivers/gpu/drm/xe/xe_gt_types.h:102:25: note: at offset [1144, 265324] into destination object 'tile' of size 8

I found that rewriting the assignment using pointer addition rather than the
equivalent array index calculation prevents the warning, so use that instead.

I sent a bug report against gcc for the false positive warning.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113214
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240103114819.2913937-1-arnd@kernel.org

drm/xe: Pick correct userptr VMA to repin on REMAP op failure

A REMAP op is composed of 3 VMA's - unmap, prev map, and next map. When
op_execute fails with -EAGAIN we need to update the local VMA pointer to
the current op state and then repin the VMA if it is a userptr.

Fixes a failure seen in xe_vm.munmap-style-unbind-userptr-one-partial.

Fixes: b06d47be7c83 ("drm/xe: Port Xe to GPUVA")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201004849.2219558-3-matthew.brost@intel.com

drm/xe: Take a reference in xe_exec_queue_last_fence_get()

Take a reference in xe_exec_queue_last_fence_get(). Also fix a reference
counting underflow bug VM bind and unbind.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201004849.2219558-2-matthew.brost@intel.com

drm/xe: Drop rebind argument from xe_pt_prepare_bind

This is unused, drop it.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201184844.2317004-1-matthew.brost@intel.com

drm/xe: Fix loop in vm_bind_ioctl_ops_unwind

The logic for the unwind loop is incorrect resulting in an infinite
loop. Fix to unwind to go from the last operations list to he first.

Fixes: 617eebb9c480 ("drm/xe: Fix array of binds")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240201175532.2303168-1-matthew.brost@intel.com

drm/xe/gsc: Add status check during gsc header readout

Before checking if data is present in the message reply check the
status in header and see if it indicates any error.

--v2
- Use drm_err() instead of drm_dbg_kms() [Daniele]

--v3
- Use &xe->drm in drm_err to make it more cleaner [Daniele]

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124045248.687023-1-suraj.kandpal@intel.com

drm/xe/vm: Subclass userptr vmas

The construct allocating only parts of the vma structure when
the userptr part is not needed is very fragile. A developer could
add additional fields below the userptr part, and the code could
easily attempt to access the userptr part even if its not persent.

So introduce xe_userptr_vma which subclasses struct xe_vma the
proper way, and accordingly modify a couple of interfaces.
This should also help if adding userptr helpers to drm_gpuvm.

v2:
- Fix documentation of to_userptr_vma() (Matthew Brost)
- Fix allocation and freeing of vmas to clearer distinguish
between the types.

Closes: https://lore.kernel.org/intel-xe/0c4cc1a7-f409-4597-b110-81f9e45d1ffe@embeddedor.com/T/#u
Fixes: a4cc60a55fd9 ("drm/xe: Only alloc userptr part of xe_vma for userptrs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240131091628.12318-1-thomas.hellstrom@linux.intel.com

drm/xe: Use LRC prefix rather than CTX prefix in lrc desc defines

The sparc build fails [1] due to CTX_VALID being redefined. Fix this by
using a better naming convention of LRC_VALID as this define is used in
setting bits in the lrc descriptor. To be uniform, change other define
with LRC prefix too.

[1] https://lore.kernel.org/all/20240123111235.3097079-1-geert@linux-m68k.org/

v2:
- s/LEGACY_64B_CONTEXT/LRC_LEGACY_64B_CONTEXT (Lucas)

Fixes: 0bc519d20ffa ("drm/xe: Remove GEN[0-9]*_ prefixes")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123212638.1605626-1-matthew.brost@intel.com

drm/xe: Make all GuC ABI shift values unsigned

All GuC ABI definitions are unsigned and not defining as unsigned is
causing build errors [1].

[1] https://lore.kernel.org/all/20240123111235.3097079-1-geert@linux-m68k.org/

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240131025424.2087936-1-matthew.brost@intel.com

drm/xe: Convert job timeouts from assert to warning

xe_assert() is intended to be used only for "impossible" situations that
should never be hit (and if they are hit it means there's a driver bug
somewhere); assertions are only compiled into debug builds.

Although we expect jobs submitted by the kernel to be well-behaved and
run without error, timeouts are a legitimate possibility for reasons
beyond our control (bad firmware, flaky hardware, etc.). We should use
a real WARN if we encounter these, even for non-debug builds, to ensure
the issue is being properly highlighted in bug reports and such.

Also give the WARNs more human-readable messages and move them below the
general notice-level message that gets printed for any kind of timeout
to make the errors a bit more understandable.

v2:
- Convert the VM / exec_queue_killed assertion as well. (MattB)

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240130200308.1429134-2-matthew.d.roper@intel.com

drm/xe: Don't use __user error pointers

The error pointer macros are not aware of __user pointers and as a
consequence sparse warns.

Have the copy_mask() function return an integer instead of a __user
pointer.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-5-thomas.hellstrom@linux.intel.com

drm/xe: Annotate mcr_[un]lock()

These functions acquire and release the gt::mcr_lock. Annotate
accordingly.
Fix the corresponding sparse warning.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Fixes: fb1d55efdfcb ("drm/xe: Cleanup OPEN_BRACE style issues")
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-4-thomas.hellstrom@linux.intel.com

drm/xe: drop display/ subdir from include directories

There are very few places that need to include anything from under
display/. Require the display/ prefix in #include directives, and drop
the subdirectory from the header search path.

Sort the include lists while at it.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122101428.2683468-2-jani.nikula@intel.com

drm/xe: move xe_display.[ch] under display/

All the other display related files are under display/ subdirectory,
also move xe_display.[ch] there.

Sort the build list while at it.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122101428.2683468-1-jani.nikula@intel.com

drm/xe: Only allow 1 ufence per exec / bind IOCTL

The way exec ufences are coded only 1 ufence per IOCTL will be signaled.
It is possible to fix this but for current use cases 1 ufence per IOCTL
is sufficient. Enforce a limit of 1 ufence per IOCTL (both exec and bind
to be uniform).

v2:
- Add fixes tag (Thomas)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124234413.1640825-1-matthew.brost@intel.com

drm/xe: Add batch buffer addresses to devcoredump

Those addresses are necessary to Mesa tools knows where in VM are the
batch buffers to parse and print instructions that are human readable.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240130135648.30211-2-jose.souza@intel.com

drm/xe: Add functions to convert regular address to canonical address and back

Some instructions requires canonical address like
MI_BATCH_BUFFER_START(UMDs must call xe_exec with a canonical address
for Xe2+).

So here adding functions to convert regular address to canonical
address and back, the first user of this functions will be added
in the next patch.

v3:
- inline removed
- rename highest_address_bit_get() to ppgtt_msb_get()

v4:
- use xe->info.va_bits instead of xe->info.dma_mask_size

BSpec: 47626
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240130135648.30211-1-jose.souza@intel.com

drm/xe: Use function to emit PIPE_CONTROL

This reduces code duplication in xe_ring_ops.

v2:
- fix flags of emit_pipe_imm_ggtt()
- reduce to only one function

v3:
- fix emit_pipe_imm_ggtt() stall_only check

Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240130132249.8615-1-jose.souza@intel.com

drm/xe/guc: Reduce a print from warn to debug

Reduce debug print from warn to debug to avoid unnecessary warning
message in dmesg: the firmware loading logic already has the right
printk priority level when checking the firmware version.

Fixes: c5a06c9169f3 ("drm/xe/guc: Enable WA 14018913170")
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240125165652.3764711-1-karthik.poosa@intel.com
[ slightly reword debug and commit messages ]
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/xe2: Enable has_usm

When xe2 support started to be added, USM was still not functional. This
has changed, and now USM can be enabled for xe2. Remove FIXME leftover
to allow VM to be created with DRM_XE_VM_CREATE_FLAG_FAULT_MODE.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240129214510.123829-1-lucas.demarchi@intel.com

drm/hwmon: Fix abi doc warnings

This fixes warnings in xe, i915 hwmon docs:

Warning: /sys/devices/.../hwmon/hwmon/curr1_crit is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:35 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:52
Warning: /sys/devices/.../hwmon/hwmon/energy1_input is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:54 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:65
Warning: /sys/devices/.../hwmon/hwmon/in0_input is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:46 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:0
Warning: /sys/devices/.../hwmon/hwmon/power1_crit is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:22 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:39
Warning: /sys/devices/.../hwmon/hwmon/power1_max is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:0 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:8
Warning: /sys/devices/.../hwmon/hwmon/power1_max_interval is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:62 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:30
Warning: /sys/devices/.../hwmon/hwmon/power1_rated_max is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:14 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:22

Use a path containing the driver name to differentiate the documentation
of each entry.

Fixes: fb1b70607f73 ("drm/xe/hwmon: Expose power attributes")
Fixes: 92d44a422d0d ("drm/xe/hwmon: Expose card reactive critical power")
Fixes: fbcdc9d3bf58 ("drm/xe/hwmon: Expose input voltage attribute")
Fixes: 71d0a32524f9 ("drm/xe/hwmon: Expose hwmon energy attribute")
Fixes: 4446fcf220ce ("drm/xe/hwmon: Expose power1_max_interval")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/all/20240125113345.291118ff@canb.auug.org.au/
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240127165040.2348009-1-badal.nilawar@intel.com

drm/xe: Grab mem_access when disabling C6 on skip_guc_pc platforms

If skip_guc_pc is set for a platform, C6 is disabled directly without
acquiring a mem_access reference, triggering an assertion inside
xe_gt_idle_disable_c6.

Fixes: 975e4a3795d4 ("drm/xe: Manually setup C6 when skip_guc_pc is set")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240126220613.865939-2-matthew.d.roper@intel.com

drm/xe: correct the assertion for number of PTEs

While one MI_STORE_DATA_IMM can take no more than 0x1fe qwords,
the size of the pgtable can be 512 entries.

Fixes: 43d48379c939 ("drm/xe: correct the calculation of remaining size")
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Tested-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240125065245.1204731-2-fei.yang@intel.com

drm/xe/guc: Flush G2H handler when turning off CTs

Make sure G2H handler is not running when changing the CT state to drop
messages or disabled. This will help prevent races in the code ensuring
that G2H are not being processed after changing the state.

v2:
- s/flush_g2h_handler/stop_g2h_handler (Michal)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo remove the extra line while pushing]
Link: https://patchwork.freedesktop.org/patch/msgid/20240122210156.1517444-4-matthew.brost@intel.com

drm/xe: Move TLB invalidation reset before HW reset

This is a software reset which can be done immediately after stopping
the UC.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122210156.1517444-3-matthew.brost@intel.com

drm/xe/guc: Add more GuC CT states

The Guc CT has more than enabled / disables states rather it has 4. The
4 states are not initialized, disabled, stopped, and enabled. Change the
code to reflect this. These states will enable proper return codes from
functions and therefore enable proper error messages.

v2:
- s/XE_GUC_CT_STATE_DROP_MESSAGES/XE_GUC_CT_STATE_STOPPED (Michal)
- Add assert for CT being initialized (Michal)
- Fix kernel for CT state enum (Michal)

v3:
- Kernel doc (Michal)
- s/reiecved/received (Michal)
- assert CT state not initialized in xe_guc_ct_init (Michal)
- add argument xe_guc_ct_set_state to clear g2h (Michal)

v4:
- Drop clear_outstanding_g2h argument (Michal)

v5:
- Move xa_destroy outside of fast lock (CI)

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122210156.1517444-2-matthew.brost@intel.com

drm/xe: Fix crash in trace_dma_fence_init()

trace_dma_fence_init() uses dma_fence_ops functions
like get_driver_name() and get_timeline_name() to generate trace
information but the Xe KMD implementation of those functions makes
use of xe_hw_fence_ctx that was being set after dma_fence_init().

So here just inverting the order to fix the crash.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124171830.95774-1-jose.souza@intel.com

drm/xe: don't build debugfs files when CONFIG_DEBUG_FS=n

If we unconditionally build the debugfs files, we'll get both the static
inline stubs from the headers and the real functions for
CONFIG_DEBUG_FS=n. Avoid building the debugfs files with that config.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/r/152521f9-119f-4c61-b467-3e91f4aecb1a@infradead.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124090515.3363901-1-jani.nikula@intel.com

drm/xe: Remove additional spaces in devcoredump HW Engines section

I guess the indention was to keep it visually aligned but that
would require a lot of spaces and was not followed by other registers
so lets just drop it.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-7-jose.souza@intel.com

drm/xe: Print registers spread in 2 u32 as u64

This makes easier to use those registers when copying its values to
calculator also makes easier for tools to parse it.

To avoids padding holes in xe_hw_engine_snapshot the u64 variables
were moved to the top of xe_hw_engine_snapshot.reg.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-6-jose.souza@intel.com

drm/xe: Print more device information in devcoredump

To properly decode batch buffer Mesa tools needs to know what
platform is this one, for now we can do that with PCI id but
already making it future proof by also printing GTs GMD version.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-5-jose.souza@intel.com

drm/xe: Stash GMD_ID value in xe_gt

Although we've stored the major and minor versions for graphics/media in
xe_device, it will be simpler to implement the uapi version query if we
also stash the raw register value in the GT itself.

Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-4-jose.souza@intel.com
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>

drm/xe: Nuke xe from xe_devcoredump

xe is never set in xe_devcoredump but if xe_device is needed
devcoredump_to_xe_device() can be used.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-3-jose.souza@intel.com

drm/xe: Change devcoredump functions parameters to xe_sched_job

When devcoredump start to dump the VMs contents it will be necessary
to know the starting addresses of batch buffers of the job that hang.

This information it set in xe_sched_job and xe_sched_job is not easily
acessible from xe_exec_queue, so here changing the parameter, next
patch will append the batch buffer addresses to devcoredump snapshot
capture.

v3:
- update functions documentation to xe_sched_job

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-2-jose.souza@intel.com

drm/xe: Remove double new lines in devcoredump

Right now devcoredump has a new line between '**** GuC CT ****' and
'H2G CTB (all sizes in DW):' while other sections don't have.

v2: remove double new line after IPEHR

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123204454.246788-1-jose.souza@intel.com

drm/xe: Remove PVC from xe_wa kunit tests

Since the PCI IDs for PVC weren't added to the xe driver, the xe_wa
tests should not try to create a fake PVC device since they can't find
the right PCI ID. Fix bugs when running kunit:

# xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111
Expected ret == 0, but
 ret == -19 (0xffffffffffffffed)
[FAILED] PVC (B0)
# xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111
Expected ret == 0, but
 ret == -19 (0xffffffffffffffed)
[FAILED] PVC (B1)
# xe_wa_gt: ASSERTION FAILED at drivers/gpu/drm/xe/tests/xe_wa_test.c:111
Expected ret == 0, but
 ret == -19 (0xffffffffffffffed)
[FAILED] PVC (C0)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123031242.3548724-1-lucas.demarchi@intel.com

drm/xe: Document nested struct members according to guidelines

Document nested struct members with full names as described in
Documentation/doc-guide/kernel-doc.rst.

For this documentation we allow a column width of 100 to make
it more readable.

This fixes warnings similar to:
drivers/gpu/drm/xe/xe_lrc_types.h:45: warning: Excess struct member 'size' description in 'xe_lrc'

v2:
- Only change the documentation, not the member.

v3:
- Fix the commit message wording.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123153147.27305-1-thomas.hellstrom@linux.intel.com

drm/xe/xe2_lpg: Introduce performance guide changes

Add performance guide changes to Xe2_LPG.

BSpec: 72161
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123050552.2250699-2-shekhar.chauhan@intel.com

drm/xe: Fix typo in vram frequency sysfs documentation

Fix function naming and description for xe_vram_freq_sysfs_init
function.

v2: Add fixes tag (Riana)
Fix review comments (Lucas)

Fixes: 4ae3aeab32d7 ("drm/xe: Add vram frequency sysfs attributes")
Signed-off-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117043215.2598677-1-sujaritha.sundaresan@intel.com

Merge drm/drm-next into drm-xe-next

Sync to v6.8-rc1.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

drm/xe/vm: bugfix in xe_vm_create_ioctl

Fix xe_vm_create_ioctl routine not freeing the vm-id allocated to it
when the function fails.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomer Tayar <ttayar@habana.ai>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240122102424.4008095-1-mhaimovski@habana.ai

drm/xe/xe2: Use XE_CACHE_WB pat index

The pat table entry associated with XE_CACHE_WB is coherent whereas
XE_CACHE_NONE is non coherent. Migration expects the coherency
with cpu therefore use the coherent entry XE_CACHE_WB for
buffers not supporting compression. For read/write to flat ccs region
the issue is not related to coherency with cpu. The hardware expects
the pat index associated with GPUVA for indirect access to be
compression enabled hence use XE_CACHE_NONE_COMPRESSION.

v2
- Fix the argument to emit_pte, pass the bool directly. (Thomas)

v3
- Rebase
- Update commit message (Matt)

v4
- Add a Fixes: tag. (Thomas)

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Fixes: 65ef8dbad1db ("drm/xe/xe2: Update emit_pte to use compression enabled PAT index")
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240119041826.1670496-1-himal.prasad.ghimiray@intel.com

Linux 6.8-rc1

Merge tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs

Pull more bcachefs updates from Kent Overstreet:
"Some fixes, Some refactoring, some minor features:

   - Assorted prep work for disk space accounting rewrite

   - BTREE_TRIGGER_ATOMIC: after combining our trigger callbacks, this
     makes our trigger context more explicit

   - A few fixes to avoid excessive transaction restarts on
     multithreaded workloads: fstests (in addition to ktest tests) are
     now checking slowpath counters, and that's shaking out a few bugs

   - Assorted tracepoint improvements

   - Starting to break up bcachefs_format.h and move on disk types so
     they're with the code they belong to; this will make room to start
     documenting the on disk format better.

   - A few minor fixes"

* tag 'bcachefs-2024-01-21' of https://evilpiepirate.org/git/bcachefs: (46 commits)
  bcachefs: Improve inode_to_text()
  bcachefs: logged_ops_format.h
  bcachefs: reflink_format.h
  bcachefs; extents_format.h
  bcachefs: ec_format.h
  bcachefs: subvolume_format.h
  bcachefs: snapshot_format.h
  bcachefs: alloc_background_format.h
  bcachefs: xattr_format.h
  bcachefs: dirent_format.h
  bcachefs: inode_format.h
  bcachefs; quota_format.h
  bcachefs: sb-counters_format.h
  bcachefs: counters.c -> sb-counters.c
  bcachefs: comment bch_subvolume
  bcachefs: bch_snapshot::btime
  bcachefs: add missing __GFP_NOWARN
  bcachefs: opts->compression can now also be applied in the background
  bcachefs: Prep work for variable size btree node buffers
  bcachefs: grab s_umount only if snapshotting
  ...

Merge tag 'timers-core-2024-01-21' of git://git./linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
"Updates for time and clocksources:

   - A fix for the idle and iowait time accounting vs CPU hotplug.

     The time is reset on CPU hotplug which makes the accumulated
     systemwide time jump backwards.

   - Assorted fixes and improvements for clocksource/event drivers"

* tag 'timers-core-2024-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug
  clocksource/drivers/ep93xx: Fix error handling during probe
  clocksource/drivers/cadence-ttc: Fix some kernel-doc warnings
  clocksource/drivers/timer-ti-dm: Fix make W=n kerneldoc warnings
  clocksource/timer-riscv: Add riscv_clock_shutdown callback
  dt-bindings: timer: Add StarFive JH8100 clint
  dt-bindings: timer: thead,c900-aclint-mtimer: separate mtime and mtimecmp regs

Merge tag 'powerpc-6.8-2' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Aneesh Kumar:

- Increase default stack size to 32KB for Book3S

Thanks to Michael Ellerman.

* tag 'powerpc-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Increase default stack size to 32KB

bcachefs: Improve inode_to_text()

Add line breaks - inode_to_text() is now much easier to read.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: logged_ops_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: reflink_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs; extents_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: ec_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: subvolume_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: snapshot_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: alloc_background_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: xattr_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: dirent_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: inode_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs; quota_format.h

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: sb-counters_format.h

bcachefs_format.h has gotten too big; let's do some organizing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: counters.c -> sb-counters.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: comment bch_subvolume

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch_snapshot::btime

Add a field to bch_snapshot for creation time; this will be important
when we start exposing the snapshot tree to userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: add missing __GFP_NOWARN

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: opts->compression can now also be applied in the background

The "apply this compression method in the background" paths now use the
compression option if background_compression is not set; this means that
setting or changing the compression option will cause existing data to
be compressed accordingly in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Prep work for variable size btree node buffers

bcachefs btree nodes are big - typically 256k - and btree roots are
pinned in memory. As we're now up to 18 btrees, we now have significant
memory overhead in mostly empty btree roots.

And in the future we're going to start enforcing that certain btree node
boundaries exist, to solve lock contention issues - analagous to XFS's
AGIs.

Thus, we need to start allocating smaller btree node buffers when we
can. This patch changes code that refers to the filesystem constant
c->opts.btree_node_size to refer to the btree node buffer size -
btree_buf_bytes() - where appropriate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: grab s_umount only if snapshotting

When I was testing mongodb over bcachefs with compression,
there is a lockdep warning when snapshotting mongodb data volume.

$ cat test.sh
prog=bcachefs

$prog subvolume create /mnt/data
$prog subvolume create /mnt/data/snapshots

while true;do
 $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s)
 sleep 1s
done

$ cat /etc/mongodb.conf
systemLog:
 destination: file
 logAppend: true
 path: /mnt/data/mongod.log

storage:
 dbPath: /mnt/data/

lockdep reports:
[ 3437.452330] ======================================================
[ 3437.452750] WARNING: possible circular locking dependency detected
[ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E
[ 3437.453562] ------------------------------------------------------
[ 3437.453981] bcachefs/35533 is trying to acquire lock:
[ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190
[ 3437.454875]
 but task is already holding lock:
[ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.456009]
 which lock already depends on the new lock.

[ 3437.456553]
 the existing dependency chain (in reverse order) is:
[ 3437.457054]
 -> #3 (&type->s_umount_key#48){.+.+}-{3:3}:
[ 3437.457507] down_read+0x3e/0x170
[ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.458206] __x64_sys_ioctl+0x93/0xd0
[ 3437.458498] do_syscall_64+0x42/0xf0
[ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.459155]
 -> #2 (&c->snapshot_create_lock){++++}-{3:3}:
[ 3437.459615] down_read+0x3e/0x170
[ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs]
[ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs]
[ 3437.460686] notify_change+0x1f1/0x4a0
[ 3437.461283] do_truncate+0x7f/0xd0
[ 3437.461555] path_openat+0xa57/0xce0
[ 3437.461836] do_filp_open+0xb4/0x160
[ 3437.462116] do_sys_openat2+0x91/0xc0
[ 3437.462402] __x64_sys_openat+0x53/0xa0
[ 3437.462701] do_syscall_64+0x42/0xf0
[ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.463359]
 -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}:
[ 3437.463843] down_write+0x3b/0xc0
[ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs]
[ 3437.464493] vfs_write+0x21b/0x4c0
[ 3437.464653] ksys_write+0x69/0xf0
[ 3437.464839] do_syscall_64+0x42/0xf0
[ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.465231]
 -> #0 (sb_writers#10){.+.+}-{0:0}:
[ 3437.465471] __lock_acquire+0x1455/0x21b0
[ 3437.465656] lock_acquire+0xc6/0x2b0
[ 3437.465822] mnt_want_write+0x46/0x1a0
[ 3437.465996] filename_create+0x62/0x190
[ 3437.466175] user_path_create+0x2d/0x50
[ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.466617] __x64_sys_ioctl+0x93/0xd0
[ 3437.466791] do_syscall_64+0x42/0xf0
[ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.467180]
 other info that might help us debug this:

[ 3437.469670] 2 locks held by bcachefs/35533:
 other info that might help us debug this:

[ 3437.467507] Chain exists of:
 sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48

[ 3437.467979] Possible unsafe locking scenario:

[ 3437.468223] CPU0 CPU1
[ 3437.468405] ---- ----
[ 3437.468585] rlock(&type->s_umount_key#48);
[ 3437.468758] lock(&c->snapshot_create_lock);
[ 3437.469030] lock(&type->s_umount_key#48);
[ 3437.469291] rlock(sb_writers#10);
[ 3437.469434]
 *** DEADLOCK ***

[ 3437.469670] 2 locks held by bcachefs/35533:
[ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs]
[ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs]
[ 3437.470744]
 stack backtrace:
[ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85
[ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 3437.471694] Call Trace:
[ 3437.471795] <TASK>
[ 3437.471884] dump_stack_lvl+0x57/0x90
[ 3437.472035] check_noncircular+0x132/0x150
[ 3437.472202] __lock_acquire+0x1455/0x21b0
[ 3437.472369] lock_acquire+0xc6/0x2b0
[ 3437.472518] ? filename_create+0x62/0x190
[ 3437.472683] ? lock_is_held_type+0x97/0x110
[ 3437.472856] mnt_want_write+0x46/0x1a0
[ 3437.473025] ? filename_create+0x62/0x190
[ 3437.473204] filename_create+0x62/0x190
[ 3437.473380] user_path_create+0x2d/0x50
[ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs]
[ 3437.473819] ? lock_acquire+0xc6/0x2b0
[ 3437.474002] ? __fget_files+0x2a/0x190
[ 3437.474195] ? __fget_files+0xbc/0x190
[ 3437.474380] ? lock_release+0xc5/0x270
[ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0
[ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs]
[ 3437.475090] __x64_sys_ioctl+0x93/0xd0
[ 3437.475277] do_syscall_64+0x42/0xf0
[ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 3437.475691] RIP: 0033:0x7f2743c313af
======================================================

In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally
and unlock it at the end of the function. There is a comment
"why do we need this lock?" about the lock coming from
commit 42d237320e98 ("bcachefs: Snapshot creation, deletion")
The reason is that __bch2_ioctl_subvolume_create() calls
sync_inodes_sb() which enforce locked s_umount to writeback all dirty
nodes before doing snapshot works.

Fix it by read locking s_umount for snapshotting only and unlocking
s_umount after sync_inodes_sb().

Signed-off-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exit

bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut.
It should be freed by kvfree not kfree.
Or umount will triger:

[ 406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008
[ 406.830676 ] #PF: supervisor read access in kernel mode
[ 406.831643 ] #PF: error_code(0x0000) - not-present page
[ 406.832487 ] PGD 0 P4D 0
[ 406.832898 ] Oops: 0000 [#1] PREEMPT SMP PTI
[ 406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G OE 6.7.0-rc7-custom+ #90
[ 406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[ 406.835796 ] RIP: 0010:kfree+0x62/0x140
[ 406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6
[ 406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286
[ 406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4
[ 406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000
[ 406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001
[ 406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80
[ 406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000
[ 406.840451 ] FS: 00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000
[ 406.840851 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0
[ 406.841464 ] Call Trace:
[ 406.841583 ] <TASK>
[ 406.841682 ] ? __die+0x1f/0x70
[ 406.841828 ] ? page_fault_oops+0x159/0x470
[ 406.842014 ] ? fixup_exception+0x22/0x310
[ 406.842198 ] ? exc_page_fault+0x1ed/0x200
[ 406.842382 ] ? asm_exc_page_fault+0x22/0x30
[ 406.842574 ] ? bch2_fs_release+0x54/0x280 [bcachefs]
[ 406.842842 ] ? kfree+0x62/0x140
[ 406.842988 ] ? kfree+0x104/0x140
[ 406.843138 ] bch2_fs_release+0x54/0x280 [bcachefs]
[ 406.843390 ] kobject_put+0xb7/0x170
[ 406.843552 ] deactivate_locked_super+0x2f/0xa0
[ 406.843756 ] cleanup_mnt+0xba/0x150
[ 406.843917 ] task_work_run+0x59/0xa0
[ 406.844083 ] exit_to_user_mode_prepare+0x197/0x1a0
[ 406.844302 ] syscall_exit_to_user_mode+0x16/0x40
[ 406.844510 ] do_syscall_64+0x4e/0xf0
[ 406.844675 ] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 406.844907 ] RIP: 0033:0x7f0a2664e4fb

Signed-off-by: Su Yue <glass.su@suse.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bios must be 512 byte algined

Fixes: 023f9ac9f70f bcachefs: Delete dio read alignment check
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: remove redundant variable tmp

The variable tmp is being assigned a value but it isn't being
read afterwards. The assignment is redundant and so tmp can be
removed.

Cleans up clang scan build warning:
warning: Although the value stored to 'ret' is used in the enclosing
expression, the value is never actually read from 'ret'
[deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve trace_trans_restart_relock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Fix excess transaction restarts in __bchfs_fallocate()

drop_locks_do() should not be used in a fastpath without first trying
the do in nonblocking mode - the unlock and relock will cause excessive
transaction restarts and potentially livelocking with other threads that
are contending for the same locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: extents_to_bp_state

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bkey_and_val_eq()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Better journal tracepoints

Factor out bch2_journal_bufs_to_text(), and use it in the
journal_entry_full() tracepoint; when we can't get a journal reservation
we need to know the outstanding journal entry sizes to know if the
problem is due to excessive flushing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Print size of superblock with space allocated

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Avoid flushing the journal in the discard path

When issuing discards, we may need to flush the journal if there's too
many buckets that can't be discarded until a journal flush.

But the heuristic was bad; we should be comparing the number of buckets
that need to flushes against the number of free buckets, not the number
of buckets we saw.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Improve move_extent tracepoint

Also print out the data_opts, so that we can see what specifically is
being done to an extent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add missing bch2_moving_ctxt_flush_all()

This fixes a bug with rebalance IOs getting stuck with reads completed,
but writes never being issued.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Re-add move_extent_write tracepoint

It appears this was accidentally deleted at some point - also, do a bit
of cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: bch2_kthread_io_clock_wait() no longer sleeps until full amount

Drop t he loop in bch2_kthread_io_clock_wait(): this allows the code
that uses it to be woken up for other reasons, and fixes a bug where
rebalance wouldn't wake up when a scan was requested.

This raises the possibility of spurious wakeups, but callers should
always be able to handle that reasonably well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Add .val_to_text() for KEY_TYPE_cookie

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bcachefs: Don't pass memcmp() as a pointer

Some (buggy!) compilers have issues with this.

Fixes: https://github.com/koverstreet/bcachefs/issues/625
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

Merge tag 'header_cleanup-2024-01-20' of https://evilpiepirate.org/git/bcachefs

Pull header fix from Kent Overstreet:
"Just one small fixup for the RT build"

* tag 'header_cleanup-2024-01-20' of https://evilpiepirate.org/git/bcachefs:
spinlock: Fix failing build for PREEMPT_RT

bcachefs: Reduce would_deadlock restarts

We don't have to take locks in any particular ordering - we'll make
forward progress just fine - but if we try to stick to an ordering, it
can help to avoid excessive would_deadlock transaction restarts.

This tweaks the reflink path to take extents btree locks in the right
order.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>