linux-block.git
4 months agodrm/xe: Remove uninitialized end var from xe_gt_tlb_invalidation_range()
Nirmoy Das [Mon, 29 Apr 2024 20:30:39 +0000 (22:30 +0200)]
drm/xe: Remove uninitialized end var from xe_gt_tlb_invalidation_range()

This fixes commit c4f18703629d ("drm/xe: Add
xe_gt_tlb_invalidation_range and convert PT layer to use this")
which added the end variable as part of the function param.

v2: Add fixes tag(Matt)

Fixes: c4f18703629d ("drm/xe: Add xe_gt_tlb_invalidation_range and convert PT layer to use this")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240429203039.26918-1-nirmoy.das@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe: Merge 16021540221 and 18034896535 WAs
Lucas De Marchi [Sat, 27 Apr 2024 13:53:39 +0000 (06:53 -0700)]
drm/xe: Merge 16021540221 and 18034896535 WAs

In order to detect duplicate implementations for the same workaround,
early in the implementation of RTP it was decided to error out even if
the values set are exactly the same. With the introduction of 18034896535
in commit 74671d23ca18 ("drm/xe/xe2: Add workaround 18034896535"), LNL
stepping with graphics stepping A1 now gives the following error on
module load:

xe 0000:00:02.0: [drm] *ERROR* GT0: [GT OTHER] \
discarding save-restore reg e48c (clear: 00000200, set: 00000200,\
masked: yes, mcr: yes): ret=-22

RTP may be improved in the future, but for now simply join the entries
like done with e.g. "160729762716070303171607186500".

Fixes: 74671d23ca18 ("drm/xe/xe2: Add workaround 18034896535")
Cc: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240427135339.3485559-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/xe2hpg: Add Wa_14021490052
Shekhar Chauhan [Wed, 24 Apr 2024 03:42:47 +0000 (09:12 +0530)]
drm/xe/xe2hpg: Add Wa_14021490052

Add Wa_14021490052 for Xe2HPG 20.01.

Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424034247.1352755-1-shekhar.chauhan@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe: Delete PT update selftest
Matthew Brost [Thu, 25 Apr 2024 04:55:13 +0000 (21:55 -0700)]
drm/xe: Delete PT update selftest

IGTs (e.g. xe_vm) can provide the exact same coverage as the PT update
selftest. The PT update selftest is dependent on internal functions
which can change thus maintaining this test is costly and provide no
extra coverage. Delete this test.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-14-matthew.brost@intel.com
4 months agodrm/xe: Add xe_gt_tlb_invalidation_range and convert PT layer to use this
Matthew Brost [Thu, 25 Apr 2024 04:55:12 +0000 (21:55 -0700)]
drm/xe: Add xe_gt_tlb_invalidation_range and convert PT layer to use this

xe_gt_tlb_invalidation_range accepts a start and end address rather than
a VMA. This will enable multiple VMAs to be invalidated in a single
invalidation. Update the PT layer to use this new function.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-13-matthew.brost@intel.com
4 months agodrm/xe: Move ufence add to vm_bind_ioctl_ops_fini
Matthew Brost [Thu, 25 Apr 2024 04:55:11 +0000 (21:55 -0700)]
drm/xe: Move ufence add to vm_bind_ioctl_ops_fini

Rather than adding a ufence to a VMA in the bind function, add the
ufence to all VMAs in the IOCTL that require binds in
vm_bind_ioctl_ops_fini. This help withs the transition to job 1 per VM
bind IOCTL.

v2:
 - Rebase
v3:
 - Fix typo in commit (Oak)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-12-matthew.brost@intel.com
4 months agodrm/xe: Move ufence check to op_lock_and_prep
Matthew Brost [Thu, 25 Apr 2024 04:55:10 +0000 (21:55 -0700)]
drm/xe: Move ufence check to op_lock_and_prep

Rather than checking for an unsignaled ufence ay unbind time, check for
this during the op_lock_and_prep function. This helps with the
transition to job 1 per VM bind IOCTL.

v2:
 - Rebase
v3:
 - Fix typo in commit message (Oak)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-11-matthew.brost@intel.com
4 months agodrm/xe: Add vm_bind_ioctl_ops_fini helper
Matthew Brost [Thu, 25 Apr 2024 04:55:09 +0000 (21:55 -0700)]
drm/xe: Add vm_bind_ioctl_ops_fini helper

Simplify VM bind code by signaling out-fences / destroying VMAs in a
single location. Will help with transition single job for many bind ops.

v2:
 - s/vm_bind_ioctl_ops_install_fences/vm_bind_ioctl_ops_fini (Oak)
 - Set last fence in vm_bind_ioctl_ops_fini (Oak)

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-10-matthew.brost@intel.com
4 months agodrm/xe: Add some members to xe_vma_ops
Matthew Brost [Thu, 25 Apr 2024 04:55:08 +0000 (21:55 -0700)]
drm/xe: Add some members to xe_vma_ops

This will help with moving to single jobs for many bind operations.

v2:
 - Rebase

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-9-matthew.brost@intel.com
4 months agodrm/xe: Use xe_vma_ops to implement page fault rebinds
Matthew Brost [Thu, 25 Apr 2024 04:55:07 +0000 (21:55 -0700)]
drm/xe: Use xe_vma_ops to implement page fault rebinds

In effort to make multiple VMA binds operations atomic (1 job), all
device page tables updates will be implemented via a xe_vma_ops (atomic
unit) interface,

Add xe_vma_rebind function which is implemented using xe_vma_ops
interface. Use xe_vma_rebind in GPU page faults for rebinds rather than
directly called deprecated function in PT layer.

v3:
 - Update commit message (Oak)
v4:
 - Fix tile_mask argument (CI)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-8-matthew.brost@intel.com
4 months agodrm/xe: Simplify VM bind IOCTL error handling and cleanup
Matthew Brost [Thu, 25 Apr 2024 04:55:06 +0000 (21:55 -0700)]
drm/xe: Simplify VM bind IOCTL error handling and cleanup

Clean up everything in VM bind IOCTL in 1 path for both errors and
non-errors. Also move VM bind IOCTL cleanup from ops (also used by
non-IOCTL binds) to the VM bind IOCTL.

v2:
 - Break ops_execute on error (Oak)

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-7-matthew.brost@intel.com
4 months agodrm/xe: Use xe_vma_ops to implement xe_vm_rebind
Matthew Brost [Thu, 25 Apr 2024 04:55:05 +0000 (21:55 -0700)]
drm/xe: Use xe_vma_ops to implement xe_vm_rebind

All page tables updates are moving to a xe_vma_ops interface to
implement 1 job per VM bind IOCTL. Convert xe_vm_rebind to use a
xe_vma_ops based interface.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-6-matthew.brost@intel.com
4 months agodrm/xe: Add struct xe_vma_ops abstraction
Matthew Brost [Thu, 25 Apr 2024 04:55:04 +0000 (21:55 -0700)]
drm/xe: Add struct xe_vma_ops abstraction

Having a structure which encapsulates a list of VMA operations will help
enable 1 job for the entire list.

v2:
 - Rebase

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-5-matthew.brost@intel.com
4 months agodrm/xe: Move migrate to prefetch to op_lock_and_prep function
Matthew Brost [Thu, 25 Apr 2024 04:55:03 +0000 (21:55 -0700)]
drm/xe: Move migrate to prefetch to op_lock_and_prep function

All non-binding operations in VM bind IOCTL should be in the lock and
prepare step rather than the execution step. Move prefetch to conform to
this pattern.

v2:
 - Rebase
 - New function names (Oak)
 - Update stale comment (Oak)

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-4-matthew.brost@intel.com
4 months agodrm/xe: Add ops_execute function which returns a fence
Matthew Brost [Thu, 25 Apr 2024 04:55:02 +0000 (21:55 -0700)]
drm/xe: Add ops_execute function which returns a fence

Add ops_execute function which returns a fence. This will be helpful to
initiate all binds (VM bind IOCTL, rebinds in exec IOCTL, rebinds in
preempt rebind worker, and rebinds in pagefaults) via a gpuva ops list.
Returning a fence is needed in various paths.

v2:
 - Rebase

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-3-matthew.brost@intel.com
4 months agodrm/xe: Lock all gpuva ops during VM bind IOCTL
Matthew Brost [Thu, 25 Apr 2024 04:55:01 +0000 (21:55 -0700)]
drm/xe: Lock all gpuva ops during VM bind IOCTL

Lock all BOs used in gpuva ops and validate all BOs in a single step
during the VM bind IOCTL.

This help with the transition to making all gpuva ops in a VM bind IOCTL
a single atomic job which is required for proper error handling.

v2:
 - Better commit message (Oak)
 - s/op_lock/op_lock_and_prep, few other renames too (Oak)
 - Use DRM_EXEC_IGNORE_DUPLICATES flag in drm_exec_init (local testing)
 - Do not reserve slots in locking step (direction based on series from Thomas)
v3:
 - Validate BO if is immediate set (Oak)

Cc: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-2-matthew.brost@intel.com
4 months agodrm/xe/display: Fix ADL-N detection
Lucas De Marchi [Thu, 25 Apr 2024 18:16:09 +0000 (11:16 -0700)]
drm/xe/display: Fix ADL-N detection

Contrary to i915, in xe ADL-N is kept as a different platform, not a
subplatform of ADL-P. Since the display side doesn't need to
differentiate between P and N, i.e. IS_ALDERLAKE_P_N() is never called,
just fixup the compat header to check for both P and N.

Moving ADL-N to be a subplatform would be more complex as the firmware
loading in xe only handles platforms, not subplatforms, as going forward
the direction is to check on IP version rather than
platforms/subplatforms.

Fix warning when initializing display:

xe 0000:00:02.0: [drm:intel_pch_type [xe]] Found Alder Lake PCH
------------[ cut here ]------------
xe 0000:00:02.0: drm_WARN_ON(!((dev_priv)->info.platform == XE_ALDERLAKE_S) && !((dev_priv)->info.platform == XE_ALDERLAKE_P))

And wrong paths being taken on the display side.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425181610.2704633-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe: Fix spelling mistake "forcebly" -> "forcibly"
Colin Ian King [Fri, 26 Apr 2024 09:49:04 +0000 (10:49 +0100)]
drm/xe: Fix spelling mistake "forcebly" -> "forcibly"

There is a spelling mistake in a drm_dbg message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240426094904.816033-1-colin.i.king@gmail.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/pf: Initialize and update PF services on driver init
Michal Wajdeczko [Thu, 25 Apr 2024 14:39:27 +0000 (16:39 +0200)]
drm/xe/pf: Initialize and update PF services on driver init

The xe_gt_sriov_pf_init_early() and xe_gt_sriov_pf_init_hw() are
ideal places to call per-GT PF service init and update functions.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425143927.2265-2-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Re-initialize SR-IOV specific HW settings
Michal Wajdeczko [Thu, 25 Apr 2024 14:39:26 +0000 (16:39 +0200)]
drm/xe/pf: Re-initialize SR-IOV specific HW settings

On older platforms (12.00) the PF driver must explicitly unblock
VF's modifications to the GGTT. On newer platforms this capability
is enabled by default.

Bspec: 49908, 53204
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425143927.2265-1-michal.wajdeczko@intel.com
4 months agodrm/xe: Change xe_guc_submit_stop return to void
Himal Prasad Ghimiray [Wed, 24 Apr 2024 04:19:11 +0000 (09:49 +0530)]
drm/xe: Change xe_guc_submit_stop return to void

The function xe_guc_submit_stop consistently returns 0 without an error
state, prompting the caller to verify it, which is redundant.

Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424041911.2184868-1-himal.prasad.ghimiray@intel.com
4 months agodrm/xe: Use xe_bo_lock()/xe_bo_unlock() helpers
Himal Prasad Ghimiray [Wed, 24 Apr 2024 04:39:10 +0000 (10:09 +0530)]
drm/xe: Use xe_bo_lock()/xe_bo_unlock() helpers

There is no change in functionality. Using the helper function
defined within the driver for locking/unlocking the reservation
object.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424043910.2190376-3-himal.prasad.ghimiray@intel.com
4 months agodrm/xe/vm: Use xe_vm_lock()/xe_vm_unlock() helpers
Himal Prasad Ghimiray [Wed, 24 Apr 2024 04:39:09 +0000 (10:09 +0530)]
drm/xe/vm: Use xe_vm_lock()/xe_vm_unlock() helpers

There is no change in functionality. Using the helper function
defined within the driver.

-v2
Use xe_vm_unlock() (Ashutosh/Matt)

-v3
Use xe_vm_unlock() for error label too (Matt)

Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424043910.2190376-2-himal.prasad.ghimiray@intel.com
4 months agodrm/xe: Replace engine references with exec queue in xe_guc_submit.c
Matthew Brost [Thu, 25 Apr 2024 23:25:44 +0000 (16:25 -0700)]
drm/xe: Replace engine references with exec queue in xe_guc_submit.c

Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-6-matthew.brost@intel.com
4 months agodrm/xe: Fix alignment in GuC exec queue state defines
Matthew Brost [Thu, 25 Apr 2024 23:25:43 +0000 (16:25 -0700)]
drm/xe: Fix alignment in GuC exec queue state defines

Normalize the alignment for readability.

v3:
 - Fix typo in commit (Himal)
 - Fix EXEC_QUEUE_STATE_WEDGED too (Himal)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-5-matthew.brost@intel.com
4 months agodrm/xe: s/ENGINE_STATE_KILLED/EXEC_QUEUE_STATE_KILLED
Matthew Brost [Thu, 25 Apr 2024 23:25:42 +0000 (16:25 -0700)]
drm/xe: s/ENGINE_STATE_KILLED/EXEC_QUEUE_STATE_KILLED

Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-4-matthew.brost@intel.com
4 months agodrm/xe: s/ENGINE_STATE_SUSPENDED/EXEC_QUEUE_STATE_SUSPENDED
Matthew Brost [Thu, 25 Apr 2024 23:25:41 +0000 (16:25 -0700)]
drm/xe: s/ENGINE_STATE_SUSPENDED/EXEC_QUEUE_STATE_SUSPENDED

Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-3-matthew.brost@intel.com
4 months agodrm/xe: s/ENGINE_STATE_ENABLED/EXEC_QUEUE_STATE_ENABLED
Matthew Brost [Thu, 25 Apr 2024 23:25:40 +0000 (16:25 -0700)]
drm/xe: s/ENGINE_STATE_ENABLED/EXEC_QUEUE_STATE_ENABLED

Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-2-matthew.brost@intel.com
4 months agodrm/xe: Delete unused GuC submission_state.suspend
Matthew Brost [Thu, 25 Apr 2024 05:47:47 +0000 (22:47 -0700)]
drm/xe: Delete unused GuC submission_state.suspend

GuC submission_state.suspend is unused, delete it.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425054747.1918811-1-matthew.brost@intel.com
4 months agodrm/xe/vm: prevent UAF in rebind_work_func()
Matthew Auld [Tue, 23 Apr 2024 07:47:23 +0000 (08:47 +0100)]
drm/xe/vm: prevent UAF in rebind_work_func()

We flush the rebind worker during the vm close phase, however in places
like preempt_fence_work_func() we seem to queue the rebind worker
without first checking if the vm has already been closed.  The concern
here is the vm being closed with the worker flushed, but then being
rearmed later, which looks like potential uaf, since there is no actual
refcounting to track the queued worker. We can't take the vm->lock here
in preempt_rebind_work_func() to first check if the vm is closed since
that will deadlock, so instead flush the worker again when the vm
refcount reaches zero.

v2:
 - Grabbing vm->lock in the preempt worker creates a deadlock, so
   checking the closed state is tricky. Instead flush the worker when
   the refcount reaches zero. It should be impossible to queue the
   preempt worker without already holding vm ref.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1676
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1591
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1364
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1304
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1249
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423074721.119633-4-matthew.auld@intel.com
4 months agoRevert "drm/xe/vm: drop vm->destroy_work"
Matthew Auld [Tue, 23 Apr 2024 07:47:22 +0000 (08:47 +0100)]
Revert "drm/xe/vm: drop vm->destroy_work"

This reverts commit 5b259c0d1d3caa6efc66c2b856840e68993f814e.

Cleanup here is good, however we need to able to flush a worker during
vm destruction which might involve sleeping, so bring back the worker.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423074721.119633-3-matthew.auld@intel.com
4 months agodrm/xe/preempt_fence: enlarge the fence critical section
Matthew Auld [Thu, 18 Apr 2024 14:46:31 +0000 (15:46 +0100)]
drm/xe/preempt_fence: enlarge the fence critical section

It is really easy to introduce subtle deadlocks in
preempt_fence_work_func() since we operate on single global ordered-wq
for signalling our preempt fences behind the scenes, so even though we
signal a particular fence, everything in the callback should be in the
fence critical section, since blocking in the callback will prevent
other published fences from signalling. If we enlarge the fence critical
section to cover the entire callback, then lockdep should be able to
understand this better, and complain if we grab a sensitive lock like
vm->lock, which is also held when waiting on preempt fences.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240418144630.299531-2-matthew.auld@intel.com
4 months agodrm/xe/guc: Fix typos in VF CFG KLVs descriptions
Michal Wajdeczko [Wed, 24 Apr 2024 14:05:06 +0000 (16:05 +0200)]
drm/xe/guc: Fix typos in VF CFG KLVs descriptions

Apart from the obvious spelling typo, use the correct values for
infinity quantum/timeout settings (it's 0x0 instead of 0xFFFFFFFF).

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140506.2133-1-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Expose PF service details via debugfs
Michal Wajdeczko [Wed, 24 Apr 2024 17:10:30 +0000 (19:10 +0200)]
drm/xe/pf: Expose PF service details via debugfs

For debug purposes we might want to verify which registers values
PF is sharing with VFs and to view which VF/PF ABI versions were
negotiated by the VFs. Plug the 'print' functions already provided
by the PF service code into our debugfs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424171030.2177-1-michal.wajdeczko@intel.com
4 months agodrm/xe: Check result of drmm_mutex_init()
Michal Wajdeczko [Tue, 9 Apr 2024 15:31:32 +0000 (17:31 +0200)]
drm/xe: Check result of drmm_mutex_init()

Although it's unlikely that drmm_mutex_init() will fail during
driver initialization, however we shouldn't ignore this case.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240409153132.1111-1-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/xe2: Add workaround 14021567978
Tejas Upadhyay [Wed, 10 Apr 2024 06:46:40 +0000 (12:16 +0530)]
drm/xe/xe2: Add workaround 14021567978

Workaround 14021567978 applies to RenderCS xe2

V3:
  - Cover xe2_hpg as its landed upstream now
V2(MattR):
  - Move tuning to wa and apply to xe2

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410064640.1010098-1-tejas.upadhyay@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/guc: Improve GuC doorbell/context ID manager intro message
Michal Wajdeczko [Fri, 19 Apr 2024 15:34:07 +0000 (17:34 +0200)]
drm/xe/guc: Improve GuC doorbell/context ID manager intro message

We can use recently added str_plural() helper.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419153407.402-1-michal.wajdeczko@intel.com
4 months agodrm/xe: Introduce the wedged_mode debugfs
Rodrigo Vivi [Tue, 23 Apr 2024 22:18:17 +0000 (18:18 -0400)]
drm/xe: Introduce the wedged_mode debugfs

So, the wedged mode can be selected per device at runtime,
before the tests or before reproducing the issue.

v2: - s/busted/wedged
    - some locking consistency

v3: - remove mutex
    - toggle guc reset policy on any mode change

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-4-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: Force wedged state and block GT reset upon any GPU hang
Rodrigo Vivi [Tue, 23 Apr 2024 22:18:16 +0000 (18:18 -0400)]
drm/xe: Force wedged state and block GT reset upon any GPU hang

In many validation situations when debugging GPU Hangs,
it is useful to preserve the GT situation from the moment
that the timeout occurred.

This patch introduces a module parameter that could be used
on situations like this.

If xe.wedged module parameter is set to 2, Xe will be declared
wedged on every single execution timeout (a.k.a. GPU hang) right
after devcoredump snapshot capture and without attempting any
kind of GT reset and blocking entirely any kind of execution.

v2: Really block gt_reset from guc side. (Lucas)
    s/wedged/busted (Lucas)

v3: - s/busted/wedged
    - Really use global_flags (Dafna)
    - More robust timeout handling when wedging it.

v4: A really robust clean exit done by Matt Brost.
    No more kernel warns on unbind.

v5: Simplify error message (Lucas)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Dafna Hirschfeld <dhirschfeld@habana.ai>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Himanshu Somaiya <himanshu.somaiya@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-3-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: declare wedged upon GuC load failure
Rodrigo Vivi [Tue, 23 Apr 2024 22:18:15 +0000 (18:18 -0400)]
drm/xe: declare wedged upon GuC load failure

Let's block the device upon any GuC load failure.
But let's continue with the probe so guc logs can be read
from the debugfs.

v2: - s/wedged/busted
    - do not block probe or we lose guc_logs in debugfs (Matt)

v3: - s/busted/wedged

v4: Do not change __xe_guc_upload return. (Himal)

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: Introduce a simple wedged state
Rodrigo Vivi [Tue, 23 Apr 2024 22:18:14 +0000 (18:18 -0400)]
drm/xe: Introduce a simple wedged state

Introduce a very simple 'wedged' state where any attempt
to access the GPU is entirely blocked.

On some critical cases, like on gt_reset failure, we need to
block any other attempt to use the GPU. Otherwise we are at
a risk of reaching cases that would force us to reboot the machine.

So, when this cases are identified we corner and block any GPU
access. No IOCTL and not even another GT reset should be attempted.

The 'wedged' state in Xe is an end state with no way back.
Only a device "re-probe" (unbind + bind) can restore the GPU access.

v2: - s/wedged/busted (Lucas)
    - use unbind+bind instead of module reload (Lucas)
    - added more info on unbind operations and instruction on bug report
    - only print the message once.

v3: - s/busted/wedged (Ashutosh, Tvrtko, Thomas)
    - don't assume user has sudo and tee available (Lucas)

v4: - remove unnecessary cases around ct communication or migration.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: Add INSTDONE registers to devcoredump
José Roberto de Souza [Wed, 24 Apr 2024 14:03:03 +0000 (07:03 -0700)]
drm/xe: Add INSTDONE registers to devcoredump

This registers contains important information that can help with debug
of GPU hangs.

While at it also fixing the double line jump at the end of engine
registers for CCS engines.

v2:
- print other INSTDONE registers

v3:
- add for_each_geometry/compute_dss()

v4:
- print one slice_common_instdone per glice in DG2+

v5:
- rename registers prefix from DG2 to XEHPG (Zhanjun)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-3-jose.souza@intel.com
4 months agodrm/xe: Add helpers to loop over geometry and compute DSS
José Roberto de Souza [Wed, 24 Apr 2024 14:03:02 +0000 (07:03 -0700)]
drm/xe: Add helpers to loop over geometry and compute DSS

Some DSS can only be available for geometry while others can only be
available for compute.
So here adding helpers to loop only available DSS for given usage.

User of this helper will come in the next patch.

v2:
- drop has_dss()

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-2-jose.souza@intel.com
4 months agodrm/xe: Store xe_hw_engine in xe_hw_engine_snapshot
José Roberto de Souza [Wed, 24 Apr 2024 14:03:01 +0000 (07:03 -0700)]
drm/xe: Store xe_hw_engine in xe_hw_engine_snapshot

A future patch will require gt and xe device structs, so here
replacing class by hwe.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-1-jose.souza@intel.com
4 months agodrm/xe/pf: Clamp maximum execution quantum to 100s
Michal Wajdeczko [Fri, 19 Apr 2024 12:35:43 +0000 (14:35 +0200)]
drm/xe/pf: Clamp maximum execution quantum to 100s

GuC is silently clamping values of the execution quantum and
preemption timeout KLVs to 100s. Perform explicit clamping on the
driver side as later there is no way to read back values used by
the firmware and we shouldn't mislead the user about actual values
being used when we print them in dmesg or debugfs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419123543.270-3-michal.wajdeczko@intel.com
4 months agodrm/xe/guc: Update VF configuration KLVs definitions
Michal Wajdeczko [Fri, 19 Apr 2024 12:35:42 +0000 (14:35 +0200)]
drm/xe/guc: Update VF configuration KLVs definitions

GuC firmware specification says that maximum value for the execution
quantum KLV is 100s and anything exceeding that will be clamped.
The same limitation applies to the preemption timeout KLV.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419123543.270-2-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Expose SR-IOV policy settings over debugfs
Michal Wajdeczko [Tue, 23 Apr 2024 13:12:44 +0000 (15:12 +0200)]
drm/xe/pf: Expose SR-IOV policy settings over debugfs

We already have functions to configure SR-IOV policies.
Allow to tweak those policy settings over debugfs.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-4-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Expose SR-IOV VF control commands over debugfs
Michal Wajdeczko [Tue, 23 Apr 2024 13:12:43 +0000 (15:12 +0200)]
drm/xe/pf: Expose SR-IOV VF control commands over debugfs

We already have functions to control the VF.
Allow to control the VF using debugfs.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-3-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Expose SR-IOV VFs configuration over debugfs
Michal Wajdeczko [Tue, 23 Apr 2024 13:12:42 +0000 (15:12 +0200)]
drm/xe/pf: Expose SR-IOV VFs configuration over debugfs

We already have functions to configure VF resources and to print
actual provisioning details. Expose this functionality in debugfs
to allow experiment with different settings or inspect details in
case of unexpected issues with the provisioning.

As debugfs attributes are per-VF, we use parent d_inode->i_private
to store VFID, similarly how we did for per-GT attributes.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-2-michal.wajdeczko@intel.com
4 months agodrm/xe/kunit: Add PF service tests
Michal Wajdeczko [Tue, 23 Apr 2024 18:04:36 +0000 (20:04 +0200)]
drm/xe/kunit: Add PF service tests

Start with basic tests for VF/PF ABI version negotiation. As we
treat all platforms in the same way, we can run the tests on one
platform. More tests will likely come later.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-6-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Add SR-IOV GuC Relay PF services
Michal Wajdeczko [Tue, 23 Apr 2024 18:04:35 +0000 (20:04 +0200)]
drm/xe/pf: Add SR-IOV GuC Relay PF services

We already have mechanism that allows a VF driver to communicate
with the PF driver, now add PF side handlers for VF2PF requests
defined in version 1.0 of VF/PF GuC Relay ABI specification.

The VF2PF_HANDSHAKE request must be used by the VF driver to
negotiate the ABI version prior to sending any other request.
We will reset any negotiated version later during FLR.

The outcome of the VF2PF_QUERY_RUNTIME requests depends on actual
platform, for legacy platforms used as SDV is provided as-is, for
latest platforms it is preliminary, and might be changed.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-5-michal.wajdeczko@intel.com
4 months agodrm/xe: Add few more GT register definitions
Michal Wajdeczko [Tue, 23 Apr 2024 18:04:34 +0000 (20:04 +0200)]
drm/xe: Add few more GT register definitions

While we are not using these registers right now, they are part
of some runtime register lists that PF driver share with VFs on
some legacy platforms that we might want to support as SDV.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-4-michal.wajdeczko@intel.com
4 months agodrm/xe: Add helper to calculate adjusted register offset
Michal Wajdeczko [Tue, 23 Apr 2024 18:04:33 +0000 (20:04 +0200)]
drm/xe: Add helper to calculate adjusted register offset

Our MMIO accessing functions automatically adjust addresses for the
media registers based on mmio.adj_limit and mmio.adj_offset logic.
Move it to the separate helper to avoid code duplication and to
allow using it by the upcoming changes to PF driver code.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-3-michal.wajdeczko@intel.com
4 months agodrm/xe/guc: Add GuC Relay ABI version 1.0 definitions
Michal Wajdeczko [Tue, 23 Apr 2024 18:04:32 +0000 (20:04 +0200)]
drm/xe/guc: Add GuC Relay ABI version 1.0 definitions

This initial GuC Relay ABI specification includes messages for ABI
version negotiation and to query values of runtime/fuse registers.

We will start handling those messages on the PF driver soon.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-2-michal.wajdeczko@intel.com
4 months agodrm/xe: Fix unexpected backmerge results
Thomas Hellström [Tue, 23 Apr 2024 12:11:14 +0000 (14:11 +0200)]
drm/xe: Fix unexpected backmerge results

The recent backmerge from drm-next to drm-xe-next brought with it
some silent unexpected results. One code snippet was added twice
and a partial revert had merge errors. Fix that up to
reinstate the affected code as it was before the backmerge.

v2:
- Commit log message rewording (Lucas DeMarchi)

Fixes: 79790b6818e9 ("Merge drm/drm-next into drm-xe-next")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423121114.39325-1-thomas.hellstrom@linux.intel.com
4 months agodrm/xe: make xe_pm_runtime_lockdep_map a static struct
Rodrigo Vivi [Mon, 22 Apr 2024 20:14:54 +0000 (16:14 -0400)]
drm/xe: make xe_pm_runtime_lockdep_map a static struct

Fix the new sparse warning:

>> drivers/gpu/drm/xe/xe_pm.c:72:20: sparse: sparse: symbol
'xe_pm_runtime_lockdep_map' was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404191329.EZzOTzwK-lkp@intel.com/
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240422201454.699089-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/guc: Fix arguments passed to relay G2H handlers
Michal Wajdeczko [Fri, 19 Apr 2024 15:03:51 +0000 (17:03 +0200)]
drm/xe/guc: Fix arguments passed to relay G2H handlers

By default CT code was passing just payload of the G2H event
message, while Relay code expects full G2H message including
HXG header which contains DATA0 field. Fix that.

Fixes: 26d4481ac23f ("drm/xe/guc: Start handling GuC Relay event messages")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419150351.358-1-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Fix xe_gt_sriov_pf_config_print_available_ggtt()
Michal Wajdeczko [Fri, 19 Apr 2024 14:10:00 +0000 (16:10 +0200)]
drm/xe/pf: Fix xe_gt_sriov_pf_config_print_available_ggtt()

This function is using internal helper pf_get_spare_ggtt() that
expects PF's master mutex to be locked. Fix that.

Fixes: ac6598aed1b3 ("drm/xe/pf: Add support to configure SR-IOV VFs")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419141000.314-1-michal.wajdeczko@intel.com
4 months agodrm/xe: Kill xe_device_mem_access_{get*,put}
Rodrigo Vivi [Thu, 18 Apr 2024 14:30:49 +0000 (10:30 -0400)]
drm/xe: Kill xe_device_mem_access_{get*,put}

Let's simply convert all the current callers towards direct
xe_pm_runtime access and remove this extra layer of indirection.

No functional change is expected with this patch since
xe_mem_access_get was already using the xe_pm_runtime_get_noresume
at this point.

v2: Convert all the current callers instead of a big refactor
at once.

v3: - Rebased
    - Squashed the GSC/HDCP
    - Added a new case: sriov_pf_policy
    - Improved commit message to highlight that
      there's no functional change in this patch.

Reviewed-by: Matthew Auld <matthew.auld@intel.com> #v2
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240418143049.43231-1-rodrigo.vivi@intel.com
5 months agodrm/xe: Define all possible engines in media IP descriptors
Matt Roper [Wed, 17 Apr 2024 15:26:22 +0000 (08:26 -0700)]
drm/xe: Define all possible engines in media IP descriptors

Rather than trying to identify exactly which engines are available on
each platform in the IP descriptor, just include the list of all media
engines that the IP could theoretically support (i.e., 8 VCS + 4 VECS).
We still rely on the media fuse registers to tell us which specific
engine instances are actually present on a given platform, so there
shouldn't be any functional change.  This will help prevent mistakes
with engine numbering (for example ambiguity about whether the 2nd VCS
engine on a platform with exactly two engines is numbered "VCS1" or
"VCS2") and will also future-proof the code a bit more in case new SKUs
or platform refreshes extend the engine list in the future.

Note that the media fuse register technically has an 8-bit field for
VECS engine presence starting on Xe2.  However there's still no MMIO
register range reserved for VE engines above VECS3, so VE0-VE3 is still
consider the "maximum" VE engine mask that the driver can support for
now.

Bspec: 52614, 52615, 62567
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417152621.3357990-2-matthew.d.roper@intel.com
5 months agodrm/i915: Convert intel_runtime_pm_get_noresume towards raw wakeref
Rodrigo Vivi [Thu, 18 Apr 2024 22:37:56 +0000 (18:37 -0400)]
drm/i915: Convert intel_runtime_pm_get_noresume towards raw wakeref

In the past, the noresume function was used by the GEM code to ensure
wakelocks were held and bump its usage. This is no longer the case
and this function was totally unused until it started to be used again
by display with commit 77e619a82fc3 ("drm/i915/display: convert inner
wakeref get towards get_if_in_use")

However, on the display code, most of the callers are using the
raw wakeref, rather then the wakelock version. What caused a
major regression caught by CI.

Another option to this patch is to go with the original plan and
use the get_if_in_use variant in the display code, what is enough
to fulfil our needs. Then, an extra patch to delete the unused
_noresume variant.

v2: Keep grabbing wakelock but only assert for wakeref. (Imre)

Cc: Imre Deak <imre.deak@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: 77e619a82fc3 ("drm/i915/display: convert inner wakeref get towards get_if_in_use")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10875
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240418223756.68427-1-rodrigo.vivi@intel.com
5 months agodrm/i915/hwmon: Get rid of devm
Ashutosh Dixit [Wed, 17 Apr 2024 14:56:46 +0000 (07:56 -0700)]
drm/i915/hwmon: Get rid of devm

When both hwmon and hwmon drvdata (on which hwmon depends) are device
managed resources, the expectation, on device unbind, is that hwmon will be
released before drvdata. However, in i915 there are two separate code
paths, which both release either drvdata or hwmon and either can be
released before the other. These code paths (for device unbind) are as
follows (see also the bug referenced below):

Call Trace:
release_nodes+0x11/0x70
devres_release_group+0xb2/0x110
component_unbind_all+0x8d/0xa0
component_del+0xa5/0x140
intel_pxp_tee_component_fini+0x29/0x40 [i915]
intel_pxp_fini+0x33/0x80 [i915]
i915_driver_remove+0x4c/0x120 [i915]
i915_pci_remove+0x19/0x30 [i915]
pci_device_remove+0x32/0xa0
device_release_driver_internal+0x19c/0x200
unbind_store+0x9c/0xb0

and

Call Trace:
release_nodes+0x11/0x70
devres_release_all+0x8a/0xc0
device_unbind_cleanup+0x9/0x70
device_release_driver_internal+0x1c1/0x200
unbind_store+0x9c/0xb0

This means that in i915, if use devm, we cannot gurantee that hwmon will
always be released before drvdata. Which means that we have a uaf if hwmon
sysfs is accessed when drvdata has been released but hwmon hasn't.

The only way out of this seems to be do get rid of devm_ and release/free
everything explicitly during device unbind.

v2: Change commit message and other minor code changes
v3: Cleanup from i915_hwmon_register on error (Armin Wolf)
v4: Eliminate potential static analyzer warning (Rodrigo)
    Eliminate fetch_and_zero (Jani)
v5: Restore previous logic for ddat_gt->hwmon_dev error return (Andi)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10366
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417145646.793223-1-ashutosh.dixit@intel.com
5 months agodrm/xe/pm: Capture errors and handle them
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:11 +0000 (23:42 +0530)]
drm/xe/pm: Capture errors and handle them

xe_pm_init may encounter failures for various reasons, such as a failure
in initializing drmm_mutex, or when dealing with a d3cold-capable device
for vram_threshold sysfs creation and setting default threshold.
Presently, all these potential failures are disregarded.

Move d3cold.lock initialization to xe_pm_init_early and cause driver
abort if mutex initialization has failed.

For xe_pm_init failures cleanup the driver and return error code

-v2
Make mutex init cleaner (Lucas)

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-8-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe/tile: Abort driver load for sysfs creation failure
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:10 +0000 (23:42 +0530)]
drm/xe/tile: Abort driver load for sysfs creation failure

Ensure that the status of all tile associated sysfs entries creation is
relayed to xe_tile_init_noalloc, leading to a driver load abort if any
sysfs creation failures occur.

-v2
Avoid unnecessary warn/error messages. (Lucas)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-7-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe/gt: Abort driver load for sysfs creation failure
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:09 +0000 (23:42 +0530)]
drm/xe/gt: Abort driver load for sysfs creation failure

Instead of allowing the driver to load with incomplete sysfs entries in
case of sysfs creation failure, we should terminate the driver loading.
This change ensures that the status of all gt associated sysfs entries
creation is relayed to xe_gt_init, leading to a driver load abort if any
sysfs creation failures occur.

-v2
use err_force_wake label instead of new. (Lucas)
Avoid unnecessary warn/error messages. (Lucas)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-6-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe: Return NULL in case of drmm_add_action_or_reset failure
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:08 +0000 (23:42 +0530)]
drm/xe: Return NULL in case of drmm_add_action_or_reset failure

In case of drmm_add_action_or_reset failure return NULL and no need
to print warning messages as they will be printed implictly.

Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-5-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe: call free_gsc_pkt only once on action add failure
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:07 +0000 (23:42 +0530)]
drm/xe: call free_gsc_pkt only once on action add failure

The drmm_add_action_or_reset function automatically invokes the
action (free_gsc_pkt) in the event of a failure; therefore, there's no
necessity to call it within the return check.

-v2
Fix commit message. (Lucas)

Fixes: d8b1571312b7 ("drm/xe/huc: HuC authentication via GSC")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-4-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe: Remove sysfs only once on action add failure
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:06 +0000 (23:42 +0530)]
drm/xe: Remove sysfs only once on action add failure

The drmm_add_action_or_reset function automatically invokes the action
(sysfs removal) in the event of a failure; therefore, there's no
necessity to call it within the return check.

Modify the return type of xe_gt_ccs_mode_sysfs_init to int, allowing the
caller to pass errors up the call chain. Should sysfs creation or
drmm_add_action_or_reset fail, error propagation will prompt a driver
load abort.

-v2
Edit commit message (Nikula/Lucas)
use err_force_wake label instead of new. (Lucas)
Avoid unnecessary warn/error messages. (Lucas)

Fixes: f3bc5bb4d53d ("drm/xe: Allow userspace to configure CCS mode")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-3-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe: Simplify function return using drmm_add_action_or_reset()
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:05 +0000 (23:42 +0530)]
drm/xe: Simplify function return using drmm_add_action_or_reset()

Instead of assigning the value of drmm_add_action_or_reset() to err and
returning err in case of failure and 0 in case of success, simply return
the result of drmm_add_action_or_reset().

-v2:
cleanup in xe_display too.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-2-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe/xe2lpg: Extend Wa_14020338487
Gustavo Sousa [Wed, 17 Apr 2024 21:25:01 +0000 (18:25 -0300)]
drm/xe/xe2lpg: Extend Wa_14020338487

Wa_14020338487 also applies to Xe2_LPG. Replicate the existing entry to
one specific for Xe2_LPG.

Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417212501.312346-1-gustavo.sousa@intel.com
5 months agodrm/xe: Add outer runtime_pm protection to xe_live_ktest@xe_dma_buf
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:52 +0000 (16:39 -0400)]
drm/xe: Add outer runtime_pm protection to xe_live_ktest@xe_dma_buf

Any kunit doing any memory access should get their own runtime_pm
outer references since they don't use the standard driver API
entries. In special this dma_buf from the same driver.

Found by pre-merge CI on adding WARN calls for unprotected
inner callers:

<6> [318.639739]     # xe_dma_buf_kunit: running xe_test_dmabuf_import_same_driver
<4> [318.639957] ------------[ cut here ]------------
<4> [318.639967] xe 0000:4d:00.0: Missing outer runtime PM protection
<4> [318.640049] WARNING: CPU: 117 PID: 3832 at drivers/gpu/drm/xe/xe_pm.c:533 xe_pm_runtime_get_noresume+0x48/0x60 [xe]

Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-10-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Ensure all the inner access are using the _noresume variant
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:51 +0000 (16:39 -0400)]
drm/xe: Ensure all the inner access are using the _noresume variant

At this point mem_access references should be only used as inner
points of the execution and a get with synchronous resume previously
called at an outer point.

So, before killing mem_acces in favor of direct accsess, let's
ensure that we first convert them towards the new _noresume
variant that will WARN us if no inner caller happened.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-9-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Convert mem_access_if_ongoing to direct xe_pm_runtime_get_if_active
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:50 +0000 (16:39 -0400)]
drm/xe: Convert mem_access_if_ongoing to direct xe_pm_runtime_get_if_active

Now that assert_mem_access is relying directly on the pm_runtime state
instead of the counters, there's no reason why we cannot use
the pm_runtime functions directly.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-8-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Removing extra mem_access protection from runtime pm
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:49 +0000 (16:39 -0400)]
drm/xe: Removing extra mem_access protection from runtime pm

This is not needed any longer, now that we have all the protection
in place with the runtime pm itself.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-7-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Convert xe_gem_fault to use direct xe_pm_runtime calls
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:48 +0000 (16:39 -0400)]
drm/xe: Convert xe_gem_fault to use direct xe_pm_runtime calls

The gem page fault is one of the outer bound protections where
we want to ensure that the hardware is in D0 before proceeding
with memory access. Let's convert it towards the xe_pm_runtime
functions directly so we can then convert the mem_access to be
inner protection only and then Kill it for good.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-6-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Remove useless mem_access during probe
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:47 +0000 (16:39 -0400)]
drm/xe: Remove useless mem_access during probe

xe_pm_init is the very last thing during the xe_pci_probe(),
hence these protections are useless from the point of view
of ensuring that the device is awake.

Let's remove it so we continue towards the goal of killing
xe_device_mem_access.

v2: Adding more cases
v3: Provide a separate fix for xe_tile_init_noalloc return (Matt)
    Adding a new case where display HDCP init calls which
    are also called at display probe time.

Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-5-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Move lockdep protection from mem_access to xe_pm_runtime
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:46 +0000 (16:39 -0400)]
drm/xe: Move lockdep protection from mem_access to xe_pm_runtime

The mem_access itself is not holding any lock, but attempting
to train lockdep with possible scarring locks happening during
runtime pm. We are going soon to kill the mem_access get and put
helpers in favor of direct xe_pm_runtime calls, so let's just
move this lock around to where it now belongs.

v2: s/lockdep_training/lockdep_prime (Matt Auld)

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-4-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/i915/display: convert inner wakeref get towards get_if_in_use
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:45 +0000 (16:39 -0400)]
drm/i915/display: convert inner wakeref get towards get_if_in_use

This patch brings no functional change. Since at this point of
the code we are already asserting a wakeref was held, it means
that we are with runtime_pm 'in_use' and in practical terms we
are only bumping the pm_runtime usage counter and moving on.

However, xe driver has a lockdep annotation that warned us that
if a sync resume was actually called at this point, we could have
a deadlock because we are inside the power_domains->lock locked
area and the resume would call the irq_reset, which would also
try to get the power_domains->lock.

For this reason, let's convert this call to a safer option and
calm lockdep on.

v2: use _noresume variant instead of get_in_use (Ville, Imre)

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Imre Deak <imre.deak@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-3-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Introduce intel_runtime_pm_get_noresume at compat-i915-headers for display
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:44 +0000 (16:39 -0400)]
drm/xe: Introduce intel_runtime_pm_get_noresume at compat-i915-headers for display

The i915-display will start using the intel_runtime_pm_noresume.
So we need to add the compat header before it.

Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Introduce xe_pm_runtime_get_noresume for inner callers
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:43 +0000 (16:39 -0400)]
drm/xe: Introduce xe_pm_runtime_get_noresume for inner callers

Let's ensure that we have an option for inner callers that will
raise WARN if device is not active and not protected by outer callers.

Make this also a void function forcing every caller to unconditionally
put the reference back afterwards.

This will be very important for cases where we want to hold the
reference before scheduling a work in a queue. Then the work job
will be responsible for putting it back.

While at this, already convert a case from mem_access_get_ongoing where
it is not checking for the reference and put it back, what would
cause the underflow.

v2: Fix identation.
v3: Convert equivalent missing put from mem_access towards pm_runtime.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe/lnl: Apply GuC Wa_13011645652
Vinay Belgaumkar [Wed, 17 Apr 2024 05:48:02 +0000 (22:48 -0700)]
drm/xe/lnl: Apply GuC Wa_13011645652

Enable WA for a bug that could cause the C6 state machine to hang
during RC6 exit.

v2: Add comment clarifying the WA (John H)
v3: Add more details to the comment (John H)

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417054802.1766359-1-vinay.belgaumkar@intel.com
5 months agodrm/xe/vm: don't include xe_gt.h
Matthew Auld [Fri, 12 Apr 2024 11:31:47 +0000 (12:31 +0100)]
drm/xe/vm: don't include xe_gt.h

clangd complains here, since nothing in xe_gt.h seems to be needed.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412113144.259426-6-matthew.auld@intel.com
5 months agodrm/xe/vm: drop vm->destroy_work
Matthew Auld [Fri, 12 Apr 2024 11:31:46 +0000 (12:31 +0100)]
drm/xe/vm: drop vm->destroy_work

Now that we no longer grab the usm.lock mutex (which might sleep) it
looks like it should be safe to directly perform xe_vm_free when vm
refcount reaches zero, instead of punting that off to some worker.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412113144.259426-5-matthew.auld@intel.com
5 months agodrm/xe/vm: prevent UAF with asid based lookup
Matthew Auld [Fri, 12 Apr 2024 11:31:45 +0000 (12:31 +0100)]
drm/xe/vm: prevent UAF with asid based lookup

The asid is only erased from the xarray when the vm refcount reaches
zero, however this leads to potential UAF since the xe_vm_get() only
works on a vm with refcount != 0. Since the asid is allocated in the vm
create ioctl, rather erase it when closing the vm, prior to dropping the
potential last ref. This should also work when user closes driver fd
without explicit vm destroy.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1594
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412113144.259426-4-matthew.auld@intel.com
5 months agodrm/xe/stolen: ignore first page for FBC
Matthew Auld [Fri, 12 Apr 2024 15:03:03 +0000 (16:03 +0100)]
drm/xe/stolen: ignore first page for FBC

We have observed underruns on some platforms if the CFB offset is within
the first page of stolen. Just like i915 skip the first page.

v2 (Maarten)
  - Also align the start.

BSpec: 50214
Reported-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412150301.273344-4-matthew.auld@intel.com
5 months agodrm/xe/stolen: lower the default alignment
Matthew Auld [Fri, 12 Apr 2024 15:03:02 +0000 (16:03 +0100)]
drm/xe/stolen: lower the default alignment

No need to be so aggressive here. The upper layers will already apply
the needed alignment, plus some allocations might wish to skip it. Main
issue is that we might want to have start/end bias range which doesn't
match the default alignment which is rejected by the allocator.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412150301.273344-3-matthew.auld@intel.com
5 months agodrm/xe: select X86_PLATFORM_DEVICES when ACPI_WMI is selected
Lu Yao [Mon, 15 Apr 2024 02:52:15 +0000 (10:52 +0800)]
drm/xe: select X86_PLATFORM_DEVICES when ACPI_WMI is selected

ACPI_WMI is a subitem of X86_PLATFORM_DEVICES. And X86_PLATFORM_DEVICES
is not selected in the current Kconfig, and may cause Kconfig warnings:

WARNING: unmet direct dependencies detected for ACPI_WMI
  Depends on [n]: X86_PLATFORM_DEVICES [=n] && ACPI [=y]
  Selected by [m]:
  - DRM_XE [=m] && HAS_IOMEM [=y] && DRM [=m] && PCI [=y] && MMU [=y] &&
    (m && MODULES [=y] || y && KUNIT [=y]=y) && X86 [=y] && ACPI [=y]

Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415025215.15811-1-yaolu@kylinos.cn
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe/bmg: Some LNL workarounds also apply to BMG
John Harrison [Wed, 10 Apr 2024 00:26:46 +0000 (17:26 -0700)]
drm/xe/bmg: Some LNL workarounds also apply to BMG

Enable a couple of existing workarounds for a new platform.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410002646.3002394-3-John.C.Harrison@Intel.com
5 months agodrm/xe/lnl: Enable more GuC based workarounds
John Harrison [Wed, 10 Apr 2024 00:26:45 +0000 (17:26 -0700)]
drm/xe/lnl: Enable more GuC based workarounds

There are a couple of new workarounds for LNL that are implemented in
the GuC firmware. The KMD needs to enable them explicitly.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410002646.3002394-2-John.C.Harrison@Intel.com
5 months agodrm/xe/pf: Add support to configure SR-IOV VFs
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:37 +0000 (19:39 +0200)]
drm/xe/pf: Add support to configure SR-IOV VFs

To run correctly, each Virtual Function must be provisioned with
some chunk of shared hardware or firmware resources (like GGTT,
device memory, GuC doorbell IDs, GuC context IDs) and scheduling
parameters (execution quantum or preemption timeout).

All resources assigned to VFs must be excluded from the PF driver
use and may require some additional preparation steps (like setup
of the LMTT or update of the GGTT PTE). Those provisioning details
must be then sent to the GuC firmware as most of those details
will be shared later with the VF drivers during their boot.

Add basic functions to provision VFs with all hardware resources
or scheduling parameters. We will use them shortly in upcoming
patches either in manual provisioning over debugfs, exposed to the
advanced users, or automatic provisioning done by PF driver during
VFs enabling.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-7-michal.wajdeczko@intel.com
5 months agodrm/xe/pf: Add SR-IOV PF specific early GT initialization
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:36 +0000 (19:39 +0200)]
drm/xe/pf: Add SR-IOV PF specific early GT initialization

The PF driver must maintain additional GT level data per each VF.
This additional per-VF data will be added in upcoming patches and
will include: provisioning configuration (like GGTT space or LMEM
allocation sizes or scheduling parameters), monitoring thresholds
and counters, and more.

As number of supported VFs varies across platforms use flexible
array where first entry will contain metadata for the PF itself
(if such configuration parameter is applicable for the PF) and
all remaining entries will contain data for potential VFs.

Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-6-michal.wajdeczko@intel.com
5 months agodrm/xe/guc: Add PF2GUC_UPDATE_VF_CFG to ABI
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:35 +0000 (19:39 +0200)]
drm/xe/guc: Add PF2GUC_UPDATE_VF_CFG to ABI

In upcoming patches the PF driver will add support to change VFs
configuration and will need to use PF2GUC_UPDATE_VF_CFG messages.
Add necessary definitions to our GuC firmware ABI header.

Definitions of the GuC VF Configuration KLVs used by this action
are already present in abi/guc_klvs_abi.h

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-5-michal.wajdeczko@intel.com
5 months agodrm/xe: Add xe_ttm_vram_get_avail
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:34 +0000 (19:39 +0200)]
drm/xe: Add xe_ttm_vram_get_avail

The PF driver will need to know size of the remaining available
VRAM to estimate fair VRAM allocations that could be used across
all VFs in automatic VFs provisioning mode. Add helper function
for that. We will use it in upcoming patch.

Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-4-michal.wajdeczko@intel.com
5 months agodrm/xe: Allow to assign GGTT region to the VF
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:33 +0000 (19:39 +0200)]
drm/xe: Allow to assign GGTT region to the VF

VF's drivers can't modify GGTT PTEs except the range explicitly
assigned by the PF driver. To allow hardware enforcement of this
requirement, each GGTT PTE has a field with the VF number that
identifies which VF can modify that particular GGTT PTE entry.

Only PF driver can modify this field and PF driver shall do that
before VF drivers will be loaded. Add function to prepare PTEs.
Since it will be used only by the PF driver, make it available
only for CONFIG_PCI_IOV=y.

Bspec: 45015, 52395
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-3-michal.wajdeczko@intel.com
5 months agodrm/xe: Add helper to format SR-IOV function name
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:32 +0000 (19:39 +0200)]
drm/xe: Add helper to format SR-IOV function name

While the GuC firmware and the Xe driver are using VF identifier
VFID(0) to represent the Physical Function, we should avoid using
"VF0" name and use proper "PF" name in all user facing messages
related to the Physical Function and use "VFn" name only when
referrinf to the true Virtual Function. Add simple helper to get
properly formatted function name based on the function number.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-2-michal.wajdeczko@intel.com
5 months agodrm/xe/gt: Add L3 bank mask to GT topology
Francois Dugast [Wed, 10 Apr 2024 12:37:22 +0000 (12:37 +0000)]
drm/xe/gt: Add L3 bank mask to GT topology

Generate the mask of enabled L3 banks for the GT. It is stored with the
rest of the GT topology in a consistent representation across platforms.
For now the L3 bank mask is just printed in the log for developers to
easily figure out the fusing characteristics of machines that they are
trying to debug issues on. Later it can be used to replace existing code
in the driver that requires the L3 bank count (not mask). Also the mask
can easily be exposed to user space in a new query if needed.

v2: Better naming of variable and function (Matt Roper)

Bspec: 52545, 52546, 62482
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410123723.7-2-francois.dugast@intel.com
5 months agodrm/xe/pf: Add support to configure GuC SR-IOV policies
Michal Wajdeczko [Wed, 10 Apr 2024 17:03:38 +0000 (19:03 +0200)]
drm/xe/pf: Add support to configure GuC SR-IOV policies

There are few knobs inside GuC firmware to control VFs scheduling.
Add basic functions to support their reconfigurations.
We will start using them shortly once we prepare debugfs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410170338.1199-6-michal.wajdeczko@intel.com
5 months agodrm/xe/guc: Add helpers for GuC KLVs
Michal Wajdeczko [Wed, 10 Apr 2024 17:03:37 +0000 (19:03 +0200)]
drm/xe/guc: Add helpers for GuC KLVs

Many of the GuC actions use KLVs to pass additional parameters or
configuration data. Add few helper functions for better reporting
any information related to KLVs.

Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Acked-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410170338.1199-5-michal.wajdeczko@intel.com
5 months agodrm/xe/guc: Add PF2GUC_UPDATE_VGT_POLICY to ABI
Michal Wajdeczko [Wed, 10 Apr 2024 17:03:36 +0000 (19:03 +0200)]
drm/xe/guc: Add PF2GUC_UPDATE_VGT_POLICY to ABI

In upcoming patches the PF driver will add support to change GuC
policies and will need to use PF2GUC_UPDATE_VGT_POLICY messages.
Add necessary definitions to our GuC firmware ABI header.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410170338.1199-4-michal.wajdeczko@intel.com
5 months agodrm/xe/pf: Introduce helper functions for use by PF
Michal Wajdeczko [Wed, 10 Apr 2024 17:03:35 +0000 (19:03 +0200)]
drm/xe/pf: Introduce helper functions for use by PF

PF driver will maintain VF's configuration data mostly on the
GT level, but some internal data is located at the device level.

To allow easy access to that data from the GT level functions, and
to minimize code duplications, introduce set of helper functions
and macros for explicit use by the PF driver.

We will use these helpers in upcoming patches.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410170338.1199-3-michal.wajdeczko@intel.com