linux-block.git
4 months agodrm/xe: Use xe_bo_lock()/xe_bo_unlock() helpers
Himal Prasad Ghimiray [Wed, 24 Apr 2024 04:39:10 +0000 (10:09 +0530)]
drm/xe: Use xe_bo_lock()/xe_bo_unlock() helpers

There is no change in functionality. Using the helper function
defined within the driver for locking/unlocking the reservation
object.

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424043910.2190376-3-himal.prasad.ghimiray@intel.com
4 months agodrm/xe/vm: Use xe_vm_lock()/xe_vm_unlock() helpers
Himal Prasad Ghimiray [Wed, 24 Apr 2024 04:39:09 +0000 (10:09 +0530)]
drm/xe/vm: Use xe_vm_lock()/xe_vm_unlock() helpers

There is no change in functionality. Using the helper function
defined within the driver.

-v2
Use xe_vm_unlock() (Ashutosh/Matt)

-v3
Use xe_vm_unlock() for error label too (Matt)

Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424043910.2190376-2-himal.prasad.ghimiray@intel.com
4 months agodrm/xe: Replace engine references with exec queue in xe_guc_submit.c
Matthew Brost [Thu, 25 Apr 2024 23:25:44 +0000 (16:25 -0700)]
drm/xe: Replace engine references with exec queue in xe_guc_submit.c

Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-6-matthew.brost@intel.com
4 months agodrm/xe: Fix alignment in GuC exec queue state defines
Matthew Brost [Thu, 25 Apr 2024 23:25:43 +0000 (16:25 -0700)]
drm/xe: Fix alignment in GuC exec queue state defines

Normalize the alignment for readability.

v3:
 - Fix typo in commit (Himal)
 - Fix EXEC_QUEUE_STATE_WEDGED too (Himal)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-5-matthew.brost@intel.com
4 months agodrm/xe: s/ENGINE_STATE_KILLED/EXEC_QUEUE_STATE_KILLED
Matthew Brost [Thu, 25 Apr 2024 23:25:42 +0000 (16:25 -0700)]
drm/xe: s/ENGINE_STATE_KILLED/EXEC_QUEUE_STATE_KILLED

Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-4-matthew.brost@intel.com
4 months agodrm/xe: s/ENGINE_STATE_SUSPENDED/EXEC_QUEUE_STATE_SUSPENDED
Matthew Brost [Thu, 25 Apr 2024 23:25:41 +0000 (16:25 -0700)]
drm/xe: s/ENGINE_STATE_SUSPENDED/EXEC_QUEUE_STATE_SUSPENDED

Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-3-matthew.brost@intel.com
4 months agodrm/xe: s/ENGINE_STATE_ENABLED/EXEC_QUEUE_STATE_ENABLED
Matthew Brost [Thu, 25 Apr 2024 23:25:40 +0000 (16:25 -0700)]
drm/xe: s/ENGINE_STATE_ENABLED/EXEC_QUEUE_STATE_ENABLED

Exec queue has replaced engine nomenclature.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425232544.1935578-2-matthew.brost@intel.com
4 months agodrm/xe: Delete unused GuC submission_state.suspend
Matthew Brost [Thu, 25 Apr 2024 05:47:47 +0000 (22:47 -0700)]
drm/xe: Delete unused GuC submission_state.suspend

GuC submission_state.suspend is unused, delete it.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240425054747.1918811-1-matthew.brost@intel.com
4 months agodrm/xe/vm: prevent UAF in rebind_work_func()
Matthew Auld [Tue, 23 Apr 2024 07:47:23 +0000 (08:47 +0100)]
drm/xe/vm: prevent UAF in rebind_work_func()

We flush the rebind worker during the vm close phase, however in places
like preempt_fence_work_func() we seem to queue the rebind worker
without first checking if the vm has already been closed.  The concern
here is the vm being closed with the worker flushed, but then being
rearmed later, which looks like potential uaf, since there is no actual
refcounting to track the queued worker. We can't take the vm->lock here
in preempt_rebind_work_func() to first check if the vm is closed since
that will deadlock, so instead flush the worker again when the vm
refcount reaches zero.

v2:
 - Grabbing vm->lock in the preempt worker creates a deadlock, so
   checking the closed state is tricky. Instead flush the worker when
   the refcount reaches zero. It should be impossible to queue the
   preempt worker without already holding vm ref.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1676
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1591
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1364
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1304
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1249
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423074721.119633-4-matthew.auld@intel.com
4 months agoRevert "drm/xe/vm: drop vm->destroy_work"
Matthew Auld [Tue, 23 Apr 2024 07:47:22 +0000 (08:47 +0100)]
Revert "drm/xe/vm: drop vm->destroy_work"

This reverts commit 5b259c0d1d3caa6efc66c2b856840e68993f814e.

Cleanup here is good, however we need to able to flush a worker during
vm destruction which might involve sleeping, so bring back the worker.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423074721.119633-3-matthew.auld@intel.com
4 months agodrm/xe/preempt_fence: enlarge the fence critical section
Matthew Auld [Thu, 18 Apr 2024 14:46:31 +0000 (15:46 +0100)]
drm/xe/preempt_fence: enlarge the fence critical section

It is really easy to introduce subtle deadlocks in
preempt_fence_work_func() since we operate on single global ordered-wq
for signalling our preempt fences behind the scenes, so even though we
signal a particular fence, everything in the callback should be in the
fence critical section, since blocking in the callback will prevent
other published fences from signalling. If we enlarge the fence critical
section to cover the entire callback, then lockdep should be able to
understand this better, and complain if we grab a sensitive lock like
vm->lock, which is also held when waiting on preempt fences.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240418144630.299531-2-matthew.auld@intel.com
4 months agodrm/xe/guc: Fix typos in VF CFG KLVs descriptions
Michal Wajdeczko [Wed, 24 Apr 2024 14:05:06 +0000 (16:05 +0200)]
drm/xe/guc: Fix typos in VF CFG KLVs descriptions

Apart from the obvious spelling typo, use the correct values for
infinity quantum/timeout settings (it's 0x0 instead of 0xFFFFFFFF).

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140506.2133-1-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Expose PF service details via debugfs
Michal Wajdeczko [Wed, 24 Apr 2024 17:10:30 +0000 (19:10 +0200)]
drm/xe/pf: Expose PF service details via debugfs

For debug purposes we might want to verify which registers values
PF is sharing with VFs and to view which VF/PF ABI versions were
negotiated by the VFs. Plug the 'print' functions already provided
by the PF service code into our debugfs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424171030.2177-1-michal.wajdeczko@intel.com
4 months agodrm/xe: Check result of drmm_mutex_init()
Michal Wajdeczko [Tue, 9 Apr 2024 15:31:32 +0000 (17:31 +0200)]
drm/xe: Check result of drmm_mutex_init()

Although it's unlikely that drmm_mutex_init() will fail during
driver initialization, however we shouldn't ignore this case.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240409153132.1111-1-michal.wajdeczko@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/xe2: Add workaround 14021567978
Tejas Upadhyay [Wed, 10 Apr 2024 06:46:40 +0000 (12:16 +0530)]
drm/xe/xe2: Add workaround 14021567978

Workaround 14021567978 applies to RenderCS xe2

V3:
  - Cover xe2_hpg as its landed upstream now
V2(MattR):
  - Move tuning to wa and apply to xe2

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410064640.1010098-1-tejas.upadhyay@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/guc: Improve GuC doorbell/context ID manager intro message
Michal Wajdeczko [Fri, 19 Apr 2024 15:34:07 +0000 (17:34 +0200)]
drm/xe/guc: Improve GuC doorbell/context ID manager intro message

We can use recently added str_plural() helper.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419153407.402-1-michal.wajdeczko@intel.com
4 months agodrm/xe: Introduce the wedged_mode debugfs
Rodrigo Vivi [Tue, 23 Apr 2024 22:18:17 +0000 (18:18 -0400)]
drm/xe: Introduce the wedged_mode debugfs

So, the wedged mode can be selected per device at runtime,
before the tests or before reproducing the issue.

v2: - s/busted/wedged
    - some locking consistency

v3: - remove mutex
    - toggle guc reset policy on any mode change

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-4-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: Force wedged state and block GT reset upon any GPU hang
Rodrigo Vivi [Tue, 23 Apr 2024 22:18:16 +0000 (18:18 -0400)]
drm/xe: Force wedged state and block GT reset upon any GPU hang

In many validation situations when debugging GPU Hangs,
it is useful to preserve the GT situation from the moment
that the timeout occurred.

This patch introduces a module parameter that could be used
on situations like this.

If xe.wedged module parameter is set to 2, Xe will be declared
wedged on every single execution timeout (a.k.a. GPU hang) right
after devcoredump snapshot capture and without attempting any
kind of GT reset and blocking entirely any kind of execution.

v2: Really block gt_reset from guc side. (Lucas)
    s/wedged/busted (Lucas)

v3: - s/busted/wedged
    - Really use global_flags (Dafna)
    - More robust timeout handling when wedging it.

v4: A really robust clean exit done by Matt Brost.
    No more kernel warns on unbind.

v5: Simplify error message (Lucas)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Dafna Hirschfeld <dhirschfeld@habana.ai>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Himanshu Somaiya <himanshu.somaiya@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-3-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: declare wedged upon GuC load failure
Rodrigo Vivi [Tue, 23 Apr 2024 22:18:15 +0000 (18:18 -0400)]
drm/xe: declare wedged upon GuC load failure

Let's block the device upon any GuC load failure.
But let's continue with the probe so guc logs can be read
from the debugfs.

v2: - s/wedged/busted
    - do not block probe or we lose guc_logs in debugfs (Matt)

v3: - s/busted/wedged

v4: Do not change __xe_guc_upload return. (Himal)

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: Introduce a simple wedged state
Rodrigo Vivi [Tue, 23 Apr 2024 22:18:14 +0000 (18:18 -0400)]
drm/xe: Introduce a simple wedged state

Introduce a very simple 'wedged' state where any attempt
to access the GPU is entirely blocked.

On some critical cases, like on gt_reset failure, we need to
block any other attempt to use the GPU. Otherwise we are at
a risk of reaching cases that would force us to reboot the machine.

So, when this cases are identified we corner and block any GPU
access. No IOCTL and not even another GT reset should be attempted.

The 'wedged' state in Xe is an end state with no way back.
Only a device "re-probe" (unbind + bind) can restore the GPU access.

v2: - s/wedged/busted (Lucas)
    - use unbind+bind instead of module reload (Lucas)
    - added more info on unbind operations and instruction on bug report
    - only print the message once.

v3: - s/busted/wedged (Ashutosh, Tvrtko, Thomas)
    - don't assume user has sudo and tee available (Lucas)

v4: - remove unnecessary cases around ct communication or migration.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: Add INSTDONE registers to devcoredump
José Roberto de Souza [Wed, 24 Apr 2024 14:03:03 +0000 (07:03 -0700)]
drm/xe: Add INSTDONE registers to devcoredump

This registers contains important information that can help with debug
of GPU hangs.

While at it also fixing the double line jump at the end of engine
registers for CCS engines.

v2:
- print other INSTDONE registers

v3:
- add for_each_geometry/compute_dss()

v4:
- print one slice_common_instdone per glice in DG2+

v5:
- rename registers prefix from DG2 to XEHPG (Zhanjun)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-3-jose.souza@intel.com
4 months agodrm/xe: Add helpers to loop over geometry and compute DSS
José Roberto de Souza [Wed, 24 Apr 2024 14:03:02 +0000 (07:03 -0700)]
drm/xe: Add helpers to loop over geometry and compute DSS

Some DSS can only be available for geometry while others can only be
available for compute.
So here adding helpers to loop only available DSS for given usage.

User of this helper will come in the next patch.

v2:
- drop has_dss()

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-2-jose.souza@intel.com
4 months agodrm/xe: Store xe_hw_engine in xe_hw_engine_snapshot
José Roberto de Souza [Wed, 24 Apr 2024 14:03:01 +0000 (07:03 -0700)]
drm/xe: Store xe_hw_engine in xe_hw_engine_snapshot

A future patch will require gt and xe device structs, so here
replacing class by hwe.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Zhanjun Dong <zhanjun.dong@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424140319.61651-1-jose.souza@intel.com
4 months agodrm/xe/pf: Clamp maximum execution quantum to 100s
Michal Wajdeczko [Fri, 19 Apr 2024 12:35:43 +0000 (14:35 +0200)]
drm/xe/pf: Clamp maximum execution quantum to 100s

GuC is silently clamping values of the execution quantum and
preemption timeout KLVs to 100s. Perform explicit clamping on the
driver side as later there is no way to read back values used by
the firmware and we shouldn't mislead the user about actual values
being used when we print them in dmesg or debugfs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419123543.270-3-michal.wajdeczko@intel.com
4 months agodrm/xe/guc: Update VF configuration KLVs definitions
Michal Wajdeczko [Fri, 19 Apr 2024 12:35:42 +0000 (14:35 +0200)]
drm/xe/guc: Update VF configuration KLVs definitions

GuC firmware specification says that maximum value for the execution
quantum KLV is 100s and anything exceeding that will be clamped.
The same limitation applies to the preemption timeout KLV.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419123543.270-2-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Expose SR-IOV policy settings over debugfs
Michal Wajdeczko [Tue, 23 Apr 2024 13:12:44 +0000 (15:12 +0200)]
drm/xe/pf: Expose SR-IOV policy settings over debugfs

We already have functions to configure SR-IOV policies.
Allow to tweak those policy settings over debugfs.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-4-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Expose SR-IOV VF control commands over debugfs
Michal Wajdeczko [Tue, 23 Apr 2024 13:12:43 +0000 (15:12 +0200)]
drm/xe/pf: Expose SR-IOV VF control commands over debugfs

We already have functions to control the VF.
Allow to control the VF using debugfs.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-3-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Expose SR-IOV VFs configuration over debugfs
Michal Wajdeczko [Tue, 23 Apr 2024 13:12:42 +0000 (15:12 +0200)]
drm/xe/pf: Expose SR-IOV VFs configuration over debugfs

We already have functions to configure VF resources and to print
actual provisioning details. Expose this functionality in debugfs
to allow experiment with different settings or inspect details in
case of unexpected issues with the provisioning.

As debugfs attributes are per-VF, we use parent d_inode->i_private
to store VFID, similarly how we did for per-GT attributes.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423131244.2045-2-michal.wajdeczko@intel.com
4 months agodrm/xe/kunit: Add PF service tests
Michal Wajdeczko [Tue, 23 Apr 2024 18:04:36 +0000 (20:04 +0200)]
drm/xe/kunit: Add PF service tests

Start with basic tests for VF/PF ABI version negotiation. As we
treat all platforms in the same way, we can run the tests on one
platform. More tests will likely come later.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-6-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Add SR-IOV GuC Relay PF services
Michal Wajdeczko [Tue, 23 Apr 2024 18:04:35 +0000 (20:04 +0200)]
drm/xe/pf: Add SR-IOV GuC Relay PF services

We already have mechanism that allows a VF driver to communicate
with the PF driver, now add PF side handlers for VF2PF requests
defined in version 1.0 of VF/PF GuC Relay ABI specification.

The VF2PF_HANDSHAKE request must be used by the VF driver to
negotiate the ABI version prior to sending any other request.
We will reset any negotiated version later during FLR.

The outcome of the VF2PF_QUERY_RUNTIME requests depends on actual
platform, for legacy platforms used as SDV is provided as-is, for
latest platforms it is preliminary, and might be changed.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-5-michal.wajdeczko@intel.com
4 months agodrm/xe: Add few more GT register definitions
Michal Wajdeczko [Tue, 23 Apr 2024 18:04:34 +0000 (20:04 +0200)]
drm/xe: Add few more GT register definitions

While we are not using these registers right now, they are part
of some runtime register lists that PF driver share with VFs on
some legacy platforms that we might want to support as SDV.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-4-michal.wajdeczko@intel.com
4 months agodrm/xe: Add helper to calculate adjusted register offset
Michal Wajdeczko [Tue, 23 Apr 2024 18:04:33 +0000 (20:04 +0200)]
drm/xe: Add helper to calculate adjusted register offset

Our MMIO accessing functions automatically adjust addresses for the
media registers based on mmio.adj_limit and mmio.adj_offset logic.
Move it to the separate helper to avoid code duplication and to
allow using it by the upcoming changes to PF driver code.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-3-michal.wajdeczko@intel.com
4 months agodrm/xe/guc: Add GuC Relay ABI version 1.0 definitions
Michal Wajdeczko [Tue, 23 Apr 2024 18:04:32 +0000 (20:04 +0200)]
drm/xe/guc: Add GuC Relay ABI version 1.0 definitions

This initial GuC Relay ABI specification includes messages for ABI
version negotiation and to query values of runtime/fuse registers.

We will start handling those messages on the PF driver soon.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-2-michal.wajdeczko@intel.com
4 months agodrm/xe: Fix unexpected backmerge results
Thomas Hellström [Tue, 23 Apr 2024 12:11:14 +0000 (14:11 +0200)]
drm/xe: Fix unexpected backmerge results

The recent backmerge from drm-next to drm-xe-next brought with it
some silent unexpected results. One code snippet was added twice
and a partial revert had merge errors. Fix that up to
reinstate the affected code as it was before the backmerge.

v2:
- Commit log message rewording (Lucas DeMarchi)

Fixes: 79790b6818e9 ("Merge drm/drm-next into drm-xe-next")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423121114.39325-1-thomas.hellstrom@linux.intel.com
4 months agodrm/xe: make xe_pm_runtime_lockdep_map a static struct
Rodrigo Vivi [Mon, 22 Apr 2024 20:14:54 +0000 (16:14 -0400)]
drm/xe: make xe_pm_runtime_lockdep_map a static struct

Fix the new sparse warning:

>> drivers/gpu/drm/xe/xe_pm.c:72:20: sparse: sparse: symbol
'xe_pm_runtime_lockdep_map' was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202404191329.EZzOTzwK-lkp@intel.com/
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240422201454.699089-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/guc: Fix arguments passed to relay G2H handlers
Michal Wajdeczko [Fri, 19 Apr 2024 15:03:51 +0000 (17:03 +0200)]
drm/xe/guc: Fix arguments passed to relay G2H handlers

By default CT code was passing just payload of the G2H event
message, while Relay code expects full G2H message including
HXG header which contains DATA0 field. Fix that.

Fixes: 26d4481ac23f ("drm/xe/guc: Start handling GuC Relay event messages")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419150351.358-1-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Fix xe_gt_sriov_pf_config_print_available_ggtt()
Michal Wajdeczko [Fri, 19 Apr 2024 14:10:00 +0000 (16:10 +0200)]
drm/xe/pf: Fix xe_gt_sriov_pf_config_print_available_ggtt()

This function is using internal helper pf_get_spare_ggtt() that
expects PF's master mutex to be locked. Fix that.

Fixes: ac6598aed1b3 ("drm/xe/pf: Add support to configure SR-IOV VFs")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419141000.314-1-michal.wajdeczko@intel.com
4 months agodrm/xe: Kill xe_device_mem_access_{get*,put}
Rodrigo Vivi [Thu, 18 Apr 2024 14:30:49 +0000 (10:30 -0400)]
drm/xe: Kill xe_device_mem_access_{get*,put}

Let's simply convert all the current callers towards direct
xe_pm_runtime access and remove this extra layer of indirection.

No functional change is expected with this patch since
xe_mem_access_get was already using the xe_pm_runtime_get_noresume
at this point.

v2: Convert all the current callers instead of a big refactor
at once.

v3: - Rebased
    - Squashed the GSC/HDCP
    - Added a new case: sriov_pf_policy
    - Improved commit message to highlight that
      there's no functional change in this patch.

Reviewed-by: Matthew Auld <matthew.auld@intel.com> #v2
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240418143049.43231-1-rodrigo.vivi@intel.com
5 months agodrm/xe: Define all possible engines in media IP descriptors
Matt Roper [Wed, 17 Apr 2024 15:26:22 +0000 (08:26 -0700)]
drm/xe: Define all possible engines in media IP descriptors

Rather than trying to identify exactly which engines are available on
each platform in the IP descriptor, just include the list of all media
engines that the IP could theoretically support (i.e., 8 VCS + 4 VECS).
We still rely on the media fuse registers to tell us which specific
engine instances are actually present on a given platform, so there
shouldn't be any functional change.  This will help prevent mistakes
with engine numbering (for example ambiguity about whether the 2nd VCS
engine on a platform with exactly two engines is numbered "VCS1" or
"VCS2") and will also future-proof the code a bit more in case new SKUs
or platform refreshes extend the engine list in the future.

Note that the media fuse register technically has an 8-bit field for
VECS engine presence starting on Xe2.  However there's still no MMIO
register range reserved for VE engines above VECS3, so VE0-VE3 is still
consider the "maximum" VE engine mask that the driver can support for
now.

Bspec: 52614, 52615, 62567
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417152621.3357990-2-matthew.d.roper@intel.com
5 months agodrm/i915: Convert intel_runtime_pm_get_noresume towards raw wakeref
Rodrigo Vivi [Thu, 18 Apr 2024 22:37:56 +0000 (18:37 -0400)]
drm/i915: Convert intel_runtime_pm_get_noresume towards raw wakeref

In the past, the noresume function was used by the GEM code to ensure
wakelocks were held and bump its usage. This is no longer the case
and this function was totally unused until it started to be used again
by display with commit 77e619a82fc3 ("drm/i915/display: convert inner
wakeref get towards get_if_in_use")

However, on the display code, most of the callers are using the
raw wakeref, rather then the wakelock version. What caused a
major regression caught by CI.

Another option to this patch is to go with the original plan and
use the get_if_in_use variant in the display code, what is enough
to fulfil our needs. Then, an extra patch to delete the unused
_noresume variant.

v2: Keep grabbing wakelock but only assert for wakeref. (Imre)

Cc: Imre Deak <imre.deak@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: 77e619a82fc3 ("drm/i915/display: convert inner wakeref get towards get_if_in_use")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10875
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240418223756.68427-1-rodrigo.vivi@intel.com
5 months agodrm/i915/hwmon: Get rid of devm
Ashutosh Dixit [Wed, 17 Apr 2024 14:56:46 +0000 (07:56 -0700)]
drm/i915/hwmon: Get rid of devm

When both hwmon and hwmon drvdata (on which hwmon depends) are device
managed resources, the expectation, on device unbind, is that hwmon will be
released before drvdata. However, in i915 there are two separate code
paths, which both release either drvdata or hwmon and either can be
released before the other. These code paths (for device unbind) are as
follows (see also the bug referenced below):

Call Trace:
release_nodes+0x11/0x70
devres_release_group+0xb2/0x110
component_unbind_all+0x8d/0xa0
component_del+0xa5/0x140
intel_pxp_tee_component_fini+0x29/0x40 [i915]
intel_pxp_fini+0x33/0x80 [i915]
i915_driver_remove+0x4c/0x120 [i915]
i915_pci_remove+0x19/0x30 [i915]
pci_device_remove+0x32/0xa0
device_release_driver_internal+0x19c/0x200
unbind_store+0x9c/0xb0

and

Call Trace:
release_nodes+0x11/0x70
devres_release_all+0x8a/0xc0
device_unbind_cleanup+0x9/0x70
device_release_driver_internal+0x1c1/0x200
unbind_store+0x9c/0xb0

This means that in i915, if use devm, we cannot gurantee that hwmon will
always be released before drvdata. Which means that we have a uaf if hwmon
sysfs is accessed when drvdata has been released but hwmon hasn't.

The only way out of this seems to be do get rid of devm_ and release/free
everything explicitly during device unbind.

v2: Change commit message and other minor code changes
v3: Cleanup from i915_hwmon_register on error (Armin Wolf)
v4: Eliminate potential static analyzer warning (Rodrigo)
    Eliminate fetch_and_zero (Jani)
v5: Restore previous logic for ddat_gt->hwmon_dev error return (Andi)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10366
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417145646.793223-1-ashutosh.dixit@intel.com
5 months agodrm/xe/pm: Capture errors and handle them
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:11 +0000 (23:42 +0530)]
drm/xe/pm: Capture errors and handle them

xe_pm_init may encounter failures for various reasons, such as a failure
in initializing drmm_mutex, or when dealing with a d3cold-capable device
for vram_threshold sysfs creation and setting default threshold.
Presently, all these potential failures are disregarded.

Move d3cold.lock initialization to xe_pm_init_early and cause driver
abort if mutex initialization has failed.

For xe_pm_init failures cleanup the driver and return error code

-v2
Make mutex init cleaner (Lucas)

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-8-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe/tile: Abort driver load for sysfs creation failure
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:10 +0000 (23:42 +0530)]
drm/xe/tile: Abort driver load for sysfs creation failure

Ensure that the status of all tile associated sysfs entries creation is
relayed to xe_tile_init_noalloc, leading to a driver load abort if any
sysfs creation failures occur.

-v2
Avoid unnecessary warn/error messages. (Lucas)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-7-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe/gt: Abort driver load for sysfs creation failure
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:09 +0000 (23:42 +0530)]
drm/xe/gt: Abort driver load for sysfs creation failure

Instead of allowing the driver to load with incomplete sysfs entries in
case of sysfs creation failure, we should terminate the driver loading.
This change ensures that the status of all gt associated sysfs entries
creation is relayed to xe_gt_init, leading to a driver load abort if any
sysfs creation failures occur.

-v2
use err_force_wake label instead of new. (Lucas)
Avoid unnecessary warn/error messages. (Lucas)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-6-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe: Return NULL in case of drmm_add_action_or_reset failure
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:08 +0000 (23:42 +0530)]
drm/xe: Return NULL in case of drmm_add_action_or_reset failure

In case of drmm_add_action_or_reset failure return NULL and no need
to print warning messages as they will be printed implictly.

Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-5-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe: call free_gsc_pkt only once on action add failure
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:07 +0000 (23:42 +0530)]
drm/xe: call free_gsc_pkt only once on action add failure

The drmm_add_action_or_reset function automatically invokes the
action (free_gsc_pkt) in the event of a failure; therefore, there's no
necessity to call it within the return check.

-v2
Fix commit message. (Lucas)

Fixes: d8b1571312b7 ("drm/xe/huc: HuC authentication via GSC")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-4-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe: Remove sysfs only once on action add failure
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:06 +0000 (23:42 +0530)]
drm/xe: Remove sysfs only once on action add failure

The drmm_add_action_or_reset function automatically invokes the action
(sysfs removal) in the event of a failure; therefore, there's no
necessity to call it within the return check.

Modify the return type of xe_gt_ccs_mode_sysfs_init to int, allowing the
caller to pass errors up the call chain. Should sysfs creation or
drmm_add_action_or_reset fail, error propagation will prompt a driver
load abort.

-v2
Edit commit message (Nikula/Lucas)
use err_force_wake label instead of new. (Lucas)
Avoid unnecessary warn/error messages. (Lucas)

Fixes: f3bc5bb4d53d ("drm/xe: Allow userspace to configure CCS mode")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-3-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe: Simplify function return using drmm_add_action_or_reset()
Himal Prasad Ghimiray [Fri, 12 Apr 2024 18:12:05 +0000 (23:42 +0530)]
drm/xe: Simplify function return using drmm_add_action_or_reset()

Instead of assigning the value of drmm_add_action_or_reset() to err and
returning err in case of failure and 0 in case of success, simply return
the result of drmm_add_action_or_reset().

-v2:
cleanup in xe_display too.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412181211.1155732-2-himal.prasad.ghimiray@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe/xe2lpg: Extend Wa_14020338487
Gustavo Sousa [Wed, 17 Apr 2024 21:25:01 +0000 (18:25 -0300)]
drm/xe/xe2lpg: Extend Wa_14020338487

Wa_14020338487 also applies to Xe2_LPG. Replicate the existing entry to
one specific for Xe2_LPG.

Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417212501.312346-1-gustavo.sousa@intel.com
5 months agodrm/xe: Add outer runtime_pm protection to xe_live_ktest@xe_dma_buf
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:52 +0000 (16:39 -0400)]
drm/xe: Add outer runtime_pm protection to xe_live_ktest@xe_dma_buf

Any kunit doing any memory access should get their own runtime_pm
outer references since they don't use the standard driver API
entries. In special this dma_buf from the same driver.

Found by pre-merge CI on adding WARN calls for unprotected
inner callers:

<6> [318.639739]     # xe_dma_buf_kunit: running xe_test_dmabuf_import_same_driver
<4> [318.639957] ------------[ cut here ]------------
<4> [318.639967] xe 0000:4d:00.0: Missing outer runtime PM protection
<4> [318.640049] WARNING: CPU: 117 PID: 3832 at drivers/gpu/drm/xe/xe_pm.c:533 xe_pm_runtime_get_noresume+0x48/0x60 [xe]

Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-10-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Ensure all the inner access are using the _noresume variant
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:51 +0000 (16:39 -0400)]
drm/xe: Ensure all the inner access are using the _noresume variant

At this point mem_access references should be only used as inner
points of the execution and a get with synchronous resume previously
called at an outer point.

So, before killing mem_acces in favor of direct accsess, let's
ensure that we first convert them towards the new _noresume
variant that will WARN us if no inner caller happened.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-9-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Convert mem_access_if_ongoing to direct xe_pm_runtime_get_if_active
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:50 +0000 (16:39 -0400)]
drm/xe: Convert mem_access_if_ongoing to direct xe_pm_runtime_get_if_active

Now that assert_mem_access is relying directly on the pm_runtime state
instead of the counters, there's no reason why we cannot use
the pm_runtime functions directly.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-8-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Removing extra mem_access protection from runtime pm
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:49 +0000 (16:39 -0400)]
drm/xe: Removing extra mem_access protection from runtime pm

This is not needed any longer, now that we have all the protection
in place with the runtime pm itself.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-7-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Convert xe_gem_fault to use direct xe_pm_runtime calls
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:48 +0000 (16:39 -0400)]
drm/xe: Convert xe_gem_fault to use direct xe_pm_runtime calls

The gem page fault is one of the outer bound protections where
we want to ensure that the hardware is in D0 before proceeding
with memory access. Let's convert it towards the xe_pm_runtime
functions directly so we can then convert the mem_access to be
inner protection only and then Kill it for good.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-6-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Remove useless mem_access during probe
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:47 +0000 (16:39 -0400)]
drm/xe: Remove useless mem_access during probe

xe_pm_init is the very last thing during the xe_pci_probe(),
hence these protections are useless from the point of view
of ensuring that the device is awake.

Let's remove it so we continue towards the goal of killing
xe_device_mem_access.

v2: Adding more cases
v3: Provide a separate fix for xe_tile_init_noalloc return (Matt)
    Adding a new case where display HDCP init calls which
    are also called at display probe time.

Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-5-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Move lockdep protection from mem_access to xe_pm_runtime
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:46 +0000 (16:39 -0400)]
drm/xe: Move lockdep protection from mem_access to xe_pm_runtime

The mem_access itself is not holding any lock, but attempting
to train lockdep with possible scarring locks happening during
runtime pm. We are going soon to kill the mem_access get and put
helpers in favor of direct xe_pm_runtime calls, so let's just
move this lock around to where it now belongs.

v2: s/lockdep_training/lockdep_prime (Matt Auld)

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-4-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/i915/display: convert inner wakeref get towards get_if_in_use
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:45 +0000 (16:39 -0400)]
drm/i915/display: convert inner wakeref get towards get_if_in_use

This patch brings no functional change. Since at this point of
the code we are already asserting a wakeref was held, it means
that we are with runtime_pm 'in_use' and in practical terms we
are only bumping the pm_runtime usage counter and moving on.

However, xe driver has a lockdep annotation that warned us that
if a sync resume was actually called at this point, we could have
a deadlock because we are inside the power_domains->lock locked
area and the resume would call the irq_reset, which would also
try to get the power_domains->lock.

For this reason, let's convert this call to a safer option and
calm lockdep on.

v2: use _noresume variant instead of get_in_use (Ville, Imre)

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Imre Deak <imre.deak@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-3-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Introduce intel_runtime_pm_get_noresume at compat-i915-headers for display
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:44 +0000 (16:39 -0400)]
drm/xe: Introduce intel_runtime_pm_get_noresume at compat-i915-headers for display

The i915-display will start using the intel_runtime_pm_noresume.
So we need to add the compat header before it.

Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: Introduce xe_pm_runtime_get_noresume for inner callers
Rodrigo Vivi [Wed, 17 Apr 2024 20:39:43 +0000 (16:39 -0400)]
drm/xe: Introduce xe_pm_runtime_get_noresume for inner callers

Let's ensure that we have an option for inner callers that will
raise WARN if device is not active and not protected by outer callers.

Make this also a void function forcing every caller to unconditionally
put the reference back afterwards.

This will be very important for cases where we want to hold the
reference before scheduling a work in a queue. Then the work job
will be responsible for putting it back.

While at this, already convert a case from mem_access_get_ongoing where
it is not checking for the reference and put it back, what would
cause the underflow.

v2: Fix identation.
v3: Convert equivalent missing put from mem_access towards pm_runtime.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417203952.25503-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe/lnl: Apply GuC Wa_13011645652
Vinay Belgaumkar [Wed, 17 Apr 2024 05:48:02 +0000 (22:48 -0700)]
drm/xe/lnl: Apply GuC Wa_13011645652

Enable WA for a bug that could cause the C6 state machine to hang
during RC6 exit.

v2: Add comment clarifying the WA (John H)
v3: Add more details to the comment (John H)

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417054802.1766359-1-vinay.belgaumkar@intel.com
5 months agodrm/xe/vm: don't include xe_gt.h
Matthew Auld [Fri, 12 Apr 2024 11:31:47 +0000 (12:31 +0100)]
drm/xe/vm: don't include xe_gt.h

clangd complains here, since nothing in xe_gt.h seems to be needed.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412113144.259426-6-matthew.auld@intel.com
5 months agodrm/xe/vm: drop vm->destroy_work
Matthew Auld [Fri, 12 Apr 2024 11:31:46 +0000 (12:31 +0100)]
drm/xe/vm: drop vm->destroy_work

Now that we no longer grab the usm.lock mutex (which might sleep) it
looks like it should be safe to directly perform xe_vm_free when vm
refcount reaches zero, instead of punting that off to some worker.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412113144.259426-5-matthew.auld@intel.com
5 months agodrm/xe/vm: prevent UAF with asid based lookup
Matthew Auld [Fri, 12 Apr 2024 11:31:45 +0000 (12:31 +0100)]
drm/xe/vm: prevent UAF with asid based lookup

The asid is only erased from the xarray when the vm refcount reaches
zero, however this leads to potential UAF since the xe_vm_get() only
works on a vm with refcount != 0. Since the asid is allocated in the vm
create ioctl, rather erase it when closing the vm, prior to dropping the
potential last ref. This should also work when user closes driver fd
without explicit vm destroy.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1594
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412113144.259426-4-matthew.auld@intel.com
5 months agodrm/xe/stolen: ignore first page for FBC
Matthew Auld [Fri, 12 Apr 2024 15:03:03 +0000 (16:03 +0100)]
drm/xe/stolen: ignore first page for FBC

We have observed underruns on some platforms if the CFB offset is within
the first page of stolen. Just like i915 skip the first page.

v2 (Maarten)
  - Also align the start.

BSpec: 50214
Reported-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412150301.273344-4-matthew.auld@intel.com
5 months agodrm/xe/stolen: lower the default alignment
Matthew Auld [Fri, 12 Apr 2024 15:03:02 +0000 (16:03 +0100)]
drm/xe/stolen: lower the default alignment

No need to be so aggressive here. The upper layers will already apply
the needed alignment, plus some allocations might wish to skip it. Main
issue is that we might want to have start/end bias range which doesn't
match the default alignment which is rejected by the allocator.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412150301.273344-3-matthew.auld@intel.com
5 months agodrm/xe: select X86_PLATFORM_DEVICES when ACPI_WMI is selected
Lu Yao [Mon, 15 Apr 2024 02:52:15 +0000 (10:52 +0800)]
drm/xe: select X86_PLATFORM_DEVICES when ACPI_WMI is selected

ACPI_WMI is a subitem of X86_PLATFORM_DEVICES. And X86_PLATFORM_DEVICES
is not selected in the current Kconfig, and may cause Kconfig warnings:

WARNING: unmet direct dependencies detected for ACPI_WMI
  Depends on [n]: X86_PLATFORM_DEVICES [=n] && ACPI [=y]
  Selected by [m]:
  - DRM_XE [=m] && HAS_IOMEM [=y] && DRM [=m] && PCI [=y] && MMU [=y] &&
    (m && MODULES [=y] || y && KUNIT [=y]=y) && X86 [=y] && ACPI [=y]

Signed-off-by: Lu Yao <yaolu@kylinos.cn>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415025215.15811-1-yaolu@kylinos.cn
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe/bmg: Some LNL workarounds also apply to BMG
John Harrison [Wed, 10 Apr 2024 00:26:46 +0000 (17:26 -0700)]
drm/xe/bmg: Some LNL workarounds also apply to BMG

Enable a couple of existing workarounds for a new platform.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410002646.3002394-3-John.C.Harrison@Intel.com
5 months agodrm/xe/lnl: Enable more GuC based workarounds
John Harrison [Wed, 10 Apr 2024 00:26:45 +0000 (17:26 -0700)]
drm/xe/lnl: Enable more GuC based workarounds

There are a couple of new workarounds for LNL that are implemented in
the GuC firmware. The KMD needs to enable them explicitly.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410002646.3002394-2-John.C.Harrison@Intel.com
5 months agodrm/xe/pf: Add support to configure SR-IOV VFs
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:37 +0000 (19:39 +0200)]
drm/xe/pf: Add support to configure SR-IOV VFs

To run correctly, each Virtual Function must be provisioned with
some chunk of shared hardware or firmware resources (like GGTT,
device memory, GuC doorbell IDs, GuC context IDs) and scheduling
parameters (execution quantum or preemption timeout).

All resources assigned to VFs must be excluded from the PF driver
use and may require some additional preparation steps (like setup
of the LMTT or update of the GGTT PTE). Those provisioning details
must be then sent to the GuC firmware as most of those details
will be shared later with the VF drivers during their boot.

Add basic functions to provision VFs with all hardware resources
or scheduling parameters. We will use them shortly in upcoming
patches either in manual provisioning over debugfs, exposed to the
advanced users, or automatic provisioning done by PF driver during
VFs enabling.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-7-michal.wajdeczko@intel.com
5 months agodrm/xe/pf: Add SR-IOV PF specific early GT initialization
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:36 +0000 (19:39 +0200)]
drm/xe/pf: Add SR-IOV PF specific early GT initialization

The PF driver must maintain additional GT level data per each VF.
This additional per-VF data will be added in upcoming patches and
will include: provisioning configuration (like GGTT space or LMEM
allocation sizes or scheduling parameters), monitoring thresholds
and counters, and more.

As number of supported VFs varies across platforms use flexible
array where first entry will contain metadata for the PF itself
(if such configuration parameter is applicable for the PF) and
all remaining entries will contain data for potential VFs.

Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-6-michal.wajdeczko@intel.com
5 months agodrm/xe/guc: Add PF2GUC_UPDATE_VF_CFG to ABI
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:35 +0000 (19:39 +0200)]
drm/xe/guc: Add PF2GUC_UPDATE_VF_CFG to ABI

In upcoming patches the PF driver will add support to change VFs
configuration and will need to use PF2GUC_UPDATE_VF_CFG messages.
Add necessary definitions to our GuC firmware ABI header.

Definitions of the GuC VF Configuration KLVs used by this action
are already present in abi/guc_klvs_abi.h

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-5-michal.wajdeczko@intel.com
5 months agodrm/xe: Add xe_ttm_vram_get_avail
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:34 +0000 (19:39 +0200)]
drm/xe: Add xe_ttm_vram_get_avail

The PF driver will need to know size of the remaining available
VRAM to estimate fair VRAM allocations that could be used across
all VFs in automatic VFs provisioning mode. Add helper function
for that. We will use it in upcoming patch.

Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-4-michal.wajdeczko@intel.com
5 months agodrm/xe: Allow to assign GGTT region to the VF
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:33 +0000 (19:39 +0200)]
drm/xe: Allow to assign GGTT region to the VF

VF's drivers can't modify GGTT PTEs except the range explicitly
assigned by the PF driver. To allow hardware enforcement of this
requirement, each GGTT PTE has a field with the VF number that
identifies which VF can modify that particular GGTT PTE entry.

Only PF driver can modify this field and PF driver shall do that
before VF drivers will be loaded. Add function to prepare PTEs.
Since it will be used only by the PF driver, make it available
only for CONFIG_PCI_IOV=y.

Bspec: 45015, 52395
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-3-michal.wajdeczko@intel.com
5 months agodrm/xe: Add helper to format SR-IOV function name
Michal Wajdeczko [Mon, 15 Apr 2024 17:39:32 +0000 (19:39 +0200)]
drm/xe: Add helper to format SR-IOV function name

While the GuC firmware and the Xe driver are using VF identifier
VFID(0) to represent the Physical Function, we should avoid using
"VF0" name and use proper "PF" name in all user facing messages
related to the Physical Function and use "VFn" name only when
referrinf to the true Virtual Function. Add simple helper to get
properly formatted function name based on the function number.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-2-michal.wajdeczko@intel.com
5 months agodrm/xe/gt: Add L3 bank mask to GT topology
Francois Dugast [Wed, 10 Apr 2024 12:37:22 +0000 (12:37 +0000)]
drm/xe/gt: Add L3 bank mask to GT topology

Generate the mask of enabled L3 banks for the GT. It is stored with the
rest of the GT topology in a consistent representation across platforms.
For now the L3 bank mask is just printed in the log for developers to
easily figure out the fusing characteristics of machines that they are
trying to debug issues on. Later it can be used to replace existing code
in the driver that requires the L3 bank count (not mask). Also the mask
can easily be exposed to user space in a new query if needed.

v2: Better naming of variable and function (Matt Roper)

Bspec: 52545, 52546, 62482
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410123723.7-2-francois.dugast@intel.com
5 months agodrm/xe/pf: Add support to configure GuC SR-IOV policies
Michal Wajdeczko [Wed, 10 Apr 2024 17:03:38 +0000 (19:03 +0200)]
drm/xe/pf: Add support to configure GuC SR-IOV policies

There are few knobs inside GuC firmware to control VFs scheduling.
Add basic functions to support their reconfigurations.
We will start using them shortly once we prepare debugfs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410170338.1199-6-michal.wajdeczko@intel.com
5 months agodrm/xe/guc: Add helpers for GuC KLVs
Michal Wajdeczko [Wed, 10 Apr 2024 17:03:37 +0000 (19:03 +0200)]
drm/xe/guc: Add helpers for GuC KLVs

Many of the GuC actions use KLVs to pass additional parameters or
configuration data. Add few helper functions for better reporting
any information related to KLVs.

Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Acked-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410170338.1199-5-michal.wajdeczko@intel.com
5 months agodrm/xe/guc: Add PF2GUC_UPDATE_VGT_POLICY to ABI
Michal Wajdeczko [Wed, 10 Apr 2024 17:03:36 +0000 (19:03 +0200)]
drm/xe/guc: Add PF2GUC_UPDATE_VGT_POLICY to ABI

In upcoming patches the PF driver will add support to change GuC
policies and will need to use PF2GUC_UPDATE_VGT_POLICY messages.
Add necessary definitions to our GuC firmware ABI header.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410170338.1199-4-michal.wajdeczko@intel.com
5 months agodrm/xe/pf: Introduce helper functions for use by PF
Michal Wajdeczko [Wed, 10 Apr 2024 17:03:35 +0000 (19:03 +0200)]
drm/xe/pf: Introduce helper functions for use by PF

PF driver will maintain VF's configuration data mostly on the
GT level, but some internal data is located at the device level.

To allow easy access to that data from the GT level functions, and
to minimize code duplications, introduce set of helper functions
and macros for explicit use by the PF driver.

We will use these helpers in upcoming patches.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410170338.1199-3-michal.wajdeczko@intel.com
5 months agodrm/xe/pf: Introduce mutex to protect VFs configurations
Michal Wajdeczko [Wed, 10 Apr 2024 17:03:34 +0000 (19:03 +0200)]
drm/xe/pf: Introduce mutex to protect VFs configurations

PF driver will maintain configurations and resources for every VF
and this data could span multiple tiles and/or GTs.  Prepare mutex
to protect data that we will add in upcoming patches.

Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410170338.1199-2-michal.wajdeczko@intel.com
5 months agoMerge drm/drm-next into drm-xe-next
Thomas Hellström [Fri, 12 Apr 2024 13:14:25 +0000 (15:14 +0200)]
Merge drm/drm-next into drm-xe-next

Backmerging drm-next in order to get up-to-date and in particular
to access commit 9ca5facd0400f610f3f7f71aeb7fc0b949a48c67.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
5 months agodrm/xe: Use hmm_range_fault to populate user pages
Oak Zeng [Fri, 12 Apr 2024 09:52:37 +0000 (15:22 +0530)]
drm/xe: Use hmm_range_fault to populate user pages

This is an effort to unify hmmptr (aka system allocator)
and userptr code. hmm_range_fault is used to populate
a virtual address range for both hmmptr and userptr,
instead of hmmptr using hmm_range_fault and userptr
using get_user_pages_fast.

This also aligns with AMD gpu driver's behavior. In
long term, we plan to put some common helpers in this
area to drm layer so it can be re-used by different
vendors.

-v1
use the function with parameter to confirm whether lock is
acquired by the caller or needs to be acquired in hmm_range_fault.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412095237.1048599-3-himal.prasad.ghimiray@intel.com
5 months agodrm/xe: Introduce helper to populate userptr
Oak Zeng [Fri, 12 Apr 2024 09:52:36 +0000 (15:22 +0530)]
drm/xe: Introduce helper to populate userptr

Introduce a helper function xe_userptr_populate_range to populate
a userptr range. This functions calls hmm_range_fault to read
CPU page tables and populate all pfns/pages of this virtual address
range. For system memory page, dma-mapping is performed
to get a dma-address which can be used later for GPU to access pages.

v1: Address review comments:
    separate a npage_in_range function (Matt)
    reparameterize function xe_userptr_populate_range function (Matt)
    move mmu_interval_read_begin() call into while loop (Thomas)
    s/mark_range_accessed/xe_mark_range_accessed (Thomas)
    use set_page_dirty_lock (vs set_page_dirty) (Thomas)
    move a few checking in xe_vma_userptr_pin_pages to hmm.c (Matt)
v2: Remove device private page support. Only support system
    pages for now. use dma-map-sg rather than dma-map-page (Matt/Thomas)
v3: Address review comments:
    Squash patch "drm/xe: Introduce a helper to free sg table" to current
    patch (Matt)
    start and end addresses are already page aligned (Matt)
    Do mmap_read_lock and mmap_read_unlock for hmm_range_fault incase of
    non system allocator call. (Matt)
    Drop kthread_use_mm and kthread_unuse_mm. (Matt)
    No need of kernel-doc for static functions.(Matt)
    Modify function names. (Matt)
    Free sgtable incase of dma_map_sgtable failure.(Matt)
    Modify loop for hmm_range_fault.(Matt)
v4: Remove the dummy function for xe_hmm_userptr_populate_range
    since CONFIG_HMM_MIRROR is needed. (Matt)
    Change variable names start/end to userptr_start/userptr_end.(Matt)
v5: Remove device private page support info from commit message. Since
    the patch doesn't support device page handling. (Thomas)

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412095237.1048599-2-himal.prasad.ghimiray@intel.com
5 months agodrm/xe: Fix bo leak in intel_fb_bo_framebuffer_init
Maarten Lankhorst [Thu, 4 Apr 2024 09:03:02 +0000 (11:03 +0200)]
drm/xe: Fix bo leak in intel_fb_bo_framebuffer_init

Add a unreference bo in the error path, to prevent leaking a bo ref.

Return 0 on success to clarify the success path.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404090302.68422-1-maarten.lankhorst@linux.intel.com
5 months agodrm/xe: Remove devcoredump during driver release
José Roberto de Souza [Tue, 9 Apr 2024 20:02:06 +0000 (13:02 -0700)]
drm/xe: Remove devcoredump during driver release

This will remove devcoredump from file system and free its resources
during driver unload.

This fix the driver unload after gpu hang happened, otherwise this
it would report that Xe KMD is still in use and it would leave the
kernel in a state that Xe KMD can't be unload without a reboot.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Acked-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240409200206.108452-2-jose.souza@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodevcoredump: Add dev_coredump_put()
José Roberto de Souza [Tue, 9 Apr 2024 20:02:05 +0000 (13:02 -0700)]
devcoredump: Add dev_coredump_put()

It is useful for modules that do not want to keep coredump available
after its unload.
Otherwise, the coredump would only be removed after DEVCD_TIMEOUT
seconds.

v2:
- dev_coredump_put() documentation updated (Mukesh)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240409200206.108452-1-jose.souza@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agoMerge tag 'drm-misc-next-2024-04-10' of https://gitlab.freedesktop.org/drm/misc/kerne...
Dave Airlie [Thu, 11 Apr 2024 03:36:00 +0000 (13:36 +1000)]
Merge tag 'drm-misc-next-2024-04-10' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for v6.10:

Cross-subsystem Changes:
- Add Tomi as Xilinx maintainer.
- Add sound bindings to DT.

Core Changes:
- Make DP helper depend on KMS helper.

Driver Changes:
- Assorted small fixes to bridge/dw-hdmi, bridge/cdns-mhdp8456, xlnx,
  omap, tilcdc, bridge/imx8mp-hdmi-pvi.
- Add debugfs entries to qaic.
- Add conservative fallback to panel eDP.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/dc690de5-17da-4af6-93a9-8078c99245fd@linux.intel.com
5 months agodrm/xe: Add xe_guc_ads.c to uses_generated_oob
Nathan Chancellor [Wed, 10 Apr 2024 18:16:11 +0000 (11:16 -0700)]
drm/xe: Add xe_guc_ads.c to uses_generated_oob

A recent change added a use of xe_wa_oob.h without adding the file that
uses it to uses_generated_oob, which means xe_wa_oob.h does not get
properly generated before attempting to build the object file:

    LINK     resolve_btfids
    CC [M]  drivers/gpu/drm/xe/xe_guc_ads.o
  drivers/gpu/drm/xe/xe_guc_ads.c:10:10: fatal error: generated/xe_wa_oob.h: No such file or directory
     10 | #include <generated/xe_wa_oob.h>
        |          ^~~~~~~~~~~~~~~~~~~~~~~

After adding '$(obj)/xe_guc_ads.o' to uses_generated_oob, xe_wa_oob.h is
always generated before building the file, resulting in no errors:

    LINK     resolve_btfids
    HOSTCC  drivers/gpu/drm/xe/xe_gen_wa_oob
    GEN     xe_wa_oob.c xe_wa_oob.h
    CC [M]  drivers/gpu/drm/xe/xe_guc_ads.o

Fixes: c151ff5c9053 ("drm/xe/lnl: Enable GuC Wa_14019882105")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410-drm-xe-fix-xe_guc_ads-using-xe_wa_oob-v1-1-441f2d8e5d83@kernel.org
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
5 months agodrm/xe/guc: Prefer GT oriented asserts in CTB code
Michal Wajdeczko [Thu, 4 Apr 2024 19:36:46 +0000 (21:36 +0200)]
drm/xe/guc: Prefer GT oriented asserts in CTB code

GuC CTB is related to the GT, so best to use xe_gt_assert().

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404193647.759-2-michal.wajdeczko@intel.com
5 months agodrm/xe/guc: Prefer GT oriented logs in GuC CTB code
Michal Wajdeczko [Thu, 4 Apr 2024 19:36:45 +0000 (21:36 +0200)]
drm/xe/guc: Prefer GT oriented logs in GuC CTB code

A platform can have more than one GuC, so we should use GT-oriented
logs to refer to specific GuC.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404193647.759-1-michal.wajdeczko@intel.com
5 months agodrm/xe: re-order lmem init check and wait for initialization to complete
Riana Tauro [Wed, 10 Apr 2024 08:50:05 +0000 (14:20 +0530)]
drm/xe: re-order lmem init check and wait for initialization to complete

Lmem init check should be done only after pcode initialization
status is complete. Move lmem init check after pcode status
check. Also wait for a short while after pcode status check
to allow completion of the task.

Failing to do so, can lead to aborting the module load
leaving the system unusable. Wait until the lmem initialization
is complete within a timeout (60s) or till the user aborts.

v2: use bool as return type
    re-order the code comment (Rodrigo)
    add comment for deferring probe (Himal)

v3: rebase

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410085005.1126343-3-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/xe: check pcode init status only on root gt of root tile
Riana Tauro [Wed, 10 Apr 2024 08:50:04 +0000 (14:20 +0530)]
drm/xe: check pcode init status only on root gt of root tile

The root tile indicates the pcode initialization is complete
when all tiles have completed their initialization.
So the mailbox can be polled only on the root tile.
Check pcode init status only on root tile and move it to
device probe early as root tile is initialized there.
Also make similar changes in resume paths.

v2: add lock/unlocked version of pcode_mailbox_rw
    to allow pcode init to be called in device
    early probe (Rodrigo)

v3: add code description about using root tile
    change function names to xe_pcode_probe_early
    and xe_pcode_init (Rodrigo)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410085005.1126343-2-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 months agodrm/bridge: imx8mp-hdmi-pvi: Convert to platform remove callback returning void
Uwe Kleine-König [Mon, 4 Mar 2024 09:05:56 +0000 (10:05 +0100)]
drm/bridge: imx8mp-hdmi-pvi: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Robert Foss <rfoss@kernel.org>
Signed-off-by: Robert Foss <rfoss@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240304090555.716327-2-u.kleine-koenig@pengutronix.de
5 months agodrm: tilcdc: don't use devm_pinctrl_get_select_default() in probe
Wolfram Sang [Fri, 22 Sep 2023 07:37:13 +0000 (09:37 +0200)]
drm: tilcdc: don't use devm_pinctrl_get_select_default() in probe

Since commit ab78029ecc34 ("drivers/pinctrl: grab default handles from
device core"), we can rely on device core for setting the default pins.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230922073714.6164-1-wsa+renesas@sang-engineering.com
5 months agodrm/omap: dmm_tiler: drop driver owner assignment
Krzysztof Kozlowski [Sat, 30 Mar 2024 20:28:04 +0000 (21:28 +0100)]
drm/omap: dmm_tiler: drop driver owner assignment

Core in platform_driver_register() already sets the .owner, so driver
does not need to.  Whatever is set here will be anyway overwritten by
main driver calling platform_driver_register().

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240330202804.83936-1-krzysztof.kozlowski@linaro.org
5 months agodrm: xlnx: db: fix a memory leak in probe
Dan Carpenter [Thu, 4 Apr 2024 07:32:07 +0000 (10:32 +0300)]
drm: xlnx: db: fix a memory leak in probe

Free "dp" before returning.

Fixes: be318d01a903 ("drm: xlnx: dp: Reset DisplayPort IP")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/86def134-9537-4939-912e-3a424e3a75b6@moroto.mountain
5 months agoMAINTAINERS: Add myself as maintainer for Xilinx DRM drivers
Tomi Valkeinen [Wed, 27 Mar 2024 13:03:33 +0000 (15:03 +0200)]
MAINTAINERS: Add myself as maintainer for Xilinx DRM drivers

Add myself as a co-maintainer for Xilinx DRM drivers to help Laurent.

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240327-xilinx-maintainer-v1-1-c5fdc115f448@ideasonboard.com
5 months agodrm/xe: Add SR-IOV info attribute to debugfs
Michal Wajdeczko [Thu, 4 Apr 2024 15:44:31 +0000 (17:44 +0200)]
drm/xe: Add SR-IOV info attribute to debugfs

As SR-IOV support varies between platforms and the driver can run
in different SR-IOV modes, add debugfs file with these details.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404154431.583-4-michal.wajdeczko@intel.com
5 months agodrm/xe: Add proper detection of the SR-IOV PF mode
Michal Wajdeczko [Thu, 4 Apr 2024 15:44:30 +0000 (17:44 +0200)]
drm/xe: Add proper detection of the SR-IOV PF mode

SR-IOV PF mode detection is based on PCI capability as reported by
the PCI dev_is_pf() function and additionally on 'max_vfs' module
parameter which could be also used to disable PF capability even
if SR-IOV PF capability is reported by the hardware.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404154431.583-3-michal.wajdeczko@intel.com
5 months agodrm/xe: Add max_vfs module parameter
Michal Wajdeczko [Thu, 4 Apr 2024 15:44:29 +0000 (17:44 +0200)]
drm/xe: Add max_vfs module parameter

We want to have an option to limit the number of the VFs that the
PF driver will be able to manage.  With this limit set to zero we
will also have a way to completely disable the PF functionality.

Since we currently don't support SR-IOV on any platform, we start
with this limit set to zero by default.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404154431.583-2-michal.wajdeczko@intel.com