linux-block.git
4 months agodrm/xe/hwmon: expose package and vram temperature
Raag Jadav [Fri, 31 Jan 2025 05:45:02 +0000 (11:15 +0530)]
drm/xe/hwmon: expose package and vram temperature

Add hwmon support for temp2_input and temp3_input attributes, which will
expose package and vram temperature in millidegree Celsius. With this in
place we can monitor temperature using lm-sensors tool.

v2: Reuse existing channels (Badal, Karthik)

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131054502.1528555-1-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/pxp: Fail the load if PXP fails to initialize
Daniele Ceraolo Spurio [Mon, 3 Feb 2025 23:48:57 +0000 (15:48 -0800)]
drm/xe/pxp: Fail the load if PXP fails to initialize

The PXP implementation mimics the i915 approach of allowing the load
to continue even if PXP init has failed. On Xe however we're taking an
harder stance on boot error and only allowing the load to complete if
everything is working, so update the code to fail if anything goes wrong
during PXP init.

While at it, update the return code in case of PXP not supported to be 0
instead of EOPNOTSUPP, to follow the standard of functions called by
xe_device_probe where every non-zero value means failure.

Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250203234857.1419637-1-daniele.ceraolospurio@intel.com
4 months agodrm/xe/vf: Don't try to trigger a full GT reset if VF
Michal Wajdeczko [Fri, 31 Jan 2025 18:25:02 +0000 (19:25 +0100)]
drm/xe/vf: Don't try to trigger a full GT reset if VF

VFs don't have access to the GDRST(0x941c) register that driver
uses to reset a GT. Attempt to trigger a reset using debugfs:

 $ cat /sys/kernel/debug/dri/0000:00:02.1/gt0/force_reset

or due to a hang condition detected by the driver leads to:

 [ ] xe 0000:00:02.1: [drm] GT0: trying reset from force_reset [xe]
 [ ] xe 0000:00:02.1: [drm] GT0: reset queued
 [ ] xe 0000:00:02.1: [drm] GT0: reset started
 [ ] ------------[ cut here ]------------
 [ ] xe 0000:00:02.1: [drm] GT0: VF is trying to write 0x1 to an inaccessible register 0x941c+0x0
 [ ] WARNING: CPU: 3 PID: 3069 at drivers/gpu/drm/xe/xe_gt_sriov_vf.c:996 xe_gt_sriov_vf_write32+0xc6/0x580 [xe]
 [ ] RIP: 0010:xe_gt_sriov_vf_write32+0xc6/0x580 [xe]
 [ ] Call Trace:
 [ ]  <TASK>
 [ ]  ? show_regs+0x6c/0x80
 [ ]  ? __warn+0x93/0x1c0
 [ ]  ? xe_gt_sriov_vf_write32+0xc6/0x580 [xe]
 [ ]  ? report_bug+0x182/0x1b0
 [ ]  ? handle_bug+0x6e/0xb0
 [ ]  ? exc_invalid_op+0x18/0x80
 [ ]  ? asm_exc_invalid_op+0x1b/0x20
 [ ]  ? xe_gt_sriov_vf_write32+0xc6/0x580 [xe]
 [ ]  ? xe_gt_sriov_vf_write32+0xc6/0x580 [xe]
 [ ]  ? xe_gt_tlb_invalidation_reset+0xef/0x110 [xe]
 [ ]  ? __mutex_unlock_slowpath+0x41/0x2e0
 [ ]  xe_mmio_write32+0x64/0x150 [xe]
 [ ]  do_gt_reset+0x2f/0xa0 [xe]
 [ ]  gt_reset_worker+0x14e/0x1e0 [xe]
 [ ]  process_one_work+0x21c/0x740
 [ ]  worker_thread+0x1db/0x3c0

Fix that by sending H2G VF_RESET(0x5507) action instead.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4078
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131182502.852-1-michal.wajdeczko@intel.com
4 months agodrm/xe/relay: Don't use GFP_KERNEL for new transactions
Michal Wajdeczko [Fri, 31 Jan 2025 15:37:13 +0000 (16:37 +0100)]
drm/xe/relay: Don't use GFP_KERNEL for new transactions

VFs use a relay transaction during the resume/reset flow and use
of the GFP_KERNEL flag may conflict with the reclaim:

     -> #0 (fs_reclaim){+.+.}-{0:0}:
 [ ]        __lock_acquire+0x1874/0x2bc0
 [ ]        lock_acquire+0xd2/0x310
 [ ]        fs_reclaim_acquire+0xc5/0x100
 [ ]        mempool_alloc_noprof+0x5c/0x1b0
 [ ]        __relay_get_transaction+0xdc/0xa10 [xe]
 [ ]        relay_send_to+0x251/0xe50 [xe]
 [ ]        xe_guc_relay_send_to_pf+0x79/0x3a0 [xe]
 [ ]        xe_gt_sriov_vf_connect+0x90/0x4d0 [xe]
 [ ]        xe_uc_init_hw+0x157/0x3b0 [xe]
 [ ]        do_gt_restart+0x1ae/0x650 [xe]
 [ ]        xe_gt_resume+0xb6/0x120 [xe]
 [ ]        xe_pm_runtime_resume+0x15b/0x370 [xe]
 [ ]        xe_pci_runtime_resume+0x73/0x90 [xe]
 [ ]        pci_pm_runtime_resume+0xa0/0x100
 [ ]        __rpm_callback+0x4d/0x170
 [ ]        rpm_callback+0x64/0x70
 [ ]        rpm_resume+0x594/0x790
 [ ]        __pm_runtime_resume+0x4e/0x90
 [ ]        xe_pm_runtime_get_ioctl+0x9c/0x160 [xe]

Since we have a preallocated pool of relay transactions, which
should cover all our normal relay use cases, we may use the
GFP_NOWAIT flag when allocating new outgoing transactions.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Tested-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Reviewed-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131153713.808-1-michal.wajdeczko@intel.com
4 months agodrm/xe: Refactor max_remote_tiles
Sai Teja Pottumuttu [Thu, 30 Jan 2025 08:58:04 +0000 (14:28 +0530)]
drm/xe: Refactor max_remote_tiles

max_remote_tiles is more related to the platform than the GT IP. Thus
move it to platform descriptor from graphics descriptor. Note that the
FIXME is no more required, thus it can be dropped.

v2: Rebase
v3: Change the position of comment (MattR)

Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250130085804.4136497-3-sai.teja.pottumuttu@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
4 months agodrm/xe: Refactor dma_mask_size
Sai Teja Pottumuttu [Thu, 30 Jan 2025 08:58:03 +0000 (14:28 +0530)]
drm/xe: Refactor dma_mask_size

dma_mask_size is more related to the platform than the GT IP. Thus
move it to platform descriptors.

v2:
 - Rebase

Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250130085804.4136497-2-sai.teja.pottumuttu@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
4 months agodrm/xe/pxp: Enable PXP for MTL and LNL
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:37 +0000 (09:41 -0800)]
drm/xe/pxp: Enable PXP for MTL and LNL

Now that are the pieces are there, we can turn the feature on.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-14-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp: Add PXP debugfs support
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:36 +0000 (09:41 -0800)]
drm/xe/pxp: Add PXP debugfs support

This patch introduces 2 PXP debugfs entries:

- info: prints the current PXP status and key instance
- terminate: simulate a termination interrupt

The first one is useful for debug, while the second one can be used for
testing the termination flow.

v2: move the info prints inside the lock (John)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-13-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp: add PXP PM support
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:35 +0000 (09:41 -0800)]
drm/xe/pxp: add PXP PM support

The HW suspend flow kills all PXP HWDRM sessions, so we need to mark all
the queues and BOs as invalid and do a full termination when PXP is next
used.

v2: rebase
v3: rebase on new status flow, defer termination to next PXP use as it
makes things much easier and allows us to use the same function for all
types of suspend.
v4: fix the documentation of the suspend function (John)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-12-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp/uapi: Add API to mark a BO as using PXP
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:34 +0000 (09:41 -0800)]
drm/xe/pxp/uapi: Add API to mark a BO as using PXP

The driver needs to know if a BO is encrypted with PXP to enable the
display decryption at flip time.
Furthermore, we want to keep track of the status of the encryption and
reject any operation that involves a BO that is encrypted using an old
key. There are two points in time where such checks can kick in:

1 - at VM bind time, all operations except for unmapping will be
    rejected if the key used to encrypt the BO is no longer valid. This
    check is opt-in via a new VM_BIND flag, to avoid a scenario where a
    malicious app purposely shares an invalid BO with a non-PXP aware
    app (such as a compositor). If the VM_BIND was failed, the
    compositor would be unable to display anything at all. Allowing the
    bind to go through means that output still works, it just displays
    garbage data within the bounds of the illegal BO.

2 - at job submission time, if the queue is marked as using PXP, all
    objects bound to the VM will be checked and the submission will be
    rejected if any of them was encrypted with a key that is no longer
    valid.

Note that there is no risk of leaking the encrypted data if a user does
not opt-in to those checks; the only consequence is that the user will
not realize that the encryption key is changed and that the data is no
longer valid.

v2: Better commnnts and descriptions (John), rebase

v3: Properly return the result of key_assign up the stack, do not use
xe_bo in display headers (Jani)

v4: improve key_instance variable documentation (John)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-11-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp/uapi: Add a query for PXP status
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:33 +0000 (09:41 -0800)]
drm/xe/pxp/uapi: Add a query for PXP status

PXP prerequisites (SW proxy and HuC auth via GSC) are completed
asynchronously from driver load, which means that userspace can start
submitting before we're ready to start a PXP session. Therefore, we need
a query that userspace can use to check not only if PXP is supported but
also to wait until the prerequisites are done.

v2: Improve doc, do not report TYPE_NONE as supported (José)
v3: Better comments, remove unneeded copy_from_user (John)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-10-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:32 +0000 (09:41 -0800)]
drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues

Userspace is required to mark a queue as using PXP to guarantee that the
PXP instructions will work. In addition to managing the PXP sessions,
when a PXP queue is created the driver will set the relevant bits in
its context control register.

On submission of a valid PXP queue, the driver will validate all
encrypted objects mapped to the VM to ensured they were encrypted with
the current key.

v2: Remove pxp_types include outside of PXP code (Jani), better comments
and code cleanup (John)

v3: split the internal PXP management to a separate patch for ease of
review. re-order ioctl checks to always return -EINVAL if parameters are
invalid, rebase on msix changes.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-9-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp: Add PXP queue tracking and session start
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:31 +0000 (09:41 -0800)]
drm/xe/pxp: Add PXP queue tracking and session start

We expect every queue that uses PXP to be marked as doing so, to allow
the driver to correctly manage the encryption status. The API for doing
this from userspace is coming in the next patch, while this patch
implement the management side of things. When a PXP queue is created,
the driver will do the following:

- Start the default PXP session if it is not already running;
- assign an rpm ref to the queue to keep for its lifetime (this is
  required because PXP HWDRM sessions are killed by the HW suspend flow).

Since PXP start and termination can race each other, this patch also
introduces locking and a state machine to keep track of the pending
operations. Note that since we'll need to take the lock from the
suspend/resume paths as well, we can't do submissions while holding it,
which means we need a slightly more complicated state machine to keep
track of intermediate steps.

v4: new patch in the series, split from the following interface patch to
keep review manageable. Lock and status rework to not do submissions
under lock.

v5: Improve comments and error logs (John)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-8-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp: Add GSC session initialization support
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:30 +0000 (09:41 -0800)]
drm/xe/pxp: Add GSC session initialization support

A session is initialized (i.e. started) by sending a message to the GSC.
The initialization will be triggered when a user opts-in to using PXP;
the interface for that is coming in a follow-up patch in the series.

v2: clean up error messages, use new ARB define (John)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-7-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp: Handle the PXP termination interrupt
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:29 +0000 (09:41 -0800)]
drm/xe/pxp: Handle the PXP termination interrupt

When something happen to the session, the HW generates a termination
interrupt. In reply to this, the driver is required to submit an inline
session termination via the VCS, trigger the global termination and
notify the GSC FW that the session is now invalid.

v2: rename ARB define to make it cleaner to move it to uapi (John)
v3: fix parameter name in documentation

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-6-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp: Add GSC session invalidation support
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:28 +0000 (09:41 -0800)]
drm/xe/pxp: Add GSC session invalidation support

After a session is terminated, we need to inform the GSC so that it can
clean up its side of the allocation. This is done by sending an
invalidation command with the session ID.
The invalidation will be triggered in response to a termination,
interrupt, whose handling is coming in the next patch in the series.

v2: Better comment and error messages (John)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-5-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp: Add VCS inline termination support
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:27 +0000 (09:41 -0800)]
drm/xe/pxp: Add VCS inline termination support

The key termination is done with a specific submission to the VCS
engine. This flow will be triggered in response to a termination
interrupt, whose handling is coming in a follow-up patch in the series.

v2: clean up defines and command emission code. (John)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-4-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp: Allocate PXP execution resources
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:26 +0000 (09:41 -0800)]
drm/xe/pxp: Allocate PXP execution resources

PXP requires submissions to the HW for the following operations

1) Key invalidation, done via the VCS engine
2) Communication with the GSC FW for session management, done via the
   GSCCS.

Key invalidation submissions are serialized (only 1 termination can be
serviced at a given time) and done via GGTT, so we can allocate a simple
BO and a kernel queue for it.

Submissions for session management are tied to a PXP client (identified
by a unique host_session_id); from the GSC POV this is a user-accessible
construct, so all related submission must be done via PPGTT. The driver
does not currently support PPGTT submission from within the kernel, so
to add this support, the following changes have been included:

- a new type of kernel-owned VM (marked as GSC), required to ensure we
  don't use fault mode on the engine and to mark the different lock
  usage with lockdep.
- a new function to map a BO into a VM from within the kernel.

v2: improve comments and function name, remove unneeded include (John)
v3: fix variable/function names in documentation

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-3-daniele.ceraolospurio@intel.com
4 months agodrm/xe/pxp: Initialize PXP structure and KCR reg
Daniele Ceraolo Spurio [Wed, 29 Jan 2025 17:41:25 +0000 (09:41 -0800)]
drm/xe/pxp: Initialize PXP structure and KCR reg

As the first step towards adding PXP support, hook in the PXP init
function, allocate the PXP structure and initialize the KCR register to
allow PXP HWDRM sessions.

v2: remove unneeded includes, free PXP memory on error (John)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-2-daniele.ceraolospurio@intel.com
4 months agodrm/xe: Remove xe_dummy_exit()
Lucas De Marchi [Fri, 31 Jan 2025 22:39:08 +0000 (14:39 -0800)]
drm/xe: Remove xe_dummy_exit()

Since commit 014125c64d09 ("drm/xe: Support 'nomodeset' kernel
command-line option") the dummy exit is not needed anymore since the
caller check for a NULL pointer. Drop it.

Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131223908.4147195-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe: Skip survivability mode for VF
Riana Tauro [Fri, 31 Jan 2025 08:05:27 +0000 (13:35 +0530)]
drm/xe: Skip survivability mode for VF

Follow the probe flow in case of VF and do not enter survivability mode
in case of pcode init failure.

Fixes: 5e940312a2ac ("drm/xe: Add functions and sysfs for boot survivability")
Suggested-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250131080527.2256475-1-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/display: Use a single early init call for display
Maarten Lankhorst [Tue, 21 Jan 2025 14:28:50 +0000 (15:28 +0100)]
drm/xe/display: Use a single early init call for display

Now that interrupts are disabled for xe_display_init_noaccel,
both xe_display_init_noirq and xe_display_init_noaccel run in the same
context.

This means that we can get rid of the 3 different init calls. Without
interrupts, nothing is touching display up to this point.
Unify those 3 early display calls into a single xe_display_init_early(),
this makes the init sequence cleaner, and display less tangled during
init.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250121142850.4960-3-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
4 months agodrm/xe: Defer irq init until after xe_display_init_noaccel
Maarten Lankhorst [Tue, 21 Jan 2025 14:28:49 +0000 (15:28 +0100)]
drm/xe: Defer irq init until after xe_display_init_noaccel

As stated in previous commit, we have to move interrupt handling
until after xe_display_init_noaccel, as using memirqs would require
an allocation.

A full solution will of course require memirq allocation to be moved,
but the first part only focuses on the required changes to display.

Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250121142850.4960-2-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
4 months agodrm/xe/display: Add intel_plane_initial_vblank_wait
Maarten Lankhorst [Tue, 21 Jan 2025 14:28:48 +0000 (15:28 +0100)]
drm/xe/display: Add intel_plane_initial_vblank_wait

We're changing the driver to have no interrupts during early init for
Xe, so we poll the PIPE_FRMSTMSMP counter instead.

Interrupts cannot be enabled during FB readout because memirq's requires
an allocation. This would overwrite the FB we want to read out.

While it might be possible to also run do the same in i915 and run
it without interrupts, the platforms i915 supports had a less clear
distinction between display and graphics. For this reason I choose
only to touch Xe for now.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250121142850.4960-1-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
4 months agoMerge drm/drm-next into drm-xe-next
Lucas De Marchi [Thu, 30 Jan 2025 22:35:52 +0000 (14:35 -0800)]
Merge drm/drm-next into drm-xe-next

Backmerge drm-next to get the common APIs and refactors as well as
getting the display changes from i915 in xe so the probe order can be
improved.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/pf: Add runtime registers for graphics gen >= 30
Jakub Kolakowski [Tue, 28 Jan 2025 11:03:00 +0000 (11:03 +0000)]
drm/xe/pf: Add runtime registers for graphics gen >= 30

Add missing runtime registers for graphics versions of 3000 or higher.
This is required for Xe3 where additionally we have
MIRROR_L3BANK_ENABLE register.

Signed-off-by: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Suggested-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Adam Miszczak <adam.miszczak@linux.intel.com>
Cc: Jakub Kolakowski <jakub1.kolakowski@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Cc: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piorkowski <piotr.piorkowski@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Tested-by: Marcin Bernatowicz <marcin.bernatowicz@linux.intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128110300.2840596-2-jakub1.kolakowski@intel.com
4 months agodrm/xe: Fix sort order of .o lists in Makefile
Gustavo Sousa [Wed, 15 Jan 2025 14:08:04 +0000 (11:08 -0300)]
drm/xe: Fix sort order of .o lists in Makefile

The Makefile for xe asks us to keep the lists of object files sorted:

  # Please keep these build lists sorted!

Reshuffle the lists into the correct sort order. That was done by
filtering each unsorted list through 'LC_ALL=C sort'.

Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250115140812.20799-1-gustavo.sousa@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
4 months agodrm/xe/pf: Reset GuC VF config when unprovisioning critical resource
Michal Wajdeczko [Wed, 29 Jan 2025 19:59:47 +0000 (20:59 +0100)]
drm/xe/pf: Reset GuC VF config when unprovisioning critical resource

GuC firmware counts received VF configuration KLVs and may start
validation of the complete VF config even if some resources where
unprovisioned in the meantime, leading to unexpected errors like:

 $ echo 1 | sudo tee /sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/contexts_quota
 $ echo 0 | sudo tee /sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/contexts_quota
 $ echo 1 | sudo tee /sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/doorbells_quota
 $ echo 0 | sudo tee /sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/doorbells_quota
 $ echo 1 | sudo tee /sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/ggtt_quota
 tee: '/sys/kernel/debug/dri/0000:00:02.0/gt0/vf1/ggtt_quota': Input/output error

To mitigate this problem trigger explicit VF config reset after
unprovisioning any of the critical resources (GGTT, context or
doorbell IDs) that GuC is monitoring.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129195947.764-3-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Don't send BEGIN_ID if VF has no context/doorbells
Michal Wajdeczko [Wed, 29 Jan 2025 19:59:46 +0000 (20:59 +0100)]
drm/xe/pf: Don't send BEGIN_ID if VF has no context/doorbells

It turned out that GuC validates VF configuration immediately
after receiving "some" set of configuration KLVs and complains
if one of the critical, from GuC understanding, resource is left
unprovisioned, even if PF should be still allowed to make late VF
config adjustments, since VF was not yet started.

This issue was discovered after we decided to asynchronously
re-send configuration KLVs after GT reset/resume, as then fair
VF auto-provisioning could already allocate some of the resources,
which was a prerequiste for sending those config KLVs:

 # fair GGTT provisioning
 [] xe 0000:00:02.0: [drm] GT0: PF: pushed VF1 config with 2 KLVs:
 [] xe 0000:00:02.0: [drm] GT0: { key 0x0001 : 64b value 0x176a000 } # ggtt_start
 [] xe 0000:00:02.0: [drm] GT0: { key 0x0002 : 64b value 0xfd696000 } # ggtt_size
 [] xe 0000:00:02.0: [drm] GT0: PF: VF1 provisioned with 4251541504 (3.96 GiB) GGTT
 # re-provisioning worker
 [] xe 0000:00:02.0: [drm] *ERROR* GT0: H2G request 0x5503 failed: error 0x60 hint 0x0
 [] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 14 config KLVs (-EIO)
 [] xe 0000:00:02.0: [drm] GT0: { key 0x0001 : 64b value 0x176a000 } # ggtt_start
 [] xe 0000:00:02.0: [drm] GT0: { key 0x0002 : 64b value 0xfd696000 } # ggtt_size
 [] xe 0000:00:02.0: [drm] GT0: { key 0x8a0b : 32b value 0 } # begin_ctx_id
 [] xe 0000:00:02.0: [drm] GT0: { key 0x0004 : 32b value 0 } # num_contexts
 [] xe 0000:00:02.0: [drm] GT0: { key 0x8a0a : 32b value 0 } # begin_db_id
 [] xe 0000:00:02.0: [drm] GT0: { key 0x0006 : 32b value 0 } # num_doorbells
 [] xe 0000:00:02.0: [drm] GT0: { key 0x8a01 : 32b value 0 } # exec_quantum
 [] xe 0000:00:02.0: [drm] GT0: { key 0x8a02 : 32b value 0 } # preempt_timeout
 [] xe 0000:00:02.0: [drm] GT0: { key 0x8a03 : 32b value 0 } # cat_error_count
 [] xe 0000:00:02.0: [drm] GT0: { key 0x8a04 : 32b value 0 } # engine_reset_count
 [] xe 0000:00:02.0: [drm] GT0: { key 0x8a05 : 32b value 0 } # page_fault_count
 [] xe 0000:00:02.0: [drm] GT0: { key 0x8a06 : 32b value 0 } # guc_time_us
 [] xe 0000:00:02.0: [drm] GT0: { key 0x8a07 : 32b value 0 } # irq_time_us
 [] xe 0000:00:02.0: [drm] GT0: { key 0x8a08 : 32b value 0 } # doorbell_time_us
 [] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-EIO)

To avoid such errors stop sending BEGIN_CONTEXT/DOORBELL_ID KLVs
if no GuC context/doorbell IDs were provisioned to VF.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4176
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129195947.764-2-michal.wajdeczko@intel.com
4 months agodrm/xe/gt_pagefault: Print engine class string
Francois Dugast [Wed, 29 Jan 2025 17:52:41 +0000 (18:52 +0100)]
drm/xe/gt_pagefault: Print engine class string

The engine class index which is printed here is an internal representation
for debugging. It is _not_ an index based on DRM_XE_ENGINE_CLASS_* values
provided in the uAPI. Add the string representation of the engine class to
the output in order to limit possible confusion by users when analyzing the
logs.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129175241.338043-1-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
4 months agoMerge tag 'amd-drm-fixes-6.14-2025-01-29' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Thu, 30 Jan 2025 04:31:38 +0000 (14:31 +1000)]
Merge tag 'amd-drm-fixes-6.14-2025-01-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-fixes-6.14-2025-01-29:

amdgpu:
- GC 12 fix
- Aldebaran fix
- DCN 3.5 fix
- Freesync fix

amdkfd:
- Per queue reset fix
- MES fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQQgO5Idg2tXNTSZAr293/aFa7yZ2AUCZ5qcgwAKCRC93/aFa7yZ
# 2GHEAP4qGRwRRm/XzGsT7t4IC6l1ALia3IycCpm8BusDpLIVlAD9HSSpKswHtNou
# Zjz7N/t791BIeS/cz36ICNqYCmgQ2wY=
# =1Q5i
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 30 Jan 2025 07:24:19 AEST
# gpg:                using EDDSA key 203B921D836B5735349902BDBDDFF6856BBC99D8
# gpg: Can't check signature: No public key
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129213037.3966625-1-alexander.deucher@amd.com
4 months agodrm/xe/guc: Fix size_t print format
Lucas De Marchi [Tue, 28 Jan 2025 15:42:42 +0000 (07:42 -0800)]
drm/xe/guc: Fix size_t print format

Use %zx format to print size_t to remove the following warning when
building for i386:

>> drivers/gpu/drm/xe/xe_guc_ct.c:1727:43: warning: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat]
    1727 |                         drm_printf(p, "[CTB].length: 0x%lx\n", snapshot->ctb_size);
         |                                                        ~~~     ^~~~~~~~~~~~~~~~~~
         |                                                        %zx

Cc: José Roberto de Souza <jose.souza@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501281627.H6nj184e-lkp@intel.com/
Fixes: cb1f868ca137 ("drm/xe: Make GUC binaries dump consistent with other binaries in devcoredump")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128154242.3371687-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agoRevert "drm/xe/lnl: Enable GuC SLPC DCC task"
Rodrigo Vivi [Tue, 28 Jan 2025 22:32:48 +0000 (17:32 -0500)]
Revert "drm/xe/lnl: Enable GuC SLPC DCC task"

This reverts commit 50554bf3e56dd0c78ef1eedb685d0ab36c9c9987.

DCC in LNL should be disabled. It was a mistake to decide
to go against GuC platform defaults in this case and this
could lead to regressions in some TDP limited scenarios
instead of helping.

Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128223248.660748-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/ptl: Update the PTL pci id table
Matt Atwood [Tue, 28 Jan 2025 17:51:02 +0000 (09:51 -0800)]
drm/xe/ptl: Update the PTL pci id table

Update to current bspec table.

Bspec: 72574

Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128175102.45797-1-matthew.s.atwood@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/bmg: Add new PCI IDs
Shekhar Chauhan [Tue, 28 Jan 2025 16:20:15 +0000 (21:50 +0530)]
drm/xe/bmg: Add new PCI IDs

Add 3 new PCI IDs for BMG.

v2: Fix typo -> Replace '.' with ','

Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Clint Taylor <Clinton.A.Taylor@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128162015.3288675-1-shekhar.chauhan@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/amd/display: restore invalid MSA timing check for freesync
Melissa Wen [Tue, 28 Jan 2025 00:41:10 +0000 (21:41 -0300)]
drm/amd/display: restore invalid MSA timing check for freesync

This restores the original behavior that gets min/max freq from EDID and
only set DP/eDP connector as freesync capable if "sink device is capable
of rendering incoming video stream without MSA timing parameters", i.e.,
`allow_invalid_MSA_timing_params` is true. The condition was mistakenly
removed by 0159f88a99c9 ("drm/amd/display: remove redundant freesync
parser for DP").

CC: Mario Limonciello <mario.limonciello@amd.com>
CC: Alex Hung <alex.hung@amd.com>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3915
Fixes: 0159f88a99c9 ("drm/amd/display: remove redundant freesync parser for DP")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
4 months agodrm/amdkfd: only flush the validate MES contex
Prike Liang [Tue, 14 Jan 2025 03:20:17 +0000 (11:20 +0800)]
drm/amdkfd: only flush the validate MES contex

The following page fault was observed duringthe KFD process release.
In this particular error case, the HIP test (./MemcpyPerformance -h)
does not require the queue. As a result, the process_context_addr was
not assigned when the KFD process was released, ultimately leading to
this page fault during the execution of the function
kfd_process_dequeue_from_all_devices().

[345962.294891] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:153 vmid:0 pasid:0)
[345962.295333] amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000000000000000 from client 10
[345962.295775] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000B33
[345962.296097] amdgpu 0000:03:00.0: amdgpu:     Faulty UTCL2 client ID: CPC (0x5)
[345962.296394] amdgpu 0000:03:00.0: amdgpu:     MORE_FAULTS: 0x1
[345962.296633] amdgpu 0000:03:00.0: amdgpu:     WALKER_ERROR: 0x1
[345962.296876] amdgpu 0000:03:00.0: amdgpu:     PERMISSION_FAULTS: 0x3
[345962.297135] amdgpu 0000:03:00.0: amdgpu:     MAPPING_ERROR: 0x1
[345962.297377] amdgpu 0000:03:00.0: amdgpu:     RW: 0x0
[345962.297682] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:169 vmid:0 pasid:0)

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
4 months agodrm/amd/display: Correct register address in dcn35
loanchen [Wed, 15 Jan 2025 09:43:29 +0000 (17:43 +0800)]
drm/amd/display: Correct register address in dcn35

[Why]
the offset address of mmCLK5_spll_field_8 was incorrect for dcn35
which causes SSC not to be enabled.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Lo-An Chen <lo-an.chen@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
4 months agodrm/amd/pm: Mark MM activity as unsupported
Lijo Lazar [Wed, 22 Jan 2025 03:42:41 +0000 (09:12 +0530)]
drm/amd/pm: Mark MM activity as unsupported

Aldebaran doesn't support querying MM activity percentage. Keep the
field as 0xFFs to mark it as unsupported.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
4 months agodrm/amd/amdgpu: change the config of cgcg on gfx12
Kenneth Feng [Mon, 20 Jan 2025 07:33:03 +0000 (15:33 +0800)]
drm/amd/amdgpu: change the config of cgcg on gfx12

change the config of cgcg on gfx12

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.12.x
4 months agodrm/amdkfd: Block per-queue reset when halt_if_hws_hang=1
Jay Cornwall [Thu, 16 Jan 2025 20:36:39 +0000 (14:36 -0600)]
drm/amdkfd: Block per-queue reset when halt_if_hws_hang=1

The purpose of halt_if_hws_hang is to preserve GPU state for driver
debugging when queue preemption fails. Issuing per-queue reset may
kill wavefronts which caused the preemption failure.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.12.x
4 months agodrm/xe: Initialize mei-gsc and vsec in survivability mode
Riana Tauro [Tue, 28 Jan 2025 09:56:32 +0000 (15:26 +0530)]
drm/xe: Initialize mei-gsc and vsec in survivability mode

Initialize mei-gsc in survivability mode and disable HECI
interrupts. Also initialize vsec in survivability mode

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-4-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: Enable Boot Survivability mode
Riana Tauro [Tue, 28 Jan 2025 09:56:31 +0000 (15:26 +0530)]
drm/xe: Enable Boot Survivability mode

Enable boot survivability mode if pcode initialization fails and
if boot status indicates a failure. In this mode, drm card is not
exposed and driver probe returns success after loading the bare minimum
to allow firmware to be flashed via mei.

v2: abstract survivability mode variable
    add BMG check inside function (Jani, Rodrigo)

v3: return -EBUSY during system suspend (Anshuman)
    check survivability mode in pci probe only
    on error

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-3-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: Add functions and sysfs for boot survivability
Riana Tauro [Tue, 28 Jan 2025 09:56:30 +0000 (15:26 +0530)]
drm/xe: Add functions and sysfs for boot survivability

Boot Survivability is a software based workflow for recovering a system
in a failed boot state. Here system recoverability is concerned with
recovering the firmware responsible for boot.

This is implemented by loading the driver with bare minimum (no drm card)
to allow the firmware to be flashed through mei-gsc and collect telemetry.
The driver's probe flow is modified such that it enters survivability mode
when pcode initialization is incomplete and boot status denotes a failure.
In this mode, drm card is not exposed and presence of survivability_mode
entry in PCI sysfs  is used to indicate survivability mode and
provide additional information required for debug

This patch adds initialization functions and exposes admin
readable sysfs entries

The new sysfs will have the below layout

/sys/bus/.../bdf
                   ├── survivability_mode

v2: reorder headers
    fix doc
    remove survivability info and use mode to display information
    use separate function for logging survivability information
    for critical error (Rodrigo)

v3: use for loop
    use dev logs instead of drm
    use helper function for aux history(Rodrigo)
    remove unnecessary error check of greater than max_scratch
    as we are reading only 3 bit

v4: fix checkpatch warnings
    fix space (Rodrigo)
    rename register

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Acked-by: Ashwin Kumar Kulkarni <ashwin.kumar.kulkarni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250128095632.1294722-2-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: Make GUC binaries dump consistent with other binaries in devcoredump
José Roberto de Souza [Thu, 23 Jan 2025 20:22:04 +0000 (12:22 -0800)]
drm/xe: Make GUC binaries dump consistent with other binaries in devcoredump

All other(hwsp, hwctx and vmas) binaries follow this format:
[name].length: 0x1000
[name].data: xxxxxxx
[name].error: errno

The error one is just in case by some reason it was not able to
capture the binary.

So this GuC binaries should follow the same patern.

v2:
- renamed GUC binary to LOG

Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250123202307.95103-3-jose.souza@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe: Fix and re-enable xe_print_blob_ascii85()
Lucas De Marchi [Thu, 23 Jan 2025 20:22:03 +0000 (12:22 -0800)]
drm/xe: Fix and re-enable xe_print_blob_ascii85()

Commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool") partially reverted some changes to workaround breakage
caused to mesa tools. However, in doing so it also broke fetching the
GuC log via debugfs since xe_print_blob_ascii85() simply bails out.

The fix is to avoid the extra newlines: the devcoredump interface is
line-oriented and adding random newlines in the middle breaks it. If a
tool is able to parse it by looking at the data and checking for chars
that are out of the ascii85 space, it can still do so. A format change
that breaks the line-oriented output on devcoredump however needs better
coordination with existing tools.

v2: Add suffix description comment
v3: Reword explanation of xe_print_blob_ascii85() calling drm_puts()
    in a loop

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Julia Filipchuk <julia.filipchuk@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: stable@vger.kernel.org
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Fixes: ec1455ce7e35 ("drm/xe/devcoredump: Add ASCII85 dump helper function")
Link: https://patchwork.freedesktop.org/patch/msgid/20250123202307.95103-2-jose.souza@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/devcoredump: Move exec queue snapshot to Contexts section
Lucas De Marchi [Thu, 23 Jan 2025 05:11:11 +0000 (21:11 -0800)]
drm/xe/devcoredump: Move exec queue snapshot to Contexts section

Having the exec queue snapshot inside a "GuC CT" section was always
wrong.  Commit c28fd6c358db ("drm/xe/devcoredump: Improve section
headings and add tile info") tried to fix that bug, but with that also
broke the mesa tool that parses the devcoredump, hence it was reverted
in commit 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa
debug tool").

With the mesa tool also fixed, this can propagate as a fix on both
kernel and userspace side to avoid unnecessary headache for a debug
feature.

Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Julia Filipchuk <julia.filipchuk@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: stable@vger.kernel.org
Fixes: 70fb86a85dc9 ("drm/xe: Revert some changes that break a mesa debug tool")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250123051112.1938193-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe: Upgrade complaint about missing slice info
John Harrison [Sat, 18 Jan 2025 00:54:03 +0000 (16:54 -0800)]
drm/xe: Upgrade complaint about missing slice info

The steering code needs to know slice/subslice counts and this
information should be retrieved from the hwconfig table. However,
earlier platforms don't have it, hence the KMD has a fallback path.
Newer platforms really should have the entries and if they are missing
that is a bug that needs to be fixed in the table.

So update the complaint to be an error on newer platforms and remove
it completely for older ones that we know are bad (but are not POR for
the Xe driver anyway). Also, re-word the message a little to make it
clearer what the issue is.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250118005403.2960807-1-John.C.Harrison@Intel.com
4 months agodrm/xe/pf: Move VFs reprovisioning to worker
Michal Wajdeczko [Sat, 25 Jan 2025 21:55:05 +0000 (22:55 +0100)]
drm/xe/pf: Move VFs reprovisioning to worker

Since the GuC is reset during GT reset, we need to re-send the
entire SR-IOV provisioning configuration to the GuC. But since
this whole configuration is protected by the PF master mutex and
we can't avoid making allocations under this mutex (like during
LMEM provisioning), we can't do this reprovisioning from gt-reset
path if we want to be reclaim-safe. Move VFs reprovisioning to a
async worker that we will start from the gt-reset path.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250125215505.720-1-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Use GuC Buffer Cache during policy provisioning
Michal Wajdeczko [Fri, 24 Jan 2025 18:52:47 +0000 (19:52 +0100)]
drm/xe/pf: Use GuC Buffer Cache during policy provisioning

Start using GuC buffer cache for the SRIOV policy configuration
actions. This is a required step before we could declare SRIOV
PF as being a reclaim safe.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124185247.676-1-michal.wajdeczko@intel.com
4 months agodrm/xe/pmu: Add GT C6 events
Vinay Belgaumkar [Fri, 24 Jan 2025 05:04:11 +0000 (21:04 -0800)]
drm/xe/pmu: Add GT C6 events

Provide a PMU interface for GT C6 residency counters. The interface is
similar to the one available for i915, but gt is passed in the config
when creating the event.

Sample usage and output:

$ perf list | grep gt-c6
  xe_0000_00_02.0/gt-c6-residency/                   [Kernel PMU event]

$ tail /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency*
==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency <==
event=0x01

==> /sys/bus/event_source/devices/xe_0000_00_02.0/events/gt-c6-residency.unit <==
ms

$ perf stat -e xe_0000_00_02.0/gt-c6-residency,gt=0/ -I1000
#           time             counts unit events
     1.001196056              1,001 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/
     2.005216219              1,003 ms   xe_0000_00_02.0/gt-c6-residency,gt=0/

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/pmu: Add attribute skeleton
Lucas De Marchi [Fri, 24 Jan 2025 05:04:10 +0000 (21:04 -0800)]
drm/xe/pmu: Add attribute skeleton

Add the generic support for defining new attributes. This only adds
the macros and common infra for the event counters, but no counters
yet. This is going to be added as follow up changes.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/pmu: Get/put runtime pm on event init
Lucas De Marchi [Fri, 24 Jan 2025 05:04:09 +0000 (21:04 -0800)]
drm/xe/pmu: Get/put runtime pm on event init

When the event is created, make sure runtime pm is taken and later put:
in order to read an event counter the GPU needs to remain accessible and
doing a get/put during perf's read is not possible it's holding a
raw_spinlock.

Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/pmu: Extract xe_pmu_event_update()
Lucas De Marchi [Fri, 24 Jan 2025 05:04:08 +0000 (21:04 -0800)]
drm/xe/pmu: Extract xe_pmu_event_update()

Like other pmu drivers, keep the update separate from the read so it can
be called from other methods (like stop()) without side effects.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/pmu: Assert max gt
Lucas De Marchi [Fri, 24 Jan 2025 05:04:07 +0000 (21:04 -0800)]
drm/xe/pmu: Assert max gt

XE_PMU_MAX_GT needs to be used due to a circular dependency, but we
should make sure it doesn't go out of sync with XE_PMU_MAX_GT. Add a
compile check for that.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/pmu: Enable PMU interface
Vinay Belgaumkar [Fri, 24 Jan 2025 05:04:06 +0000 (21:04 -0800)]
drm/xe/pmu: Enable PMU interface

Basic PMU enabling patch. Setup the basic framework
for adding events.

Based on previous versions by Bommu Krishnaiah, Aravind Iddamsetty and
Riana Tauro, using i915 and rapl as reference implementations.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124050411.2189060-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agoMerge tag 'drm-misc-next-fixes-2025-01-24' of https://gitlab.freedesktop.org/drm...
Simona Vetter [Fri, 24 Jan 2025 16:06:06 +0000 (17:06 +0100)]
Merge tag 'drm-misc-next-fixes-2025-01-24' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next-fixes for v6.14-rc1:
- Fix a serious regression from commit e4b5ccd392b9 ("drm/v3d: Ensure
  job pointer is set to NULL after job completion")
- dmem cgroup Kconfig fix (acked by Tejun)
- virtio: uaf in dma_buf free path
- xlnx: kerneldoc

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/0d4a18f4-222c-4767-9169-e6350ce8fea5@linux.intel.com
4 months agoMerge tag 'amd-drm-next-6.14-2025-01-24' of https://gitlab.freedesktop.org/agd5f...
Simona Vetter [Fri, 24 Jan 2025 16:01:41 +0000 (17:01 +0100)]
Merge tag 'amd-drm-next-6.14-2025-01-24' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.14-2025-01-24:

amdgpu:
- Documentation fixes
- SMU 13.x fixes
- SR-IOV fix
- Display fix
- PCIe calculation fix
- MES 12 fix
- HUBP fix
- Cursor fix
- Enforce isolation fixes
- GFX 12 fix
- Use drm scheduler API helper rather than open coding it
- Mark some debugging parameters as unsafe
- PSP 14.x fix
- Add cleaner shader support for gfx12
- Add subvp debugging flag
- SDMA 4.4.x fix
- Clarify some kernel log messages
- clang fix
- PCIe lane reporting fix
- Documentation fix

amdkfd:
- Mark some debugging parameters as unsafe
- Fix partial migration handling
- Trap handler updates

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124152153.3861868-1-alexander.deucher@amd.com
4 months agodrm/amd/display: Optimize cursor position updates
Aric Cyr [Tue, 10 Dec 2024 23:38:15 +0000 (18:38 -0500)]
drm/amd/display: Optimize cursor position updates

[why]
Updating the cursor enablement register can be a slow operation and accumulates
when high polling rate cursors cause frequent updates asynchronously to the
cursor position.

[how]
Since the cursor enable bit is cached there is no need to update the
enablement register if there is no change to it.  This removes the
read-modify-write from the cursor position programming path in HUBP and
DPP, leaving only the register writes.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sung Lee <sung.lee@amd.com>
Signed-off-by: Aric Cyr <Aric.Cyr@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Add hubp cache reset when powergating
Aric Cyr [Thu, 9 Jan 2025 20:03:48 +0000 (15:03 -0500)]
drm/amd/display: Add hubp cache reset when powergating

[Why]
When HUBP is power gated, the SW state can get out of sync with the
hardware state causing cursor to not be programmed correctly.

[How]
Similar to DPP, add a HUBP reset function which is called wherever
HUBP is initialized or powergated.  This function will clear the cursor
position and attribute cache allowing for proper programming when the
HUBP is brought back up.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sung Lee <sung.lee@amd.com>
Signed-off-by: Aric Cyr <Aric.Cyr@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/amdgpu: Enable scratch data dump for mes 12
Shaoyun Liu [Tue, 14 Jan 2025 16:57:41 +0000 (11:57 -0500)]
drm/amd/amdgpu: Enable scratch data dump for mes 12

MES internal will check CP_MES_MSCRATCH_LO/HI register to set scratch
data location during ucode start, driver side need to start the MES
one by one with different setting for each pipe

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd: Clarify kdoc for amdgpu.gttsize
Mario Limonciello [Thu, 16 Jan 2025 21:47:11 +0000 (15:47 -0600)]
drm/amd: Clarify kdoc for amdgpu.gttsize

Effectively amdgpu.gttsize gets set to ~1/2 of RAM, but that's controlled
by what the TTM page limit is set to.  Clarify the kdoc.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculation
Srinivasan Shanmugam [Mon, 20 Jan 2025 12:27:04 +0000 (17:57 +0530)]
drm/amd/amdgpu: Prevent null pointer dereference in GPU bandwidth calculation

If the parent is NULL, adev->pdev is used to retrieve the PCIe speed and
width, ensuring that  the function can still determine these
capabilities from the device itself.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:6193 amdgpu_device_gpu_bandwidth()
error: we previously assumed 'parent' could be null (see line 6180)

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
    6170 static void amdgpu_device_gpu_bandwidth(struct amdgpu_device *adev,
    6171                                         enum pci_bus_speed *speed,
    6172                                         enum pcie_link_width *width)
    6173 {
    6174         struct pci_dev *parent = adev->pdev;
    6175
    6176         if (!speed || !width)
    6177                 return;
    6178
    6179         parent = pci_upstream_bridge(parent);
    6180         if (parent && parent->vendor == PCI_VENDOR_ID_ATI) {
                     ^^^^^^
If parent is NULL

    6181                 /* use the upstream/downstream switches internal to dGPU */
    6182                 *speed = pcie_get_speed_cap(parent);
    6183                 *width = pcie_get_width_cap(parent);
    6184                 while ((parent = pci_upstream_bridge(parent))) {
    6185                         if (parent->vendor == PCI_VENDOR_ID_ATI) {
    6186                                 /* use the upstream/downstream switches internal to dGPU */
    6187                                 *speed = pcie_get_speed_cap(parent);
    6188                                 *width = pcie_get_width_cap(parent);
    6189                         }
    6190                 }
    6191         } else {
    6192                 /* use the device itself */
--> 6193                 *speed = pcie_get_speed_cap(parent);
                                                     ^^^^^^ Then we are toasted here.

    6194                 *width = pcie_get_width_cap(parent);
    6195         }
    6196 }

Fixes: 757e8b951ce2 ("drm/amdgpu: cache gpu pcie link width")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Fix error pointers in amdgpu_dm_crtc_mem_type_changed
Srinivasan Shanmugam [Wed, 15 Jan 2025 16:59:06 +0000 (22:29 +0530)]
drm/amd/display: Fix error pointers in amdgpu_dm_crtc_mem_type_changed

The function amdgpu_dm_crtc_mem_type_changed was dereferencing pointers
returned by drm_atomic_get_plane_state without checking for errors. This
could lead to undefined behavior if the function returns an error pointer.

This commit adds checks using IS_ERR to ensure that new_plane_state and
old_plane_state are valid before dereferencing them.

Fixes the below:

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:11486 amdgpu_dm_crtc_mem_type_changed()
error: 'new_plane_state' dereferencing possible ERR_PTR()

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c
    11475 static bool amdgpu_dm_crtc_mem_type_changed(struct drm_device *dev,
    11476                                             struct drm_atomic_state *state,
    11477                                             struct drm_crtc_state *crtc_state)
    11478 {
    11479         struct drm_plane *plane;
    11480         struct drm_plane_state *new_plane_state, *old_plane_state;
    11481
    11482         drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) {
    11483                 new_plane_state = drm_atomic_get_plane_state(state, plane);
    11484                 old_plane_state = drm_atomic_get_plane_state(state, plane);
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^ These functions can fail.

    11485
--> 11486                 if (old_plane_state->fb && new_plane_state->fb &&
    11487                     get_mem_type(old_plane_state->fb) != get_mem_type(new_plane_state->fb))
    11488                         return true;
    11489         }
    11490
    11491         return false;
    11492 }

Fixes: 4caacd1671b7 ("drm/amd/display: Do not elevate mem_type change to full update")
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: fix ring timeout issue in gfx10 sr-iov environment
Lin.Cao [Tue, 14 Jan 2025 09:42:01 +0000 (17:42 +0800)]
drm/amdgpu: fix ring timeout issue in gfx10 sr-iov environment

commit 26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in
amdgpu_job_prepare") set job->vm as NULL if there is no fence. It will
cause emit switch buffer be skippen if job->vm set as NULL.

Check job rather than vm could solve this problem.

Fixes: 26c95e838e63 ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare")
Signed-off-by: Lin.Cao <lincao12@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/pm: Fix smu v13.0.6 caps initialization
Lijo Lazar [Fri, 17 Jan 2025 14:42:42 +0000 (20:12 +0530)]
drm/amd/pm: Fix smu v13.0.6 caps initialization

Fix the initialization and usage of SMU v13.0.6 capability values. Use
caps_set/clear functions to set/clear capability.

Also, fix SET_UCLK_MAX capability on APUs, it is supported on APUs.

Fixes: e9b86b841baf ("drm/amd/pm: Add capability flags for SMU v13.0.6")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/pm: Refactor SMU 13.0.6 SDMA reset firmware version checks
Jesse.zhang@amd.com [Sat, 18 Jan 2025 09:38:22 +0000 (17:38 +0800)]
drm/amd/pm: Refactor SMU 13.0.6 SDMA reset firmware version checks

This patch refactors the firmware version checks in `smu_v13_0_6_reset_sdma`
to support multiple SMU programs with different firmware version thresholds.

V2: return -EOPNOTSUPP for unspported pmfw

Suggested-by: Lazar Lijo <Lijo.Lazar@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agorevert "drm/amdgpu/pm: add definition PPSMC_MSG_ResetSDMA2"
Jesse.zhang@amd.com [Mon, 13 Jan 2025 01:44:56 +0000 (09:44 +0800)]
revert "drm/amdgpu/pm: add definition PPSMC_MSG_ResetSDMA2"

pmfw now unifies PPSMC_MSG_ResetSDMA definitions for different devices.
PPSMC_MSG_ResetSDMA2 is not needed.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agorevert "drm/amdgpu/pm: Implement SDMA queue reset for different asic"
Jesse.zhang@amd.com [Sat, 18 Jan 2025 09:05:25 +0000 (17:05 +0800)]
revert "drm/amdgpu/pm: Implement SDMA queue reset for different asic"

pmfw unified PPSMC_MSG_ResetSDMA definitions for different devices.
PPSMC_MSG_ResetSDMA2 is not needed.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/pm: Add capability flags for SMU v13.0.6
Lijo Lazar [Thu, 16 Jan 2025 12:26:12 +0000 (17:56 +0530)]
drm/amd/pm: Add capability flags for SMU v13.0.6

Add capability flags for SMU v13.0.6 variants. Initialize the flags
based on firmware support. As there are multiple IP versions maintained,
it is more manageable with one time initialization of caps flags based
on IP version and firmware feature support.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: fix SUBVP DC_DEBUG_MASK documentation
Alex Deucher [Fri, 17 Jan 2025 19:16:11 +0000 (14:16 -0500)]
drm/amd/display: fix SUBVP DC_DEBUG_MASK documentation

This needs to be kerneldoc formatted.

Fixes: 5349658fa4a1 ("drm/amd: Add debug option to disable subvp")
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
4 months agodrm/amd/display: fix CEC DC_DEBUG_MASK documentation
Alex Deucher [Tue, 14 Jan 2025 19:31:55 +0000 (14:31 -0500)]
drm/amd/display: fix CEC DC_DEBUG_MASK documentation

This needs to be kerneldoc formatted.

Fixes: 7594874227e1 ("drm/amd/display: add CEC notifier to amdgpu driver")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Kun Liu <Kun.Liu2@amd.com>
4 months agodrm/amdgpu: fix the PCIe lanes reporting in the INFO IOCTL
Alex Deucher [Mon, 6 Jan 2025 17:19:11 +0000 (12:19 -0500)]
drm/amdgpu: fix the PCIe lanes reporting in the INFO IOCTL

Combine the platform and GPU caps like we do for PCIe Gen.
This aligns properly with expectations and documentation
for the interface.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: cache gpu pcie link width
Alex Deucher [Mon, 6 Jan 2025 16:55:05 +0000 (11:55 -0500)]
drm/amdgpu: cache gpu pcie link width

Get the PCIe link with of the device itself (or it's
integrated upstream bridge) and cache that.

v2: fix typo

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3820
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: mark static functions noinline_for_stack
Tzung-Bi Shih [Thu, 9 Jan 2025 05:35:04 +0000 (05:35 +0000)]
drm/amd/display: mark static functions noinline_for_stack

When compiling allmodconfig (CONFIG_WERROR=y) with clang-19, see the
following errors:

.../display/dc/dml2/display_mode_core.c:6268:13: warning: stack frame size (3128) exceeds limit (3072) in 'dml_prefetch_check' [-Wframe-larger-than]
.../display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c:7236:13: warning: stack frame size (3256) exceeds limit (3072) in 'dml_core_mode_support' [-Wframe-larger-than]

Mark static functions called by dml_prefetch_check() and
dml_core_mode_support() noinline_for_stack to avoid them become huge
functions and thus exceed the frame size limit.

A way to reproduce:
$ git checkout next-20250107
$ mkdir build_dir
$ export PATH=/tmp/llvm-19.1.6-x86_64/bin:$PATH
$ make LLVM=1 O=build_dir allmodconfig
$ make LLVM=1 O=build_dir drivers/gpu/drm/ -j

The way how it chose static functions to mark:
[0] Unset CONFIG_WERROR in build_dir/.config.
To get display_mode_core.o without errors.

[1] Get a function list called by dml_prefetch_check().
$ sed -n '6268,6711p' drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c \
  | sed -n -r 's/.*\W(\w+)\(.*/\1/p' | sort -u >/tmp/syms

[2] Get the non-inline function list.
Objdump won't show the symbols if they are inline functions.

$ make LLVM=1 O=build_dir drivers/gpu/drm/ -j
$ objdump -d build_dir/.../display_mode_core.o | \
  ./scripts/checkstack.pl x86_64 0 | \
  grep -f /tmp/syms | cut -d' ' -f2- >/tmp/orig

[3] Get the full function list.
Append "-fno-inline" to `CFLAGS_.../display_mode_core.o` in
drivers/gpu/drm/amd/display/dc/dml2/Makefile.

$ make LLVM=1 O=build_dir drivers/gpu/drm/ -j
$ objdump -d build_dir/.../display_mode_core.o | \
  ./scripts/checkstack.pl x86_64 0 | \
  grep -f /tmp/syms | cut -d' ' -f2- >/tmp/noinline

[4] Get the inline function list.
If a symbol only in /tmp/noinline but not in /tmp/orig, it is a good
candidate to mark noinline.

$ diff /tmp/orig /tmp/noinline

Chosen functions and their stack sizes:
CalculateBandwidthAvailableForImmediateFlip [display_mode_core.o]:144
CalculateExtraLatency [display_mode_core.o]:176
CalculateTWait [display_mode_core.o]:64
CalculateVActiveBandwithSupport [display_mode_core.o]:112
set_calculate_prefetch_schedule_params [display_mode_core.o]:48

CheckGlobalPrefetchAdmissibility [dml2_core_dcn4_calcs.o]:544
calculate_bandwidth_available [dml2_core_dcn4_calcs.o]:320
calculate_vactive_det_fill_latency [dml2_core_dcn4_calcs.o]:272
CalculateDCFCLKDeepSleep [dml2_core_dcn4_calcs.o]:208
CalculateODMMode [dml2_core_dcn4_calcs.o]:208
CalculateOutputLink [dml2_core_dcn4_calcs.o]:176

Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdkfd: Clear MODE.VSKIP in gfx9 trap handler
Jay Cornwall [Wed, 8 Jan 2025 03:59:06 +0000 (21:59 -0600)]
drm/amdkfd: Clear MODE.VSKIP in gfx9 trap handler

If user shader issues S_SETVSKIP then this state will persist when
executing the trap handler, causing vector instructions to be
skipped.

VSKIP state is already saved/restored through the MODE register.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: Refine ip detection log message
Lijo Lazar [Tue, 17 Dec 2024 04:49:49 +0000 (10:19 +0530)]
drm/amdgpu: Refine ip detection log message

'add ip block' causes a confusion if the blocks are disabled later with
ip_block_mask. Instead change to 'detected' and also add device context.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: Add handler for SDMA context empty
Lijo Lazar [Wed, 1 Jan 2025 04:31:50 +0000 (10:01 +0530)]
drm/amdgpu: Add handler for SDMA context empty

Context empty interrupt is enabled for SDMA 4.4.2. Add a handler for
context empty interrupt so that it is disposed of fast, and not
propagated to KFD layer.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd: Add debug option to disable subvp
Aurabindo Pillai [Mon, 13 Jan 2025 22:05:16 +0000 (17:05 -0500)]
drm/amd: Add debug option to disable subvp

Some monitors flicker when subvp is enabled which maybe related to
an uncommon timing they use. To isolate such issues, add a debug
option to help isolate this the issue for debugging.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdkfd: Sync trap handler binary with source
Jay Cornwall [Wed, 8 Jan 2025 03:26:32 +0000 (21:26 -0600)]
drm/amdkfd: Sync trap handler binary with source

Source and binary have become mismatched during branch activity.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdkfd: Fix partial migrate issue
Emily Deng [Tue, 31 Dec 2024 08:13:14 +0000 (16:13 +0800)]
drm/amdkfd: Fix partial migrate issue

For partial migrate from ram to vram, the migrate->cpages is not
equal to migrate->npages, should use migrate->npages to check all needed
migrate pages which could be copied or not.

And only need to set those pages could be migrated to migrate->dst[i], or
the migrate_vma_pages will migrate the wrong pages based on the migrate->dst[i].

v2:
Add mpages to break the loop earlier.

v3:
Uses MIGRATE_PFN_MIGRATE to identify whether page could be migrated.

v4:
Correct the error part.

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agoMerge tag 'drm-misc-fixes-2025-01-24' of https://gitlab.freedesktop.org/drm/misc...
Simona Vetter [Fri, 24 Jan 2025 10:26:41 +0000 (11:26 +0100)]
Merge tag 'drm-misc-fixes-2025-01-24' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

Short summary of fixes pull:

bochs:
- Fix double-free on driver removal

client:
- Improve support for tile-based modes
- Fix fbdev Kconfig select rules

xlnx:
- zynqmp_dp: Add locking to DP-bridge enable helper

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20250124082932.GA13715@linux.fritz.box
4 months agodrm: zynqmp_dp: Unlock on error in zynqmp_dp_bridge_atomic_enable()
Dan Carpenter [Mon, 11 Nov 2024 09:06:10 +0000 (12:06 +0300)]
drm: zynqmp_dp: Unlock on error in zynqmp_dp_bridge_atomic_enable()

We added some locking to this function, but accidentally forgot to unlock
if zynqmp_dp_mode_configure() failed.  Use a guard lock to fix it.

Fixes: a7d5eeaa57d7 ("drm: zynqmp_dp: Add locking")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/b4042bd9-c943-4738-a2e1-8647259137c6@stanley.mountain
4 months agodrm/xe/oa: Set stream->pollin in xe_oa_buffer_check_unlocked
Ashutosh Dixit [Wed, 15 Jan 2025 22:20:29 +0000 (14:20 -0800)]
drm/xe/oa: Set stream->pollin in xe_oa_buffer_check_unlocked

We rely on stream->pollin to decide whether or not to block during
poll/read calls. However, currently there are blocking read code paths
which don't even set stream->pollin. The best place to consistently set
stream->pollin for all code paths is therefore to set it in
xe_oa_buffer_check_unlocked.

Fixes: e936f885f1e9 ("drm/xe/oa/uapi: Expose OA stream fd")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250115222029.3002103-1-ashutosh.dixit@intel.com
4 months agodrm/v3d: Assign job pointer to NULL before signaling the fence
Maíra Canal [Thu, 23 Jan 2025 01:24:03 +0000 (22:24 -0300)]
drm/v3d: Assign job pointer to NULL before signaling the fence

In commit e4b5ccd392b9 ("drm/v3d: Ensure job pointer is set to NULL
after job completion"), we introduced a change to assign the job pointer
to NULL after completing a job, indicating job completion.

However, this approach created a race condition between the DRM
scheduler workqueue and the IRQ execution thread. As soon as the fence is
signaled in the IRQ execution thread, a new job starts to be executed.
This results in a race condition where the IRQ execution thread sets the
job pointer to NULL simultaneously as the `run_job()` function assigns
a new job to the pointer.

This race condition can lead to a NULL pointer dereference if the IRQ
execution thread sets the job pointer to NULL after `run_job()` assigns
it to the new job. When the new job completes and the GPU emits an
interrupt, `v3d_irq()` is triggered, potentially causing a crash.

[  466.310099] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c0
[  466.318928] Mem abort info:
[  466.321723]   ESR = 0x0000000096000005
[  466.325479]   EC = 0x25: DABT (current EL), IL = 32 bits
[  466.330807]   SET = 0, FnV = 0
[  466.333864]   EA = 0, S1PTW = 0
[  466.337010]   FSC = 0x05: level 1 translation fault
[  466.341900] Data abort info:
[  466.344783]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[  466.350285]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  466.355350]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  466.360677] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000089772000
[  466.367140] [00000000000000c0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  466.375875] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[  466.382163] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device algif_hash algif_skcipher af_alg bnep binfmt_misc vc4 snd_soc_hdmi_codec drm_display_helper cec brcmfmac_wcc spidev rpivid_hevc(C) drm_client_lib brcmfmac hci_uart drm_dma_helper pisp_be btbcm brcmutil snd_soc_core aes_ce_blk v4l2_mem2mem bluetooth aes_ce_cipher snd_compress videobuf2_dma_contig ghash_ce cfg80211 gf128mul snd_pcm_dmaengine videobuf2_memops ecdh_generic sha2_ce ecc videobuf2_v4l2 snd_pcm v3d sha256_arm64 rfkill videodev snd_timer sha1_ce libaes gpu_sched snd videobuf2_common sha1_generic drm_shmem_helper mc rp1_pio drm_kms_helper raspberrypi_hwmon spi_bcm2835 gpio_keys i2c_brcmstb rp1 raspberrypi_gpiomem rp1_mailbox rp1_adc nvmem_rmem uio_pdrv_genirq uio i2c_dev drm ledtrig_pattern drm_panel_orientation_quirks backlight fuse dm_mod ip_tables x_tables ipv6
[  466.458429] CPU: 0 UID: 1000 PID: 2008 Comm: chromium Tainted: G         C         6.13.0-v8+ #18
[  466.467336] Tainted: [C]=CRAP
[  466.470306] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  466.476157] pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  466.483143] pc : v3d_irq+0x118/0x2e0 [v3d]
[  466.487258] lr : __handle_irq_event_percpu+0x60/0x228
[  466.492327] sp : ffffffc080003ea0
[  466.495646] x29: ffffffc080003ea0 x28: ffffff80c0c94200 x27: 0000000000000000
[  466.502807] x26: ffffffd08dd81d7b x25: ffffff80c0c94200 x24: ffffff8003bdc200
[  466.509969] x23: 0000000000000001 x22: 00000000000000a7 x21: 0000000000000000
[  466.517130] x20: ffffff8041bb0000 x19: 0000000000000001 x18: 0000000000000000
[  466.524291] x17: ffffffafadfb0000 x16: ffffffc080000000 x15: 0000000000000000
[  466.531452] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  466.538613] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffffd08c527eb0
[  466.545777] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[  466.552941] x5 : ffffffd08c4100d0 x4 : ffffffafadfb0000 x3 : ffffffc080003f70
[  466.560102] x2 : ffffffc0829e8058 x1 : 0000000000000001 x0 : 0000000000000000
[  466.567263] Call trace:
[  466.569711]  v3d_irq+0x118/0x2e0 [v3d] (P)
[  466.573826]  __handle_irq_event_percpu+0x60/0x228
[  466.578546]  handle_irq_event+0x54/0xb8
[  466.582391]  handle_fasteoi_irq+0xac/0x240
[  466.586498]  generic_handle_domain_irq+0x34/0x58
[  466.591128]  gic_handle_irq+0x48/0xd8
[  466.594798]  call_on_irq_stack+0x24/0x58
[  466.598730]  do_interrupt_handler+0x88/0x98
[  466.602923]  el0_interrupt+0x44/0xc0
[  466.606508]  __el0_irq_handler_common+0x18/0x28
[  466.611050]  el0t_64_irq_handler+0x10/0x20
[  466.615156]  el0t_64_irq+0x198/0x1a0
[  466.618740] Code: 52800035 3607faf3 f9442e80 52800021 (f9406018)
[  466.624853] ---[ end trace 0000000000000000 ]---
[  466.629483] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[  466.636384] SMP: stopping secondary CPUs
[  466.640320] Kernel Offset: 0x100c400000 from 0xffffffc080000000
[  466.646259] PHYS_OFFSET: 0x0
[  466.649141] CPU features: 0x100,00000170,00901250,0200720b
[  466.654644] Memory Limit: none
[  466.657706] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

Fix the crash by assigning the job pointer to NULL before signaling the
fence. This ensures that the job pointer is cleared before any new job
starts execution, preventing the race condition and the NULL pointer
dereference crash.

Cc: stable@vger.kernel.org
Fixes: e4b5ccd392b9 ("drm/v3d: Ensure job pointer is set to NULL after job completion")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Phil Elwell <phil@raspberrypi.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250123012403.20447-1-mcanal@igalia.com
4 months agoMerge remote-tracking branch 'drm/drm-next' into drm-misc-next-fixes
Maarten Lankhorst [Thu, 23 Jan 2025 16:58:12 +0000 (17:58 +0100)]
Merge remote-tracking branch 'drm/drm-next' into drm-misc-next-fixes

A regression was caused by commit e4b5ccd392b9 ("drm/v3d: Ensure job pointer is set to NULL
after job completion"), but this commit is not yet in next-fixes,
fast-forward it.

Try #2, first one didn't have v6.13 in it.

Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
4 months agoMerge v6.13 into drm-next
Simona Vetter [Thu, 23 Jan 2025 13:39:49 +0000 (14:39 +0100)]
Merge v6.13 into drm-next

A regression was caused by commit e4b5ccd392b9 ("drm/v3d: Ensure job
pointer is set to NULL after job completion"), but this commit is not
yet in next-fixes, fast-forward it.

Note that this recreates Linus merge in 96c84703f1cf ("Merge tag
'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel")
because I didn't want to backmerge a random point in the merge window.

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
4 months agodrm/bochs: Do not put DRM device in PCI remove callback
Thomas Zimmermann [Fri, 3 Jan 2025 09:55:45 +0000 (10:55 +0100)]
drm/bochs: Do not put DRM device in PCI remove callback

Removing the bochs PCI device should mark the DRM device as unplugged
without removing it. Hence clear the respective call to drm_dev_put()
from bochs_pci_remove().

Fixes a double unref in devm_drm_dev_init_release(). An example error
message is shown below:

[   32.958338] BUG: KASAN: use-after-free in drm_dev_put.part.0+0x1b/0x90
[   32.958850] Write of size 4 at addr ffff888152134004 by task (udev-worker)/591
[   32.959574] CPU: 3 UID: 0 PID: 591 Comm: (udev-worker) Tainted: G            E      6.13.0-rc2-1-default+ #3417
[   32.960316] Tainted: [E]=UNSIGNED_MODULE
[   32.960637] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-2-gc13ff2cd-prebuilt.qemu.org 04/01/2014
[   32.961429] Call Trace:
[   32.961433]  <TASK>
[   32.961439]  dump_stack_lvl+0x68/0x90
[   32.961452]  print_address_description.constprop.0+0x88/0x330
[   32.961461]  ? preempt_count_sub+0x14/0xc0
[   32.961473]  print_report+0xe2/0x1d0
[   32.961479]  ? srso_alias_return_thunk+0x5/0xfbef5
[   32.963725]  ? __virt_addr_valid+0x143/0x320
[   32.964077]  ? srso_alias_return_thunk+0x5/0xfbef5
[   32.964463]  ? drm_dev_put.part.0+0x1b/0x90
[   32.964817]  kasan_report+0xce/0x1a0
[   32.965123]  ? drm_dev_put.part.0+0x1b/0x90
[   32.965474]  kasan_check_range+0xff/0x1c0
[   32.965806]  drm_dev_put.part.0+0x1b/0x90
[   32.966138]  release_nodes+0x84/0xc0
[   32.966447]  devres_release_all+0xd2/0x110
[   32.966788]  ? __pfx_devres_release_all+0x10/0x10
[   32.967177]  ? preempt_count_sub+0x14/0xc0
[   32.967523]  device_unbind_cleanup+0x16/0xc0
[   32.967886]  really_probe+0x1b7/0x570
[   32.968207]  __driver_probe_device+0xca/0x1b0
[   32.968568]  driver_probe_device+0x4a/0xf0
[   32.968907]  __driver_attach+0x10b/0x290
[   32.969239]  ? __pfx___driver_attach+0x10/0x10
[   32.969598]  bus_for_each_dev+0xc0/0x110
[   32.969923]  ? __pfx_bus_for_each_dev+0x10/0x10
[   32.970291]  ? bus_add_driver+0x17a/0x2b0
[   32.970622]  ? srso_alias_return_thunk+0x5/0xfbef5
[   32.971011]  bus_add_driver+0x19a/0x2b0
[   32.971335]  driver_register+0xd8/0x160
[   32.971671]  ? __pfx_bochs_pci_driver_init+0x10/0x10 [bochs]
[   32.972130]  do_one_initcall+0xba/0x390
[...]

After unplugging the DRM device, clients will close their references.
Closing the final reference will also release the DRM device.

Reported-by: Dr. David Alan Gilbert <dave@treblig.org>
Closes: https://lore.kernel.org/lkml/Z18dbfDAiFadsSdg@gallifrey/
Fixes: 04826f588682 ("drm/bochs: Allocate DRM device in struct bochs_device")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: virtualization@lists.linux.dev
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20250103095615.231162-1-tzimmermann@suse.de
4 months agodrm/xe/ptl: Apply Wa_13011645652
Vinay Belgaumkar [Thu, 16 Jan 2025 18:46:59 +0000 (10:46 -0800)]
drm/xe/ptl: Apply Wa_13011645652

Extend Wa_13011645652 to PTL.

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250116184659.384874-1-vinay.belgaumkar@intel.com
4 months agodrm: select DRM_KMS_HELPER from DRM_GEM_SHMEM_HELPER
Arnd Bergmann [Wed, 22 Jan 2025 09:02:03 +0000 (10:02 +0100)]
drm: select DRM_KMS_HELPER from DRM_GEM_SHMEM_HELPER

In the combination of DRM_KMS_HELPER=m, DRM_GEM_SHMEM_HELPER=y, DRM_FBDEV_EMULATION=y,
The shmem code fails to link against the KMS helpers:

x86_64-linux-ld: vmlinux.o: in function `drm_fbdev_shmem_driver_fbdev_probe':
(.text+0xeec601): undefined reference to `drm_fb_helper_alloc_info'
x86_64-linux-ld: (.text+0xeec633): undefined reference to `drm_fb_helper_fill_info'
x86_64-linux-ld: vmlinux.o: in function `drm_fbdev_shmem_get_page':
drm_fbdev_shmem.c:(.text+0xeec7d2): undefined reference to `drm_gem_fb_get_obj'
x86_64-linux-ld: vmlinux.o: in function `drm_fbdev_shmem_fb_mmap':
drm_fbdev_shmem.c:(.text+0xeec9f6): undefined reference to `drm_gem_fb_get_obj'
x86_64-linux-ld: vmlinux.o: in function `drm_fbdev_shmem_defio_imageblit':
(.rodata+0x5b2288): undefined reference to `drm_fb_helper_check_var'
x86_64-linux-ld: (.rodata+0x5b2290): undefined reference to `drm_fb_helper_set_par'

This can happen for a number of device drivers that select DRM_GEM_SHMEM_HELPER
without also selecting DRM_KMS_HELPER. To work around this, add another select
that forces DRM_KMS_HELPER to be built-in rather than a loadable module, but
only if FBDEV emulation is also enabled. DRM_TTM_HELPER and DRM_GEM_DMA_HELPER
look like they have the same problem in theory even if there is no possible
configuration that shows it. For consistency, do the same change to those.

Closes: https://lore.kernel.org/all/20250121-greedy-flounder-of-abundance-4d2ee8-mkl@pengutronix.de
Reported-by: Marc Kleine-Budde <mkl@pengutronix.de>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20250122090211.3161186-1-arnd@kernel.org
4 months agoMAINTAINERS: Also exclude xe for drm-misc
Lucas De Marchi [Fri, 17 Jan 2025 16:45:29 +0000 (08:45 -0800)]
MAINTAINERS: Also exclude xe for drm-misc

When the xe driver was added, it didn't extend the exclude entries for
drm-misc, as done in commit 5a44d50f0072 ("MAINTAINERS: Update drm-misc
entry to match all drivers"). Exclude it like is done for i915 and other
drivers with dedicated maintainers.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250117164529.393503-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/guc: Fix sizeof(32) typo
Michal Wajdeczko [Tue, 21 Jan 2025 09:48:32 +0000 (10:48 +0100)]
drm/xe/guc: Fix sizeof(32) typo

A small typo leads to the following static code checker warning:

drivers/gpu/drm/xe/xe_guc_buf.c:81 xe_guc_buf_reserve()
warn: sizeof(NUMBER)?

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/intel-xe/0d5bcbf1-79f9-4a10-a221-ddbaec9f6122@stanley.mountain/
Fixes: 696bfdf273ea ("drm/xe/guc: Introduce the GuC Buffer Cache")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250121094832.588-1-michal.wajdeczko@intel.com
4 months agodrm/xe/pf: Fix migration initialization
Michal Wajdeczko [Mon, 20 Jan 2025 23:24:43 +0000 (00:24 +0100)]
drm/xe/pf: Fix migration initialization

The migration support only needs to be initialized once, but it
was incorrectly called from the xe_gt_sriov_pf_init_hw(), which
is part of the reset flow and may be called multiple times.

Fixes: d86e3737c7ab ("drm/xe/pf: Add functions to save and restore VF GuC state")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250120232443.544-1-michal.wajdeczko@intel.com
4 months agodrm/xe/oa: Preserve oa_ctrl unused bits
Ashutosh Dixit [Fri, 17 Jan 2025 03:21:55 +0000 (19:21 -0800)]
drm/xe/oa: Preserve oa_ctrl unused bits

UMD's have interest in setting unused bits of the oa_ctrl register "out of
band" for certain experiments. To facilitate this, don't clobber previous
oa_ctrl unused bits, i.e. rmw the values rather than simply write them.

Fixes: e936f885f1e9 ("drm/xe/oa/uapi: Expose OA stream fd")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250117032155.3048063-1-ashutosh.dixit@intel.com
4 months agodrm/xe: Move suballocator init to after display init
Maarten Lankhorst [Tue, 10 Dec 2024 08:31:03 +0000 (09:31 +0100)]
drm/xe: Move suballocator init to after display init

No allocations should be done before we have had a chance to preserve
the display fb.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241210083111.230484-4-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
4 months agodrm/xe/uapi: Fix documentation indentation
Rodrigo Vivi [Fri, 17 Jan 2025 19:38:27 +0000 (14:38 -0500)]
drm/xe/uapi: Fix documentation indentation

Fix these issues:

Documentation/gpu/driver-uapi:29: include/uapi/drm/xe_drm.h:817: WARNING:
+Bullet list ends without a blank line; unexpected unindent.
Documentation/gpu/driver-uapi:29: include/uapi/drm/xe_drm.h:835: WARNING:
+Definition list ends without a blank line; unexpected unindent.

Fixes: 75d37750a753 ("drm/xe/mmap: Add mmap support for PCI memory barrier")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/intel-xe/20250117164023.3fdc00b9@canb.auug.org.au/
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250117193827.91779-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: Do not attempt to bootstrap VF in execlists mode
Maarten Lankhorst [Tue, 10 Dec 2024 08:31:11 +0000 (09:31 +0100)]
drm/xe: Do not attempt to bootstrap VF in execlists mode

It was mentioned in a review that there is a possibility of choosing
to load the module with VF in execlists mode.

Of course this doesn't work, just bomb out as hard as possible.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241210083111.230484-12-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
4 months agodrm/client: Handle tiled displays better
Maarten Lankhorst [Thu, 16 Jan 2025 14:28:25 +0000 (15:28 +0100)]
drm/client: Handle tiled displays better

When testing on my tiled display, initially the tiled display is
detected correctly:
[90376.523692] xe 0000:67:00.0: [drm:drm_client_firmware_config.isra.0 [drm]] fallback: Not all outputs enabled
[90376.523713] xe 0000:67:00.0: [drm:drm_client_firmware_config.isra.0 [drm]] Enabled: 0, detected: 2
...
[90376.523967] xe 0000:67:00.0: [drm:drm_client_modeset_probe [drm]] [CRTC:82:pipe A] desired mode 1920x2160 set (1920,0)
[90376.524020] xe 0000:67:00.0: [drm:drm_client_modeset_probe [drm]] [CRTC:134:pipe B] desired mode 1920x2160 set (0,0)

But then, when modes have been set:
[90379.729525] xe 0000:67:00.0: [drm:drm_client_firmware_config.isra.0 [drm]] [CONNECTOR:287:DP-4] on [CRTC:82:pipe A]: 1920x2160
[90379.729640] xe 0000:67:00.0: [drm:drm_client_firmware_config.isra.0 [drm]] [CONNECTOR:289:DP-5] on [CRTC:134:pipe B]: 1920x2160
...
[90379.730036] xe 0000:67:00.0: [drm:drm_client_modeset_probe [drm]] [CRTC:82:pipe A] desired mode 1920x2160 set (0,0)
[90379.730124] xe 0000:67:00.0: [drm:drm_client_modeset_probe [drm]] [CRTC:134:pipe B] desired mode 1920x2160 set (0,0)

Call drm_client_get_tile_offsets() in drm_client_firmware_config() as
well, to ensure that the offset is set correctly.

This has to be done as a separate pass, as the tile order may not be
equal to the drm connector order.

Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20250116142825.3933-2-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Cc: <stable@vger.kernel.org>
4 months agodrm/modeset: Handle tiled displays in pan_display_atomic.
Maarten Lankhorst [Thu, 16 Jan 2025 14:28:24 +0000 (15:28 +0100)]
drm/modeset: Handle tiled displays in pan_display_atomic.

Tiled displays have a different x/y offset to begin with. Instead of
attempting to remember this, just apply a delta instead.

This fixes the first tile being duplicated on other tiles when vt
switching.

Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20250116142825.3933-1-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Cc: <stable@vger.kernel.org>
4 months agoLinux 6.13 v6.13
Linus Torvalds [Sun, 19 Jan 2025 23:51:45 +0000 (15:51 -0800)]
Linux 6.13