linux-2.6-block.git
4 months agodrm/xe: skip error capture when exec queue is killed
Tejas Upadhyay [Tue, 30 Apr 2024 13:12:29 +0000 (18:42 +0530)]
drm/xe: skip error capture when exec queue is killed

When user closes exec queue soon after job submission,
we are generating error coredump. Instead check if
exec queue is killed during job timeout then skip
error coredump capture.

V2:
  - Just skip error capture - MattB

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240430131229.2228809-1-tejas.upadhyay@intel.com
4 months agodrm/xe/vm_doc: Fix some typos
Francois Dugast [Mon, 6 May 2024 20:29:50 +0000 (22:29 +0200)]
drm/xe/vm_doc: Fix some typos

Fix some typos and add / remove / change a few words to improve
readability and prevent some ambiguities.

Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240506202950.109750-1-francois.dugast@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe: Fix xe_mocs.h
Michal Wajdeczko [Mon, 6 May 2024 20:52:54 +0000 (22:52 +0200)]
drm/xe: Fix xe_mocs.h

We don't need to include <linux/types.h>.
We don't use struct xe_exec_queue here.
We should sort forward declarations.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240506205254.2659-1-michal.wajdeczko@intel.com
4 months agodrm/xe: Use ordered WQ for G2H handler
Matthew Brost [Mon, 6 May 2024 03:47:58 +0000 (20:47 -0700)]
drm/xe: Use ordered WQ for G2H handler

System work queues are shared, use a dedicated work queue for G2H
processing to avoid G2H processing getting block behind system tasks.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: <stable@vger.kernel.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240506034758.3697397-1-matthew.brost@intel.com
4 months agodrm/xe/mocs: Add debugfs node to dump mocs
Janga Rahul Kumar [Fri, 3 May 2024 19:39:02 +0000 (01:09 +0530)]
drm/xe/mocs: Add debugfs node to dump mocs

This is useful to check mocs configuration. Tests/Tools can use
this debugfs entry to get mocs info.

v2: Address review comments. Change debugfs output style similar
to pat debugfs. (Lucas De Marchi)

v3: rebase.

v4: Address review comments. Use function pointer inside ops
struct. Update Test-with links. Remove usage of flags wherever
not required. (Lucas De Marchi)

v5: Address review comments. Move register defines. Modify mocs
info struct to avoid holes. (Luca De Marchi)

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Janga Rahul Kumar <janga.rahul.kumar@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240503193902.2056202-3-janga.rahul.kumar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe: Relocate regs_are_mcr function
Janga Rahul Kumar [Fri, 3 May 2024 19:39:01 +0000 (01:09 +0530)]
drm/xe: Relocate regs_are_mcr function

Relocate regs_are_mcr funciton to a higher position in the file
for improved visibility.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Janga Rahul Kumar <janga.rahul.kumar@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240503193902.2056202-2-janga.rahul.kumar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe: Refactor default device atomic settings
Nirmoy Das [Tue, 30 Apr 2024 16:25:29 +0000 (18:25 +0200)]
drm/xe: Refactor default device atomic settings

The default behavior of device atomics depends on the
VM type and buffer allocation types. Device atomics are
expected to function with all types of allocations for
traditional applications/APIs. Additionally, in compute/SVM
API scenarios with fault mode or LR mode VMs, device atomics
must work with single-region allocations. In all other cases
device atomics should be disabled by default also on platforms
where we know device atomics doesn't on work on particular
allocations types.

v3: fault mode requires LR mode so only check for LR mode
    to determine compute API(Jose).
    Handle SMEM+LMEM BO's migration to LMEM where device
    atomics is expected to work. (Brian).
v2: Fix platform checks to correct atomics behaviour on PVC.

Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240430162529.21588-6-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
4 months agodrm/xe: Add function to check if BO has single placement
Nirmoy Das [Tue, 30 Apr 2024 16:25:28 +0000 (18:25 +0200)]
drm/xe: Add function to check if BO has single placement

A new helper function xe_bo_has_single_placement() to check
if a BO has single placement.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240430162529.21588-5-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
4 months agodrm/xe: Introduce has_device_atomics_on_smem device info
Nirmoy Das [Tue, 30 Apr 2024 16:25:27 +0000 (18:25 +0200)]
drm/xe: Introduce has_device_atomics_on_smem device info

Add has_device_atomics_on_smem to specify that a device
supports device atomics on system memory. Currently XE2
supports this so set this for XE2.

v2: Set has_device_atomics_on_smem for all platform but
    PVC.

Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240430162529.21588-4-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
4 months agodrm/xe: Move vm bind bo validation to a helper function
Nirmoy Das [Tue, 30 Apr 2024 16:25:26 +0000 (18:25 +0200)]
drm/xe: Move vm bind bo validation to a helper function

Move vm bind bo validation to a helper function to make the
xe_vm_bind_ioctl() more readable.

v2: Capture ret value of xe_vm_bind_ioctl_validate_bo(Matt B).
    Remove redundant coh_mode param.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240430162529.21588-3-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
4 months agodrm/xe: Introduce has_atomic_enable_pte_bit device info
Nirmoy Das [Tue, 30 Apr 2024 16:25:25 +0000 (18:25 +0200)]
drm/xe: Introduce has_atomic_enable_pte_bit device info

Add has_atomic_enable_pte_bit to specify that a device
has PTE_AE bit in its PTE feild. Currently XE2 and PVC
supports this so set this for those two.

This will help consolidate setting atomic access bit in PTE
logic which is spread between multiple files.

Reviewed-by: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240430162529.21588-2-nirmoy.das@intel.com
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
4 months agodrm/xe: Demote CCS_MODE info to debug only
Rodrigo Vivi [Fri, 3 May 2024 19:03:31 +0000 (15:03 -0400)]
drm/xe: Demote CCS_MODE info to debug only

This information is printed in any gt_reset, which actually
occurs in any runtime resume, what can be so verbose in
production build. Let's demote it to debug only.

Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240503190331.6690-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/debugfs: Get a runtime_pm reference when setting wedged mode
Francois Dugast [Fri, 3 May 2024 08:24:50 +0000 (10:24 +0200)]
drm/xe/debugfs: Get a runtime_pm reference when setting wedged mode

This function is another entry point where it must be ensured that
the device resumes before operating on the GuC, so grab a runtime_pm
reference. This fixes inner xe_pm_runtime_get_noresume calls which
were previously failing.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240503082450.268335-1-francois.dugast@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 months agodrm/xe/rtp: Prefer helper macros from xe_args.h
Michal Wajdeczko [Thu, 2 May 2024 22:33:13 +0000 (00:33 +0200)]
drm/xe/rtp: Prefer helper macros from xe_args.h

Some custom implementation can be replaced with generic macros
from the linux/args.h or xe_args.h.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502223313.2527-4-michal.wajdeczko@intel.com
4 months agodrm/xe/kunit: Add simple tests for new xe_args macros
Michal Wajdeczko [Thu, 2 May 2024 22:33:12 +0000 (00:33 +0200)]
drm/xe/kunit: Add simple tests for new xe_args macros

We want to make sure that helper macros are working as expected.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502223313.2527-3-michal.wajdeczko@intel.com
4 months agodrm/xe: Add helpers for manipulating macro arguments
Michal Wajdeczko [Thu, 2 May 2024 22:33:11 +0000 (00:33 +0200)]
drm/xe: Add helpers for manipulating macro arguments

Define generic helpers that will replace private definitions used
by the RTP code and will allow reuse by the new code.

Put them in new xe_args.h file (instead of infamous xe_macros.h)
as once we find more potential users outside of the Xe driver we
may want to move all of these macros as-is to linux/args.h.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502223313.2527-2-michal.wajdeczko@intel.com
4 months agodrm/xe: Perform dma_map when moving system buffer objects to TT
Thomas Hellström [Thu, 2 May 2024 18:32:51 +0000 (20:32 +0200)]
drm/xe: Perform dma_map when moving system buffer objects to TT

Currently we dma_map on ttm_tt population and dma_unmap when
the pages are released in ttm_tt unpopulate.

Strictly, the dma_map is not needed until the bo is moved to the
XE_PL_TT placement, so perform the dma_mapping on such moves
instead, and remove the dma_mappig when moving to XE_PL_SYSTEM.

This is desired for the upcoming shrinker series where shrinking
of a ttm_tt might fail. That would lead to an odd construct where
we first dma_unmap, then shrink and if shrinking fails dma_map
again. If dma_mapping instead is performed on move like this,
shrinking does not need to care at all about dma mapping.

Finally, where a ttm_tt is destroyed while bound to a different
memory type than XE_PL_SYSTEM, we keep the dma_unmap in
unpopulate().

v2:
- Don't accidently unmap the dma-buf's sgtable.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502183251.10170-1-thomas.hellstrom@linux.intel.com
4 months agodrm/xe/gt: Fix assert in L3 bank mask generation
Francois Dugast [Thu, 2 May 2024 12:43:10 +0000 (14:43 +0200)]
drm/xe/gt: Fix assert in L3 bank mask generation

What needs to be asserted is that the pattern fits in the number
of bits provided by the user in patternbits, otherwise it would
be truncated when replicated according to the mask, which is
likely not the intended use of this function.
The pattern argument is a bitmap so use find_last_bit() instead
of fls(). The bit position starts at index 0 so remove "or equal"
from the comparison. XE_MAX_L3_BANK_MASK_BITS would be the
returned value if the pattern is 0, which can be the case on some
platforms.

v2: Check the result does not overflow the array (Lucas De Marchi)

v3: Use __fls() for long and handle mask == 0  (Lucas De Marchi)

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240502124311.159695-1-francois.dugast@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agoMerge drm/drm-next into drm-xe-next
Thomas Hellström [Thu, 2 May 2024 07:46:12 +0000 (09:46 +0200)]
Merge drm/drm-next into drm-xe-next

Avoid falling too far behind drm-next.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
4 months agodrm/xe/gsc: define GSCCS for LNL
Daniele Ceraolo Spurio [Fri, 19 Apr 2024 18:34:12 +0000 (11:34 -0700)]
drm/xe/gsc: define GSCCS for LNL

LNL has 1 GSCCS, same as MTL. Note that the GSCCS will be disabled until
we have a GSC FW defined, but having it in the list of engine is a
requirement to add such definition.

v2: rebase

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419183412.1605782-2-daniele.ceraolospurio@intel.com
4 months agodrm/xe/gsc: Turn off GSCCS interrupts when disabling the engine
Daniele Ceraolo Spurio [Fri, 19 Apr 2024 18:34:11 +0000 (11:34 -0700)]
drm/xe/gsc: Turn off GSCCS interrupts when disabling the engine

Starting on LNL, there is a new GSCCS interrupt that is triggered when
the GSC engine reset fails. If the HW is in a bad state, this interrupt
might end up being triggered even if we're not using the engine, which
will lead to a warning because we'll see it as unexpected. Since there
is no point in handling the interrupt in this scenario, we can just
make sure the interrupts are off when we disable the engine.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240419183412.1605782-1-daniele.ceraolospurio@intel.com
4 months agodrm/xe: Remove uninitialized end var from xe_gt_tlb_invalidation_range()
Nirmoy Das [Mon, 29 Apr 2024 20:30:39 +0000 (22:30 +0200)]
drm/xe: Remove uninitialized end var from xe_gt_tlb_invalidation_range()

This fixes commit c4f18703629d ("drm/xe: Add
xe_gt_tlb_invalidation_range and convert PT layer to use this")
which added the end variable as part of the function param.

v2: Add fixes tag(Matt)

Fixes: c4f18703629d ("drm/xe: Add xe_gt_tlb_invalidation_range and convert PT layer to use this")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240429203039.26918-1-nirmoy.das@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agoMerge tag 'amd-drm-next-6.10-2024-04-26' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Tue, 30 Apr 2024 04:42:54 +0000 (14:42 +1000)]
Merge tag 'amd-drm-next-6.10-2024-04-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.10-2024-04-26:

amdgpu:
- Misc code cleanups and refactors
- Support setting reset method at runtime
- Report OD status
- SMU 14.0.1 fixes
- SDMA 4.4.2 fixes
- VPE fixes
- MES fixes
- Update BO eviction priorities
- UMSCH fixes
- Reset fixes
- Freesync fixes
- GFXIP 9.4.3 fixes
- SDMA 5.2 fixes
- MES UAF fix
- RAS updates
- Devcoredump updates for dumping IP state
- DSC fixes
- JPEG fix
- Fix VRAM memory accounting
- VCN 5.0 fixes
- MES fixes
- UMC 12.0 updates
- Modify contiguous flags handling
- Initial support for mapping kernel queues via MES

amdkfd:
- Fix rescheduling of restore worker
- VRAM accounting for SVM migrations
- mGPU fix
- Enable SQ watchpoint for gfx10

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240426221245.1613332-1-alexander.deucher@amd.com
4 months agoMerge tag 'drm-intel-gt-next-2024-04-26' of https://anongit.freedesktop.org/git/drm...
Dave Airlie [Tue, 30 Apr 2024 04:20:31 +0000 (14:20 +1000)]
Merge tag 'drm-intel-gt-next-2024-04-26' of https://anongit.freedesktop.org/git/drm/drm-intel into drm-next

UAPI Changes:

- drm/i915/guc: Use context hints for GT frequency

    Allow user to provide a low latency context hint. When set, KMD
    sends a hint to GuC which results in special handling for this
    context. SLPC will ramp the GT frequency aggressively every time
    it switches to this context. The down freq threshold will also be
    lower so GuC will ramp down the GT freq for this context more slowly.
    We also disable waitboost for this context as that will interfere with
    the strategy.

    We need to enable the use of SLPC Compute strategy during init, but
    it will apply only to contexts that set this bit during context
    creation.

    Userland can check whether this feature is supported using a new param-
    I915_PARAM_HAS_CONTEXT_FREQ_HINT. This flag is true for all guc submission
    enabled platforms as they use SLPC for frequency management.

    The Mesa usage model for this flag is here -
    https://gitlab.freedesktop.org/sushmave/mesa/-/commits/compute_hint

- drm/i915/gt: Enable only one CCS for compute workload

    Enable only one CCS engine by default with all the compute sices
    allocated to it.

    While generating the list of UABI engines to be exposed to the
    user, exclude any additional CCS engines beyond the first
    instance

    ***

    NOTE: This W/A will make all DG2 SKUs appear like single CCS SKUs by
    default to mitigate a hardware bug. All the EUs will still remain
    usable, and all the userspace drivers have been confirmed to be able
    to dynamically detect the change in number of CCS engines and adjust.

    For the smaller percent of applications that get perf benefit from
    letting the userspace driver dispatch across all 4 CCS engines we will
    be introducing a sysfs control as a later patch to choose 4 CCS each
    with 25% EUs (or 50% if 2 CCS).

    NOTE: A regression has been reported at

    https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10895

    However Andi has been triaging the issue and we're closing in a fix
    to the gap in the W/A implementation:

    https://lists.freedesktop.org/archives/intel-gfx/2024-April/348747.html

Driver Changes:

- Add new and fix to existing workarounds: Wa_14018575942 (MTL),
  Wa_16019325821 (Gen12.70), Wa_14019159160 (MTL), Wa_16015675438,
  Wa_14020495402 (Gen12.70) (Tejas, John, Lucas)
- Fix UAF on destroy against retire race and remove two earlier
  partial fixes (Janusz)
- Limit the reserved VM space to only the platforms that need it (Andi)
- Reset queue_priority_hint on parking for execlist platforms (Chris)
- Fix gt reset with GuC submission is disabled (Nirmoy)
- Correct capture of EIR register on hang (John)

- Remove usage of the deprecated ida_simple_xx() API
- Refactor confusing __intel_gt_reset() (Nirmoy)
- Fix the fix for GuC reset lock confusion (John)
- Simplify/extend platform check for Wa_14018913170 (John)
- Replace dev_priv with i915 (Andi)
- Add and use gt_to_guc() wrapper (Andi)
- Remove bogus null check (Rodrigo, Dan)

. Selftest improvements (Janusz, Nirmoy, Daniele)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZitVBTvZmityDi7D@jlahtine-mobl.ger.corp.intel.com
4 months agoMerge v6.9-rc6 into drm-next
Daniel Vetter [Mon, 29 Apr 2024 18:22:39 +0000 (20:22 +0200)]
Merge v6.9-rc6 into drm-next

Thomas needs the defio fixes, Maíra needs the vkms fixes and Joonas
has some fun with i915-gem conflicts.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
4 months agodrm/xe: Merge 16021540221 and 18034896535 WAs
Lucas De Marchi [Sat, 27 Apr 2024 13:53:39 +0000 (06:53 -0700)]
drm/xe: Merge 16021540221 and 18034896535 WAs

In order to detect duplicate implementations for the same workaround,
early in the implementation of RTP it was decided to error out even if
the values set are exactly the same. With the introduction of 18034896535
in commit 74671d23ca18 ("drm/xe/xe2: Add workaround 18034896535"), LNL
stepping with graphics stepping A1 now gives the following error on
module load:

xe 0000:00:02.0: [drm] *ERROR* GT0: [GT OTHER] \
discarding save-restore reg e48c (clear: 00000200, set: 00000200,\
masked: yes, mcr: yes): ret=-22

RTP may be improved in the future, but for now simply join the entries
like done with e.g. "160729762716070303171607186500".

Fixes: 74671d23ca18 ("drm/xe/xe2: Add workaround 18034896535")
Cc: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240427135339.3485559-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agodrm/xe/xe2hpg: Add Wa_14021490052
Shekhar Chauhan [Wed, 24 Apr 2024 03:42:47 +0000 (09:12 +0530)]
drm/xe/xe2hpg: Add Wa_14021490052

Add Wa_14021490052 for Xe2HPG 20.01.

Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240424034247.1352755-1-shekhar.chauhan@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
4 months agoLinux 6.9-rc6 v6.9-rc6
Linus Torvalds [Sun, 28 Apr 2024 20:47:24 +0000 (13:47 -0700)]
Linux 6.9-rc6

4 months agoMerge tag 'sched-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 28 Apr 2024 19:11:26 +0000 (12:11 -0700)]
Merge tag 'sched-urgent-2024-04-28' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:

 - Fix EEVDF corner cases

 - Fix two nohz_full= related bugs that can cause boot crashes
   and warnings

* tag 'sched-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
  sched/isolation: Prevent boot crash when the boot CPU is nohz_full
  sched/eevdf: Prevent vlag from going out of bounds in reweight_eevdf()
  sched/eevdf: Fix miscalculation in reweight_entity() when se is not curr
  sched/eevdf: Always update V if se->on_rq when reweighting

4 months agoMerge tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 28 Apr 2024 18:58:16 +0000 (11:58 -0700)]
Merge tag 'x86-urgent-2024-04-28' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

 - Make the CPU_MITIGATIONS=n interaction with conflicting
   mitigation-enabling boot parameters a bit saner.

 - Re-enable CPU mitigations by default on non-x86

 - Fix TDX shared bit propagation on mprotect()

 - Fix potential show_regs() system hang when PKE initialization
   is not fully finished yet.

 - Add the 0x10-0x1f model IDs to the Zen5 range

 - Harden #VC instruction emulation some more

* tag 'x86-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu: Ignore "mitigations" kernel parameter if CPU_MITIGATIONS=n
  cpu: Re-enable CPU mitigations by default for !X86 architectures
  x86/tdx: Preserve shared bit on mprotect()
  x86/cpu: Fix check for RDPKRU in __show_regs()
  x86/CPU/AMD: Add models 0x10-0x1f to the Zen5 range
  x86/sev: Check for MWAITX and MONITORX opcodes in the #VC handler

4 months agoMerge tag 'irq-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 28 Apr 2024 18:51:13 +0000 (11:51 -0700)]
Merge tag 'irq-urgent-2024-04-28' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
 "Fix a double free bug in the init error path of the GICv3 irqchip
  driver"

* tag 'irq-urgent-2024-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3-its: Prevent double free on error

4 months agosched/isolation: Fix boot crash when maxcpus < first housekeeping CPU
Oleg Nesterov [Sat, 13 Apr 2024 14:17:46 +0000 (16:17 +0200)]
sched/isolation: Fix boot crash when maxcpus < first housekeeping CPU

housekeeping_setup() checks cpumask_intersects(present, online) to ensure
that the kernel will have at least one housekeeping CPU after smp_init(),
but this doesn't work if the maxcpus= kernel parameter limits the number of
processors available after bootup.

For example, a kernel with "maxcpus=2 nohz_full=0-2" parameters crashes at
boot time on a virtual machine with 4 CPUs.

Change housekeeping_setup() to use cpumask_first_and() and check that the
returned CPU number is valid and less than setup_max_cpus.

Another corner case is "nohz_full=0" on a machine with a single CPU or with
the maxcpus=1 kernel argument. In this case non_housekeeping_mask is empty
and tick_nohz_full_setup() makes no sense. And indeed, the kernel hits the
WARN_ON(tick_nohz_full_running) in tick_sched_do_timer().

And how should the kernel interpret the "nohz_full=" parameter? It should
be silently ignored, but currently cpulist_parse() happily returns the
empty cpumask and this leads to the same problem.

Change housekeeping_setup() to check cpumask_empty(non_housekeeping_mask)
and do nothing in this case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240413141746.GA10008@redhat.com
4 months agosched/isolation: Prevent boot crash when the boot CPU is nohz_full
Oleg Nesterov [Thu, 11 Apr 2024 14:39:05 +0000 (16:39 +0200)]
sched/isolation: Prevent boot crash when the boot CPU is nohz_full

Documentation/timers/no_hz.rst states that the "nohz_full=" mask must not
include the boot CPU, which is no longer true after:

  08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full").

However after:

  aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")

the kernel will crash at boot time in this case; housekeeping_any_cpu()
returns an invalid CPU number until smp_init() brings the first
housekeeping CPU up.

Change housekeeping_any_cpu() to check the result of cpumask_any_and() and
return smp_processor_id() in this case.

This is just the simple and backportable workaround which fixes the
symptom, but smp_processor_id() at boot time should be safe at least for
type == HK_TYPE_TIMER, this more or less matches the tick_do_timer_boot_cpu
logic.

There is no worry about cpu_down(); tick_nohz_cpu_down() will not allow to
offline tick_do_timer_cpu (the 1st online housekeeping CPU).

Fixes: aae17ebb53cd ("workqueue: Avoid using isolated cpus' timers on queue_delayed_work")
Reported-by: Chris von Recklinghausen <crecklin@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20240411143905.GA19288@redhat.com
Closes: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/

4 months agoMerge tag 'rust-fixes-6.9' of https://github.com/Rust-for-Linux/linux
Linus Torvalds [Sat, 27 Apr 2024 19:11:55 +0000 (12:11 -0700)]
Merge tag 'rust-fixes-6.9' of https://github.com/Rust-for-Linux/linux

Pull Rust fixes from Miguel Ojeda:

 - Soundness: make internal functions generated by the 'module!' macro
   inaccessible, do not implement 'Zeroable' for 'Infallible' and
   require 'Send' for the 'Module' trait.

 - Build: avoid errors with "empty" files and workaround 'rustdoc' ICE.

 - Kconfig: depend on '!CFI_CLANG' and avoid selecting 'CONSTRUCTORS'.

 - Code docs: remove non-existing key from 'module!' macro example.

 - Docs: trivial rendering fix in arch table.

* tag 'rust-fixes-6.9' of https://github.com/Rust-for-Linux/linux:
  rust: remove `params` from `module` macro example
  kbuild: rust: force `alloc` extern to allow "empty" Rust files
  kbuild: rust: remove unneeded `@rustc_cfg` to avoid ICE
  rust: kernel: require `Send` for `Module` implementations
  rust: phy: implement `Send` for `Registration`
  rust: make mutually exclusive with CFI_CLANG
  rust: macros: fix soundness issue in `module!` macro
  rust: init: remove impl Zeroable for Infallible
  docs: rust: fix improper rendering in Arch Support page
  rust: don't select CONSTRUCTORS

4 months agoMerge tag 'riscv-for-linus-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 27 Apr 2024 19:02:55 +0000 (12:02 -0700)]
Merge tag 'riscv-for-linus-6.9-rc6' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix for TASK_SIZE on rv64/NOMMU, to reflect the lack of user/kernel
   separation

 - A fix to avoid loading rv64/NOMMU kernel past the start of RAM

 - A fix for RISCV_HWPROBE_EXT_ZVFHMIN on ilp32 to avoid signed integer
   overflow in the bitmask

 - The sud_test kselftest has been fixed to properly swizzle the syscall
   number into the return register, which are not the same on RISC-V

 - A fix for a build warning in the perf tools on rv32

 - A fix for the CBO selftests, to avoid non-constants leaking into the
   inline asm

 - A pair of fixes for T-Head PBMT errata probing, which has been
   renamed MAE by the vendor

* tag 'riscv-for-linus-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: selftests: cbo: Ensure asm operands match constraints, take 2
  perf riscv: Fix the warning due to the incompatible type
  riscv: T-Head: Test availability bit before enabling MAE errata
  riscv: thead: Rename T-Head PBMT to MAE
  selftests: sud_test: return correct emulated syscall value on RISC-V
  riscv: hwprobe: fix invalid sign extension for RISCV_HWPROBE_EXT_ZVFHMIN
  riscv: Fix loading 64-bit NOMMU kernels past the start of RAM
  riscv: Fix TASK_SIZE on 64-bit NOMMU

4 months agoMerge tag '6.9-rc5-cifs-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 27 Apr 2024 18:35:40 +0000 (11:35 -0700)]
Merge tag '6.9-rc5-cifs-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Three smb3 client fixes, all also for stable:

   - two small locking fixes spotted by Coverity

   - FILE_ALL_INFO and network_open_info packing fix"

* tag '6.9-rc5-cifs-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix lock ordering potential deadlock in cifs_sync_mid_result
  smb3: missing lock when picking channel
  smb: client: Fix struct_group() usage in __packed structs

4 months agoMerge tag 'i2c-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 27 Apr 2024 18:24:53 +0000 (11:24 -0700)]
Merge tag 'i2c-for-6.9-rc6' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "Fix a race condition in the at24 eeprom handler, a NULL pointer
  exception in the I2C core for controllers only using target modes,
  drop a MAINTAINERS entry, and fix an incorrect DT binding for at24"

* tag 'i2c-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: smbus: fix NULL function pointer dereference
  MAINTAINERS: Drop entry for PCA9541 bus master selector
  eeprom: at24: fix memory corruption race condition
  dt-bindings: eeprom: at24: Fix ST M24C64-D compatible schema

4 months agoprofiling: Remove create_prof_cpu_mask().
Tetsuo Handa [Sat, 27 Apr 2024 06:27:58 +0000 (15:27 +0900)]
profiling: Remove create_prof_cpu_mask().

create_prof_cpu_mask() is no longer used after commit 1f44a225777e ("s390:
convert interrupt handling to use generic hardirq").

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoMerge tag 'soundwire-6.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 27 Apr 2024 18:14:32 +0000 (11:14 -0700)]
Merge tag 'soundwire-6.9-fixes' of git://git./linux/kernel/git/vkoul/soundwire

Pull soundwire fix from Vinod Koul:

 - Single AMD driver fix for wake interrupt handling in clockstop mode

* tag 'soundwire-6.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
  soundwire: amd: fix for wake interrupt handling for clockstop mode

4 months agoMerge tag 'dmaengine-fix-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul...
Linus Torvalds [Sat, 27 Apr 2024 18:07:35 +0000 (11:07 -0700)]
Merge tag 'dmaengine-fix-6.9' of git://git./linux/kernel/git/vkoul/dmaengine

Pull dmaengine fixes from Vinod Koul:

 - Revert pl330 issue_pending waits until WFP state due to regression
   reported in Bluetooth loading

 - Xilinx driver fixes for synchronization, buffer offsets, locking and
   kdoc

 - idxd fixes for spinlock and preventing the migration of the perf
   context to an invalid target

 - idma driver fix for interrupt handling when powered off

 - Tegra driver residual calculation fix

 - Owl driver register access fix

* tag 'dmaengine-fix-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
  dmaengine: idxd: Fix oops during rmmod on single-CPU platforms
  dmaengine: xilinx: xdma: Clarify kdoc in XDMA driver
  dmaengine: xilinx: xdma: Fix synchronization issue
  dmaengine: xilinx: xdma: Fix wrong offsets in the buffers addresses in dma descriptor
  dma: xilinx_dpdma: Fix locking
  dmaengine: idxd: Convert spinlock to mutex to lock evl workqueue
  idma64: Don't try to serve interrupts when device is powered off
  dmaengine: tegra186: Fix residual calculation
  dmaengine: owl: fix register access functions
  dmaengine: Revert "dmaengine: pl330: issue_pending waits until WFP state"

4 months agoMerge tag 'phy-fixes-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Linus Torvalds [Sat, 27 Apr 2024 18:01:12 +0000 (11:01 -0700)]
Merge tag 'phy-fixes-6.9' of git://git./linux/kernel/git/phy/linux-phy

Pull phy fixes from Vinod Koul:

 - static checker (array size, bounds) fix for marvel driver

 - Rockchip rk3588 pcie fixes for bifurcation and mux

 - Qualcomm qmp-compbo fix for VCO, register base and regulator name for
   m31 driver

 - charger det crash fix for ti driver

* tag 'phy-fixes-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: ti: tusb1210: Resolve charger-det crash if charger psy is unregistered
  phy: qcom: qmp-combo: fix VCO div offset on v5_5nm and v6
  phy: phy-rockchip-samsung-hdptx: Select CONFIG_RATIONAL
  phy: qcom: m31: match requested regulator name with dt schema
  phy: qcom: qmp-combo: Fix register base for QSERDES_DP_PHY_MODE
  phy: qcom: qmp-combo: Fix VCO div offset on v3
  phy: rockchip: naneng-combphy: Fix mux on rk3588
  phy: rockchip-snps-pcie3: fix clearing PHP_GRF_PCIESEL_CON bits
  phy: rockchip-snps-pcie3: fix bifurcation on rk3588
  phy: freescale: imx8m-pcie: fix pcie link-up instability
  phy: marvell: a3700-comphy: Fix hardcoded array size
  phy: marvell: a3700-comphy: Fix out of bounds read

4 months agoi2c: smbus: fix NULL function pointer dereference
Wolfram Sang [Fri, 26 Apr 2024 06:44:08 +0000 (08:44 +0200)]
i2c: smbus: fix NULL function pointer dereference

Baruch reported an OOPS when using the designware controller as target
only. Target-only modes break the assumption of one transfer function
always being available. Fix this by always checking the pointer in
__i2c_transfer.

Reported-by: Baruch Siach <baruch@tkos.co.il>
Closes: https://lore.kernel.org/r/4269631780e5ba789cf1ae391eec1b959def7d99.1712761976.git.baruch@tkos.co.il
Fixes: 4b1acc43331d ("i2c: core changes for slave support")
[wsa: dropped the simplification in core-smbus to avoid theoretical regressions]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Baruch Siach <baruch@tkos.co.il>
4 months agoMerge tag 'soc-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Fri, 26 Apr 2024 21:39:45 +0000 (14:39 -0700)]
Merge tag 'soc-fixes-6.9-2' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "There are a lot of minor DT fixes for Mediatek, Rockchip, Qualcomm and
  Microchip and NXP, addressing both build-time warnings and bugs found
  during runtime testing.

  Most of these changes are machine specific fixups, but there are a few
  notable regressions that affect an entire SoC:

   - The Qualcomm MSI support that was improved for 6.9 ended up being
     wrong on some chips and now gets fixed.

   - The i.MX8MP camera interface broke due to a typo and gets updated
     again.

  The main driver fix is also for Qualcomm platforms, rewriting an
  interface in the QSEECOM firmware support that could lead to crashing
  the kernel from a trusted application.

  The only other code changes are minor fixes for Mediatek SoC drivers"

* tag 'soc-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (50 commits)
  ARM: dts: imx6ull-tarragon: fix USB over-current polarity
  soc: mediatek: mtk-socinfo: depends on CONFIG_SOC_BUS
  soc: mediatek: mtk-svs: Append "-thermal" to thermal zone names
  arm64: dts: imx8mp: Fix assigned-clocks for second CSI2
  ARM: dts: microchip: at91-sama7g54_curiosity: Replace regulator-suspend-voltage with the valid property
  ARM: dts: microchip: at91-sama7g5ek: Replace regulator-suspend-voltage with the valid property
  arm64: dts: rockchip: Fix USB interface compatible string on kobol-helios64
  arm64: dts: qcom: sc8180x: Fix ss_phy_irq for secondary USB controller
  arm64: dts: qcom: sm8650: Fix the msi-map entries
  arm64: dts: qcom: sm8550: Fix the msi-map entries
  arm64: dts: qcom: sm8450: Fix the msi-map entries
  arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
  arm64: dts: qcom: x1e80100: Fix the compatible for cluster idle states
  arm64: dts: qcom: Fix type of "wdog" IRQs for remoteprocs
  arm64: dts: rockchip: regulator for sd needs to be always on for BPI-R2Pro
  dt-bindings: rockchip: grf: Add missing type to 'pcie-phy' node
  arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 2
  arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 1
  arm64: dts: rockchip: drop redundant pcie-reset-suspend in Scarlet Dumo
  arm64: dts: rockchip: mark system power controller and fix typo on orangepi-5-plus
  ...

4 months agodrm/amd/display: Add some HDCP registers DCN35 list
Rodrigo Siqueira [Wed, 10 Apr 2024 18:45:47 +0000 (12:45 -0600)]
drm/amd/display: Add some HDCP registers DCN35 list

Add some missing HDCP registers to be used in DCN35.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu/mes11: update ADD_QUEUE interface
Jack Xiao [Wed, 24 Apr 2024 08:41:04 +0000 (16:41 +0800)]
drm/amdgpu/mes11: update ADD_QUEUE interface

Update ADD_QUEUE interface for mes11 to support
mes mapping legacy queue.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: fix the warning about the expression (int)size - len
Jesse Zhang [Thu, 25 Apr 2024 07:16:40 +0000 (15:16 +0800)]
drm/amdgpu: fix the warning about the expression (int)size - len

Converting size from size_t to int may overflow.
v2: keep reverse xmas tree order (Christian)

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu/mes: add mes mapping legacy queue support
Jack Xiao [Fri, 1 Mar 2024 09:33:01 +0000 (17:33 +0800)]
drm/amdgpu/mes: add mes mapping legacy queue support

Add mes mapping legacy queue framework support.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: fix uninitialized scalar variable warning
Tim Huang [Tue, 23 Apr 2024 06:06:28 +0000 (14:06 +0800)]
drm/amdgpu: fix uninitialized scalar variable warning

Clear warning that uses uninitialized value fw_size.

Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Code style adjustments
Rodrigo Siqueira [Wed, 10 Apr 2024 18:54:40 +0000 (12:54 -0600)]
drm/amd/display: Code style adjustments

This commit address some small code style issues in DC.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Adjust registers sequence in the DIO list
Rodrigo Siqueira [Wed, 10 Apr 2024 17:20:44 +0000 (11:20 -0600)]
drm/amd/display: Adjust registers sequence in the DIO list

This commit reorganizes the order in which some control registers are
presented to make it easier to identify the operations based on the
hardware doc.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Clean up code in DC
Rodrigo Siqueira [Tue, 9 Apr 2024 20:34:59 +0000 (14:34 -0600)]
drm/amd/display: Clean up code in DC

This commit removes some unnecessary code and makes the required
adjustments to replace other parts of the code with a short option.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: skip ip dump if devcoredump flag is set
Sunil Khatri [Thu, 25 Apr 2024 10:15:56 +0000 (15:45 +0530)]
drm/amdgpu: skip ip dump if devcoredump flag is set

Do not dump the ip registers during driver reload
in passthrough environment.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: Modify the contiguous flags behaviour
Arunpravin Paneer Selvam [Thu, 25 Apr 2024 15:35:55 +0000 (21:05 +0530)]
drm/amdgpu: Modify the contiguous flags behaviour

Now we have two flags for contiguous VRAM buffer allocation.
If the application request for AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
it would set the ttm place TTM_PL_FLAG_CONTIGUOUS flag in the
buffer's placement function.

This patch will change the default behaviour of the two flags.

When we set AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS
- This means contiguous is not mandatory.
- we will try to allocate the contiguous buffer. Say if the
  allocation fails, we fallback to allocate the individual pages.

When we setTTM_PL_FLAG_CONTIGUOUS
- This means contiguous allocation is mandatory.
- we are setting this in amdgpu_bo_pin_restricted() before bo validation
  and check this flag in the vram manager file.
- if this is set, we should allocate the buffer pages contiguously.
  the allocation fails, we return -ENOSPC.

v2:
  - keep the mem_flags and bo->flags check as is(Christian)
  - place the TTM_PL_FLAG_CONTIGUOUS flag setting into the
    amdgpu_bo_pin_restricted function placement range iteration
    loop(Christian)
  - rename find_pages with amdgpu_vram_mgr_calculate_pages_per_block
    (Christian)
  - Keep the kernel BO allocation as is(Christain)
  - If BO pin vram allocation failed, we need to return -ENOSPC as
    RDMA cannot work with scattered VRAM pages(Philip)

v3(Christian):
  - keep contiguous flag handling outside of pages_per_block
    calculation
  - remove the hacky implementation in contiguous flag error
    handling code

v4(Christian):
  - use any variable and return value for non-contiguous
    fallback

v5: rebase to amd-staging-drm-next branch

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Fix uninitialized variables in DC
Alex Hung [Tue, 16 Apr 2024 04:16:08 +0000 (22:16 -0600)]
drm/amd/display: Fix uninitialized variables in DC

This fixes 29 UNINIT issues reported by Coverity.

Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Fix uninitialized variables in DC
Alex Hung [Tue, 16 Apr 2024 01:05:34 +0000 (19:05 -0600)]
drm/amd/display: Fix uninitialized variables in DC

This fixes 49 UNINIT issues reported by Coverity.

Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Fix uninitialized variables in DM
Alex Hung [Tue, 16 Apr 2024 01:02:56 +0000 (19:02 -0600)]
drm/amd/display: Fix uninitialized variables in DM

This fixes 11 UNINIT issues reported by Coverity.

Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Remove redundant include file
Alex Hung [Mon, 15 Apr 2024 19:47:37 +0000 (13:47 -0600)]
drm/amd/display: Remove redundant include file

This fixes 1 PW.INCLUDE_RECURSION reported by Coverity.

"./drivers/gpu/drm/amd/amdgpu/../display/dc/dc_types.h"
includes itself: dc_types.h -> dal_types.h -> dc_types.h

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: ASSERT when failing to find index by plane/stream id
Alex Hung [Mon, 22 Apr 2024 18:02:17 +0000 (12:02 -0600)]
drm/amd/display: ASSERT when failing to find index by plane/stream id

[WHY]
find_disp_cfg_idx_by_plane_id and find_disp_cfg_idx_by_stream_id returns
an array index and they return -1 when not found; however, -1 is not a
valid index number.

[HOW]
When this happens, call ASSERT(), and return a positive number (which is
fewer than callers' array size) instead.

This fixes 4 OVERRUN and 2 NEGATIVE_RETURNS issues reported by Coverity.

Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Do not return negative stream id for array
Alex Hung [Mon, 22 Apr 2024 19:43:14 +0000 (13:43 -0600)]
drm/amd/display: Do not return negative stream id for array

[WHY]
resource_stream_to_stream_idx returns an array index and it return -1
when not found; however, -1 is not a valid array index number.

[HOW]
When this happens, call ASSERT(), and return a zero instead.

This fixes an OVERRUN and an NEGATIVE_RETURNS issues reported by Coverity.

Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Fix overlapping copy within dml_core_mode_programming
Hersen Wu [Tue, 23 Apr 2024 14:57:37 +0000 (10:57 -0400)]
drm/amd/display: Fix overlapping copy within dml_core_mode_programming

[WHY]
&mode_lib->mp.Watermark and &locals->Watermark are
the same address. memcpy may lead to unexpected behavior.

[HOW]
memmove should be used.

Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Skip finding free audio for unknown engine_id
Alex Hung [Mon, 22 Apr 2024 19:52:27 +0000 (13:52 -0600)]
drm/amd/display: Skip finding free audio for unknown engine_id

[WHY]
ENGINE_ID_UNKNOWN = -1 and can not be used as an array index. Plus, it
also means it is uninitialized and does not need free audio.

[HOW]
Skip and return NULL.

This fixes 2 OVERRUN issues reported by Coverity.

Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Check pipe offset before setting vblank
Alex Hung [Tue, 23 Apr 2024 00:07:17 +0000 (18:07 -0600)]
drm/amd/display: Check pipe offset before setting vblank

pipe_ctx has a size of MAX_PIPES so checking its index before accessing
the array.

This fixes an OVERRUN issue reported by Coverity.

Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdkfd: Enable SQ watchpoint for gfx10
Lancelot SIX [Wed, 3 Apr 2024 09:21:24 +0000 (10:21 +0100)]
drm/amdkfd: Enable SQ watchpoint for gfx10

There are new control registers introduced in gfx10 used to configure
hardware watchpoints triggered by SMEM instructions:
SQ_WATCH{0,1,2,3}_{CNTL_ADDR_HI,ADDR_L}.

Those registers work in a similar way as the TCP_WATCH* registers
currently used for gfx9 and above.

This patch adds support to program the SQ_WATCH registers for gfx10.

The SQ_WATCH?_CNTL.MASK field has one bit more than
TCP_WATCH?_CNTL.MASK, so SQ watchpoints can have a finer granularity
than TCP_WATCH watchpoints.  In this patch, we keep the capabilities
advertised to the debugger unchanged
(HSA_DBG_WATCH_ADDR_MASK_*_BIT_GFX10) as this reflects what both TCP and
SQ watchpoints can do and both watchpoints are configured together.

Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Check index msg_id before read or write
Alex Hung [Thu, 18 Apr 2024 19:27:43 +0000 (13:27 -0600)]
drm/amd/display: Check index msg_id before read or write

[WHAT]
msg_id is used as an array index and it cannot be a negative value, and
therefore cannot be equal to MOD_HDCP_MESSAGE_ID_INVALID (-1).

[HOW]
Check whether msg_id is valid before reading and setting.

This fixes 4 OVERRUN issues reported by Coverity.

Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: Fix buffer size in gfx_v9_4_3_init_ cp_compute_microcode() and rlc_microc...
Srinivasan Shanmugam [Thu, 25 Apr 2024 05:52:32 +0000 (11:22 +0530)]
drm/amdgpu: Fix buffer size in gfx_v9_4_3_init_ cp_compute_microcode() and rlc_microcode()

The function gfx_v9_4_3_init_microcode in gfx_v9_4_3.c was generating
about potential truncation of output when using the snprintf function.
The issue was due to the size of the buffer 'ucode_prefix' being too
small to accommodate the maximum possible length of the string being
written into it.

The string being written is "amdgpu/%s_mec.bin" or "amdgpu/%s_rlc.bin",
where %s is replaced by the value of 'chip_name'. The length of this
string without the %s is 16 characters. The warning message indicated
that 'chip_name' could be up to 29 characters long, resulting in a total
of 45 characters, which exceeds the buffer size of 30 characters.

To resolve this issue, the size of the 'ucode_prefix' buffer has been
reduced from 30 to 15. This ensures that the maximum possible length of
the string being written into the buffer will not exceed its size, thus
preventing potential buffer overflow and truncation issues.

Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c: In function ‘gfx_v9_4_3_early_init’:
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
  379 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name);
      |                                                    ^~
......
  439 |         r = gfx_v9_4_3_init_rlc_microcode(adev, ucode_prefix);
      |                                                 ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:379:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30
  379 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:52: warning: ‘%s’ directive output may be truncated writing up to 29 bytes into a region of size 23 [-Wformat-truncation=]
  413 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name);
      |                                                    ^~
......
  443 |         r = gfx_v9_4_3_init_cp_compute_microcode(adev, ucode_prefix);
      |                                                        ~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:413:9: note: ‘snprintf’ output between 16 and 45 bytes into a destination of size 30
  413 |         snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec.bin", chip_name);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 86301129698b ("drm/amdgpu: split gc v9_4_3 functionality from gc v9_0")
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Add NULL pointer check for kzalloc
Hersen Wu [Mon, 22 Apr 2024 16:27:34 +0000 (12:27 -0400)]
drm/amd/display: Add NULL pointer check for kzalloc

[Why & How]
Check return pointer of kzalloc before using it.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Handle Y carry-over in VCP X.Y calculation
George Shen [Thu, 16 Sep 2021 23:55:39 +0000 (19:55 -0400)]
drm/amd/display: Handle Y carry-over in VCP X.Y calculation

Theoretically rare corner case where ceil(Y) results in rounding up to
an integer. If this happens, the 1 should be carried over to the X
value.

CC: stable@vger.kernel.org
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: Fix ras mode2 reset failure in ras aca mode
YiPeng Chai [Wed, 24 Apr 2024 02:21:30 +0000 (10:21 +0800)]
drm/amdgpu: Fix ras mode2 reset failure in ras aca mode

Fix ras mode2 reset failure in ras aca mode.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: fix double free err_addr pointer warnings
Bob Zhou [Tue, 23 Apr 2024 05:32:39 +0000 (13:32 +0800)]
drm/amdgpu: fix double free err_addr pointer warnings

In amdgpu_umc_bad_page_polling_timeout, the amdgpu_umc_handle_bad_pages
will be run many times so that double free err_addr in some special case.
So set the err_addr to NULL to avoid the warnings.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: initialize the last_jump_jiffies in atom_exec_context
Jesse Zhang [Thu, 25 Apr 2024 02:04:08 +0000 (10:04 +0800)]
drm/amdgpu: initialize the last_jump_jiffies in atom_exec_context

The parameter "last_jump_jiffies" should be initialized
before being used in the function atom_op_jump.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: add check before free wb entry
Jesse Zhang [Wed, 24 Apr 2024 04:59:22 +0000 (12:59 +0800)]
drm/amdgpu: add check before free wb entry

Check if ring is not a mes queue before freeing the wb entry,
because we only allocate a wb entry when it's not a mes queue.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: add return result for amdgpu_i2c_{get/put}_byte
Bob Zhou [Wed, 24 Apr 2024 07:24:05 +0000 (15:24 +0800)]
drm/amdgpu: add return result for amdgpu_i2c_{get/put}_byte

After amdgpu_i2c_get_byte fail, amdgpu_i2c_put_byte shouldn't be
conducted to put wrong value.
So return and check the i2c transfer result.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: add error handle to avoid out-of-bounds
Bob Zhou [Tue, 23 Apr 2024 08:58:11 +0000 (16:58 +0800)]
drm/amdgpu: add error handle to avoid out-of-bounds

if the sdma_v4_0_irq_id_to_seq return -EINVAL, the process should
be stop to avoid out-of-bounds read, so directly return -EINVAL.

Signed-off-by: Bob Zhou <bob.zhou@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: Initialize timestamp for some legacy SOCs
Ma Jun [Mon, 22 Apr 2024 02:07:51 +0000 (10:07 +0800)]
drm/amdgpu: Initialize timestamp for some legacy SOCs

Initialize the interrupt timestamp for some legacy SOCs
to fix the coverity issue "Uninitialized scalar variable"

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: Use new interface to reserve bad page
YiPeng Chai [Mon, 8 Apr 2024 06:48:50 +0000 (14:48 +0800)]
drm/amdgpu: Use new interface to reserve bad page

Use new interface to reserve bad page.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: Fix address translation defect
YiPeng Chai [Wed, 3 Apr 2024 06:11:41 +0000 (14:11 +0800)]
drm/amdgpu: Fix address translation defect

retired_page is page frame and should be expanded
to the full address when querying status.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdkfd: Enforce queue BO's adev
Harish Kasiviswanathan [Mon, 22 Apr 2024 21:03:46 +0000 (17:03 -0400)]
drm/amdkfd: Enforce queue BO's adev

Queue buffer, though it is in system memory, has to be created using the
correct amdgpu device. Enforce this as the BO needs to mapped to the
GART for MES Hardware scheduler to access it.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Increase SAT_UPDATE_PENDING timeout
Dmytro Laktyushkin [Thu, 24 Jun 2021 14:50:45 +0000 (10:50 -0400)]
drm/amd/display: Increase SAT_UPDATE_PENDING timeout

Headless dp 2.0 will take longer to update.

Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Add some missing HDMI registers for DCN3x
Rodrigo Siqueira [Tue, 9 Apr 2024 21:25:23 +0000 (15:25 -0600)]
drm/amd/display: Add some missing HDMI registers for DCN3x

This commit add some missing HDMI control registers to DCN3x.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: support ACA logging ecc errors
YiPeng Chai [Fri, 22 Mar 2024 07:03:07 +0000 (15:03 +0800)]
drm/amdgpu: support ACA logging ecc errors

support ACA logging ecc errors.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: add poison consumption handler
YiPeng Chai [Mon, 11 Mar 2024 09:52:37 +0000 (17:52 +0800)]
drm/amdgpu: add poison consumption handler

Add poison consumption handler.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: prepare to handle pasid poison consumption
YiPeng Chai [Mon, 22 Apr 2024 09:40:03 +0000 (17:40 +0800)]
drm/amdgpu: prepare to handle pasid poison consumption

Prepare to handle pasid poison consumption.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: retire bad pages for umc v12_0
YiPeng Chai [Mon, 18 Mar 2024 06:24:13 +0000 (14:24 +0800)]
drm/amdgpu: retire bad pages for umc v12_0

Retire bad pages for umc v12_0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: add condition check for amdgpu_umc_fill_error_record
YiPeng Chai [Fri, 22 Mar 2024 04:06:01 +0000 (12:06 +0800)]
drm/amdgpu: add condition check for amdgpu_umc_fill_error_record

Add condition check for amdgpu_umc_fill_error_record.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: Add delay work to retire bad pages
YiPeng Chai [Mon, 22 Apr 2024 09:38:54 +0000 (17:38 +0800)]
drm/amdgpu: Add delay work to retire bad pages

Add delay work to retire bad pages.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: umc v12_0 logs ecc errors
YiPeng Chai [Mon, 18 Mar 2024 03:48:07 +0000 (11:48 +0800)]
drm/amdgpu: umc v12_0 logs ecc errors

1. umc v12_0 logs ecc errors.
2. Reserve newly detected ecc error pages.
3. Add tag for bad pages, so that they can
   be retired later.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: umc v12_0 converts error address
YiPeng Chai [Mon, 11 Mar 2024 09:29:26 +0000 (17:29 +0800)]
drm/amdgpu: umc v12_0 converts error address

Umc v12_0 converts error address.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: add interface to update umc v12_0 ecc status
YiPeng Chai [Tue, 19 Mar 2024 02:14:16 +0000 (10:14 +0800)]
drm/amdgpu: add interface to update umc v12_0 ecc status

Add interface to update umc v12_0 ecc status.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: add poison creation handler
YiPeng Chai [Mon, 11 Mar 2024 09:02:40 +0000 (17:02 +0800)]
drm/amdgpu: add poison creation handler

Add poison creation handler.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: prepare for logging ecc errors
YiPeng Chai [Mon, 22 Apr 2024 09:37:36 +0000 (17:37 +0800)]
drm/amdgpu: prepare for logging ecc errors

Prepare for logging ecc errors.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: add message fifo to handle RAS poison events
YiPeng Chai [Tue, 19 Mar 2024 01:57:58 +0000 (09:57 +0800)]
drm/amdgpu: add message fifo to handle RAS poison events

Add message fifo to handle RAS poison events.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc
Jesse Zhang [Wed, 24 Apr 2024 09:10:46 +0000 (17:10 +0800)]
drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc

Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x03000001.
V2: To really improve the handling we would actually
   need to have a separate value of 0xffffffff.(Christian)

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Add TMDS DC balancer control
Rodrigo Siqueira [Tue, 9 Apr 2024 21:14:49 +0000 (15:14 -0600)]
drm/amd/display: Add TMDS DC balancer control

Add TMDS balancer control to the list of available encoder registers for
DCN 30.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Remove unnecessary NULL check in dcn20_set_input_transfer_func
Srinivasan Shanmugam [Tue, 23 Apr 2024 03:14:40 +0000 (08:44 +0530)]
drm/amd/display: Remove unnecessary NULL check in dcn20_set_input_transfer_func

This commit removes an unnecessary NULL check in the
`dcn20_set_input_transfer_func` function in the `dcn20_hwseq.c` file.
The variable `tf` is assigned the address of
`plane_state->in_transfer_func` unconditionally, so it can never be
`NULL`. Therefore, the check `if (tf == NULL)` is unnecessary and has
been removed.

The plane_state->in_transfer_func itself cannot be NULL because it's a
structure, not a pointer. When we do tf =
&plane_state->in_transfer_func;, we're getting the address of that
structure, which will always be valid as long as plane_state itself is
not NULL.

we've already checked if plane_state is NULL with the line if (dpp_base
== NULL || plane_state == NULL) return false;. So, if the code execution
gets to the point where tf = &plane_state->in_transfer_func; is called,
plane_state is guaranteed to be not NULL, and therefore tf will also not
be NULL.

drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn20/dcn20_hwseq.c
    1094 bool dcn20_set_input_transfer_func(struct dc *dc,
    1095                                 struct pipe_ctx *pipe_ctx,
    1096                                 const struct dc_plane_state *plane_state)
    1097 {
    1098         struct dce_hwseq *hws = dc->hwseq;
    1099         struct dpp *dpp_base = pipe_ctx->plane_res.dpp;
    1100         const struct dc_transfer_func *tf = NULL;
                                                ^^^^^^^^^ This assignment is not necessary now.

    1101         bool result = true;
    1102         bool use_degamma_ram = false;
    1103
    1104         if (dpp_base == NULL || plane_state == NULL)
    1105                 return false;
    1106
    1107         hws->funcs.set_shaper_3dlut(pipe_ctx, plane_state);
    1108         hws->funcs.set_blend_lut(pipe_ctx, plane_state);
    1109
    1110         tf = &plane_state->in_transfer_func;
                 ^^^^^
Before there was an if statement but now tf is assigned unconditionally

    1111
--> 1112         if (tf == NULL) {
                 ^^^^^^^^^^^^^^^^^
so these conditions are impossible.

    1113                 dpp_base->funcs->dpp_set_degamma(dpp_base,
    1114                                 IPP_DEGAMMA_MODE_BYPASS);
    1115                 return true;
    1116         }
    1117
    1118         if (tf->type == TF_TYPE_HWPWL || tf->type == TF_TYPE_DISTRIBUTED_POINTS)
    1119                 use_degamma_ram = true;
    1120
    1121         if (use_degamma_ram == true) {
    1122                 if (tf->type == TF_TYPE_HWPWL)
    1123                         dpp_base->funcs->dpp_program_degamma_pwl(dpp_base,

Fixes the below Smatch static checker warning:
drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn20/dcn20_hwseq.c:1112 dcn20_set_input_transfer_func() warn: address of 'plane_state->in_transfer_func' is non-NULL

Fixes: 285a7054bf81 ("drm/amd/display: Remove plane and stream pointers from dc scratch")
Cc: Wenjing Liu <wenjing.liu@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Alvin Lee <alvin.lee2@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Qingqing Zhuo <Qingqing.Zhuo@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu/mes11: Use a separate fence per transaction
Alex Deucher [Fri, 19 Apr 2024 03:30:46 +0000 (23:30 -0400)]
drm/amdgpu/mes11: Use a separate fence per transaction

We can't use a shared fence location because each transaction
should be considered independently.

Reviewed-by: Shaoyun.liu <shaoyunl@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Add missing dwb registers
Rodrigo Siqueira [Tue, 9 Apr 2024 21:11:20 +0000 (15:11 -0600)]
drm/amd/display: Add missing dwb registers

DCN3.0 supports some specific DWB debug registers that are not exposed
yet. This commit just adds the missing registers.

Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: use mpcc_count to log MPC state
Melissa Wen [Fri, 12 Apr 2024 16:39:09 +0000 (13:39 -0300)]
drm/amd/display: use mpcc_count to log MPC state

According to [1]:
```
DTN only logs 'pipe_count' instances of MPCC. However in some cases
there are different number of MPCC than DPP (pipe_count).
```

As DTN log still relies on pipe_count to print mpcc state, switch to
mpcc_count in all occurrences.

[1] https://lore.kernel.org/amd-gfx/20240328195047.2843715-39-Roman.Li@amd.com/

Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: add a spinlock to wb allocation
Alex Deucher [Fri, 19 Apr 2024 22:03:32 +0000 (18:03 -0400)]
drm/amdgpu: add a spinlock to wb allocation

As we use wb slots more dynamically, we need to lock
access to avoid racing on allocation or free.

Reviewed-by: Shaoyun.liu <shaoyunl@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amdgpu: update fw_share for VCN5
Sonny Jiang [Tue, 23 Apr 2024 16:52:25 +0000 (12:52 -0400)]
drm/amdgpu: update fw_share for VCN5

kmd_fw_shared changed in VCN5

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
4 months agodrm/amd/display: Remove duplicated function signature from dcn3.01 DCCG
David Tadokoro [Thu, 22 Feb 2024 14:19:00 +0000 (11:19 -0300)]
drm/amd/display: Remove duplicated function signature from dcn3.01 DCCG

In the header file dc/dcn301/dcn301_dccg.h, the function dccg301_create
is declared twice, so remove duplication.

Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: David Tadokoro <davidbtadokoro@usp.br>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>