linux-block.git
3 months agodrm/amd/display: Increase halt timeout for DMCUB to 1s
Nicholas Kazlauskas [Thu, 13 Feb 2025 22:40:29 +0000 (17:40 -0500)]
drm/amd/display: Increase halt timeout for DMCUB to 1s

[Why]
If we soft reset before halt finishes and there are outstanding
memory transactions then the memory interface may produce unexpected
results, such as out of order transactions when the firmware next runs.

These can manifest as random or unexpected load/store violations.

[How]
Increase the timeout before soft reset to ensure the DMCUB has quiesced.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Remove unused header
Krunoslav Kovac [Fri, 14 Feb 2025 00:14:59 +0000 (19:14 -0500)]
drm/amd/display: Remove unused header

[Why]
Removes unused header

Reviewed-by: Samson Tam <samson.tam@amd.com>
Signed-off-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: handle max_downscale_src_width fail check
Yihan Zhu [Wed, 12 Feb 2025 20:17:56 +0000 (15:17 -0500)]
drm/amd/display: handle max_downscale_src_width fail check

[WHY]
If max_downscale_src_width check fails, we exit early from TAP calculation and left a NULL
value to the scaling data structure to cause the zero divide in the DML validation.

[HOW]
Call set default TAP calculation before early exit in get_optimal_number_of_taps due to
max downscale limit exceed.

Reviewed-by: Samson Tam <samson.tam@amd.com>
Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Update FIXED_VS Link Rate Toggle Workaround Usage
Michael Strauss [Fri, 24 Jan 2025 20:02:27 +0000 (15:02 -0500)]
drm/amd/display: Update FIXED_VS Link Rate Toggle Workaround Usage

[WHY]
Previously the 128b/132b LTTPR support DPCD field was used to decide if
FIXED_VS training sequence required a rate toggle before initiating LT.

When running DP2.1 4.9.x.x compliance tests, emulated LTTPRs can report
no-128b/132b support which is then forwarded by the FIXED_VS retimer.
As a result this test exposes the rate toggle again, erroneously causing
failures as certain compliance sinks don't expect this behaviour.

[HOW]
Add new DPCD register defines/reads to read LTTPR IEEE OUI and device ID.

Decide whether to perform the rate toggle based on the LTTPR's IEEE OUI
which guarantees that we only perform the toggle on affected retimers.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: fix dcn4x init failed
Charlene Liu [Thu, 13 Feb 2025 17:37:10 +0000 (12:37 -0500)]
drm/amd/display: fix dcn4x init failed

[why]
failed due to cmdtable not created.
switch atombios cmdtable as default.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Temporarily disable hostvm on DCN31
Aurabindo Pillai [Mon, 20 Jan 2025 20:27:23 +0000 (15:27 -0500)]
drm/amd/display: Temporarily disable hostvm on DCN31

With HostVM enabled, DCN31 fails to pass validation for 3x4k60. Some Linux
userspace does not downgrade one of the monitors to 4k30, and the result
is that the monitor does not light up. Disable it until the bandwidth
calculation failure is resolved.

Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: ACPI Re-timer Programming
Rafal Ostrowski [Wed, 12 Feb 2025 07:08:07 +0000 (08:08 +0100)]
drm/amd/display: ACPI Re-timer Programming

[Why]
We must implement an ACPI re-timer programming interface and notify
ACPI driver whenever a PHY transition is about to take place.

Because some trace lengths on certain platforms are very long,
then a re-timer may need to be programmed whenever a PHY transition
takes place. The implementation of this re-timer programming interface
will notify ACPI driver that PHY transition is taking place and it
will trigger the re-timer as needed.

First we need to gather retimer information from ACPI interface.

Then, in the PRE case, the re-timer interface needs to be called before we call
transmitter ENABLE.
In the POST case, it has to be called after we call transmitter DISABLE.

[How]
Implemented ACPI retimer programming interface.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Rafal Ostrowski <rostrows@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Refactor DCN4x and related code
Patel, Swapnil [Sun, 9 Feb 2025 16:42:23 +0000 (11:42 -0500)]
drm/amd/display: Refactor DCN4x and related code

[why & how]
Refactor existing code related to DCN4x for better code sharing with
other modules.

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Swapnil Patel <Swapnil.Patel@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: add a quirk to enable eDP0 on DP1
Yilin Chen [Fri, 7 Feb 2025 20:26:19 +0000 (15:26 -0500)]
drm/amd/display: add a quirk to enable eDP0 on DP1

[why]
some board designs have eDP0 connected to DP1, need a way to enable
support_edp0_on_dp1 flag, otherwise edp related features cannot work

[how]
do a dmi check during dm initialization to identify systems that
require support_edp0_on_dp1. Optimize quirk table with callback
functions to set quirk entries, retrieve_dmi_info can set quirks
according to quirk entries

Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Yilin Chen <Yilin.Chen@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: replace dio encoder access
Peichen Huang [Fri, 17 Jan 2025 02:48:11 +0000 (10:48 +0800)]
drm/amd/display: replace dio encoder access

[WHY]
replace dio encoder access to work with new dio encoder
assignment.

[HOW}
1. before validation, access dio encoder by get_temp_dio_link_enc()
2. after validation, access dio encoder through pipe_ctx->link_res

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Add SPL namespace
Navid Assadian [Mon, 20 Jan 2025 17:35:07 +0000 (12:35 -0500)]
drm/amd/display: Add SPL namespace

[Why]
In order to avoid component conflicts, spl namespace is needed.

[How]
Adding SPL namespace to the public API os that each user of SPL can have
their own namespace.

Signed-off-by: Navid Assadian <Navid.Assadian@amd.com>
Reviewed-by: Samson Tam <Samson.Tam@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Fix unit test failure
Samson Tam [Tue, 7 Jan 2025 19:17:15 +0000 (14:17 -0500)]
drm/amd/display: Fix unit test failure

[Why]
Some of unit tests use large scaling ratio such that when we
 calculate optimal number of taps, max_taps is negative.
 Then in recent change, we changed max_taps to uint instead
 of int so now max_taps wraps and is positive.  This change
 changed the behaviour from returning back false to return
 true and breaks unit test check

[How]
Add check to prevent max_taps from wrapping and set to 0
 instead

Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: fix check for identity ratio
Samson Tam [Tue, 21 Jan 2025 16:01:47 +0000 (11:01 -0500)]
drm/amd/display: fix check for identity ratio

[Why]
IDENTITY_RATIO check uses 2 bits for integer, which only allows
 checking downscale ratios up to 3.  But we support up to 6x
 downscale

[How]
Update IDENTITY_RATIO to check 3 bits for integer
Add ASSERT to catch if we downscale more than 6x

Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Jun Lei <jun.lei@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Fix mismatch type comparison
Assadian, Navid [Thu, 19 Dec 2024 22:19:09 +0000 (17:19 -0500)]
drm/amd/display: Fix mismatch type comparison

The mismatch type comparison/assignment may cause data loss. Since the
values are always non-negative, it is safe to use unsigned variables to
resolve the mismatch.

Signed-off-by: Navid Assadian <navid.assadian@amd.com>
Reviewed-by: Joshua Aberback <joshua.aberback@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Add opp recout adjustment
Navid Assadian [Mon, 20 Jan 2025 17:35:23 +0000 (12:35 -0500)]
drm/amd/display: Add opp recout adjustment

[Why]
For subsampled YUV output formats, more pixels can get fetched and be
used for scaling.

[How]
Add the adjustment to the calculated recout, so the viewport covers the
corresponding pixels on the source plane.

Signed-off-by: Navid Assadian <Navid.Assadian@amd.com>
Reviewed-by: Samson Tam <Samson.Tam@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Fix mismatch type comparison in custom_float
Samson Tam [Tue, 7 Jan 2025 19:16:04 +0000 (14:16 -0500)]
drm/amd/display: Fix mismatch type comparison in custom_float

[Why & How]
Passing uint into uchar function param.  Pass uint instead

Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Apply DCN35 DML2 state policy for DCN36 too
Nicholas Kazlauskas [Fri, 24 Jan 2025 14:59:37 +0000 (09:59 -0500)]
drm/amd/display: Apply DCN35 DML2 state policy for DCN36 too

[Why]
DCN36 should inherit the same policy as DCN35 for DML2.

[How]
Add it to the list of checks in translation helper.

Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: update incorrect cursor buffer size
Alex Hung [Tue, 11 Feb 2025 20:43:48 +0000 (13:43 -0700)]
drm/amd/display: update incorrect cursor buffer size

[WHAT & HOW]
Fix the incorrect value of the cursor_buffer_size.

Signed-off-by: Alex Hung <alex.hung@amd.com>
Reviewed-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Disable PSR-SU on eDP panels
Tom Chung [Thu, 6 Feb 2025 03:31:23 +0000 (11:31 +0800)]
drm/amd/display: Disable PSR-SU on eDP panels

[Why]
PSR-SU may cause some glitching randomly on several panels.

[How]
Temporarily disable the PSR-SU and fallback to PSR1 for
all eDP panels.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3388
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Revert "Disable PSR-SU on some OLED panel"
Tom Chung [Thu, 6 Feb 2025 03:30:17 +0000 (11:30 +0800)]
drm/amd/display: Revert "Disable PSR-SU on some OLED panel"

This reverts commit c31b41f1cb32450d8ac176eef9bda979760040e7.

We planning to disable the PSR-SU and fallback to PSR1 for
all eDP panels not only for specific eDP panel temporarily.

Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Fix spelling mistake "oustanding" -> "outstanding"
Colin Ian King [Mon, 17 Feb 2025 09:53:25 +0000 (09:53 +0000)]
drm/amd/display: Fix spelling mistake "oustanding" -> "outstanding"

There is a spelling mistake in max_oustanding_when_urgent_expected,
fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agoMAINTAINERS: Update AMDGPU DML maintainers info
Aurabindo Pillai [Fri, 21 Feb 2025 19:19:12 +0000 (14:19 -0500)]
MAINTAINERS: Update AMDGPU DML maintainers info

Chaitanya is no longer with AMD, and the responsibility has been
taken over by Austin.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: restore edid reading from a given i2c adapter
Melissa Wen [Sat, 15 Feb 2025 21:15:47 +0000 (18:15 -0300)]
drm/amd/display: restore edid reading from a given i2c adapter

When switching to drm_edid, we slightly changed how to get edid by
removing the possibility of getting them from dc_link when in aux
transaction mode. As MST doesn't initialize the connector with
`drm_connector_init_with_ddc()`, restore the original behavior to avoid
functional changes.

v2:
- Fix build warning of unchecked dereference (kernel test bot)

CC: Alex Hung <alex.hung@amd.com>
CC: Mario Limonciello <mario.limonciello@amd.com>
CC: Roman Li <Roman.Li@amd.com>
CC: Aurabindo Pillai <Aurabindo.Pillai@amd.com>
Fixes: 48edb2a4256e ("drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid")
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Remove unused nbif_v6_3_1_sriov_funcs
Dr. David Alan Gilbert [Wed, 19 Feb 2025 21:23:18 +0000 (21:23 +0000)]
drm/amdgpu: Remove unused nbif_v6_3_1_sriov_funcs

The nbif_v6_3_1_sriov_funcs instance of amdgpu_nbio_funcs was added in
commit 894c6d3522d1 ("drm/amdgpu: Add nbif v6_3_1 ip block support")
but has remained unused.

Alex has confirmed it wasn't needed.

Remove it, together with the four unused stub functions:
  nbif_v6_3_1_sriov_ih_doorbell_range
  nbif_v6_3_1_sriov_gc_doorbell_init
  nbif_v6_3_1_sriov_vcn_doorbell_range
  nbif_v6_3_1_sriov_sdma_doorbell_range

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agomailmap: Add entry for Rodrigo Siqueira
Rodrigo Siqueira [Wed, 19 Feb 2025 18:46:20 +0000 (11:46 -0700)]
mailmap: Add entry for Rodrigo Siqueira

Map all of my previously used email addresses to my @igalia.com address.

Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Add ring reset callback for JPEG5_0_1
Sathishkumar S [Tue, 18 Feb 2025 18:06:48 +0000 (23:36 +0530)]
drm/amdgpu: Add ring reset callback for JPEG5_0_1

Add ring reset function callback for JPEG5_0_1 to
recover from job timeouts without a full gpu reset.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agoMAINTAINERS: Change my role from Maintainer to Reviewer
Rodrigo Siqueira [Wed, 19 Feb 2025 18:46:19 +0000 (11:46 -0700)]
MAINTAINERS: Change my role from Maintainer to Reviewer

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Log after a successful ring reset
André Almeida [Thu, 20 Feb 2025 16:27:49 +0000 (13:27 -0300)]
drm/amdgpu: Log after a successful ring reset

When a ring reset happens, the kernel log shows only "amdgpu: Starting
<ring name> ring reset", but when it finishes nothing appears in the
log. Explicitly write in the log that the reset has finished correctly.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Log the creation of a coredump file
André Almeida [Thu, 20 Feb 2025 16:27:48 +0000 (13:27 -0300)]
drm/amdgpu: Log the creation of a coredump file

After a GPU reset happens, the driver creates a coredump file. However,
the user might not be aware of it. Log the file creation the user can
find more information about the device and add the file to bug reports.
This is similar to what the xe driver does.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/mes: keep enforce isolation up to date
Alex Deucher [Fri, 14 Feb 2025 17:32:30 +0000 (12:32 -0500)]
drm/amdgpu/mes: keep enforce isolation up to date

Re-send the mes message on resume to make sure the
mes state is up to date.

Fixes: 8521e3c5f058 ("drm/amd/amdgpu: limit single process inside MES")
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/pm: Use separate metrics table for smu_v13_0_12
Asad Kamal [Wed, 12 Feb 2025 08:34:03 +0000 (16:34 +0800)]
drm/amd/pm: Use separate metrics table for smu_v13_0_12

Use separate metrics table for smu_v13_0_12 and fetch metrics data using
that.

v2: Fix jpeg busy indexing (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Add core reset registers for JPEG5_0_1
Sathishkumar S [Tue, 18 Feb 2025 18:05:42 +0000 (23:35 +0530)]
drm/amdgpu: Add core reset registers for JPEG5_0_1

Add core reset control register definitions and align
all prior register definitions to end at 100 column
length for uniformity.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Per-instance init func for JPEG5_0_1
Sathishkumar S [Tue, 18 Feb 2025 17:56:37 +0000 (23:26 +0530)]
drm/amdgpu: Per-instance init func for JPEG5_0_1

Add helper functions to handle per-instance and per-core
initialization and deinitialization in JPEG5_0_1.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: fix an indent issue in DML21
Aurabindo Pillai [Fri, 21 Feb 2025 14:45:12 +0000 (09:45 -0500)]
drm/amd/display: fix an indent issue in DML21

Remove extraneous tab and newline in dml2_core_dcn4.c that was
reported by the bot

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202502211920.txUfwtSj-lkp@intel.com/
Fixes: 70839da6360 ("drm/amd/display: Add new DCN401 sources")
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agoMAINTAINERS: update amdgpu maintainers list
Alex Deucher [Tue, 11 Feb 2025 20:38:20 +0000 (15:38 -0500)]
MAINTAINERS: update amdgpu maintainers list

Xinhui's email is no longer valid.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: disable BAR resize on Dell G5 SE
Alex Deucher [Mon, 17 Feb 2025 15:55:05 +0000 (10:55 -0500)]
drm/amdgpu: disable BAR resize on Dell G5 SE

There was a quirk added to add a workaround for a Sapphire
RX 5600 XT Pulse that didn't allow BAR resizing.  However,
the quirk caused a regression with runtime pm on Dell laptops
using those chips, rather than narrowing the scope of the
resizing quirk, add a quirk to prevent amdgpu from resizing
the BAR on those Dell platforms unless runtime pm is disabled.

v2: update commit message, add runpm check

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707
Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/pm: Fetch fru product info for smu_v13_0_12
Asad Kamal [Wed, 12 Feb 2025 08:00:41 +0000 (16:00 +0800)]
drm/amd/pm: Fetch fru product info for smu_v13_0_12

Fetch fru product info for smu_v13_0_12 from static metrics table

v2: Field by field copy for fru info(Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/pm: Fetch static metrics table
Asad Kamal [Mon, 10 Feb 2025 16:03:18 +0000 (00:03 +0800)]
drm/amd/pm: Fetch static metrics table

Fetch clock frequency table from static metrics table for
smu_v13_0_12

v2: Move PPTable definition, remove unnecessary checks for getting
static metrics table(Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/pm: Add GetStaticMetricTable message
Asad Kamal [Mon, 10 Feb 2025 16:17:37 +0000 (00:17 +0800)]
drm/amd/pm: Add GetStaticMetricTable message

Add GetStaticMetricTable message for smu_v13_0_12

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/pm: Update pmfw headers for smu_v13_0_12
Asad Kamal [Mon, 10 Feb 2025 07:56:51 +0000 (15:56 +0800)]
drm/amd/pm: Update pmfw headers for smu_v13_0_12

Update pmfw headers for smu_v13_0_12 new messages & metrics table.
Static metrics table for frequency added, Separate metrics table
for smu_v13_0_12 added.

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Update amdgpu_job_timedout to check if the ring is guilty
Jesse.zhang@amd.com [Fri, 21 Feb 2025 02:26:52 +0000 (10:26 +0800)]
drm/amdgpu: Update amdgpu_job_timedout to check if the ring is guilty

This patch updates the `amdgpu_job_timedout` function to check if
the ring is actually guilty of causing the timeout. If not, it
skips error handling and fence completion.

v2: move the is_guilty check down into the queue reset area (Alex)
v3: need to call is_guilty before reset (Alex)
v4: squash in is_guilty logic fixes (Alex)

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/pm: add support for checking SDMA reset capability
Jesse.zhang@amd.com [Fri, 21 Feb 2025 06:02:05 +0000 (14:02 +0800)]
drm/amd/pm: add support for checking SDMA reset capability

This patch introduces a new function to check if the SMU supports resetting the SDMA engine.
This capability check ensures that the driver does not attempt to reset the SDMA engine
on hardware that does not support it.

The following changes are included:
- New function `amdgpu_dpm_reset_sdma_is_supported` to check SDMA reset
  support at the AMDGPU driver level.
- New function `smu_reset_sdma_is_supported` to check SDMA reset support
  at the SMU level.
- Implementation of `smu_v13_0_6_reset_sdma_is_supported` for the specific
  SMU version v13.0.6.
- Updated `smu_v13_0_6_reset_sdma` to use the new capability check before
  attempting to reset the SDMA engine.

v2: change smu_reset_sdma_is_supported type to bool (Tim)

Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Add reset function pointer for SDMA v4.4.2 page ring
Jesse.zhang@amd.com [Thu, 13 Feb 2025 05:28:51 +0000 (13:28 +0800)]
drm/amdgpu: Add reset function pointer for SDMA v4.4.2 page ring

This patch adds a reset function pointer to the SDMA v4.4.2 page ring
functionality. The new function pointer `reset` is set to
`sdma_v4_4_2_reset_queue`, which is responsible for resetting the SDMA queue.

Changes:
- Add `reset` function pointer to `sdma_v4_4_2_page_ring_funcs`.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Improve SDMA reset logic with guilty queue tracking
Jesse.zhang@amd.com [Thu, 20 Feb 2025 06:43:59 +0000 (14:43 +0800)]
drm/amdgpu: Improve SDMA reset logic with guilty queue tracking

This patch includes the remaining improvements to the SDMA reset logic:
- Added `gfx_guilty` and `page_guilty` flags to track guilty queues.
- Updated the reset and resume functions to handle the guilty state.
- Cached the `rptr` before reset.

v2:
   1.replace the caller with a guilty bool.
   If the queue is the guilty one, set the rptr and wptr  to the saved wptr value,
   else, set the rptr and wptr to the saved rptr value. (Alex)
   2. cache the rptr before the reset. (Alex)

v3: Keeping intermediate variables like u64 rwptr simplifies resotre rptr/wptr.(Lijo)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/sdma: Introduce is_guilty callbacks for sdma GFX and PAGE rings
Jesse.zhang@amd.com [Thu, 13 Feb 2025 02:51:38 +0000 (10:51 +0800)]
drm/amdgpu/sdma: Introduce is_guilty callbacks for sdma GFX and PAGE rings

This patch introduces the `is_guilty` callbacks for the GFX and PAGE rings.
These callbacks check if a ring is guilty of causing a timeout or error.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Introduce cached_rptr and is_guilty callback in amdgpu_ring
Jesse.zhang@amd.com [Thu, 13 Feb 2025 02:30:07 +0000 (10:30 +0800)]
drm/amdgpu: Introduce cached_rptr and is_guilty callback in amdgpu_ring

This patch introduces the following changes:
- Add `cached_rptr` to the `amdgpu_ring` structure to store the read pointer before a reset.
- Add `is_guilty` callback to the `amdgpu_ring_funcs` structure to check if a ring is guilty of causing a timeout.

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Introduce conditional user queue suspension for SDMA resets
Jesse.zhang@amd.com [Thu, 20 Feb 2025 06:25:47 +0000 (14:25 +0800)]
drm/amdgpu: Introduce conditional user queue suspension for SDMA resets

- Modify the `amdgpu_sdma_reset_engine` function to accept a `suspend_user_queues` parameter.
- This parameter allows the function to conditionally suspend and resume user queues during SDMA resets.
- Ensure that user queues are suspended only when necessary to avoid unnecessary overhead and potential deadlocks.
- Restart the scheduler's work queue for the GFX and page rings after the reset to allow new tasks to be submitted.

This change improves synchronization between the KGD and the KFD during SDMA resets,
ensuring proper handling of user queues and avoiding race conditions.

V2: replace the ring_lock with the existed the scheduler
    locks for the queues (ring->sched) on the sdma engine.(Alex)

v3: call drm_sched_wqueue_stop() rather than job_list_lock.
    If a GPU ring reset was already initiated for one ring at amdgpu_job_timedout,
    skip resetting that ring and call drm_sched_wqueue_stop()
    for the other rings (Alex)

   replace  the common lock (sdma_reset_lock) with DQM lock to
   to resolve reset races between the two driver sections during KFD eviction.(Jon)

   Rename the caller to Reset_src and
   Change AMDGPU_RESET_SRC_SDMA_KGD/KFD to AMDGPU_RESET_SRC_SDMA_HWS/RING (Jon)

v4: restart the wqueue if the reset was successful,
    or fall back to a full adapter reset. (Alex)

   move definition of reset source to enumeration AMDGPU_RESET_SRCS, and
   check reset src in amdgpu_sdma_reset_instance (Jon)

v5: Call amdgpu_amdkfd_suspend/resume at the start/end of reset function respectively under !SRC_HWS
    conditions only (Jon)

v6: replace the paramter src with a bool suspend_user_queues,
    remove the paramter src in pre/post func. (Jon)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Acked-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Remove redundant logic in GC v9.4.3
Lijo Lazar [Mon, 17 Feb 2025 05:11:56 +0000 (10:41 +0530)]
drm/amdgpu: Remove redundant logic in GC v9.4.3

GFXOFF check is not needed for GC v9.4.3. Also, save/restore list is
available by default.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Do not poweroff UVDJ in JPEG4_0_3
Sathishkumar S [Wed, 19 Feb 2025 04:29:03 +0000 (09:59 +0530)]
drm/amdgpu: Do not poweroff UVDJ in JPEG4_0_3

Update power gate setting to not poweroff UVDJ in JPEG4_0_3.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agoDocumentation/gpu: Add acronyms for some firmware components
Rodrigo Siqueira [Wed, 19 Feb 2025 18:53:45 +0000 (11:53 -0700)]
Documentation/gpu: Add acronyms for some firmware components

Users can check the file "/sys/kernel/debug/dri/0/amdgpu_firmware_info"
to get information on the firmware loaded in the system. This file has
multiple acronyms that are not documented in the glossary. This commit
introduces some missing acronyms to the AMD glossary documentation. The
meaning of each acronym in this commit was extracted from code
documentation available in the following files:

- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
- drivers/gpu/drm/amd/include/amd_shared.h

Changes since v1:
- Expand acronym meanings based on Alex Deucher suggestions.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/sdma: Refactor SDMA reset functionality and add callback support
Jesse.zhang@amd.com [Thu, 23 Jan 2025 08:21:47 +0000 (16:21 +0800)]
drm/amdgpu/sdma: Refactor SDMA reset functionality and add callback support

This patch refactors the SDMA reset functionality in the `sdma_v4_4_2` driver
to improve modularity and support shared usage between AMDGPU and KFD. The
changes include:

1. **Refactored SDMA Reset Logic**:
   - Split the `sdma_v4_4_2_reset_queue` function into two separate functions:
     - `sdma_v4_4_2_stop_queue`: Stops the SDMA queue before reset.
     - `sdma_v4_4_2_restore_queue`: Restores the SDMA queue after reset.
   - These functions are now used as callbacks for the shared reset mechanism.

2. **Added Callback Support**:
   - Introduced a new structure `sdma_v4_4_2_reset_funcs` to hold the stop and
     restore callbacks.
   - Added `sdma_v4_4_2_set_reset_funcs` to register these callbacks with the
     shared reset mechanism using `amdgpu_set_on_reset_callbacks`.

3. **Fixed Reset Queue Function**:
   - Modified `sdma_v4_4_2_reset_queue` to use the shared `amdgpu_sdma_reset_queue`
     function, ensuring consistency across the driver.

This patch ensures that SDMA reset functionality is more modular, reusable, and
aligned with the shared reset mechanism between AMDGPU and KFD.

v2: Renamed sdma_v4_4_2_set_reset_funcs to sdma_v4_4_2_set_engine_reset_funcs.
    Renamed sdma_v4_4_2_reset_funcs to sdma_v4_4_2_engine_reset_funcs.(Alex)

Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/kfd: Add shared SDMA reset functionality with callback support
Jesse.zhang@amd.com [Tue, 21 Jan 2025 01:18:44 +0000 (09:18 +0800)]
drm/amdgpu/kfd: Add shared SDMA reset functionality with callback support

This patch introduces shared SDMA reset functionality between AMDGPU and KFD.
The implementation includes the following key changes:

1. Added `amdgpu_sdma_reset_queue`:
   - Resets a specific SDMA queue by instance ID.
   - Invokes registered pre-reset and post-reset callbacks to allow KFD and AMDGPU
     to save/restore their state during the reset process.

2. Added `amdgpu_set_on_reset_callbacks`:
   - Allows KFD and AMDGPU to register callback functions for pre-reset and
     post-reset operations.
   - Callbacks are stored in a global linked list and invoked in the correct order
     during SDMA reset.

This patch ensures that both AMDGPU and KFD can handle SDMA reset events
gracefully, with proper state saving and restoration. It also provides a flexible
callback mechanism for future extensions.

v2: fix CamelCase and put the SDMA helper into amdgpu_sdma.c (Alex)

v3: rename the `amdgpu_register_on_reset_callbacks` function to
      `amdgpu_sdma_register_on_reset_callbacks`
    move global reset_callback_list to struct amdgpu_sdma (Alex)

v4: Update the reset callback function description and
   rename the reset function to amdgpu_sdma_reset_engine (Alex)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: correct the name of mes_pipe structure
Likun Gao [Wed, 19 Feb 2025 07:14:32 +0000 (15:14 +0800)]
drm/amdgpu: correct the name of mes_pipe structure

Correct the structure name admgpu_mes_pipe to amdgpu_mes_pipe.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdkfd: Preserve cp_hqd_pq_control on update_mqd
David Yat Sin [Wed, 19 Feb 2025 22:34:38 +0000 (17:34 -0500)]
drm/amdkfd: Preserve cp_hqd_pq_control on update_mqd

When userspace applications call AMDKFD_IOC_UPDATE_QUEUE. Preserve
bitfields that do not need to be modified as they contain flags to
track queue states that are used by CP FW.

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agoamdgpu/pm/legacy: fix suspend/resume issues
chr[] [Wed, 12 Feb 2025 15:51:38 +0000 (16:51 +0100)]
amdgpu/pm/legacy: fix suspend/resume issues

resume and irq handler happily races in set_power_state()

* amdgpu_legacy_dpm_compute_clocks() needs lock
* protect irq work handler
* fix dpm_enabled usage

v2: fix clang build, integrate Lijo's comments (Alex)

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2524
Fixes: 3712e7a49459 ("drm/amd/pm: unified lock protections in amdgpu_dpm.c")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> # on Oland PRO
Signed-off-by: chr[] <chris@rudorff.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: update the handle ptr in is_idle
Sunil Khatri [Wed, 19 Feb 2025 18:10:52 +0000 (23:40 +0530)]
drm/amdgpu: update the handle ptr in is_idle

Update the *handle to amdgpu_ip_block ptr for all
functions pointers of is_idle.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: remove all KFD fences from the BO on release
Christian König [Wed, 29 Jan 2025 15:28:49 +0000 (16:28 +0100)]
drm/amdgpu: remove all KFD fences from the BO on release

Remove all KFD BOs from the private dma_resv object.

This prevents the KFD from being evict unecessarily when an exported BO
is released.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-and-tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: update the handle ptr in get_clockgating_state
Sunil Khatri [Wed, 19 Feb 2025 15:29:02 +0000 (20:59 +0530)]
drm/amdgpu: update the handle ptr in get_clockgating_state

Update the *handle to amdgpu_ip_block ptr for all
functions pointers of get_clockgating_state.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Add clear DCC and Tiling callback for DCE
Rodrigo Siqueira [Tue, 4 Feb 2025 15:59:35 +0000 (08:59 -0700)]
drm/amd/display: Add clear DCC and Tiling callback for DCE

Introduce the DCC and Tiling reset callback to all DCE versions that can
call it.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdkfd: Fix error handling for missing PASID in 'kfd_process_device_init_vm'
Srinivasan Shanmugam [Mon, 17 Feb 2025 06:37:22 +0000 (12:07 +0530)]
drm/amdkfd: Fix error handling for missing PASID in 'kfd_process_device_init_vm'

In the kfd_process_device_init_vm function, a valid error code is now
returned when the associated Process Address Space ID (PASID) is not
present.

If the address space virtual memory (avm) does not have an associated
PASID, the function sets the ret variable to -EINVAL before proceeding
to the error handling section. This ensures that the calling function,
such as kfd_ioctl_acquire_vm, can appropriately handle the error,
thereby preventing any issues during virtual memory initialization.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:1694 kfd_process_device_init_vm()
warn: missing error code 'ret'

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c
    1647 int kfd_process_device_init_vm(struct kfd_process_device *pdd,
    1648                                struct file *drm_file)
    1649 {
    ...
    1690
    1691         if (unlikely(!avm->pasid)) {
    1692                 dev_warn(pdd->dev->adev->dev, "WARN: vm %p has no pasid associated",
    1693                                  avm);
--> 1694                 goto err_get_pasid;

ret = -EINVAL?

    1695         }

Fixes: 8544374c0f82 ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver")
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Xiaogang Chen <xiaogang.chen@amd.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Remove redundant check of adev
Xiang Liu [Wed, 19 Feb 2025 04:27:27 +0000 (12:27 +0800)]
drm/amdgpu: Remove redundant check of adev

There is no need to check adev for sure.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Check aca enabled inside cper init/fini func
Xiang Liu [Wed, 19 Feb 2025 04:21:59 +0000 (12:21 +0800)]
drm/amdgpu: Check aca enabled inside cper init/fini func

Move code about checking aca enabled to the cper init/fini function
to make code clean.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Use firmware supported NPS modes
Lijo Lazar [Wed, 27 Nov 2024 06:16:06 +0000 (11:46 +0530)]
drm/amdgpu: Use firmware supported NPS modes

If firmware supported NPS modes are available through CAP register, use
those values for supported NPS modes.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/pm: Fetch current power limit from PMFW
Lijo Lazar [Tue, 18 Feb 2025 12:13:01 +0000 (17:43 +0530)]
drm/amd/pm: Fetch current power limit from PMFW

On SMU v13.0.12, always query the firmware to get the current power
limit as it could be updated through other means also.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Add ring reset callback for JPEG4_0_3
Sathishkumar S [Wed, 29 Jan 2025 13:31:25 +0000 (19:01 +0530)]
drm/amdgpu: Add ring reset callback for JPEG4_0_3

Add ring reset function callback for JPEG4_0_3 to
recover from job timeouts without a full gpu reset.

V2:
 - sched->ready flag shouldn't be modified by HW backend (Christian)

V3:
 - Dont modifying sched/job-submission state from HW backend (Christian)
 - Implement per-core reset sequence

V4:
 -  Dont create reset_mask sysfs and return -EOPNOTSUPP on VFs (Lijo)

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Add JPEG4_0_3 core reset control reg
Sathishkumar S [Wed, 12 Feb 2025 04:51:40 +0000 (10:21 +0530)]
drm/amdgpu: Add JPEG4_0_3 core reset control reg

Add core reset control registers for JPEG4_0_3

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority...
Srinivasan Shanmugam [Fri, 14 Feb 2025 08:52:17 +0000 (14:22 +0530)]
drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV

RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.

The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.

The [ BUG: Invalid wait context ] indicates that a thread is trying to
acquire a mutex while it is in a context that does not allow it to sleep
(like holding a spinlock).

Fixes the below:

[  253.013423] =============================
[  253.013434] [ BUG: Invalid wait context ]
[  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G     U     OE
[  253.013464] -----------------------------
[  253.013475] kworker/0:1/10 is trying to lock:
[  253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.013815] other info that might help us debug this:
[  253.013827] context-{4:4}
[  253.013835] 3 locks held by kworker/0:1/10:
[  253.013847]  #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[  253.013877]  #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[  253.013905]  #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
[  253.014154] stack backtrace:
[  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G     U     OE      6.12.0-amdstaging-drm-next-lol-050225 #14
[  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
[  253.014224] Workqueue: events work_for_cpu_fn
[  253.014241] Call Trace:
[  253.014250]  <TASK>
[  253.014260]  dump_stack_lvl+0x9b/0xf0
[  253.014275]  dump_stack+0x10/0x20
[  253.014287]  __lock_acquire+0xa47/0x2810
[  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014321]  lock_acquire+0xd1/0x300
[  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014562]  ? __lock_acquire+0xa6b/0x2810
[  253.014578]  __mutex_lock+0x85/0xe20
[  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014782]  ? sched_clock_noinstr+0x9/0x10
[  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014808]  ? local_clock_noinstr+0xe/0xc0
[  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.015029]  mutex_lock_nested+0x1b/0x30
[  253.015044]  ? mutex_lock_nested+0x1b/0x30
[  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
[  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
[  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
[  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]
[  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
[  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
[  253.017057]  amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
[  253.017493]  local_pci_probe+0x4b/0xb0
[  253.017746]  work_for_cpu_fn+0x1a/0x30
[  253.017995]  process_one_work+0x21e/0x680
[  253.018248]  worker_thread+0x190/0x330
[  253.018500]  ? __pfx_worker_thread+0x10/0x10
[  253.018746]  kthread+0xe7/0x120
[  253.018988]  ? __pfx_kthread+0x10/0x10
[  253.019231]  ret_from_fork+0x3c/0x60
[  253.019468]  ? __pfx_kthread+0x10/0x10
[  253.019701]  ret_from_fork_asm+0x1a/0x30
[  253.019939]  </TASK>

v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian).

Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface")
Cc: lin cao <lin.cao@amd.com>
Cc: Jingwen Chen <Jingwen.Chen2@amd.com>
Cc: Victor Skvortsov <victor.skvortsov@amd.com>
Cc: Zhigang Luo <zhigang.luo@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: 3.2.321
Taimur Hassan [Mon, 10 Feb 2025 00:10:14 +0000 (19:10 -0500)]
drm/amd/display: 3.2.321

Summary:

* Add support for disconnected eDP streams
* Add log for MALL entry on DCN32x
* Add DCC/Tiling reset helper for DCN and DCE
* Guard against setting dispclk low when active
* Other minor fixes

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Add support for disconnected eDP streams
Harry VanZyllDeJong [Fri, 7 Feb 2025 18:46:53 +0000 (13:46 -0500)]
drm/amd/display: Add support for disconnected eDP streams

[Why]
eDP may not be connected to the GPU on driver start causing
fail enumeration.

[How]
Move the virtual signal type check before the eDP connector
signal check.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Harry VanZyllDeJong <hvanzyll@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: dpia should avoid encoder used by dp2
Peichen Huang [Tue, 4 Feb 2025 08:00:40 +0000 (16:00 +0800)]
drm/amd/display: dpia should avoid encoder used by dp2

[WHY]
In current HPO DP2 implementation, driver would enable/disable DIG
encoder when configuring HPO DP2. Therefore, usb4 dp tunnelling should
not use the DIG encoder if the corresponded phy is used by a HPO DP2
stream.

[HOW]
A DP2 stream is treated as a dig stream.

Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Guard against setting dispclk low when active
Nicholas Kazlauskas [Mon, 3 Feb 2025 14:49:58 +0000 (09:49 -0500)]
drm/amd/display: Guard against setting dispclk low when active

[Why]
We should never apply a minimum dispclk value while in prepare_bandwidth
or while displays are active. This is always an optimization for when
all displays are disabled.

[How]
Defer dispclk optimization until safe_to_lower = true and display_count
reaches 0.

Since 0 has a special value in this logic (ie. no dispclk required)
we also need adjust the logic that clamps it for the actual request
to PMFW.

Reviewed-by: Gabe Teeger <gabe.teeger@amd.com>
Reviewed-by: Leo Chen <leo.chen@amd.com>
Reviewed-by: Syed Hassan <syed.hassan@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Fix BT2020 YCbCr limited/full range input
Ilya Bakoulin [Wed, 29 Jan 2025 19:46:27 +0000 (14:46 -0500)]
drm/amd/display: Fix BT2020 YCbCr limited/full range input

[Why]
BT2020 YCbCr input is not handled properly when full range
quantization is used and limited range is not supported at all.

[How]
- Add enums for BT2020 YCbCr limited/full range
- Add limited range CSC matrix

Reviewed-by: Krunoslav Kovac <krunoslav.kovac@amd.com>
Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Robert Mader <robert.mader@collabora.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Add log for MALL entry on DCN32x
Aurabindo Pillai [Fri, 24 Jan 2025 21:10:57 +0000 (16:10 -0500)]
drm/amd/display: Add log for MALL entry on DCN32x

[Why&How]
Add a dyndbg log entry to check whether the driver requested scanout
from MALL cache to PMFW via DMCUB

Reviewed-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Add total_num_dpps_required field to informative structure
Oleh Kuzhylnyi [Tue, 4 Feb 2025 18:33:00 +0000 (19:33 +0100)]
drm/amd/display: Add total_num_dpps_required field to informative structure

[Why]
The informative structure needs to be extended by the total number of DPPs
required per each active plane.
The new informative field is going to be used as a statistical indicator.

[How]
The dml2_core_calcs_get_informative() routine must count a total number of DPPs.

Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Oleh Kuzhylnyi <okuzhyln@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Read LTTPR ALPM caps during link cap retrieval
George Shen [Tue, 4 Feb 2025 19:34:02 +0000 (14:34 -0500)]
drm/amd/display: Read LTTPR ALPM caps during link cap retrieval

[Why]
The latest DP spec requires the DP TX to read DPCD F0000h through F0009h
when detecting LTTPR capabilities for the first time.

[How]
Update LTTPR cap retrieval to read up to F0009h (two more bytes than the
previous F0007h), and store the LTTPR ALPM capabilities.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Print seamless boot message in mark_seamless_boot_stream
Alex Hung [Tue, 4 Feb 2025 18:51:29 +0000 (11:51 -0700)]
drm/amd/display: Print seamless boot message in mark_seamless_boot_stream

[WHAT & HOW]
Add a message so users know the stream will be used for seamless boot.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Add clear DCC and Tiling callback for DCN
Rodrigo Siqueira [Tue, 4 Feb 2025 15:31:33 +0000 (08:31 -0700)]
drm/amd/display: Add clear DCC and Tiling callback for DCN

Introduce the DCC and Tiling reset callback to all DCN versions that can
call it.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Rename panic function
Rodrigo Siqueira [Tue, 4 Feb 2025 15:34:58 +0000 (08:34 -0700)]
drm/amd/display: Rename panic function

Rename dc_plane_force_update_for_panic to
dc_plane_force_dcc_and_tiling_disable to describe the function operation
in the name. Also, this function might be used in other contexts, and a
more generic name can be helpful for this purpose.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Add DCC/Tiling reset helper for DCN and DCE
Rodrigo Siqueira [Tue, 4 Feb 2025 15:07:21 +0000 (08:07 -0700)]
drm/amd/display: Add DCC/Tiling reset helper for DCN and DCE

This commit introduces a function helper for resetting DCN/DCE DCC and
tiling. Those functions are generic for their respective DCN/DCE, so
they were added to the oldest version of each architecture.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agoRevert "drm/amd/display: Request HW cursor on DCN3.2 with SubVP"
Leo Zeng [Fri, 31 Jan 2025 16:46:52 +0000 (11:46 -0500)]
Revert "drm/amd/display: Request HW cursor on DCN3.2 with SubVP"

This reverts commit 13437c91606c9232c747475e202fe3827cd53264.

Reason to revert: idle power regression found in testing.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Leo Zeng <Leo.Zeng@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Don't treat wb connector as physical in create_validate_stream_for_sink
Harry Wentland [Fri, 31 Jan 2025 16:57:49 +0000 (11:57 -0500)]
drm/amd/display: Don't treat wb connector as physical in create_validate_stream_for_sink

Don't try to operate on a drm_wb_connector as an amdgpu_dm_connector.
While dereferencing aconnector->base will "work" it's wrong and
might lead to unknown bad things. Just... don't.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Exit idle optimizations before accessing PHY
Ovidiu Bunea [Mon, 3 Feb 2025 20:43:32 +0000 (15:43 -0500)]
drm/amd/display: Exit idle optimizations before accessing PHY

[why & how]
By default, DCN HW is in idle optimized state which does not allow access
to PHY registers. If BIOS powers up the DCN, it is fine because they will
power up everything. Only exit idle optimized state when not taking control
from VBIOS.

Fixes: be704e5ef4bd ("Revert "drm/amd/display: Exit idle optimizations before attempt to access PHY"")
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Generate bad page threshold cper records
Xiang Liu [Tue, 11 Feb 2025 11:45:52 +0000 (19:45 +0800)]
drm/amdgpu: Generate bad page threshold cper records

Generate CPER record when bad page threshold exceed and
commit to CPER ring.

v2: return -ENOMEM instead of false
v2: check return value of fill section function

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Commit CPER entry
Xiang Liu [Wed, 12 Feb 2025 12:17:11 +0000 (20:17 +0800)]
drm/amdgpu: Commit CPER entry

Commit the CPER entry to the ring buffer.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: add mutex lock for cper ring
Tao Zhou [Mon, 10 Feb 2025 07:28:37 +0000 (15:28 +0800)]
drm/amdgpu: add mutex lock for cper ring

Avoid the confliction between read and write of ring buffer.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/pm: Limit jpeg rings as per max for jpeg_v_4_0_3
Asad Kamal [Fri, 14 Feb 2025 05:54:01 +0000 (13:54 +0800)]
drm/amd/pm: Limit jpeg rings as per max for jpeg_v_4_0_3

Since pmfw supports for smuv13_0_6 is limited to 8 jpeg rings per instance,
which is the max for jpeg_v_4_0_3. Limit it to same to avoid out
of bound access.

Fixes: 568199a5c7a9 ("drm/amd/pm: Limit to 8 jpeg rings per instance")
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: add data write function for CPER ring
Tao Zhou [Wed, 22 Jan 2025 09:08:11 +0000 (17:08 +0800)]
drm/amdgpu: add data write function for CPER ring

Old CPER data will be overwritten if ring buffer is full, and read
pointer always points to CPER header.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: read CPER ring via debugfs
Tao Zhou [Wed, 22 Jan 2025 08:57:53 +0000 (16:57 +0800)]
drm/amdgpu: read CPER ring via debugfs

We read CPER data from read pointer to write pointer without changing
the pointers.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: add RAS CPER ring buffer
Tao Zhou [Wed, 22 Jan 2025 08:55:51 +0000 (16:55 +0800)]
drm/amdgpu: add RAS CPER ring buffer

And initialize it, this is a pure software ring to store RAS CPER data.

v2: change ring size to 0x100000
v2: update the initialization of count_dw of cper ring, it's dword
variable
v3: skip VM inv eng for cper
v3: init/fini when aca enabled

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Get timestamp from system time
Xiang Liu [Tue, 11 Feb 2025 03:39:06 +0000 (11:39 +0800)]
drm/amdgpu: Get timestamp from system time

Get system local time and encode it to timestamp for CPER.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/mes12: allocate hw_resource_1 buffer once
Alex Deucher [Fri, 14 Feb 2025 15:18:21 +0000 (10:18 -0500)]
drm/amdgpu/mes12: allocate hw_resource_1 buffer once

Allocate the buffer at sw init time so we don't alloc
and free it for every suspend/resume or reset cycle.

Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu/mes11: allocate hw_resource_1 buffer once
Alex Deucher [Fri, 14 Feb 2025 15:11:17 +0000 (10:11 -0500)]
drm/amdgpu/mes11: allocate hw_resource_1 buffer once

Allocate the buffer at sw init time so we don't alloc
and free it for every suspend/resume or reset cycle.

Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/display: Reapply 2fde4fdddc1f
Nathan Chancellor [Wed, 24 Jul 2024 15:49:35 +0000 (08:49 -0700)]
drm/amd/display: Reapply 2fde4fdddc1f

Commit 2563391e57b5 ("drm/amd/display: DML2.1 resynchronization") blew
away the compiler warning fix from commit 2fde4fdddc1f
("drm/amd/display: Avoid -Wenum-float-conversion in
add_margin_and_round_to_dfs_grainularity()"), causing the warning to
reappear.

  drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_dpmm/dml2_dpmm_dcn4.c:183:58: error: arithmetic between enumeration type 'enum dentist_divider_range' and floating-point type 'double' [-Werror,-Wenum-float-conversion]
    183 |         divider = (unsigned int)(DFS_DIVIDER_RANGE_SCALE_FACTOR * (vco_freq_khz / clock_khz));
        |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
  1 error generated.

Apply the fix again to resolve the warning.

Fixes: 1b30456150e5 ("drm/amd/display: DML21 Reintegration")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Generate cper records
Hawking Zhang [Tue, 11 Feb 2025 11:54:05 +0000 (19:54 +0800)]
drm/amdgpu: Generate cper records

Encode the error information in CPER format and commit
to the cper ring

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <keivnyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdkfd: Fix user queue validation on Gfx7/8
Philip Yang [Wed, 29 Jan 2025 17:37:30 +0000 (12:37 -0500)]
drm/amdkfd: Fix user queue validation on Gfx7/8

To workaround queue full h/w issue on Gfx7/8, when application create
AQL queue, the ring buffer bo allocate size is queue_size/2 and
map queue_size ring buffer to GPU in 2 pieces using 2 attachments, each
attachment map size is queue_size/2, with same ring_bo backing memory.

For Gfx7/8, user queue buffer validation should use queue_size/2 to
verify ring_bo allocation and mapping size.

Fixes: 68e599db7a54 ("drm/amdkfd: Validate user queue buffers")
Suggested-by: Tomáš Trnka <trnka@scm.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Introduce funcs for generating cper record
Hawking Zhang [Sun, 26 Jan 2025 09:15:48 +0000 (17:15 +0800)]
drm/amdgpu: Introduce funcs for generating cper record

Introduce new functions that are used to generate
cper ue or ce records.

v2: return -ENOMEM instead of false
v2: check return value of fill section function

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Yang Wang <keivnyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Include ACA error type in aca bank
Hawking Zhang [Sun, 26 Jan 2025 08:32:57 +0000 (16:32 +0800)]
drm/amdgpu: Include ACA error type in aca bank

ACA error types managed by driver a direct 1:1
correspondence with those managed by firmware.

To address this, for each ACA bank, include
both the ACA error type and the ACA SMU type.

This addition is useful for creating CPER records.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <keivnyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Optimize the enablement of GECC
Candice Li [Tue, 11 Feb 2025 01:58:24 +0000 (09:58 +0800)]
drm/amdgpu: Optimize the enablement of GECC

Enable GECC only when the default memory ECC mode or
the module parameter amdgpu_ras_enable is activated.

v2: Add kernel message to remind users explicitly set
    amdgpu_ras_enable=1 before driver loading to enable GECC
    and set amdgpu_ras_enable=0 to disable GECC when GECC is
    currently enabled if needed.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amdgpu: Introduce funcs for populating CPER
Hawking Zhang [Fri, 24 Jan 2025 15:37:33 +0000 (23:37 +0800)]
drm/amdgpu: Introduce funcs for populating CPER

Introduce utility functions designed to assist
in populating CPER records.

v2: call cper_init/fini in device_ip_init/fini.

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 months agodrm/amd/include: Add amd cper header
Hawking Zhang [Fri, 24 Jan 2025 15:31:10 +0000 (23:31 +0800)]
drm/amd/include: Add amd cper header

AMD is using Common Platform Error Record (CPER) format
to report all gpu hardware errors.

v2: add program attribute

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>