linux-block.git
19 months agoMerge tag 'amd-drm-next-6.3-2023-01-27' of https://gitlab.freedesktop.org/agd5f/linux...
Dave Airlie [Mon, 30 Jan 2023 05:37:55 +0000 (15:37 +1000)]
Merge tag 'amd-drm-next-6.3-2023-01-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.3-2023-01-27:

amdgpu:
- GC11 fixes
- SMU13 fixes
- Freesync fixes
- DP MST fixes
- DP MST code rework and cleanup
- AV1 fixes for VCN4
- DCN 3.2.x fixes
- PSR fixes
- DML optimizations
- DC link code rework

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127225917.2419162-1-alexander.deucher@amd.com
19 months agoMerge tag 'drm/tegra/for-6.3-rc1' of https://gitlab.freedesktop.org/drm/tegra into...
Dave Airlie [Mon, 30 Jan 2023 04:17:06 +0000 (14:17 +1000)]
Merge tag 'drm/tegra/for-6.3-rc1' of https://gitlab.freedesktop.org/drm/tegra into drm-next

drm/tegra: Changes for v6.3-rc1

This set of changes includes a rework of the custom syncpoint interrupt
code to take better advantage of existing DRM/KMS infrastructure.

There's also various bits of cleanup and fixes included.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127170119.495943-1-thierry.reding@gmail.com
19 months agoMerge tag 'drm-next-20230127' of git://git.kernel.org/pub/scm/linux/kernel/git/pincha...
Dave Airlie [Mon, 30 Jan 2023 03:49:46 +0000 (13:49 +1000)]
Merge tag 'drm-next-20230127' of git://git./linux/kernel/git/pinchartl/linux into drm-next

Renesas R-Car DU fixes and improvements

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y9QCw3SkHm6k1bwJ@pendragon.ideasonboard.com
19 months agoMerge tag 'drm-intel-next-2023-01-27' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Mon, 30 Jan 2023 03:35:56 +0000 (13:35 +1000)]
Merge tag 'drm-intel-next-2023-01-27' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

drm/i915 feature pull #2 v6.3:

Features and functionality:
- Enable HF-EEODB by switching HDMI, DP and LVDS to use struct drm_edid (Jani)
- Start using unversioned DMC firmware paths for new platforms (Gustavo)

Refactoring and cleanups:
- ELD refactor: Stop using hardware buffer, precompute ELD, and wire up ELD in
  the state checker (Ville)
- Use generics for debugfs device parameters (Jani)
- DSB refactoring and fixes (Ville)
- Header refactoring, add new intel_display_limits.h (Jani)
- Split out GMCH code to a new file (Jani)
- Split out vblank code to a new file (Jani)
- i915_drv.h and struct drm_i915_private cleanups (Jani)
- Simplify FBC and DRRS debug attributes (Deepak R Varma)
- Remove some single-use macros (Rodrigo)

Fixes:
- Fix scaler limits for display versions 12 and 13 (Luca)
- Fix plane source size check for zero height (Drew Davenport)
- Implement PSR2 selective fetch workaround (Jouni)
- Expand a PSR workaound to more platforms and pipes (Jouni)
- Expand an HDMI infoframe workaround to all MTL steppings (Jouni)
- Enable PIPEDMC whenever its corresponding pipe is enabled (Imre)

Merges:
- Backmerge drm-next (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87tu0c44gv.fsf@intel.com
19 months agoMerge tag 'drm-habanalabs-next-2023-01-26' of https://git.kernel.org/pub/scm/linux...
Dave Airlie [Mon, 30 Jan 2023 02:43:10 +0000 (12:43 +1000)]
Merge tag 'drm-habanalabs-next-2023-01-26' of https://git./linux/kernel/git/ogabbay/linux into drm-next

This tag contains habanalabs driver and accel changes for v6.3:

- Moved the driver to the accel subsystem. Currently only the files were
  moved (including the uapi file which was also renamed). This doesn't
  include registering to the accel subsystem. This will probably be only
  in the next kernel version.

- In case of decoder error (axi error) in Gaudi2, we can now find the exact
  IP that initiated the erroneous transaction and print the details for
  better debug.

- Add more trace events. We now can trace mmio transactions and communication
  with the preboot firmware.

- Add to Gaudi2 support for abrupt reset that is done by the firmware. This
  was support so far only for Gaudi1.

- Add uAPI to flush memory transactions (to the device memory). This is
  needed by the communications library in case of doing p2p with a host NIC
  which access our HBM directly through the PCI BAR.

- Add uAPI to pass-through a request from user-space to firmware and get the
  result back to user-space. This will allow the driver code to avoid the
  need to add new packet (in the communication channel with the firmware) for
  every new request type.

- Remove the option to export dma-buf by memory allocation handle in our uAPI.
  This was planned for Gaudi2 but was never used. Instead, we will do export
  by memory address (same as Gaudi1). In addition, we added the option to
  specify an offset to the address. This is needed in Gaudi2 because there
  the user allocates the entire HBM in one allocation, but would like to
  export only small part of it.

- Multiple bug fixes, refactors and small optimizations.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Oded Gabbay <ogabbay@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230126213317.GA1520525@ogabbay-vm-u20.habana-labs.com
19 months agoMerge tag 'drm-misc-next-2023-01-26' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Mon, 30 Jan 2023 01:26:08 +0000 (11:26 +1000)]
Merge tag 'drm-misc-next-2023-01-26' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for v6.3:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:

 * fbdev-helper: Streamline code in generic fbdev and its helpers

 * TTM: Fixes plus their reverts

Driver Changes:

 * accel/ivpu: Typo fixes

 * i915: TTM-related fixes

 * nouveau: Remove unused return value from disable helper

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/Y9I2nOzHxTxPeTjg@linux-uq9g
19 months agoamdgpu: fix build on non-DCN platforms.
Dave Airlie [Fri, 27 Jan 2023 02:15:13 +0000 (12:15 +1000)]
amdgpu: fix build on non-DCN platforms.

This fixes the build here locally on my 32-bit arm build.

Signed-off-by: Dave Airlie <airlied@redhat.com>
(cherry picked from commit f439a959dcfb6b39d6fd4b85ca1110a1d1de1587)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
19 months agodrm/tegra: nvdec: Use tegra_dev_iommu_get_stream_id()
Thierry Reding [Thu, 17 Nov 2022 12:34:23 +0000 (13:34 +0100)]
drm/tegra: nvdec: Use tegra_dev_iommu_get_stream_id()

Use the newly implemented tegra_dev_iommu_get_stream_id() helper to
encapsulate and centralize the IOMMU stream ID access.

Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agodrm/tegra: vic: Use tegra_dev_iommu_get_stream_id()
Thierry Reding [Thu, 17 Nov 2022 12:35:20 +0000 (13:35 +0100)]
drm/tegra: vic: Use tegra_dev_iommu_get_stream_id()

Use the newly implemented tegra_dev_iommu_get_stream_id() helper to
encapsulate and centralize the IOMMU stream ID access.

Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agodrm/tegra: Use tegra_dev_iommu_get_stream_id()
Thierry Reding [Thu, 17 Nov 2022 12:34:50 +0000 (13:34 +0100)]
drm/tegra: Use tegra_dev_iommu_get_stream_id()

Use the newly implemented tegra_dev_iommu_get_stream_id() helper to
encapsulate and centralize the IOMMU stream ID access.

Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agogpu: host1x: Use tegra_dev_iommu_get_stream_id()
Thierry Reding [Thu, 17 Nov 2022 12:36:19 +0000 (13:36 +0100)]
gpu: host1x: Use tegra_dev_iommu_get_stream_id()

Use the newly implemented tegra_dev_iommu_get_stream_id() helper to
encapsulate and centralize the IOMMU stream ID access.

Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agodrm/tegra: Remove #ifdef guards for PM related functions
Paul Cercueil [Tue, 29 Nov 2022 19:19:36 +0000 (19:19 +0000)]
drm/tegra: Remove #ifdef guards for PM related functions

Use the RUNTIME_PM_OPS() and pm_ptr() macros to handle the
.runtime_suspend/.runtime_resume callbacks.

These macros allow the suspend and resume functions to be automatically
dropped by the compiler when CONFIG_PM is disabled, without having
to use #ifdef guards.

This has the advantage of always compiling these functions in,
independently of any Kconfig option. Thanks to that, bugs and other
regressions are subsequently easier to catch.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agodrm/tegra: Remove redundant null checks before kfree
Yushan Zhou [Tue, 29 Nov 2022 09:45:46 +0000 (17:45 +0800)]
drm/tegra: Remove redundant null checks before kfree

Fix the following coccicheck warning:
./drivers/gpu/drm/tegra/submit.c:689:2-7: WARNING:
NULL check before some freeing functions is not needed.

Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agogpu: host1x: External timeout/cancellation for fences
Mikko Perttunen [Thu, 19 Jan 2023 13:09:21 +0000 (15:09 +0200)]
gpu: host1x: External timeout/cancellation for fences

Currently all fences have a 30 second timeout to ensure they are
cleaned up if the fence never completes otherwise. However, this
one size fits all solution doesn't actually fit in every case,
such as syncpoint waiting where we want to be able to have timeouts
longer than 30 seconds. As such, we want to be able to give control
over fence cancellation to the caller (and maybe eventually get rid
of the internal timeout altogether).

Here we add this cancellation mechanism by essentially adding a
function for entering the timeout path by function call, and changing
the syncpoint wait function to use it.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agogpu: host1x: Rewrite syncpoint interrupt handling
Mikko Perttunen [Thu, 19 Jan 2023 13:09:20 +0000 (15:09 +0200)]
gpu: host1x: Rewrite syncpoint interrupt handling

Move from the old, complex intr handling code to a new implementation
based on dma_fences. While there is a fair bit of churn to get there,
the new implementation is much simpler and likely faster as well due
to allowing signaling directly from interrupt context.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agogpu: host1x: Implement job tracking using DMA fences
Mikko Perttunen [Thu, 19 Jan 2023 13:09:19 +0000 (15:09 +0200)]
gpu: host1x: Implement job tracking using DMA fences

In anticipation of removal of the intr API, implement job tracking
using DMA fences instead. The main two things about this are
making cdma_update schedule the work since fence completion can
now be called from interrupt context, and some complication in
ensuring the callback is not running when we free the fence.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agogpu: host1x: Implement syncpoint wait using DMA fences
Mikko Perttunen [Thu, 19 Jan 2023 13:09:18 +0000 (15:09 +0200)]
gpu: host1x: Implement syncpoint wait using DMA fences

In anticipation of removal of the intr API, move host1x_syncpt_wait
to use DMA fences instead. As of this patch, this means that waits
have a 30 second maximum timeout because of the implicit timeout
we have with fences, but that will be lifted in a follow-up patch.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agodrm/tegra: firewall: Check for is_addr_reg existence in IMM check
Mikko Perttunen [Thu, 19 Jan 2023 13:39:01 +0000 (15:39 +0200)]
drm/tegra: firewall: Check for is_addr_reg existence in IMM check

In the IMM opcode check, don't call is_addr_reg if it's not set.

Fixes: 8cc95f3fd35e ("drm/tegra: Add job firewall")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agogpu: host1x: Don't skip assigning syncpoints to channels
Mikko Perttunen [Thu, 19 Jan 2023 13:39:00 +0000 (15:39 +0200)]
gpu: host1x: Don't skip assigning syncpoints to channels

The code to write the syncpoint channel assignment register
incorrectly skips the write if hypervisor registers are not available.

The register, however, is within the guest aperture so remove the
check and assign syncpoints properly even on virtualized systems.

Fixes: c3f52220f276 ("gpu: host1x: Enable Tegra186 syncpoint protection")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agogpu: host1x: Fix mask for syncpoint increment register
Mikko Perttunen [Thu, 19 Jan 2023 13:38:59 +0000 (15:38 +0200)]
gpu: host1x: Fix mask for syncpoint increment register

On Tegra186+, the syncpoint ID has 10 bits of space. To allow
using more than 256 syncpoints, fix the mask.

Fixes: 9abdd497cd0a ("gpu: host1x: Tegra234 device data and headers")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agoMAINTAINERS: Update Tegra DRM tree
Thierry Reding [Thu, 26 Jan 2023 11:41:17 +0000 (12:41 +0100)]
MAINTAINERS: Update Tegra DRM tree

The Tegra DRM tree moved to freedesktop.org's gitlab a few releases ago,
so update the MAINTAINERS entry accordingly.

Signed-off-by: Thierry Reding <treding@nvidia.com>
19 months agodrm/i915/mtl: Apply Wa_14013475917 for all MTL steppings
Jouni Högander [Tue, 24 Jan 2023 10:26:36 +0000 (12:26 +0200)]
drm/i915/mtl: Apply Wa_14013475917 for all MTL steppings

Wa_14013475917 has to be applied for all MTL steppings.

Bspec: 66624

Cc: Mika Kahola <mika.kahola@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230124102636.2567292-3-jouni.hogander@intel.com
19 months agodrm/i915/psr: Implement Wa_14014971492
Jouni Högander [Tue, 24 Jan 2023 10:26:35 +0000 (12:26 +0200)]
drm/i915/psr: Implement Wa_14014971492

Implement Wa_14014971492 and apply it for affected platforms.

Bspec: 52890, 54369, 55378, 66624

v2: Adjust platforms where applied

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230124102636.2567292-2-jouni.hogander@intel.com
19 months agodrm/i915/panel: move panel fixed EDID to struct intel_panel
Jani Nikula [Wed, 25 Jan 2023 11:10:52 +0000 (13:10 +0200)]
drm/i915/panel: move panel fixed EDID to struct intel_panel

It's a bit confusing to have two cached EDIDs in struct intel_connector
with slightly different purposes. Make the distinction a bit clearer by
moving the EDID cached for eDP and LVDS panels at connector init time to
struct intel_panel, and name it fixed_edid. That's what it is, a fixed
EDID for the panels.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/328350ef918638928a8286cdbab3107c8258332d.1674643465.git.jani.nikula@intel.com
19 months agodrm/i915/opregion: convert intel_opregion_get_edid() to struct drm_edid
Jani Nikula [Wed, 25 Jan 2023 11:10:51 +0000 (13:10 +0200)]
drm/i915/opregion: convert intel_opregion_get_edid() to struct drm_edid

Simplify validation and use by converting to drm_edid.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6abb01f1e97d54a3c11bec24377f035df412b492.1674643465.git.jani.nikula@intel.com
19 months agodrm/i915/bios: convert intel_bios_init_panel() to drm_edid
Jani Nikula [Wed, 25 Jan 2023 11:10:50 +0000 (13:10 +0200)]
drm/i915/bios: convert intel_bios_init_panel() to drm_edid

Try to use struct drm_edid where possible, even if having to fall back
to looking into struct edid down low via drm_edid_raw().

v2: Rebase

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/897807d62f74f690a173ecd405e25c6ccdd63b98.1674643465.git.jani.nikula@intel.com
19 months agodrm/i915/edid: convert DP, HDMI and LVDS to drm_edid
Jani Nikula [Wed, 25 Jan 2023 11:10:49 +0000 (13:10 +0200)]
drm/i915/edid: convert DP, HDMI and LVDS to drm_edid

Convert all the connectors that use cached connector edid and
detect_edid to drm_edid.

Since drm_get_edid() calls drm_connector_update_edid_property() while
drm_edid_read*() do not, we need to call drm_edid_connector_update()
separately, in part due to the EDID caching behaviour in HDMI and
DP. Especially DP depends on the details parsed from EDID. (The big
behavioural change conflating EDID reading with parsing and property
update was done in commit 5186421cbfe2 ("drm: Introduce epoch counter to
drm_connector"))

v6: Rebase on drm_edid_connector_add_modes()

v5: Fix potential uninitialized var use (kernel test robot <lkp@intel.com>)

v4: Call drm_edid_connector_update() after reading HDMI/DP EDID

v3: Don't leak vga switcheroo EDID in LVDS init (Ville)

v2: Don't leak opregion fallback EDID (Ville)

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/eabb4de932841b38b34cc2818ea9fbf7c10224fd.1674643465.git.jani.nikula@intel.com
19 months agohabanalabs: Fix list of /sys/class/habanalabs/hl<n>/status
Bagas Sanjaya [Fri, 20 Jan 2023 12:35:33 +0000 (19:35 +0700)]
habanalabs: Fix list of /sys/class/habanalabs/hl<n>/status

Stephen Rothwell reported htmldocs warnings when merging accel tree:

Documentation/ABI/testing/sysfs-driver-habanalabs:201: ERROR: Unexpected indentation.
Documentation/ABI/testing/sysfs-driver-habanalabs:201: WARNING: Block quote ends without a blank line; unexpected unindent.
Documentation/ABI/testing/sysfs-driver-habanalabs:201: ERROR: Unexpected indentation.
Documentation/ABI/testing/sysfs-driver-habanalabs:201: WARNING: Block quote ends without a blank line; unexpected unindent.

Fix these by fixing alignment of list of card status returned by
/sys/class/habanalabs/hl<n>/status.

Link: https://lore.kernel.org/linux-next/20230120130634.61c3e857@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agoDocumentation: accel: escape wildcard in special file path
Bagas Sanjaya [Fri, 20 Jan 2023 12:35:32 +0000 (19:35 +0700)]
Documentation: accel: escape wildcard in special file path

Stephen Rothwell reported htmldocs warning then merging accel tree:

Documentation/accel/introduction.rst:72: WARNING: Inline emphasis start-string without end-string.

Sphinx confuses the file wildcards with inline emphasis (italics), hence
the warning.

Fix the warning by escaping wildcards.

Link: https://lore.kernel.org/linux-next/20230120132116.21de1104@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agodocs: accel: Fix debugfs path
Jeffrey Hugo [Thu, 19 Jan 2023 16:26:08 +0000 (09:26 -0700)]
docs: accel: Fix debugfs path

The device specific directory in debugfs does not have "accel".  For
example, the documentation says device 0 should have a debugfs entry as
/sys/kernel/debug/accel/accel0/ but in reality the entry is
/sys/kernel/debug/accel/0/

Fix the documentation to match the implementation.

Fixes: 8c5577a5ccc6 ("doc: add documentation for accel subsystem")
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: find decode error root cause
Koby Elbaz [Sun, 15 Jan 2023 10:38:53 +0000 (12:38 +0200)]
habanalabs/gaudi2: find decode error root cause

When a decode error happens, we often don't know the exact root
cause (the erroneous address that was accessed) and the exact engine
that created the erroneous transaction.

To find out, we need to go over all the relevant register blocks
in the ASIC. Once we find the relevant engine, we print its details
and the offending address.

This helps tremendously when debugging an error that was created
by running a user workload.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: unsecure tpc kernel_config registers
Ofir Bitton [Wed, 18 Jan 2023 07:36:06 +0000 (09:36 +0200)]
habanalabs/gaudi2: unsecure tpc kernel_config registers

This is required in order to allow the kernel to control relevant
configuration space via load and store instructions.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: clear in_compute_reset when escalating to hard reset
Tomer Tayar [Tue, 17 Jan 2023 17:45:24 +0000 (19:45 +0200)]
habanalabs: clear in_compute_reset when escalating to hard reset

If resetting device upon release while the release watchdog work is
scheduled, the compute reset is replaced with hard reset.
In this case, need to clear the in_compute_reset indication in the
device reset information structure.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: run error handling if scrub_device_mem fails after reset
Tomer Tayar [Tue, 17 Jan 2023 13:50:41 +0000 (15:50 +0200)]
habanalabs: run error handling if scrub_device_mem fails after reset

If device memory scrubbing from hl_device_reset() fails, we return with
an error code but not perform error handling code.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: enhance info printed on FW load errors
Moti Haimovski [Tue, 3 Jan 2023 08:28:24 +0000 (10:28 +0200)]
habanalabs: enhance info printed on FW load errors

This commit enhances the following error messages to also provide the
type of error occurred, this in order to ease debugging of errors
detected during firmware-load.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: optimize command submission completion timestamp
Ofir Bitton [Tue, 10 Jan 2023 09:41:39 +0000 (11:41 +0200)]
habanalabs: optimize command submission completion timestamp

Completion timestamp is taken during the actual command submission
release. As the release happens in a work queue, the timestamp taken
is not accurate. Hence, we will take the timestamp in the interrupt
handler itself while propagating it to the release function.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: refactor user interrupt type
Ofir Bitton [Mon, 16 Jan 2023 18:20:22 +0000 (20:20 +0200)]
habanalabs: refactor user interrupt type

In order to support more user interrupt types in the future, we
enumerate the user interrupt type instead of using a boolean.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: fix emda range registers razwi handling
Dani Liberman [Mon, 16 Jan 2023 10:00:05 +0000 (12:00 +0200)]
habanalabs/gaudi2: fix emda range registers razwi handling

Handling edma razwi is different than all other engines since edma
uses sft routers. For hbw transactions sft router contain separate
interface for each edma and for lbw there is common interface for
both edma engines of the same dcore.

To handle the razwi correctly we need to:
1. Simplify the calculation of the sft router address.
2. Add razwi handling for edma qm errors, since edma qman doesn't
   reports axi error response.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: block soft-reset on an unusable device
Koby Elbaz [Wed, 11 Jan 2023 13:43:21 +0000 (15:43 +0200)]
habanalabs: block soft-reset on an unusable device

A device with status malfunction indicates that it can't be used.
In such a case we do not support certain reset types, e.g.,
all kinds of soft-resets (compute reset, inference soft-reset),
and reset upon device release.

A hard-reset is the only way that an unusable device can change its
status. All other reset procedures can't put the device in a reset
procedure, which might ultimately cause the device to change its
status, unintentionally, to become operational again.

Such a scenario has recently occurred, when a user requested
a hard-reset while another heavy user workload was ongoing (reset
request is queued).
Since the workload couldn't finish within reset's timeout limits, the
reset has failed and set a device status malfunction.
Eventually, when the user released the FD, an unsuccessful soft-reset
occurred, hence followed by an additional hard-reset that changed the
ASICs status back to be operational.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: print page fault axi transaction id
Dani Liberman [Tue, 10 Jan 2023 13:48:36 +0000 (15:48 +0200)]
habanalabs/gaudi2: print page fault axi transaction id

AXI transaction id holds information about the initiator which caused
the page fault. In the future it will be translated automatically by
driver to an initiator name.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: update device status sysfs documentation
Ofir Bitton [Tue, 10 Jan 2023 19:43:28 +0000 (21:43 +0200)]
habanalabs: update device status sysfs documentation

As device status was changed recently, we must update the
documentation as well.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agoaccel: Add .mmap to DRM_ACCEL_FOPS
Jeffrey Hugo [Tue, 17 Jan 2023 17:45:58 +0000 (10:45 -0700)]
accel: Add .mmap to DRM_ACCEL_FOPS

In reviewing the ivpu driver, DEFINE_DRM_ACCEL_FOPS could have been used
if DRM_ACCEL_FOPS defined .mmap to be drm_gem_mmap.  Lets add that since
accel drivers are a variant of drm drivers, modern drm drivers are
expected to use GEM, and mmap() is a common operation that is expected
to be heavily used in accel drivers thus the common accel driver should
be able to just use DEFINE_DRM_ACCEL_FOPS() for convenience.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agoMAINTAINERS/ACCEL: Add include/drm/drm_accel.h to the accel entry
Jeffrey Hugo [Tue, 17 Jan 2023 18:04:30 +0000 (11:04 -0700)]
MAINTAINERS/ACCEL: Add include/drm/drm_accel.h to the accel entry

get_maintainer.pl does not suggest Oded Gabbay, the DRM COMPUTE
ACCELERATORS DRIVERS AND FRAMEWORK maintainer for changes that touch
the Accel Subsystem header - drm_accel.h.  This is because that file is
missing from the Accel Subsystem entry.  Fix this.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabe/gaudi2: add cfg base when displaying razwi addresses
Dani Liberman [Wed, 11 Jan 2023 08:50:09 +0000 (10:50 +0200)]
habanalabe/gaudi2: add cfg base when displaying razwi addresses

Captured addresses of low b/w razwi information contains only the
offset from the cfg base. To make it more user readable, add the cfg
base to it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: read mmio razwi information
Dani Liberman [Thu, 5 Jan 2023 15:12:28 +0000 (17:12 +0200)]
habanalabs/gaudi2: read mmio razwi information

In gaudi2 there night be different routers for low b/w and high b/w
transactions. But in the code that collects razwi information, we used
the same router for high b/w and low b/w.

Fixed it by reading the information also from low b/w routers.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: fix bug in timestamps registration code
farah kassabri [Tue, 10 Jan 2023 10:29:55 +0000 (12:29 +0200)]
habanalabs: fix bug in timestamps registration code

Protect re-using the same timestamp buffer record before actually
adding it to the to interrupt wait list.
Mark ts buff offset as in use in the spinlock protection area of the
interrupt wait list to avoid getting in the re-use section in
ts_buff_get_kernel_ts_record before adding the node to the list.
this scenario might happen when multiple threads are racing on
same offset and one thread could set data in the ts buff in
ts_buff_get_kernel_ts_record then the other thread takes over
and get to ts_buff_get_kernel_ts_record and we will try
to re-use the same ts buff offset then we will try to
delete a non existing node from the list.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: bugs fixes in timestamps buff alloc
farah kassabri [Sun, 8 Jan 2023 15:33:44 +0000 (17:33 +0200)]
habanalabs: bugs fixes in timestamps buff alloc

use argument instead of fixed GFP value for allocation
in Timestamps buffers alloc function.
change data type of size to size_t.

Fixes: 9158bf69e74f ("habanalabs: Timestamps buffers registration")
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: check pad and reserved fields in ioctls
farah kassabri [Tue, 3 Jan 2023 12:23:55 +0000 (14:23 +0200)]
habanalabs: check pad and reserved fields in ioctls

Make sure all reserved/pad fields in uapi input structures
are set to 0.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: remove unnecessary (void*) conversions
XU pengfei [Tue, 10 Jan 2023 10:35:13 +0000 (18:35 +0800)]
habanalabs: remove unnecessary (void*) conversions

data is a void * type and does not require a cast.

Signed-off-by: XU pengfei <xupengfei@nfschina.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: Replace zero-length arrays with flexible-array members
Gustavo A. R. Silva [Tue, 10 Jan 2023 01:39:47 +0000 (19:39 -0600)]
habanalabs: Replace zero-length arrays with flexible-array members

Zero-length arrays are deprecated[1] and we are moving towards
adopting C99 flexible-array members instead. So, replace zero-length
arrays in a couple of structures with flex-array members.

This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [2].

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html
Link: https://github.com/KSPP/linux/issues/78
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: extend fatal messages to contain PCI info
Moti Haimovski [Thu, 29 Dec 2022 10:44:09 +0000 (12:44 +0200)]
habanalabs: extend fatal messages to contain PCI info

This commit attaches the PCI device address to driver fatal messages
in order to ease debugging in multi-device setups.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: remove use of razwi info received from f/w
Dani Liberman [Tue, 3 Jan 2023 22:05:03 +0000 (00:05 +0200)]
habanalabs/gaudi2: remove use of razwi info received from f/w

Because f/w does not update razwi info when sending events, remove the
use of it.
The driver is responsible to check if razwi happened and to
collect razwi data.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: trace LBW reads/writes
Ohad Sharabi [Wed, 30 Nov 2022 12:41:49 +0000 (14:41 +0200)]
habanalabs: trace LBW reads/writes

Add traces to LBW reads/writes.
This may be handy when debugging configuration failure or events when
tracking configuration flow.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: define events to trace PCI LBW access
Ohad Sharabi [Wed, 30 Nov 2022 12:02:00 +0000 (14:02 +0200)]
habanalabs: define events to trace PCI LBW access

There are cases where it may be useful to dump the whole LBW configs.
Yet, doing so while spamming the kernel log will probably shade other
important messages since the LBW access is done in sheer volume.
To answer this we add trace events for those too.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: fix log for sob value overflow/underflow
Carmit Carmel [Wed, 4 Jan 2023 09:13:01 +0000 (11:13 +0200)]
habanalabs/gaudi2: fix log for sob value overflow/underflow

The value in SM_SEI_CAUSE includes the SOB index and not the SOB group
index.
Remove usage of log_mask in sm_sei_cause structure as it was never
used.

Signed-off-by: Carmit Carmel <ccarmel@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: add set engines masks ASIC function
Ohad Sharabi [Mon, 2 Jan 2023 14:44:28 +0000 (16:44 +0200)]
habanalabs: add set engines masks ASIC function

This function shall be used whenever components enable/binning masks
should be updated.

Usage is in one of the below cases:
- update user (or default) component masks
- update when getting the masks from FW (either CPUCP or COMMS)

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: protect access to dynamic mem 'user_mappings'
Koby Elbaz [Fri, 23 Dec 2022 13:02:05 +0000 (15:02 +0200)]
habanalabs: protect access to dynamic mem 'user_mappings'

When HL_INFO_USER_MAPPINGS IOCTL is called, we copy_to_user from
a dynamically allocated memory - 'user_mappings'.
Since freeing/allocating it happens in runtime (upon a page fault),
it not unlikely to access it even before being initially allocated
(i.e., accessing a NULL pointer).

The solution is to simply mark the spot when the err info has been
collected, and that way to know whether err info (either page fault
or RAZWI) is available to be read.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: remove redundant memset
Tom Rix [Sat, 7 Jan 2023 18:48:27 +0000 (13:48 -0500)]
habanalabs: remove redundant memset

From reviewing the code, the line
  memset(kdata, 0, usize);
is not needed because kdata is either zeroed by
  kdata = kzalloc(asize, GFP_KERNEL);
when allocated at runtime or by
  char stack_kdata[128] = {0};
at compile time.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: refactor razwi/page-fault information structures
Koby Elbaz [Sun, 25 Dec 2022 10:43:04 +0000 (12:43 +0200)]
habanalabs: refactor razwi/page-fault information structures

This refactor makes the code clearer and the new variables' names
better describe their roles.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: avoid reconfiguring the same PB registers
Koby Elbaz [Wed, 21 Dec 2022 15:49:42 +0000 (17:49 +0200)]
habanalabs/gaudi2: avoid reconfiguring the same PB registers

It appears that, within the sync manager security configuration,
we reconfigure PB registers over and over without any need to do that.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi: allow device acquire while in debug mode
Ofir Bitton [Sun, 25 Dec 2022 14:27:24 +0000 (16:27 +0200)]
habanalabs/gaudi: allow device acquire while in debug mode

During device acquire, the driver is using a QMAN for clearing some
registers. In order to avoid internal races, the driver verifies
the device is idle before submitting the register clear job.

This check introduces an issue, as debug mode will cause the device
to be non-idle which will lead to device acquire failure.

In order to overcome this issue we can entirely remove the idle
check as the driver is using the QMAN only when there is no active
context.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: move some prints to debug level
Oded Gabbay [Thu, 22 Dec 2022 10:28:54 +0000 (12:28 +0200)]
habanalabs: move some prints to debug level

When entering an IOCTL, the driver prints a message in case device is
not operational. This message should be printed in debug level as
it can spam the kernel log and it is not an error.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: update f/w files
Oded Gabbay [Wed, 21 Dec 2022 10:51:13 +0000 (12:51 +0200)]
habanalabs: update f/w files

Update common firmware files with the latest version.
There is no functional change.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: update f/w files
Oded Gabbay [Wed, 21 Dec 2022 10:18:55 +0000 (12:18 +0200)]
habanalabs/gaudi2: update f/w files

Update gaudi2 firmware files with the latest version.
There is no functional change.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: update asic register files
Oded Gabbay [Wed, 21 Dec 2022 09:55:54 +0000 (11:55 +0200)]
habanalabs/gaudi2: update asic register files

Update some register files with the latest h/w auto-generated files.
There is no functional change.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: verify that kernel CB is destroyed only once
Tomer Tayar [Tue, 6 Dec 2022 17:54:10 +0000 (19:54 +0200)]
habanalabs: verify that kernel CB is destroyed only once

Remove the distinction between user CB and kernel CB, and verify for
both that they are not destroyed more than once.

As kernel CB might be taken from the pre-allocated CB pool, so we need
to clear the handle destroyed indication when returning a CB to the
pool.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: add uapi to flush inbound HBM transactions
Ohad Sharabi [Sun, 18 Dec 2022 07:42:34 +0000 (09:42 +0200)]
habanalabs: add uapi to flush inbound HBM transactions

When doing p2p with a NIC device, the NIC needs to make sure all the
writes to the HBM (through the PCI bar of the Gaudi device) were
flushed.

It can be done by either the NIC or the host reading through the PCI
bar.

To support the host side, we supply a simple uapi to perform this flush
through the driver, because the user can't create such a transaction
by itself (the PCI bar isn't exposed to normal users).

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: move driver to accel subsystem
Oded Gabbay [Mon, 26 Dec 2022 21:05:00 +0000 (23:05 +0200)]
habanalabs: move driver to accel subsystem

Now that we have a subsystem for compute accelerators, move the
habanalabs driver to it.

This patch only moves the files and fixes the Makefiles. Future
patches will change the existing code to register to the accel
subsystem and expose the accel device char files instead of the
habanalabs device char files.

Update the MAINTAINERS file to reflect this change.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/uapi: move uapi file to drm
Oded Gabbay [Tue, 20 Dec 2022 12:12:19 +0000 (14:12 +0200)]
habanalabs/uapi: move uapi file to drm

Move the habanalabs.h uapi file from include/uapi/misc to
include/uapi/drm, and rename it to habanalabs_accel.h.

This is required before moving the actual driver to the accel
subsystem.

Update MAINTAINERS file accordingly.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: fix dma-buf release handling if dma_buf_fd() fails
Tomer Tayar [Thu, 15 Dec 2022 14:36:53 +0000 (16:36 +0200)]
habanalabs: fix dma-buf release handling if dma_buf_fd() fails

The dma-buf private object is freed if a call to dma_buf_fd() fails,
and because a file was already associated with the dma-buf in
dma_buf_export(), the release op will be called and will use this
object.

Mark the 'priv' field as NULL in this case, and avoid accessing it from
the release op.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: dump event description even if no cause
Ofir Bitton [Wed, 14 Dec 2022 14:52:05 +0000 (16:52 +0200)]
habanalabs/gaudi2: dump event description even if no cause

In order to have the no-cause error print be more informative,
we add the event description in addition to the event id.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: pass-through request from user to f/w
farah kassabri [Wed, 16 Nov 2022 13:40:30 +0000 (15:40 +0200)]
habanalabs: pass-through request from user to f/w

Add a uAPI, as part of the INFO IOCTL, to allow users to send
requests directly to f/w, according to a pre-defined set of opcodes
that the f/w exposes.

The f/w will put the result in a kernel-allocated buffer, which the
driver will then copy to the user-supplied buffer.

This will allow f/w tools to communicate directly with the f/w
without the need to add a new uAPI to the driver for each new type
of request.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: support receiving ascii message from preboot f/w
Tal Cohen [Thu, 1 Dec 2022 14:37:30 +0000 (16:37 +0200)]
habanalabs: support receiving ascii message from preboot f/w

An Ascii message that is sent from preboot towards the driver
will indicate the specific error that occurred on the f/w.
This commit supports that message and parse the ascii string
in order to print it into the kernel log

The commit also changes the way the descriptor struct is declared.
While its size increased (it now above 1024 bytes), it will be
allocated by using kmalloc instead of stack declaration.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: fix asic-specific functions documentation
Ohad Sharabi [Thu, 8 Dec 2022 13:19:10 +0000 (15:19 +0200)]
habanalabs: fix asic-specific functions documentation

- Add missing documentation of set DRAM props
- fix typo

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: fix wrong variable type used for vzalloc
farah kassabri [Mon, 28 Nov 2022 11:11:44 +0000 (13:11 +0200)]
habanalabs: fix wrong variable type used for vzalloc

vzalloc expects void* and not void __iomem*.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: wait for preboot ready if HW state is dirty
Ohad Sharabi [Wed, 30 Nov 2022 12:26:10 +0000 (14:26 +0200)]
habanalabs/gaudi2: wait for preboot ready if HW state is dirty

Instead of waiting for BTM indication we should wait for preboot ready.
Consider the below scenario:
    1. FW update is being triggered
           - setting the dirty bit
    2. hard reset will be triggered due to the dirty bit
    3. FW initiates the reset:
           - dirty bit cleared
           - BTM indication cleared
           - preboot ready indication cleared
    4. during hard reset:
           - BTM indication will be set
           - BIST test performed and another reset triggered
    5. only after this reset the preboot will set the preboot ready

When polling on BTM indication alone we can lose sync with FW while
trying to communicate with FW that is during reset.
To overcome this we will always wait to preboot ready indication.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: put fences in case of unexpected wait status
Tomer Tayar [Sun, 4 Dec 2022 21:23:47 +0000 (23:23 +0200)]
habanalabs: put fences in case of unexpected wait status

Need to put fences even if an unexpected status value is received while
waiting for a fence.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: fix handling of wait CS for interrupting signals
Tomer Tayar [Sun, 4 Dec 2022 20:09:08 +0000 (22:09 +0200)]
habanalabs: fix handling of wait CS for interrupting signals

The -ERESTARTSYS return value is not handled correctly when a signal is
received while waiting for CS completion.
This can lead to bad output values to user when waiting for a single CS
completion, and more severe, it can cause a non-stopping loop when
waiting to multi-CS completion and until a CS timeout.

Fix the handling and exit the waiting if this return value is received.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: fix dmabuf to export only required size
Ohad Sharabi [Thu, 10 Nov 2022 11:43:02 +0000 (13:43 +0200)]
habanalabs: fix dmabuf to export only required size

This patch fixes a bug that was found in the dmabuf flow.
Bug description as found on Gaudi2 device:
1. User allocates 4MB of device memory
    - Note that although the allocation size was 4MB the HMMU allocated
      a full page of 768MB to back the request.
    - The user gets a memory handle that points to a single page (768MB)
    - Mapping the handle, the user gets virtual address to the start of
      the page.
2. User exports the buffer
3. User registers the exported buffer in the importer. This flow has
   a callback to the exporter which in turn converts the phys_page_pack
   to an SG list for the importer. This SG list is of single entry of
   size 768MB. However, the size that was passed to the importer was
   only 4MB.

The solution for this is to make sure the importer gets exposure only
to the exported size.

This will be done by fixing the SG created by the exporter to be of
the total size of the actual exported memory requested by the user.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: modify export dmabuf API
Ohad Sharabi [Mon, 14 Nov 2022 10:16:37 +0000 (12:16 +0200)]
habanalabs: modify export dmabuf API

A previous commit deprecated the option to export from handle, leaving
the code with no support for devices with virtual memory.

This commit modifies the export API in a way that unifies the uAPI to
user address for both cases (i.e. with and without MMU support) and add
the actual support for devices with virtual memory.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: helper function to validate export params
Ohad Sharabi [Tue, 29 Nov 2022 07:13:34 +0000 (09:13 +0200)]
habanalabs: helper function to validate export params

Validate export parameters in a dedicated function instead of in the
main export flow.
This will be useful later when support to export dmabuf for devices
with virtual memory will be added.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: remove support to export dmabuf from handle
Ohad Sharabi [Tue, 29 Nov 2022 12:02:07 +0000 (14:02 +0200)]
habanalabs: remove support to export dmabuf from handle

The API to the user which allows exporting DMA buffer from handle is
deprecated here. It was never used as it is relevant only for Gaudi2,
and the user stack has yet to add support for dmabuf in Gaudi2.

Looking forward, a modified API to export DMA buffer for ASICs that
supports virtual memory will be added.

Until the new API will be ready- exporting DMA buffer will not be
supported for ASICs with virtual memory.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: set log level for descriptor validation to debug
farah kassabri [Tue, 29 Nov 2022 13:37:55 +0000 (15:37 +0200)]
habanalabs: set log level for descriptor validation to debug

This warning doesn't have real consequences, and therefore can be
printed in debug level.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: trace COMMS protocol
Ohad Sharabi [Wed, 30 Nov 2022 09:31:39 +0000 (11:31 +0200)]
habanalabs: trace COMMS protocol

Call COMMS tracepoints from within the dynamic CPU FW load.
This can help debug failures or delays in the dynamic FW load flow.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: define traces for COMMS protocol
Ohad Sharabi [Wed, 30 Nov 2022 09:16:51 +0000 (11:16 +0200)]
habanalabs: define traces for COMMS protocol

As the COMMS protocol is being used more widely in our driver,
an available debug tool for the handshake will be handy.

This commit defines tracepoints to various key points of the COMMS
protocol.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: support abrupt device reset event
Ofir Bitton [Wed, 30 Nov 2022 12:35:32 +0000 (14:35 +0200)]
habanalabs/gaudi2: support abrupt device reset event

In certain scenarios, firmware might encounter a fatal event for
which a device reset is required. Hence, a proper notification
is needed for driver to be aware and initiate a reset sequence.

In secured environments the reset will be performed by firmware
without an explicit request from the driver.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: skip device idle check in hpriv_release if in reset
Tomer Tayar [Wed, 30 Nov 2022 10:07:06 +0000 (12:07 +0200)]
habanalabs: skip device idle check in hpriv_release if in reset

When user context is released and hpriv_release() is called, there is a
device idle status check, to understand if user has left the device not
idle and then a reset is required.

However, if the user process is killed because of device hard reset,
the device at this point would always be not idle, because the device
engines were already forcefully halted.

Modify hpriv_release() to skip the idle check if reset is in progress.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: adjacent timestamps should be more accurate
Tamir Gilad-Raz [Sun, 6 Nov 2022 09:22:16 +0000 (11:22 +0200)]
habanalabs: adjacent timestamps should be more accurate

timestamp events that expire on the same interrupt will get the same
timestamp value

Signed-off-by: Tamir Gilad-Raz <tgiladraz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: remove duplicated event prints
Ofir Bitton [Wed, 16 Nov 2022 15:27:26 +0000 (17:27 +0200)]
habanalabs/gaudi2: remove duplicated event prints

In order to reduce error log, we try to minimize the dumped rows
while keeping all relevant error info. In addition we completely
remove clock throttling debug logs.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: count interrupt causes
Ofir Bitton [Wed, 23 Nov 2022 09:03:17 +0000 (11:03 +0200)]
habanalabs/gaudi2: count interrupt causes

During event handling we extract interrupt cause and count it.
In case we could not find any cause we should add proper error.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: update DRAM props according to preboot data
Ohad Sharabi [Sun, 27 Nov 2022 10:46:23 +0000 (12:46 +0200)]
habanalabs: update DRAM props according to preboot data

If the f/w reports the binning masks at the preboot stage, the driver
must align its DRAM properties according to the new information.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: fix double assignment in MMU V1
Marco Pagani [Tue, 29 Nov 2022 11:52:17 +0000 (12:52 +0100)]
habanalabs: fix double assignment in MMU V1

Removing double assignment of the hop2_pte_addr
variable in dram_default_mapping_fini().

Dead store reported by clang-analyzer.

Signed-off-by: Marco Pagani <marpagan@redhat.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: make set_dram_properties an ASIC function
Ohad Sharabi [Sun, 27 Nov 2022 10:38:49 +0000 (12:38 +0200)]
habanalabs: make set_dram_properties an ASIC function

As ASICs are evolving, we will need to update the DRAM properties at
various points because we may get different information from the f/w
at different points of the initialization.

This ASIC function is a foundation for this capability.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: use dev_dbg() when hl_mmap_mem_buf_get() fails
Tomer Tayar [Thu, 24 Nov 2022 09:12:38 +0000 (11:12 +0200)]
habanalabs: use dev_dbg() when hl_mmap_mem_buf_get() fails

As hl_mmap_mem_buf_get() is called also from IOCTLs which can have a
bad handle from user, modify the print for "no match to handle" to use
dev_dbg().
Calls to this function which are not dependent on user, already have an
error print when the function fails.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: don't allow user to destroy CB handle more than once
Tomer Tayar [Wed, 23 Nov 2022 13:09:43 +0000 (15:09 +0200)]
habanalabs: don't allow user to destroy CB handle more than once

The refcount of a CB buffer is initialized when user allocates a CB,
and is decreased when he destroys the CB handle.

If this refcount is increased also from kernel and user sends more than
one destroy requests for the handle, the buffer will be released/freed
and later be accessed when the refcount is put from kernel side.

To avoid it, prevent user from destroying the handle more than once.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: don't notify user about clk throttling due to power
Ofir Bitton [Thu, 24 Nov 2022 09:01:44 +0000 (11:01 +0200)]
habanalabs: don't notify user about clk throttling due to power

As clock throttling due to high power consumption can happen very
frequently and there is no real reason to notify the user about it,
we skip this notification in all asics.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: abort waiting user threads upon error
Tomer Tayar [Tue, 8 Nov 2022 12:34:43 +0000 (14:34 +0200)]
habanalabs: abort waiting user threads upon error

User should close the FD when being notified about an error, after
which a device reset takes place.

However, if the user has pending threads that wait for completions,
the device release won't be called and eventually the watchdog timeout
will expire, leading to hard reset and killing the user process.

To avoid it, abort such waiting threads right after the error
notification, and block following waiting operations.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: remove releasing of user threads from device release
Tomer Tayar [Sun, 6 Nov 2022 18:29:18 +0000 (20:29 +0200)]
habanalabs: remove releasing of user threads from device release

The device file is not in use when hl_device_release() is called,
and there aren't any user threads that use IOCTLs to wait for
interrupts. Therefore there is no need to release them at this point.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs: read binning info from preboot
farah kassabri [Sun, 13 Nov 2022 15:44:17 +0000 (17:44 +0200)]
habanalabs: read binning info from preboot

Sometimes we need the binning info at a very early state of the
driver initialization. Therefore, support was added in preboot to
provide the binning info as part of the f/w descriptor and the driver
can now use that.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
19 months agohabanalabs/gaudi2: fix BMON 3rd address range
tal albo [Wed, 16 Nov 2022 20:54:24 +0000 (22:54 +0200)]
habanalabs/gaudi2: fix BMON 3rd address range

Fix programming incorrect value of address range

Signed-off-by: tal albo <talbo@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>