linux-2.6-block.git
5 hours agoMerge tag 'drm-fixes-2025-09-19' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 19 Sep 2025 15:13:46 +0000 (08:13 -0700)]
Merge tag 'drm-fixes-2025-09-19' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "Weekly fixes for drm, it's a bit busier than I'd like on the xe side
  this week, but otherwise amdgpu and some smaller fixes for i915/bridge
  and a revert on docs.

  docs:
   - fix docs build regression

  i915:
   - Honor VESA eDP backlight luminance control capability

  bridge:
   - anx7625: Fix NULL pointer dereference with early IRQ
   - cdns-mhdp8546: Fix missing mutex unlock on error path

  xe:
   - Release kobject for the failure path
   - SRIOV PF: Drop rounddown_pow_of_two fair
   - Remove type casting on hwmon
   - Defer free of NVM auxiliary container to device release
   - Fix a NULL vs IS_ERR
   - Add cleanup action in xe_device_sysfs_init
   - Fix error handling if PXP fails to start
   - Set GuC RCS/CCS yield policy

  amdgpu:
   - GC 11.0.1/4 cleaner shader support
   - DC irq fix
   - OD fix

  amdkfd:
   - S0ix fix"

* tag 'drm-fixes-2025-09-19' of https://gitlab.freedesktop.org/drm/kernel:
  drm/amdgpu: suspend KFD and KGD user queues for S0ix
  drm/amdkfd: add proper handling for S0ix
  drm/xe/guc: Set RCS/CCS yield policy
  drm/xe: Fix error handling if PXP fails to start
  drm/xe/sysfs: Add cleanup action in xe_device_sysfs_init
  drm/amd: Only restore cached manual clock settings in restore if OD enabled
  drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue()
  drm: bridge: cdns-mhdp8546: Fix missing mutex unlock on error path
  drm/i915/backlight: Honor VESA eDP backlight luminance control capability
  drm/amd/display: Allow RX6xxx & RX7700 to invoke amdgpu_irq_get/put
  drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.0.1/11.0.4 GPUs
  drm: bridge: anx7625: Fix NULL pointer dereference with early IRQ
  drm/xe: defer free of NVM auxiliary container to device release callback
  drm/xe/hwmon: Remove type casting
  drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation
  drm/xe/tile: Release kobject for the failure path
  Revert "drm: Add directive to format code in comment"

18 hours agoMerge tag 'amd-drm-fixes-6.17-2025-09-18' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Fri, 19 Sep 2025 01:53:08 +0000 (11:53 +1000)]
Merge tag 'amd-drm-fixes-6.17-2025-09-18' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.17-2025-09-18:

amdgpu:
- GC 11.0.1/4 cleaner shader support
- DC irq fix
- OD fix

amdkfd:
- S0ix fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250918191428.2553105-1-alexander.deucher@amd.com
19 hours agoMerge tag 'drm-xe-fixes-2025-09-18' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Fri, 19 Sep 2025 01:19:36 +0000 (11:19 +1000)]
Merge tag 'drm-xe-fixes-2025-09-18' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- Release kobject for the failure path (Shuicheng)
- SRIOV PF: Drop rounddown_pow_of_two fair (Michal)
- Remove type casting on hwmon (Mallesh)
- Defer free of NVM auxiliary container to device release (Nitin)
- Fix a NULL vs IS_ERR (Dan)
- Add cleanup action in xe_device_sysfs_init (Zongyao)
- Fix error handling if PXP fails to start (Daniele)
- Set GuC RCS/CCS yield policy (Daniele)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aMwL7vxFP1L94IML@intel.com
20 hours agoMerge tag 'drm-misc-fixes-2025-09-18' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Fri, 19 Sep 2025 00:27:57 +0000 (10:27 +1000)]
Merge tag 'drm-misc-fixes-2025-09-18' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

One fix for a documentation warning, a null pointer dereference fix for
anx7625, and a mutex unlock fix for cdns-mhdp8546

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250918-orthodox-pretty-puma-1ddeea@houat
22 hours agoMerge tag 'trace-rv-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Thu, 18 Sep 2025 22:22:00 +0000 (15:22 -0700)]
Merge tag 'trace-rv-v6.17-rc5' of git://git./linux/kernel/git/trace/linux-trace

Pull runtime verifier fixes from Steven Rostedt:

 - Fix build in some RISC-V flavours

   Some system calls only are available for the 64bit RISC-V machines.
   #ifdef out the cases of clock_nanosleep and futex in the sleep
   monitor if they are not supported by the architecture.

 - Fix wrong cast, obsolete after refactoring

   Use container_of() to get to the rv_monitor structure from the
   enable_monitors_next() 'p' pointer. The assignment worked only
   because the list field used happened to be the first field of the
   structure.

 - Remove redundant include files

   Some include files were listed twice. Remove the extra ones and sort
   the includes.

 - Fix missing unlock on failure

   There was an error path that exited the rv_register_monitor()
   function without releasing a lock. Change that to goto the lock
   release.

 - Add Gabriele Monaco to be Runtime Verifier maintainer

   Gabriele is doing most of the work on RV as well as collecting
   patches. Add him to the maintainers file for Runtime Verification.

* tag 'trace-rv-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Add Gabriele Monaco as maintainer for Runtime Verification
  rv: Fix missing mutex unlock in rv_register_monitor()
  include/linux/rv.h: remove redundant include file
  rv: Fix wrong type cast in enabled_monitors_next()
  rv: Support systems with time64-only syscalls

24 hours agorv: Add Gabriele Monaco as maintainer for Runtime Verification
Steven Rostedt [Thu, 11 Sep 2025 15:57:44 +0000 (11:57 -0400)]
rv: Add Gabriele Monaco as maintainer for Runtime Verification

Gabriele will start taking over managing the changes to the Runtime
Verification. Make him officially one of the maintainers.

Cc: Gabriele Monaco <gmonaco@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/20250911115744.66ccade3@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
26 hours agodrm/amdgpu: suspend KFD and KGD user queues for S0ix
Alex Deucher [Wed, 17 Sep 2025 16:42:11 +0000 (12:42 -0400)]
drm/amdgpu: suspend KFD and KGD user queues for S0ix

We need to make sure the user queues are preempted so
GFX can enter gfxoff.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f8b367e6fa1716cab7cc232b9e3dff29187fc99d)
Cc: stable@vger.kernel.org
26 hours agodrm/amdkfd: add proper handling for S0ix
Alex Deucher [Wed, 17 Sep 2025 16:42:09 +0000 (12:42 -0400)]
drm/amdkfd: add proper handling for S0ix

When in S0i3, the GFX state is retained, so all we need to do
is stop the runlist so GFX can enter gfxoff.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: David Perry <david.perry@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4bfa8609934dbf39bbe6e75b4f971469384b50b1)
Cc: stable@vger.kernel.org
27 hours agoMerge tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 18 Sep 2025 17:22:02 +0000 (10:22 -0700)]
Merge tag 'net-6.17-rc7' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless. No known regressions at this point.

  Current release - fix to a fix:

   - eth: Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"

   - wifi: iwlwifi: pcie: fix byte count table for 7000/8000 devices

   - net: clear sk->sk_ino in sk_set_socket(sk, NULL), fix CRIU

  Previous releases - regressions:

   - bonding: set random address only when slaves already exist

   - rxrpc: fix untrusted unsigned subtract

   - eth:
       - ice: fix Rx page leak on multi-buffer frames
       - mlx5: don't return mlx5_link_info table when speed is unknown

  Previous releases - always broken:

   - tls: make sure to abort the stream if headers are bogus

   - tcp: fix null-deref when using TCP-AO with TCP_REPAIR

   - dpll: fix skipping last entry in clock quality level reporting

   - eth: qed: don't collect too many protection override GRC elements,
     fix memory corruption"

* tag 'net-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
  octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
  cnic: Fix use-after-free bugs in cnic_delete_task
  devlink rate: Remove unnecessary 'static' from a couple places
  MAINTAINERS: update sundance entry
  net: liquidio: fix overflow in octeon_init_instr_queue()
  net: clear sk->sk_ino in sk_set_socket(sk, NULL)
  Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
  selftests: tls: test skb copy under mem pressure and OOB
  tls: make sure to abort the stream if headers are bogus
  selftest: packetdrill: Add tcp_fastopen_server_reset-after-disconnect.pkt.
  tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
  octeon_ep: fix VF MAC address lifecycle handling
  selftests: bonding: add vlan over bond testing
  bonding: don't set oif to bond dev when getting NS target destination
  net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
  net/mlx5e: Add a miss level for ipsec crypto offload
  net/mlx5e: Harden uplink netdev access against device unbind
  MAINTAINERS: make the DPLL entry cover drivers
  doc/netlink: Fix typos in operation attributes
  igc: don't fail igc_probe() on LED setup error
  ...

28 hours agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Thu, 18 Sep 2025 16:42:55 +0000 (09:42 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "These are mostly Oliver's Arm changes: lock ordering fixes for the
  vGIC, and reverts for a buggy attempt to avoid RCU stalls on large
  VMs.

  Arm:

   - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
     visiting from an MMU notifier

   - Fixes to the TLB match process and TLB invalidation range for
     managing the VCNR pseudo-TLB

   - Prevent SPE from erroneously profiling guests due to UNKNOWN reset
     values in PMSCR_EL1

   - Fix save/restore of host MDCR_EL2 to account for eagerly
     programming at vcpu_load() on VHE systems

   - Correct lock ordering when dealing with VGIC LPIs, avoiding
     scenarios where an xarray's spinlock was nested with a *raw*
     spinlock

   - Permit stage-2 read permission aborts which are possible in the
     case of NV depending on the guest hypervisor's stage-2 translation

   - Call raw_spin_unlock() instead of the internal spinlock API

   - Fix parameter ordering when assigning VBAR_EL1

   - Reverted a couple of fixes for RCU stalls when destroying a stage-2
     page table.

     There appears to be some nasty refcounting / UAF issues lurking in
     those patches and the band-aid we tried to apply didn't hold.

  s390:

   - mm fixes, including userfaultfd bug fix

  x86:

   - Sync the vTPR from the local APIC to the VMCB even when AVIC is
     active.

     This fixes a bug where host updates to the vTPR, e.g. via
     KVM_SET_LAPIC or emulation of a guest access, are lost and result
     in interrupt delivery issues in the guest"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR even if AVIC is active
  Revert "KVM: arm64: Split kvm_pgtable_stage2_destroy()"
  Revert "KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables"
  KVM: arm64: vgic: fix incorrect spinlock API usage
  KVM: arm64: Remove stage 2 read fault check
  KVM: arm64: Fix parameter ordering for VBAR_EL1 assignment
  KVM: arm64: nv: Fix incorrect VNCR invalidation range calculation
  KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
  KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock
  KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
  KVM: arm64: Spin off release helper from vgic_put_irq()
  KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
  KVM: arm64: vgic: Drop stale comment on IRQ active state
  KVM: arm64: VHE: Save and restore host MDCR_EL2 value correctly
  KVM: arm64: Initialize PMSCR_EL1 when in VHE
  KVM: arm64: nv: fix VNCR TLB ASID match logic for non-Global entries
  KVM: s390: Fix FOLL_*/FAULT_FLAG_* confusion
  KVM: s390: Fix incorrect usage of mmu_notifier_register()
  KVM: s390: Fix access to unavailable adapter indicator pages during postcopy
  KVM: arm64: Mark freed S2 MMUs as invalid

28 hours agoMerge tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 18 Sep 2025 16:22:34 +0000 (09:22 -0700)]
Merge tag 'platform-drivers-x86-v6.17-4' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from Ilpo Järvinen:
 "Fixes and new HW support:

   - amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list

   - amd/pmf: Support new ACPI ID AMDI0108

   - asus-wmi: Re-add extra keys to ignore_key_wlan quirk

   - oxpec: Add support for AOKZOE A1X and OneXPlayer X1Pro EVA-02"

* tag 'platform-drivers-x86-v6.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: asus-wmi: Re-add extra keys to ignore_key_wlan quirk
  platform/x86/amd/pmf: Support new ACPI ID AMDI0108
  platform/x86: oxpec: Add support for AOKZOE A1X
  platform/x86: oxpec: Add support for OneXPlayer X1Pro EVA-02
  platform/x86/amd/pmc: Add MECHREVO Yilong15Pro to spurious_8042 list

28 hours agoMerge tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml...
Linus Torvalds [Thu, 18 Sep 2025 16:18:27 +0000 (09:18 -0700)]
Merge tag 'uml-for-6.17-rc7' of git://git./linux/kernel/git/uml/linux

Pull UML fixes from Johannes Berg:
 "A few fixes for UML, which I'd meant to send earlier but then forgot.

  All of them are pretty long-standing issues that are either not really
  happening (the UAF), in rarely used code (the FD buffer issue), or an
  issue only for some host configurations (the executable stack):

   - mark stack not executable to work on more modern systems with
     selinux

   - fix use-after-free in a virtio error path

   - fix stack buffer overflow in external unix socket FD receive
     function"

* tag 'uml-for-6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: Fix FD copy size in os_rcv_fd_msg()
  um: virtio_uml: Fix use-after-free after put_device in probe
  um: Don't mark stack executable

30 hours agoocteontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()
Duoming Zhou [Wed, 17 Sep 2025 06:38:53 +0000 (14:38 +0800)]
octeontx2-pf: Fix use-after-free bugs in otx2_sync_tstamp()

The original code relies on cancel_delayed_work() in otx2_ptp_destroy(),
which does not ensure that the delayed work item synctstamp_work has fully
completed if it was already running. This leads to use-after-free scenarios
where otx2_ptp is deallocated by otx2_ptp_destroy(), while synctstamp_work
remains active and attempts to dereference otx2_ptp in otx2_sync_tstamp().
Furthermore, the synctstamp_work is cyclic, the likelihood of triggering
the bug is nonnegligible.

A typical race condition is illustrated below:

CPU 0 (cleanup)           | CPU 1 (delayed work callback)
otx2_remove()             |
  otx2_ptp_destroy()      | otx2_sync_tstamp()
    cancel_delayed_work() |
    kfree(ptp)            |
                          |   ptp = container_of(...); //UAF
                          |   ptp-> //UAF

This is confirmed by a KASAN report:

BUG: KASAN: slab-use-after-free in __run_timer_base.part.0+0x7d7/0x8c0
Write of size 8 at addr ffff88800aa09a18 by task bash/136
...
Call Trace:
 <IRQ>
 dump_stack_lvl+0x55/0x70
 print_report+0xcf/0x610
 ? __run_timer_base.part.0+0x7d7/0x8c0
 kasan_report+0xb8/0xf0
 ? __run_timer_base.part.0+0x7d7/0x8c0
 __run_timer_base.part.0+0x7d7/0x8c0
 ? __pfx___run_timer_base.part.0+0x10/0x10
 ? __pfx_read_tsc+0x10/0x10
 ? ktime_get+0x60/0x140
 ? lapic_next_event+0x11/0x20
 ? clockevents_program_event+0x1d4/0x2a0
 run_timer_softirq+0xd1/0x190
 handle_softirqs+0x16a/0x550
 irq_exit_rcu+0xaf/0xe0
 sysvec_apic_timer_interrupt+0x70/0x80
 </IRQ>
...
Allocated by task 1:
 kasan_save_stack+0x24/0x50
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0x7f/0x90
 otx2_ptp_init+0xb1/0x860
 otx2_probe+0x4eb/0xc30
 local_pci_probe+0xdc/0x190
 pci_device_probe+0x2fe/0x470
 really_probe+0x1ca/0x5c0
 __driver_probe_device+0x248/0x310
 driver_probe_device+0x44/0x120
 __driver_attach+0xd2/0x310
 bus_for_each_dev+0xed/0x170
 bus_add_driver+0x208/0x500
 driver_register+0x132/0x460
 do_one_initcall+0x89/0x300
 kernel_init_freeable+0x40d/0x720
 kernel_init+0x1a/0x150
 ret_from_fork+0x10c/0x1a0
 ret_from_fork_asm+0x1a/0x30

Freed by task 136:
 kasan_save_stack+0x24/0x50
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3a/0x60
 __kasan_slab_free+0x3f/0x50
 kfree+0x137/0x370
 otx2_ptp_destroy+0x38/0x80
 otx2_remove+0x10d/0x4c0
 pci_device_remove+0xa6/0x1d0
 device_release_driver_internal+0xf8/0x210
 pci_stop_bus_device+0x105/0x150
 pci_stop_and_remove_bus_device_locked+0x15/0x30
 remove_store+0xcc/0xe0
 kernfs_fop_write_iter+0x2c3/0x440
 vfs_write+0x871/0xd70
 ksys_write+0xee/0x1c0
 do_syscall_64+0xac/0x280
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
...

Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the delayed work item is properly canceled before the otx2_ptp is
deallocated.

This bug was initially identified through static analysis. To reproduce
and test it, I simulated the OcteonTX2 PCI device in QEMU and introduced
artificial delays within the otx2_sync_tstamp() function to increase the
likelihood of triggering the bug.

Fixes: 2958d17a8984 ("octeontx2-pf: Add support for ptp 1-step mode on CN10K silicon")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
30 hours agocnic: Fix use-after-free bugs in cnic_delete_task
Duoming Zhou [Wed, 17 Sep 2025 05:46:02 +0000 (13:46 +0800)]
cnic: Fix use-after-free bugs in cnic_delete_task

The original code uses cancel_delayed_work() in cnic_cm_stop_bnx2x_hw(),
which does not guarantee that the delayed work item 'delete_task' has
fully completed if it was already running. Additionally, the delayed work
item is cyclic, the flush_workqueue() in cnic_cm_stop_bnx2x_hw() only
blocks and waits for work items that were already queued to the
workqueue prior to its invocation. Any work items submitted after
flush_workqueue() is called are not included in the set of tasks that the
flush operation awaits. This means that after the cyclic work items have
finished executing, a delayed work item may still exist in the workqueue.
This leads to use-after-free scenarios where the cnic_dev is deallocated
by cnic_free_dev(), while delete_task remains active and attempt to
dereference cnic_dev in cnic_delete_task().

A typical race condition is illustrated below:

CPU 0 (cleanup)              | CPU 1 (delayed work callback)
cnic_netdev_event()          |
  cnic_stop_hw()             | cnic_delete_task()
    cnic_cm_stop_bnx2x_hw()  | ...
      cancel_delayed_work()  | /* the queue_delayed_work()
      flush_workqueue()      |    executes after flush_workqueue()*/
                             | queue_delayed_work()
  cnic_free_dev(dev)//free   | cnic_delete_task() //new instance
                             |   dev = cp->dev; //use

Replace cancel_delayed_work() with cancel_delayed_work_sync() to ensure
that the cyclic delayed work item is properly canceled and that any
ongoing execution of the work item completes before the cnic_dev is
deallocated. Furthermore, since cancel_delayed_work_sync() uses
__flush_work(work, true) to synchronously wait for any currently
executing instance of the work item to finish, the flush_workqueue()
becomes redundant and should be removed.

This bug was identified through static analysis. To reproduce the issue
and validate the fix, I simulated the cnic PCI device in QEMU and
introduced intentional delays — such as inserting calls to ssleep()
within the cnic_delete_task() function — to increase the likelihood
of triggering the bug.

Fixes: fdf24086f475 ("cnic: Defer iscsi connection cleanup")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
30 hours agodevlink rate: Remove unnecessary 'static' from a couple places
Cosmin Ratiu [Thu, 18 Sep 2025 10:15:06 +0000 (13:15 +0300)]
devlink rate: Remove unnecessary 'static' from a couple places

devlink_rate_node_get_by_name() and devlink_rate_nodes_destroy() have a
couple of unnecessary static variables for iterating over devlink rates.

This could lead to races/corruption/unhappiness if two concurrent
operations execute the same function.

Remove 'static' from both. It's amazing this was missed for 4+ years.
While at it, I confirmed there are no more examples of this mistake in
net/ with 1, 2 or 3 levels of indentation.

Fixes: a8ecb93ef03d ("devlink: Introduce rate nodes")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
30 hours agoMAINTAINERS: update sundance entry
Denis Kirjanov [Thu, 18 Sep 2025 09:15:56 +0000 (12:15 +0300)]
MAINTAINERS: update sundance entry

Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
30 hours agonet: liquidio: fix overflow in octeon_init_instr_queue()
Alexey Nepomnyashih [Wed, 17 Sep 2025 15:30:58 +0000 (15:30 +0000)]
net: liquidio: fix overflow in octeon_init_instr_queue()

The expression `(conf->instr_type == 64) << iq_no` can overflow because
`iq_no` may be as high as 64 (`CN23XX_MAX_RINGS_PER_PF`). Casting the
operand to `u64` ensures correct 64-bit arithmetic.

Fixes: f21fb3ed364b ("Add support of Cavium Liquidio ethernet adapters")
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
30 hours agonet: clear sk->sk_ino in sk_set_socket(sk, NULL)
Eric Dumazet [Wed, 17 Sep 2025 13:53:37 +0000 (13:53 +0000)]
net: clear sk->sk_ino in sk_set_socket(sk, NULL)

Andrei Vagin reported that blamed commit broke CRIU.

Indeed, while we want to keep sk_uid unchanged when a socket
is cloned, we want to clear sk->sk_ino.

Otherwise, sock_diag might report multiple sockets sharing
the same inode number.

Move the clearing part from sock_orphan() to sk_set_socket(sk, NULL),
called both from sock_orphan() and sk_clone_lock().

Fixes: 5d6b58c932ec ("net: lockless sock_i_ino()")
Closes: https://lore.kernel.org/netdev/aMhX-VnXkYDpKd9V@google.com/
Closes: https://github.com/checkpoint-restore/criu/issues/2744
Reported-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Andrei Vagin <avagin@google.com>
Link: https://patch.msgid.link/20250917135337.1736101-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
30 hours agoRevert "net/mlx5e: Update and set Xon/Xoff upon port speed set"
Tariq Toukan [Wed, 17 Sep 2025 13:48:54 +0000 (16:48 +0300)]
Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set"

This reverts commit d24341740fe48add8a227a753e68b6eedf4b385a.
It causes errors when trying to configure QoS, as well as
loss of L2 connectivity (on multi-host devices).

Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/20250910170011.70528106@kernel.org
Fixes: d24341740fe4 ("net/mlx5e: Update and set Xon/Xoff upon port speed set")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
34 hours agoselftests: tls: test skb copy under mem pressure and OOB
Jakub Kicinski [Wed, 17 Sep 2025 00:28:14 +0000 (17:28 -0700)]
selftests: tls: test skb copy under mem pressure and OOB

Add a test which triggers mem pressure via OOB writes.

Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250917002814.1743558-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
34 hours agotls: make sure to abort the stream if headers are bogus
Jakub Kicinski [Wed, 17 Sep 2025 00:28:13 +0000 (17:28 -0700)]
tls: make sure to abort the stream if headers are bogus

Normally we wait for the socket to buffer up the whole record
before we service it. If the socket has a tiny buffer, however,
we read out the data sooner, to prevent connection stalls.
Make sure that we abort the connection when we find out late
that the record is actually invalid. Retrying the parsing is
fine in itself but since we copy some more data each time
before we parse we can overflow the allocated skb space.

Constructing a scenario in which we're under pressure without
enough data in the socket to parse the length upfront is quite
hard. syzbot figured out a way to do this by serving us the header
in small OOB sends, and then filling in the recvbuf with a large
normal send.

Make sure that tls_rx_msg_size() aborts strp, if we reach
an invalid record there's really no way to recover.

Reported-by: Lee Jones <lee@kernel.org>
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250917002814.1743558-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
37 hours agoMerge tag 'drm-intel-fixes-2025-09-17' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Thu, 18 Sep 2025 07:36:23 +0000 (17:36 +1000)]
Merge tag 'drm-intel-fixes-2025-09-17' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Honor VESA eDP backlight luminance control capability (Aaron Ma)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://lore.kernel.org/r/aMrPx4FZ66t1Kfe-@linux
40 hours agoMerge tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Thu, 18 Sep 2025 04:34:26 +0000 (21:34 -0700)]
Merge tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git./linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "15 hotfixes. 11 are cc:stable and the remainder address post-6.16
  issues or aren't considered necessary for -stable kernels. 13 of these
  fixes are for MM.

  The usual shower of singletons, plus

   - fixes from Hugh to address various misbehaviors in get_user_pages()

   - patches from SeongJae to address a quite severe issue in DAMON

   - another series also from SeongJae which completes some fixes for a
     DAMON startup issue"

* tag 'mm-hotfixes-stable-2025-09-17-21-10' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  zram: fix slot write race condition
  nilfs2: fix CFI failure when accessing /sys/fs/nilfs2/features/*
  samples/damon/mtier: avoid starting DAMON before initialization
  samples/damon/prcl: avoid starting DAMON before initialization
  samples/damon/wsse: avoid starting DAMON before initialization
  MAINTAINERS: add Lance Yang as a THP reviewer
  MAINTAINERS: add Jann Horn as rmap reviewer
  mm/damon/sysfs: use dynamically allocated repeat mode damon_call_control
  mm/damon/core: introduce damon_call_control->dealloc_on_cancel
  mm: folio_may_be_lru_cached() unless folio_test_large()
  mm: revert "mm: vmscan.c: fix OOM on swap stress test"
  mm: revert "mm/gup: clear the LRU flag of a page before adding to LRU batch"
  mm/gup: local lru_add_drain() to avoid lru_add_drain_all()
  mm/gup: check ref_count instead of lru before migration

43 hours agoMerge tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Thu, 18 Sep 2025 01:23:01 +0000 (18:23 -0700)]
Merge tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - Two fixes for remaining_data_length and offset checks in receive path

 - Don't go over max SGEs which caused smbdirect send to fail (and
   trigger disconnect)

* tag '6.17-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: smbdirect: verify remaining_data_length respects max_fragmented_recv_size
  ksmbd: smbdirect: validate data_offset and data_length field of smb_direct_data_transfer
  smb: server: let smb_direct_writev() respect SMB_DIRECT_MAX_SEND_SGES

44 hours agodrm/xe/guc: Set RCS/CCS yield policy
Daniele Ceraolo Spurio [Fri, 5 Sep 2025 23:56:33 +0000 (16:56 -0700)]
drm/xe/guc: Set RCS/CCS yield policy

All recent platforms (including all the ones officially supported by the
Xe driver) do not allow concurrent execution of RCS and CCS workloads
from different address spaces, with the HW blocking the context switch
when it detects such a scenario.
The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a
context it knows will not be able to execute. This, however, causes a new
problem: if RCS and CCS queues have pending workloads from different
address spaces, the GuC needs to choose from which of the 2 queues to
pick the next workload to execute. By default, the GuC prioritizes RCS
submissions over CCS ones, which can lead to CCS workloads being
significantly (or completely) starved of execution time.
The driver can tune this by setting a dedicated scheduling policy KLV;
this KLV allows the driver to specify a quantum (in ms) and a ratio
(percentage value between 0 and 100), and the GuC will prioritize the CCS
for that percentage of each quantum.
Given that we want to guarantee enough RCS throughput to avoid missing
frames, we set the yield policy to 20% of each 80ms interval.

v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable
in gt_sanitize

Fixes: d9a1ae0d17bd ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com
(cherry picked from commit 88434448438e4302e272b2a2b810b42e05ea024b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo added #include "xe_guc_submit.h" while backporting]

45 hours agoMerge tag 'probes-fixes-v6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 17 Sep 2025 23:52:26 +0000 (16:52 -0700)]
Merge tag 'probes-fixes-v6.17-rc6' of git://git./linux/kernel/git/trace/linux-trace

Pull probe fix from Masami Hiramatsu:

 - kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal(),
   by handling NULL return of kmemdup() correctly

* tag 'probes-fixes-v6.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal()

45 hours agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Wed, 17 Sep 2025 23:14:25 +0000 (16:14 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-09-16 (ice, i40e, ixgbe, igc)

For ice:
Jake resolves leaking pages with multi-buffer frames when a 0-sized
descriptor is encountered.

For i40e:
Maciej removes a redundant, and incorrect, memory barrier.

For ixgbe:
Jedrzej adjusts lifespan of ACI lock to ensure uses are while it is
valid.

For igc:
Kohei Enju does not fail probe on LED setup failure which resolves a
kernel panic in the cleanup path, if we were to fail.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: don't fail igc_probe() on LED setup error
  ixgbe: destroy aci.lock later within ixgbe_remove path
  ixgbe: initialize aci.lock before it's used
  i40e: remove redundant memory barrier when cleaning Tx descs
  ice: fix Rx page leak on multi-buffer frames
====================

Link: https://patch.msgid.link/20250916212801.2818440-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
45 hours agoMerge tag 'wireless-2025-09-17' of https://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Wed, 17 Sep 2025 23:12:46 +0000 (16:12 -0700)]
Merge tag 'wireless-2025-09-17' of https://git./linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Just two fixes:
 - fix crash in rfkill due to uninitialized type_name
 - fix aggregation in iwlwifi 7000/8000 devices

* tag 'wireless-2025-09-17' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
  wifi: iwlwifi: pcie: fix byte count table for some devices
====================

Link: https://patch.msgid.link/20250917105159.161583-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
46 hours agoMerge branch 'tcp-clear-tcp_sk-sk-fastopen_rsk-in-tcp_disconnect'
Jakub Kicinski [Wed, 17 Sep 2025 23:01:54 +0000 (16:01 -0700)]
Merge branch 'tcp-clear-tcp_sk-sk-fastopen_rsk-in-tcp_disconnect'

Kuniyuki Iwashima says:

====================
tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().

syzbot reported a warning in tcp_retransmit_timer() for TCP Fast
Open socket.

Patch 1 fixes the issue and Patch 2 adds a test for the scenario.
====================

Link: https://patch.msgid.link/20250915175800.118793-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
46 hours agoselftest: packetdrill: Add tcp_fastopen_server_reset-after-disconnect.pkt.
Kuniyuki Iwashima [Mon, 15 Sep 2025 17:56:47 +0000 (17:56 +0000)]
selftest: packetdrill: Add tcp_fastopen_server_reset-after-disconnect.pkt.

The test reproduces the scenario explained in the previous patch.

Without the patch, the test triggers the warning and cannot see the last
retransmitted packet.

  # ./ksft_runner.sh tcp_fastopen_server_reset-after-disconnect.pkt
  TAP version 13
  1..2
  [   29.229250] ------------[ cut here ]------------
  [   29.231414] WARNING: CPU: 26 PID: 0 at net/ipv4/tcp_timer.c:542 tcp_retransmit_timer+0x32/0x9f0
  ...
  tcp_fastopen_server_reset-after-disconnect.pkt:26: error handling packet: Timed out waiting for packet
  not ok 1 ipv4
  tcp_fastopen_server_reset-after-disconnect.pkt:26: error handling packet: Timed out waiting for packet
  not ok 2 ipv6
  # Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250915175800.118793-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
46 hours agotcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().
Kuniyuki Iwashima [Mon, 15 Sep 2025 17:56:46 +0000 (17:56 +0000)]
tcp: Clear tcp_sk(sk)->fastopen_rsk in tcp_disconnect().

syzbot reported the splat below where a socket had tcp_sk(sk)->fastopen_rsk
in the TCP_ESTABLISHED state. [0]

syzbot reused the server-side TCP Fast Open socket as a new client before
the TFO socket completes 3WHS:

  1. accept()
  2. connect(AF_UNSPEC)
  3. connect() to another destination

As of accept(), sk->sk_state is TCP_SYN_RECV, and tcp_disconnect() changes
it to TCP_CLOSE and makes connect() possible, which restarts timers.

Since tcp_disconnect() forgot to clear tcp_sk(sk)->fastopen_rsk, the
retransmit timer triggered the warning and the intended packet was not
retransmitted.

Let's call reqsk_fastopen_remove() in tcp_disconnect().

[0]:
WARNING: CPU: 2 PID: 0 at net/ipv4/tcp_timer.c:542 tcp_retransmit_timer (net/ipv4/tcp_timer.c:542 (discriminator 7))
Modules linked in:
CPU: 2 UID: 0 PID: 0 Comm: swapper/2 Not tainted 6.17.0-rc5-g201825fb4278 #62 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:tcp_retransmit_timer (net/ipv4/tcp_timer.c:542 (discriminator 7))
Code: 41 55 41 54 55 53 48 8b af b8 08 00 00 48 89 fb 48 85 ed 0f 84 55 01 00 00 0f b6 47 12 3c 03 74 0c 0f b6 47 12 3c 04 74 04 90 <0f> 0b 90 48 8b 85 c0 00 00 00 48 89 ef 48 8b 40 30 e8 6a 4f 06 3e
RSP: 0018:ffffc900002f8d40 EFLAGS: 00010293
RAX: 0000000000000002 RBX: ffff888106911400 RCX: 0000000000000017
RDX: 0000000002517619 RSI: ffffffff83764080 RDI: ffff888106911400
RBP: ffff888106d5c000 R08: 0000000000000001 R09: ffffc900002f8de8
R10: 00000000000000c2 R11: ffffc900002f8ff8 R12: ffff888106911540
R13: ffff888106911480 R14: ffff888106911840 R15: ffffc900002f8de0
FS:  0000000000000000(0000) GS:ffff88907b768000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8044d69d90 CR3: 0000000002c30003 CR4: 0000000000370ef0
Call Trace:
 <IRQ>
 tcp_write_timer (net/ipv4/tcp_timer.c:738)
 call_timer_fn (kernel/time/timer.c:1747)
 __run_timers (kernel/time/timer.c:1799 kernel/time/timer.c:2372)
 timer_expire_remote (kernel/time/timer.c:2385 kernel/time/timer.c:2376 kernel/time/timer.c:2135)
 tmigr_handle_remote_up (kernel/time/timer_migration.c:944 kernel/time/timer_migration.c:1035)
 __walk_groups.isra.0 (kernel/time/timer_migration.c:533 (discriminator 1))
 tmigr_handle_remote (kernel/time/timer_migration.c:1096)
 handle_softirqs (./arch/x86/include/asm/jump_label.h:36 ./include/trace/events/irq.h:142 kernel/softirq.c:580)
 irq_exit_rcu (kernel/softirq.c:614 kernel/softirq.c:453 kernel/softirq.c:680 kernel/softirq.c:696)
 sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1050 (discriminator 35) arch/x86/kernel/apic/apic.c:1050 (discriminator 35))
 </IRQ>

Fixes: 8336886f786f ("tcp: TCP Fast Open Server - support TFO listeners")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250915175800.118793-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
46 hours agoocteon_ep: fix VF MAC address lifecycle handling
Sathesh B Edara [Tue, 16 Sep 2025 13:32:07 +0000 (06:32 -0700)]
octeon_ep: fix VF MAC address lifecycle handling

Currently, VF MAC address info is not updated when the MAC address is
configured from VF, and it is not cleared when the VF is removed. This
leads to stale or missing MAC information in the PF, which may cause
incorrect state tracking or inconsistencies when VFs are hot-plugged
or reassigned.

Fix this by:
 - storing the VF MAC address in the PF when it is set from VF
 - clearing the stored VF MAC address when the VF is removed

This ensures that the PF always has correct VF MAC state.

Fixes: cde29af9e68e ("octeon_ep: add PF-VF mailbox communication")
Signed-off-by: Sathesh B Edara <sedara@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250916133207.21737-1-sedara@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
46 hours agotracing: kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal()
Wang Liang [Tue, 16 Sep 2025 07:58:16 +0000 (15:58 +0800)]
tracing: kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal()

A crash was observed with the following output:

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 2899 Comm: syz.2.399 Not tainted 6.17.0-rc5+ #5 PREEMPT(none)
RIP: 0010:trace_kprobe_create_internal+0x3fc/0x1440 kernel/trace/trace_kprobe.c:911
Call Trace:
 <TASK>
 trace_kprobe_create_cb+0xa2/0xf0 kernel/trace/trace_kprobe.c:1089
 trace_probe_create+0xf1/0x110 kernel/trace/trace_probe.c:2246
 dyn_event_create+0x45/0x70 kernel/trace/trace_dynevent.c:128
 create_or_delete_trace_kprobe+0x5e/0xc0 kernel/trace/trace_kprobe.c:1107
 trace_parse_run_command+0x1a5/0x330 kernel/trace/trace.c:10785
 vfs_write+0x2b6/0xd00 fs/read_write.c:684
 ksys_write+0x129/0x240 fs/read_write.c:738
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x5d/0x2d0 arch/x86/entry/syscall_64.c:94
 </TASK>

Function kmemdup() may return NULL in trace_kprobe_create_internal(), add
check for it's return value.

Link: https://lore.kernel.org/all/20250916075816.3181175-1-wangliang74@huawei.com/
Fixes: 33b4e38baa03 ("tracing: kprobe-event: Allocate string buffers from heap")
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
46 hours agoselftests: bonding: add vlan over bond testing
Hangbin Liu [Tue, 16 Sep 2025 08:01:27 +0000 (08:01 +0000)]
selftests: bonding: add vlan over bond testing

Add a vlan over bond testing to make sure arp/ns target works.
Also change all the configs to mudules.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250916080127.430626-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
46 hours agobonding: don't set oif to bond dev when getting NS target destination
Hangbin Liu [Tue, 16 Sep 2025 08:01:26 +0000 (08:01 +0000)]
bonding: don't set oif to bond dev when getting NS target destination

Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.

Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.

Reported-by: David Wilder <wilder@us.ibm.com>
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh <jv@jvosburgh.net>
Tested-by: David Wilder <wilder@us.ibm.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250916080127.430626-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 days agoMerge tag 'sched_ext-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 17 Sep 2025 20:27:31 +0000 (13:27 -0700)]
Merge tag 'sched_ext-for-6.17-rc6-fixes' of git://git./linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

 - Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED

 - Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"
   which was causing issues with per-CPU task scheduling and reenqueuing
   behavior

* tag 'sched_ext-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext, sched/core: Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED
  Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"

2 days agoMerge tag 'cgroup-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 17 Sep 2025 20:22:08 +0000 (13:22 -0700)]
Merge tag 'cgroup-for-6.17-rc6-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
 "This contains two cgroup changes. Both are pretty low risk.

  - Fix deadlock in cgroup destruction when repeatedly
    mounting/unmounting perf_event and net_prio controllers.

    The issue occurs because cgroup_destroy_wq has max_active=1, causing
    root destruction to wait for CSS offline operations that are queued
    behind it.

    The fix splits cgroup_destroy_wq into three separate workqueues to
    eliminate the blocking.

  - Set of->priv to NULL upon file release to make potential bugs to
    manifest as NULL pointer dereferences rather than use-after-free
    errors"

* tag 'cgroup-for-6.17-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/psi: Set of->priv to NULL upon file release
  cgroup: split cgroup_destroy_wq into 3 workqueues

2 days agoMerge tag 'kvm-s390-master-6.17-1' of https://git.kernel.org/pub/scm/linux/kernel...
Paolo Bonzini [Wed, 17 Sep 2025 17:45:21 +0000 (19:45 +0200)]
Merge tag 'kvm-s390-master-6.17-1' of https://git./linux/kernel/git/kvms390/linux into HEAD

- KVM mm fixes
- Postcopy fix

2 days agoMerge tag 'kvm-x86-fixes-6.17-rcN' of https://github.com/kvm-x86/linux into HEAD
Paolo Bonzini [Wed, 17 Sep 2025 17:45:02 +0000 (19:45 +0200)]
Merge tag 'kvm-x86-fixes-6.17-rcN' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fix for 6.17-rcN

Sync the vTPR from the local APIC to the VMCB even when AVIC is active, to fix
a bug where host updates to the vTPR, e.g. via KVM_SET_LAPIC or emulation of a
guest access, effectively get lost and result in interrupt delivery issues in
the guest.

2 days agoMerge tag 'kvmarm-fixes-6.17-2' of https://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Wed, 17 Sep 2025 17:44:40 +0000 (19:44 +0200)]
Merge tag 'kvmarm-fixes-6.17-2' of https://git./linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 changes for 6.17, round #3

 - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
   visiting from an MMU notifier

 - Fixes to the TLB match process and TLB invalidation range for
   managing the VCNR pseudo-TLB

 - Prevent SPE from erroneously profiling guests due to UNKNOWN reset
   values in PMSCR_EL1

 - Fix save/restore of host MDCR_EL2 to account for eagerly programming
   at vcpu_load() on VHE systems

 - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
   where an xarray's spinlock was nested with a *raw* spinlock

 - Permit stage-2 read permission aborts which are possible in the case
   of NV depending on the guest hypervisor's stage-2 translation

 - Call raw_spin_unlock() instead of the internal spinlock API

 - Fix parameter ordering when assigning VBAR_EL1

2 days agodrm/xe: Fix error handling if PXP fails to start
Daniele Ceraolo Spurio [Tue, 9 Sep 2025 22:12:40 +0000 (15:12 -0700)]
drm/xe: Fix error handling if PXP fails to start

Since the PXP start comes after __xe_exec_queue_init() has completed,
we need to cleanup what was done in that function in case of a PXP
start error.
__xe_exec_queue_init calls the submission backend init() function,
so we need to introduce an opposite for that. Unfortunately, while
we already have a fini() function pointer, it performs other
operations in addition to cleaning up what was done by the init().
Therefore, for clarity, the existing fini() has been renamed to
destroy(), while a new fini() has been added to only clean up what was
done by the init(), with the latter being called by the former (via
xe_exec_queue_fini).

Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com
(cherry picked from commit 626667321deb4c7a294725406faa3dd71c3d445d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2 days agodrm/xe/sysfs: Add cleanup action in xe_device_sysfs_init
Zongyao Bai [Mon, 15 Sep 2025 21:47:15 +0000 (05:47 +0800)]
drm/xe/sysfs: Add cleanup action in xe_device_sysfs_init

On partial failure, some sysfs files created before the failure might
not be removed. Add common cleanup step to remove them all immediately,
as is should be harmless to attempt to remove non-existing files.

Fixes: 0e414bf7ad01 ("drm/xe: Expose PCIe link downgrade attributes")
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Zongyao Bai <zongyao.bai@intel.com>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250915214716.1327379-2-zongyao.bai@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 1a869168d91f1a1a2b0db22cea0295c67908e5d8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2 days agoMerge tag 'for-6.17/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Wed, 17 Sep 2025 15:07:02 +0000 (08:07 -0700)]
Merge tag 'for-6.17/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mikulas Patocka:

 - fix integer overflow in dm-stripe

 - limit tag size in dm-integrity to 255 bytes

 - fix 'alignment inconsistency' warning in dm-raid

* tag 'for-6.17/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm-raid: don't set io_min and io_opt for raid1
  dm-integrity: limit MAX_TAG_SIZE to 255
  dm-stripe: fix a possible integer overflow

2 days agoMerge tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Wed, 17 Sep 2025 14:55:45 +0000 (07:55 -0700)]
Merge tag 'for-6.17-rc6-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - in zoned mode, turn assertion to proper code when reserving space in
   relocation block group

 - fix search key of extended ref (hardlink) when replaying log

 - fix initialization of file extent tree on filesystems without
   no-holes feature

 - add harmless data race annotation to block group comparator

* tag 'for-6.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: annotate block group access with data_race() when sorting for reclaim
  btrfs: initialize inode::file_extent_tree after i_mode has been set
  btrfs: zoned: fix incorrect ASSERT in btrfs_zoned_reserve_data_reloc_bg()
  btrfs: fix invalid extref key setup when replaying dentry

2 days agodm-raid: don't set io_min and io_opt for raid1
Mikulas Patocka [Mon, 15 Sep 2025 14:12:40 +0000 (16:12 +0200)]
dm-raid: don't set io_min and io_opt for raid1

These commands
 modprobe brd rd_size=1048576
 vgcreate vg /dev/ram*
 lvcreate -m4 -L10 -n lv vg
trigger the following warnings:
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency
device-mapper: table: 252:10: adding target device (start sect 0 len 24576) caused an alignment inconsistency

The warnings are caused by the fact that io_min is 512 and physical block
size is 4096.

If there's chunk-less raid, such as raid1, io_min shouldn't be set to zero
because it would be raised to 512 and it would trigger the warning.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
2 days agonet: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer
Hans de Goede [Sat, 13 Sep 2025 11:35:15 +0000 (13:35 +0200)]
net: rfkill: gpio: Fix crash due to dereferencering uninitialized pointer

Since commit 7d5e9737efda ("net: rfkill: gpio: get the name and type from
device property") rfkill_find_type() gets called with the possibly
uninitialized "const char *type_name;" local variable.

On x86 systems when rfkill-gpio binds to a "BCM4752" or "LNV4752"
acpi_device, the rfkill->type is set based on the ACPI acpi_device_id:

        rfkill->type = (unsigned)id->driver_data;

and there is no "type" property so device_property_read_string() will fail
and leave type_name uninitialized, leading to a potential crash.

rfkill_find_type() does accept a NULL pointer, fix the potential crash
by initializing type_name to NULL.

Note likely sofar this has not been caught because:

1. Not many x86 machines actually have a "BCM4752"/"LNV4752" acpi_device
2. The stack happened to contain NULL where type_name is stored

Fixes: 7d5e9737efda ("net: rfkill: gpio: get the name and type from device property")
Cc: stable@vger.kernel.org
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Hans de Goede <hansg@kernel.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20250913113515.21698-1-hansg@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2 days agoMerge tag 'iwlwifi-fixes-2025-09-15' of https://git.kernel.org/pub/scm/linux/kernel...
Johannes Berg [Wed, 17 Sep 2025 10:35:38 +0000 (12:35 +0200)]
Merge tag 'iwlwifi-fixes-2025-09-15' of https://git./linux/kernel/git/iwlwifi/iwlwifi-next

Miri Korenblit says:
====================
iwlwifi fix
====================

The fix is for byte count tables in 7000/8000 family devices.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2 days agosched_ext, sched/core: Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED
Tejun Heo [Tue, 16 Sep 2025 21:06:42 +0000 (11:06 -1000)]
sched_ext, sched/core: Fix build failure when !FAIR_GROUP_SCHED && EXT_GROUP_SCHED

While collecting SCX related fields in struct task_group into struct
scx_task_group, 6e6558a6bc41 ("sched_ext, sched/core: Factor out struct
scx_task_group") forgot update tg->scx_weight usage in tg_weight(), which
leads to build failure when CONFIG_FAIR_GROUP_SCHED is disabled but
CONFIG_EXT_GROUP_SCHED is enabled. Fix it.

Fixes: 6e6558a6bc41 ("sched_ext, sched/core: Factor out struct scx_task_group")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509170230.MwZsJSWa-lkp@intel.com/
Tested-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2 days agoMerge branch 'mlx5e-misc-fixes-2025-09-15'
Jakub Kicinski [Wed, 17 Sep 2025 00:19:14 +0000 (17:19 -0700)]
Merge branch 'mlx5e-misc-fixes-2025-09-15'

Tariq Toukan says:

====================
mlx5e misc fixes 2025-09-15

This patchset provides misc bug fixes from the team to the mlx5 Eth
driver.
====================

Link: https://patch.msgid.link/1757939074-617281-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 days agonet/mlx5e: Add a miss level for ipsec crypto offload
Lama Kayal [Mon, 15 Sep 2025 12:24:34 +0000 (15:24 +0300)]
net/mlx5e: Add a miss level for ipsec crypto offload

The cited commit adds a miss table for switchdev mode. But it
uses the same level as policy table. Will hit the following error
when running command:

 # ip xfrm state add src 192.168.1.22 dst 192.168.1.21 proto \
esp spi 1001 reqid 10001 aead 'rfc4106(gcm(aes))' \
0x3a189a7f9374955d3817886c8587f1da3df387ff 128 \
mode tunnel offload dev enp8s0f0 dir in
 Error: mlx5_core: Device failed to offload this state.

The dmesg error is:

 mlx5_core 0000:03:00.0: ipsec_miss_create:578:(pid 311797): fail to create IPsec miss_rule err=-22

Fix it by adding a new miss level to avoid the error.

Fixes: 7d9e292ecd67 ("net/mlx5e: Move IPSec policy check after decryption")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Chris Mi <cmi@nvidia.com>
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1757939074-617281-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 days agonet/mlx5e: Harden uplink netdev access against device unbind
Jianbo Liu [Mon, 15 Sep 2025 12:24:32 +0000 (15:24 +0300)]
net/mlx5e: Harden uplink netdev access against device unbind

The function mlx5_uplink_netdev_get() gets the uplink netdevice
pointer from mdev->mlx5e_res.uplink_netdev. However, the netdevice can
be removed and its pointer cleared when unbound from the mlx5_core.eth
driver. This results in a NULL pointer, causing a kernel panic.

 BUG: unable to handle page fault for address: 0000000000001300
 at RIP: 0010:mlx5e_vport_rep_load+0x22a/0x270 [mlx5_core]
 Call Trace:
  <TASK>
  mlx5_esw_offloads_rep_load+0x68/0xe0 [mlx5_core]
  esw_offloads_enable+0x593/0x910 [mlx5_core]
  mlx5_eswitch_enable_locked+0x341/0x420 [mlx5_core]
  mlx5_devlink_eswitch_mode_set+0x17e/0x3a0 [mlx5_core]
  devlink_nl_eswitch_set_doit+0x60/0xd0
  genl_family_rcv_msg_doit+0xe0/0x130
  genl_rcv_msg+0x183/0x290
  netlink_rcv_skb+0x4b/0xf0
  genl_rcv+0x24/0x40
  netlink_unicast+0x255/0x380
  netlink_sendmsg+0x1f3/0x420
  __sock_sendmsg+0x38/0x60
  __sys_sendto+0x119/0x180
  do_syscall_64+0x53/0x1d0
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

Ensure the pointer is valid before use by checking it for NULL. If it
is valid, immediately call netdev_hold() to take a reference, and
preventing the netdevice from being freed while it is in use.

Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1757939074-617281-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 days agoMAINTAINERS: make the DPLL entry cover drivers
Jakub Kicinski [Mon, 15 Sep 2025 23:42:55 +0000 (16:42 -0700)]
MAINTAINERS: make the DPLL entry cover drivers

DPLL maintainers should probably be CCed on driver patches, too.
Remove the *, which makes the pattern only match files directly
under drivers/dpll but not its sub-directories.

Acked-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Link: https://patch.msgid.link/20250915234255.1306612-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 days agodoc/netlink: Fix typos in operation attributes
Remy D. Farley [Sat, 13 Sep 2025 14:05:28 +0000 (14:05 +0000)]
doc/netlink: Fix typos in operation attributes

I'm trying to generate Rust bindings for netlink using the yaml spec.

It looks like there's a typo in conntrack spec: attribute set conntrack-attrs
defines attributes "counters-{orig,reply}" (plural), while get operation
references "counter-{orig,reply}" (singular). The latter should be fixed, as it
denotes multiple counters (packet and byte). The corresonding C define is
CTA_COUNTERS_ORIG.

Also, dump request references "nfgen-family" attribute, which neither exists in
conntrack-attrs attrset nor ctattr_type enum. There's member of nfgenmsg struct
with the same name, which is where family value is actually taken from.

> static int ctnetlink_dump_exp_ct(struct net *net, struct sock *ctnl,
>                struct sk_buff *skb,
>                const struct nlmsghdr *nlh,
>                const struct nlattr * const cda[],
>                struct netlink_ext_ack *extack)
> {
>   int err;
>   struct nfgenmsg *nfmsg = nlmsg_data(nlh);
>   u_int8_t u3 = nfmsg->nfgen_family;
                         ^^^^^^^^^^^^

Signed-off-by: Remy D. Farley <one-d-wide@protonmail.com>
Fixes: 23fc9311a526 ("netlink: specs: add conntrack dump and stats dump support")
Link: https://patch.msgid.link/20250913140515.1132886-1-one-d-wide@protonmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 days agoMerge tag 'perf-tools-fixes-for-v6.17-2025-09-16' of git://git.kernel.org/pub/scm...
Linus Torvalds [Tue, 16 Sep 2025 22:15:54 +0000 (15:15 -0700)]
Merge tag 'perf-tools-fixes-for-v6.17-2025-09-16' of git://git./linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Namhyung Kim:
 "A small set of fixes for crashes in different commands and conditions"

* tag 'perf-tools-fixes-for-v6.17-2025-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
  perf maps: Ensure kmap is set up for all inserts
  perf lock: Provide a host_env for session new
  perf subcmd: avoid crash in exclude_cmds when excludes is empty

2 days agodrm/amd: Only restore cached manual clock settings in restore if OD enabled
Mario Limonciello [Tue, 16 Sep 2025 01:59:02 +0000 (20:59 -0500)]
drm/amd: Only restore cached manual clock settings in restore if OD enabled

If OD is not enabled then restoring cached clock settings doesn't make
sense and actually leads to errors in resume.

Check if enabled before restoring settings.

Fixes: 4e9526924d09 ("drm/amd: Restore cached manual clock settings during resume")
Reported-by: Jérôme Lécuyer <jerome.4a4c@gmail.com>
Closes: https://lore.kernel.org/amd-gfx/0ffe2692-7bfa-4821-856e-dd0f18e2c32b@amd.com/T/#me6db8ddb192626360c462b7570ed7eba0c6c9733
Suggested-by: Jérôme Lécuyer <jerome.4a4c@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1a4dd33cc6e1baaa81efdbe68227a19f51c50f20)
Cc: stable@vger.kernel.org
3 days agoigc: don't fail igc_probe() on LED setup error
Kohei Enju [Wed, 10 Sep 2025 13:47:21 +0000 (22:47 +0900)]
igc: don't fail igc_probe() on LED setup error

When igc_led_setup() fails, igc_probe() fails and triggers kernel panic
in free_netdev() since unregister_netdev() is not called. [1]
This behavior can be tested using fault-injection framework, especially
the failslab feature. [2]

Since LED support is not mandatory, treat LED setup failures as
non-fatal and continue probe with a warning message, consequently
avoiding the kernel panic.

[1]
 kernel BUG at net/core/dev.c:12047!
 Oops: invalid opcode: 0000 [#1] SMP NOPTI
 CPU: 0 UID: 0 PID: 937 Comm: repro-igc-led-e Not tainted 6.17.0-rc4-enjuk-tnguy-00865-gc4940196ab02 #64 PREEMPT(voluntary)
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
 RIP: 0010:free_netdev+0x278/0x2b0
 [...]
 Call Trace:
  <TASK>
  igc_probe+0x370/0x910
  local_pci_probe+0x3a/0x80
  pci_device_probe+0xd1/0x200
 [...]

[2]
 #!/bin/bash -ex

 FAILSLAB_PATH=/sys/kernel/debug/failslab/
 DEVICE=0000:00:05.0
 START_ADDR=$(grep " igc_led_setup" /proc/kallsyms \
         | awk '{printf("0x%s", $1)}')
 END_ADDR=$(printf "0x%x" $((START_ADDR + 0x100)))

 echo $START_ADDR > $FAILSLAB_PATH/require-start
 echo $END_ADDR > $FAILSLAB_PATH/require-end
 echo 1 > $FAILSLAB_PATH/times
 echo 100 > $FAILSLAB_PATH/probability
 echo N > $FAILSLAB_PATH/ignore-gfp-wait

 echo $DEVICE > /sys/bus/pci/drivers/igc/bind

Fixes: ea578703b03d ("igc: Add support for LEDs on i225/i226")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 days agoixgbe: destroy aci.lock later within ixgbe_remove path
Jedrzej Jagielski [Mon, 8 Sep 2025 11:26:29 +0000 (13:26 +0200)]
ixgbe: destroy aci.lock later within ixgbe_remove path

There's another issue with aci.lock and previous patch uncovers it.
aci.lock is being destroyed during removing ixgbe while some of the
ixgbe closing routines are still ongoing. These routines use Admin
Command Interface which require taking aci.lock which has been already
destroyed what leads to call trace.

[  +0.000004] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[  +0.000007] WARNING: CPU: 12 PID: 10277 at kernel/locking/mutex.c:155 mutex_lock+0x5f/0x70
[  +0.000002] Call Trace:
[  +0.000003]  <TASK>
[  +0.000006]  ixgbe_aci_send_cmd+0xc8/0x220 [ixgbe]
[  +0.000049]  ? try_to_wake_up+0x29d/0x5d0
[  +0.000009]  ixgbe_disable_rx_e610+0xc4/0x110 [ixgbe]
[  +0.000032]  ixgbe_disable_rx+0x3d/0x200 [ixgbe]
[  +0.000027]  ixgbe_down+0x102/0x3b0 [ixgbe]
[  +0.000031]  ixgbe_close_suspend+0x28/0x90 [ixgbe]
[  +0.000028]  ixgbe_close+0xfb/0x100 [ixgbe]
[  +0.000025]  __dev_close_many+0xae/0x220
[  +0.000005]  dev_close_many+0xc2/0x1a0
[  +0.000004]  ? kernfs_should_drain_open_files+0x2a/0x40
[  +0.000005]  unregister_netdevice_many_notify+0x204/0xb00
[  +0.000006]  ? __kernfs_remove.part.0+0x109/0x210
[  +0.000006]  ? kobj_kset_leave+0x4b/0x70
[  +0.000008]  unregister_netdevice_queue+0xf6/0x130
[  +0.000006]  unregister_netdev+0x1c/0x40
[  +0.000005]  ixgbe_remove+0x216/0x290 [ixgbe]
[  +0.000021]  pci_device_remove+0x42/0xb0
[  +0.000007]  device_release_driver_internal+0x19c/0x200
[  +0.000008]  driver_detach+0x48/0x90
[  +0.000003]  bus_remove_driver+0x6d/0xf0
[  +0.000006]  pci_unregister_driver+0x2e/0xb0
[  +0.000005]  ixgbe_exit_module+0x1c/0xc80 [ixgbe]

Same as for the previous commit, the issue has been highlighted by the
commit 337369f8ce9e ("locking/mutex: Add MUTEX_WARN_ON() into fast path").

Move destroying aci.lock to the end of ixgbe_remove(), as this
simply fixes the issue.

Fixes: 4600cdf9f5ac ("ixgbe: Enable link management in E610 device")
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 days agoixgbe: initialize aci.lock before it's used
Jedrzej Jagielski [Mon, 8 Sep 2025 11:26:28 +0000 (13:26 +0200)]
ixgbe: initialize aci.lock before it's used

Currently aci.lock is initialized too late. A bunch of ACI callbacks
using the lock are called prior it's initialized.

Commit 337369f8ce9e ("locking/mutex: Add MUTEX_WARN_ON() into fast path")
highlights that issue what results in call trace.

[    4.092899] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[    4.092910] WARNING: CPU: 0 PID: 578 at kernel/locking/mutex.c:154 mutex_lock+0x6d/0x80
[    4.098757] Call Trace:
[    4.098847]  <TASK>
[    4.098922]  ixgbe_aci_send_cmd+0x8c/0x1e0 [ixgbe]
[    4.099108]  ? hrtimer_try_to_cancel+0x18/0x110
[    4.099277]  ixgbe_aci_get_fw_ver+0x52/0xa0 [ixgbe]
[    4.099460]  ixgbe_check_fw_error+0x1fc/0x2f0 [ixgbe]
[    4.099650]  ? usleep_range_state+0x69/0xd0
[    4.099811]  ? usleep_range_state+0x8c/0xd0
[    4.099964]  ixgbe_probe+0x3b0/0x12d0 [ixgbe]
[    4.100132]  local_pci_probe+0x43/0xa0
[    4.100267]  work_for_cpu_fn+0x13/0x20
[    4.101647]  </TASK>

Move aci.lock mutex initialization to ixgbe_sw_init() before any ACI
command is sent. Along with that move also related SWFW semaphore in
order to reduce size of ixgbe_probe() and that way all locks are
initialized in ixgbe_sw_init().

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Fixes: 4600cdf9f5ac ("ixgbe: Enable link management in E610 device")
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 days agoi40e: remove redundant memory barrier when cleaning Tx descs
Maciej Fijalkowski [Fri, 22 Aug 2025 15:16:17 +0000 (17:16 +0200)]
i40e: remove redundant memory barrier when cleaning Tx descs

i40e has a feature which writes to memory location last descriptor
successfully sent. Memory barrier in i40e_clean_tx_irq() was used to
avoid forward-reading descriptor fields in case DD bit was not set.
Having mentioned feature in place implies that such situation will not
happen as we know in advance how many descriptors HW has dealt with.

Besides, this barrier placement was wrong. Idea is to have this
protection *after* reading DD bit from HW descriptor, not before.
Digging through git history showed me that indeed barrier was before DD
bit check, anyways the commit introducing i40e_get_head() should have
wiped it out altogether.

Also, there was one commit doing s/read_barrier_depends/smp_rmb when get
head feature was already in place, but it was only theoretical based on
ixgbe experiences, which is different in these terms as that driver has
to read DD bit from HW descriptor.

Fixes: 1943d8ba9507 ("i40e/i40evf: enable hardware feature head write back")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 days agoice: fix Rx page leak on multi-buffer frames
Jacob Keller [Mon, 25 Aug 2025 23:00:14 +0000 (16:00 -0700)]
ice: fix Rx page leak on multi-buffer frames

The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for each
buffer in the current frame. This function was introduced as part of
handling multi-buffer XDP support in the ice driver.

It works by iterating over the buffers from first_desc up to 1 plus the
total number of fragments in the frame, cached from before the XDP program
was executed.

If the hardware posts a descriptor with a size of 0, the logic used in
ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get added
as fragments in ice_add_xdp_frag. Since the buffer isn't counted as a
fragment, we do not iterate over it in ice_put_rx_mbuf(), and thus we don't
call ice_put_rx_buf().

Because we don't call ice_put_rx_buf(), we don't attempt to re-use the
page or free it. This leaves a stale page in the ring, as we don't
increment next_to_alloc.

The ice_reuse_rx_page() assumes that the next_to_alloc has been incremented
properly, and that it always points to a buffer with a NULL page. Since
this function doesn't check, it will happily recycle a page over the top
of the next_to_alloc buffer, losing track of the old page.

Note that this leak only occurs for multi-buffer frames. The
ice_put_rx_mbuf() function always handles at least one buffer, so a
single-buffer frame will always get handled correctly. It is not clear
precisely why the hardware hands us descriptors with a size of 0 sometimes,
but it happens somewhat regularly with "jumbo frames" used by 9K MTU.

To fix ice_put_rx_mbuf(), we need to make sure to call ice_put_rx_buf() on
all buffers between first_desc and next_to_clean. Borrow the logic of a
similar function in i40e used for this same purpose. Use the same logic
also in ice_get_pgcnts().

Instead of iterating over just the number of fragments, use a loop which
iterates until the current index reaches to the next_to_clean element just
past the current frame. Unlike i40e, the ice_put_rx_mbuf() function does
call ice_put_rx_buf() on the last buffer of the frame indicating the end of
packet.

For non-linear (multi-buffer) frames, we need to take care when adjusting
the pagecnt_bias. An XDP program might release fragments from the tail of
the frame, in which case that fragment page is already released. Only
update the pagecnt_bias for the first descriptor and fragments still
remaining post-XDP program. Take care to only access the shared info for
fragmented buffers, as this avoids a significant cache miss.

The xdp_xmit value only needs to be updated if an XDP program is run, and
only once per packet. Drop the xdp_xmit pointer argument from
ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq() function
directly. This avoids needing to pass the argument and avoids an extra
bit-wise OR for each buffer in the frame.

Move the increment of the ntc local variable to ensure its updated *before*
all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the loop logic
requires the index of the element just after the current frame.

Now that we use an index pointer in the ring to identify the packet, we no
longer need to track or cache the number of fragments in the rx_ring.

Cc: Christoph Petrausch <christoph.petrausch@deepl.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/
Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
Tested-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Tested-by: Priya Singh <priyax.singh@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
3 days agoRevert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"
Andrea Righi [Fri, 12 Sep 2025 16:14:38 +0000 (18:14 +0200)]
Revert "sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()"

scx_bpf_reenqueue_local() can be called from ops.cpu_release() when a
CPU is taken by a higher scheduling class to give tasks queued to the
CPU's local DSQ a chance to be migrated somewhere else, instead of
waiting indefinitely for that CPU to become available again.

In doing so, we decided to skip migration-disabled tasks, under the
assumption that they cannot be migrated anyway.

However, when a higher scheduling class preempts a CPU, the running task
is always inserted at the head of the local DSQ as a migration-disabled
task. This means it is always skipped by scx_bpf_reenqueue_local(), and
ends up being confined to the same CPU even if that CPU is heavily
contended by other higher scheduling class tasks.

As an example, let's consider the following scenario:

 $ schedtool -a 0,1, -e yes > /dev/null
 $ sudo schedtool -F -p 99 -a 0, -e \
   stress-ng -c 1 --cpu-load 99 --cpu-load-slice 1000

The first task (SCHED_EXT) can run on CPU0 or CPU1. The second task
(SCHED_FIFO) is pinned to CPU0 and consumes ~99% of it. If the SCHED_EXT
task initially runs on CPU0, it will remain there because it always sees
CPU0 as "idle" in the short gaps left by the RT task, resulting in ~1%
utilization while CPU1 stays idle:

    0[||||||||||||||||||||||100.0%]   8[                        0.0%]
    1[                        0.0%]   9[                        0.0%]
    2[                        0.0%]  10[                        0.0%]
    3[                        0.0%]  11[                        0.0%]
    4[                        0.0%]  12[                        0.0%]
    5[                        0.0%]  13[                        0.0%]
    6[                        0.0%]  14[                        0.0%]
    7[                        0.0%]  15[                        0.0%]
  PID USER       PRI  NI  S CPU  CPU%▽MEM%   TIME+  Command
 1067 root        RT   0  R   0  99.0  0.2  0:31.16 stress-ng-cpu [run]
  975 arighi      20   0  R   0   1.0  0.0  0:26.32 yes

By allowing scx_bpf_reenqueue_local() to re-enqueue migration-disabled
tasks, the scheduler can choose to migrate them to other CPUs (CPU1 in
this case) via ops.enqueue(), leading to better CPU utilization:

    0[||||||||||||||||||||||100.0%]   8[                        0.0%]
    1[||||||||||||||||||||||100.0%]   9[                        0.0%]
    2[                        0.0%]  10[                        0.0%]
    3[                        0.0%]  11[                        0.0%]
    4[                        0.0%]  12[                        0.0%]
    5[                        0.0%]  13[                        0.0%]
    6[                        0.0%]  14[                        0.0%]
    7[                        0.0%]  15[                        0.0%]
  PID USER       PRI  NI  S CPU  CPU%▽MEM%   TIME+  Command
  577 root        RT   0  R   0 100.0  0.2  0:23.17 stress-ng-cpu [run]
  555 arighi      20   0  R   1 100.0  0.0  0:28.67 yes

It's debatable whether per-CPU tasks should be re-enqueued as well, but
doing so is probably safer: the scheduler can recognize re-enqueued
tasks through the %SCX_ENQ_REENQ flag, reassess their placement, and
either put them back at the head of the local DSQ or let another task
attempt to take the CPU.

This also prevents giving per-CPU tasks an implicit priority boost,
which would otherwise make them more likely to reclaim CPUs preempted by
higher scheduling classes.

Fixes: 97e13ecb02668 ("sched_ext: Skip per-CPU tasks in scx_bpf_reenqueue_local()")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Acked-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
3 days agodrm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue()
Dan Carpenter [Thu, 7 Aug 2025 15:53:41 +0000 (18:53 +0300)]
drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue()

The xe_preempt_fence_create() function returns error pointers.  It
never returns NULL.  Update the error checking to match.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/aJTMBdX97cof_009@stanley.mountain
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 75cc23ffe5b422bc3cbd5cf0956b8b86e4b0e162)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 days agodrm: bridge: cdns-mhdp8546: Fix missing mutex unlock on error path
Qi Xi [Thu, 4 Sep 2025 03:44:47 +0000 (11:44 +0800)]
drm: bridge: cdns-mhdp8546: Fix missing mutex unlock on error path

Add missing mutex unlock before returning from the error path in
cdns_mhdp_atomic_enable().

Fixes: 935a92a1c400 ("drm: bridge: cdns-mhdp8546: Fix possible null pointer dereference")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qi Xi <xiqi2@huawei.com>
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20250904034447.665427-1-xiqi2@huawei.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
3 days agoplatform/x86: asus-wmi: Re-add extra keys to ignore_key_wlan quirk
Antheas Kapenekakis [Tue, 16 Sep 2025 07:28:18 +0000 (09:28 +0200)]
platform/x86: asus-wmi: Re-add extra keys to ignore_key_wlan quirk

It turns out that the dual screen models use 0x5E for attaching and
detaching the keyboard instead of 0x5F. So, re-add the codes by
reverting commit cf3940ac737d ("platform/x86: asus-wmi: Remove extra
keys from ignore_key_wlan quirk"). For our future reference, add a
comment next to 0x5E indicating that it is used for that purpose.

Fixes: cf3940ac737d ("platform/x86: asus-wmi: Remove extra keys from ignore_key_wlan quirk")
Reported-by: Rahul Chandra <rahul@chandra.net>
Closes: https://lore.kernel.org/all/10020-68c90c80-d-4ac6c580@106290038/
Cc: stable@kernel.org
Signed-off-by: Antheas Kapenekakis <lkml@antheas.dev>
Link: https://patch.msgid.link/20250916072818.196462-1-lkml@antheas.dev
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 days agoplatform/x86/amd/pmf: Support new ACPI ID AMDI0108
Shyam Sundar S K [Mon, 15 Sep 2025 09:05:46 +0000 (14:35 +0530)]
platform/x86/amd/pmf: Support new ACPI ID AMDI0108

Include the ACPI ID AMDI0108, which is used on upcoming AMD platforms, in
the PMF driver's list of supported devices.

Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/20250915090546.2759130-1-Shyam-sundar.S-k@amd.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
3 days agodrm/i915/backlight: Honor VESA eDP backlight luminance control capability
Aaron Ma [Sat, 23 Aug 2025 12:16:47 +0000 (20:16 +0800)]
drm/i915/backlight: Honor VESA eDP backlight luminance control capability

The VESA AUX backlight fails to be enable luminance based backlight
mainpulation becaused luminance_set is false by default.
Fix it by using luminance support control capabitliy.

Fixes: e13af5166a359 ("drm/i915/backlight: Use drm helper to initialize edp backlight")
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com>
Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Link: https://lore.kernel.org/r/20250823121647.275834-1-aaron.ma@canonical.com
(cherry picked from commit 72136efb875d8438c20b9c8ab72945d474933471)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
3 days agozram: fix slot write race condition
Sergey Senozhatsky [Tue, 9 Sep 2025 04:48:35 +0000 (13:48 +0900)]
zram: fix slot write race condition

Parallel concurrent writes to the same zram index result in leaked
zsmalloc handles.  Schematically we can have something like this:

CPU0                              CPU1
zram_slot_lock()
zs_free(handle)
zram_slot_lock()
zram_slot_lock()
zs_free(handle)
zram_slot_lock()

compress compress
handle = zs_malloc() handle = zs_malloc()
zram_slot_lock
zram_set_handle(handle)
zram_slot_lock
zram_slot_lock
zram_set_handle(handle)
zram_slot_lock

Either CPU0 or CPU1 zsmalloc handle will leak because zs_free() is done
too early.  In fact, we need to reset zram entry right before we set its
new handle, all under the same slot lock scope.

Link: https://lkml.kernel.org/r/20250909045150.635345-1-senozhatsky@chromium.org
Fixes: 71268035f5d7 ("zram: free slot memory early during write")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reported-by: Changhui Zhong <czhong@redhat.com>
Closes: https://lore.kernel.org/all/CAGVVp+UtpGoW5WEdEU7uVTtsSCjPN=ksN6EcvyypAtFDOUf30A@mail.gmail.com/
Tested-by: Changhui Zhong <czhong@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 days agonet: natsemi: fix `rx_dropped` double accounting on `netif_rx()` failure
Yeounsu Moon [Sat, 13 Sep 2025 06:01:36 +0000 (15:01 +0900)]
net: natsemi: fix `rx_dropped` double accounting on `netif_rx()` failure

`netif_rx()` already increments `rx_dropped` core stat when it fails.
The driver was also updating `ndev->stats.rx_dropped` in the same path.
Since both are reported together via `ip -s -s` command, this resulted
in drops being counted twice in user-visible stats.

Keep the driver update on `if (unlikely(!skb))`, but skip it after
`netif_rx()` errors.

Fixes: caf586e5f23c ("net: add a core netdev->rx_dropped counter")
Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250913060135.35282-3-yyyynoom@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agoMerge branch 'mptcp-pm-nl-announce-deny-join-id0-flag'
Jakub Kicinski [Tue, 16 Sep 2025 01:12:08 +0000 (18:12 -0700)]
Merge branch 'mptcp-pm-nl-announce-deny-join-id0-flag'

Matthieu Baerts says:

====================
mptcp: pm: nl: announce deny-join-id0 flag

During the connection establishment, a peer can tell the other one that
it cannot establish new subflows to the initial IP address and port by
setting the 'C' flag [1]. Doing so makes sense when the sender is behind
a strict NAT, operating behind a legacy Layer 4 load balancer, or using
anycast IP address for example.

When this 'C' flag is set, the path-managers must then not try to
establish new subflows to the other peer's initial IP address and port.
The in-kernel PM has access to this info, but the userspace PM didn't,
not letting the userspace daemon able to respect the RFC8684.

Here are a few fixes related to this 'C' flag (aka 'deny-join-id0'):

- Patch 1: add remote_deny_join_id0 info on passive connections. A fix
  for v5.14.

- Patch 2: let the userspace PM daemon know about the deny_join_id0
  attribute, so when set, it can avoid creating new subflows to the
  initial IP address and port. A fix for v5.19.

- Patch 3: a validation for the previous commit.

- Patch 4: record the deny_join_id0 info when TFO is used. A fix for
  v6.2.

- Patch 5: not related to deny-join-id0, but it fixes errors messages in
  the sockopt selftests, not to create confusions. A fix for v6.5.
====================

Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-0-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agoselftests: mptcp: sockopt: fix error messages
Geliang Tang [Fri, 12 Sep 2025 12:52:24 +0000 (14:52 +0200)]
selftests: mptcp: sockopt: fix error messages

This patch fixes several issues in the error reporting of the MPTCP sockopt
selftest:

1. Fix diff not printed: The error messages for counter mismatches had
   the actual difference ('diff') as argument, but it was missing in the
   format string. Displaying it makes the debugging easier.

2. Fix variable usage: The error check for 'mptcpi_bytes_acked' incorrectly
   used 'ret2' (sent bytes) for both the expected value and the difference
   calculation. It now correctly uses 'ret' (received bytes), which is the
   expected value for bytes_acked.

3. Fix off-by-one in diff: The calculation for the 'mptcpi_rcv_delta' diff
   was 's.mptcpi_rcv_delta - ret', which is off-by-one. It has been
   corrected to 's.mptcpi_rcv_delta - (ret + 1)' to match the expected
   value in the condition above it.

Fixes: 5dcff89e1455 ("selftests: mptcp: explicitly tests aggregate counters")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-5-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agomptcp: tfo: record 'deny join id0' info
Matthieu Baerts (NGI0) [Fri, 12 Sep 2025 12:52:23 +0000 (14:52 +0200)]
mptcp: tfo: record 'deny join id0' info

When TFO is used, the check to see if the 'C' flag (deny join id0) was
set was bypassed.

This flag can be set when TFO is used, so the check should also be done
when TFO is used.

Note that the set_fully_established label is also used when a 4th ACK is
received. In this case, deny_join_id0 will not be set.

Fixes: dfc8d0603033 ("mptcp: implement delayed seq generation for passive fastopen")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-4-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agoselftests: mptcp: userspace pm: validate deny-join-id0 flag
Matthieu Baerts (NGI0) [Fri, 12 Sep 2025 12:52:22 +0000 (14:52 +0200)]
selftests: mptcp: userspace pm: validate deny-join-id0 flag

The previous commit adds the MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 flag. Make
sure it is correctly announced by the other peer when it has been
received.

pm_nl_ctl will now display 'deny_join_id0:1' when monitoring the events,
and when this flag was set by the other peer.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-3-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agomptcp: pm: nl: announce deny-join-id0 flag
Matthieu Baerts (NGI0) [Fri, 12 Sep 2025 12:52:21 +0000 (14:52 +0200)]
mptcp: pm: nl: announce deny-join-id0 flag

During the connection establishment, a peer can tell the other one that
it cannot establish new subflows to the initial IP address and port by
setting the 'C' flag [1]. Doing so makes sense when the sender is behind
a strict NAT, operating behind a legacy Layer 4 load balancer, or using
anycast IP address for example.

When this 'C' flag is set, the path-managers must then not try to
establish new subflows to the other peer's initial IP address and port.
The in-kernel PM has access to this info, but the userspace PM didn't.

The RFC8684 [1] is strict about that:

  (...) therefore the receiver MUST NOT try to open any additional
  subflows toward this address and port.

So it is important to tell the userspace about that as it is responsible
for the respect of this flag.

When a new connection is created and established, the Netlink events
now contain the existing but not currently used 'flags' attribute. When
MPTCP_PM_EV_FLAG_DENY_JOIN_ID0 is set, it means no other subflows
to the initial IP address and port -- info that are also part of the
event -- can be established.

Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.1-20.6
Fixes: 702c2f646d42 ("mptcp: netlink: allow userspace-driven subflow establishment")
Reported-by: Marek Majkowski <marek@cloudflare.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/532
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-2-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agomptcp: set remote_deny_join_id0 on SYN recv
Matthieu Baerts (NGI0) [Fri, 12 Sep 2025 12:52:20 +0000 (14:52 +0200)]
mptcp: set remote_deny_join_id0 on SYN recv

When a SYN containing the 'C' flag (deny join id0) was received, this
piece of information was not propagated to the path-manager.

Even if this flag is mainly set on the server side, a client can also
tell the server it cannot try to establish new subflows to the client's
initial IP address and port. The server's PM should then record such
info when received, and before sending events about the new connection.

Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-pm-uspace-deny_join_id0-v1-1-40171884ade8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agoMerge branch 'selftests-mptcp-avoid-spurious-errors-on-tcp-disconnect'
Jakub Kicinski [Tue, 16 Sep 2025 01:10:40 +0000 (18:10 -0700)]
Merge branch 'selftests-mptcp-avoid-spurious-errors-on-tcp-disconnect'

Matthieu Baerts says:

====================
selftests: mptcp: avoid spurious errors on TCP disconnect

This series should fix the recent instabilities seen by MPTCP and NIPA
CIs where the 'mptcp_connect.sh' tests fail regularly when running the
'disconnect' subtests with "plain" TCP sockets, e.g.

  # INFO: disconnect
  # 63 ns1 MPTCP -> ns1 (10.0.1.1:20001      ) MPTCP     (duration   996ms) [ OK ]
  # 64 ns1 MPTCP -> ns1 (10.0.1.1:20002      ) TCP       (duration   851ms) [ OK ]
  # 65 ns1 TCP   -> ns1 (10.0.1.1:20003      ) MPTCP     Unexpected revents: POLLERR/POLLNVAL(19)
  # (duration   896ms) [FAIL] file received by server does not match (in, out):
  # -rw-r--r-- 1 root root 11112852 Aug 19 09:16 /tmp/tmp.hlJe5DoMoq.disconnect
  # Trailing bytes are:
  # /{ga 6@=#.8:-rw------- 1 root root 10085368 Aug 19 09:16 /tmp/tmp.blClunilxx
  # Trailing bytes are:
  # /{ga 6@=#.8:66 ns1 MPTCP -> ns1 (dead:beef:1::1:20004) MPTCP     (duration   987ms) [ OK ]
  # 67 ns1 MPTCP -> ns1 (dead:beef:1::1:20005) TCP       (duration   911ms) [ OK ]
  # 68 ns1 TCP   -> ns1 (dead:beef:1::1:20006) MPTCP     (duration   980ms) [ OK ]
  # [FAIL] Tests of the full disconnection have failed

These issues started to be visible after some behavioural changes in
TCP, where too quick re-connections after a shutdown() can now be more
easily rejected. Patch 3 modifies the selftests to wait, but this
resolution revealed an issue in MPTCP which is fixed by patch 1 (a fix
for v5.9 kernel).

Patches 2 and 4 improve some errors reported by the selftests, and patch
5 helps with the debugging of such issues.
====================

Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-0-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agoselftests: mptcp: connect: print pcap prefix
Matthieu Baerts (NGI0) [Fri, 12 Sep 2025 12:25:54 +0000 (14:25 +0200)]
selftests: mptcp: connect: print pcap prefix

To be able to find which capture files have been produced after several
runs.

This prefix was not printed anywhere before.

While at it, always use the same prefix by taking info from ns1, instead
of "$connector_ns", which is sometimes ns1, sometimes ns2 in the
subtests.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-5-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agoselftests: mptcp: print trailing bytes with od
Matthieu Baerts (NGI0) [Fri, 12 Sep 2025 12:25:53 +0000 (14:25 +0200)]
selftests: mptcp: print trailing bytes with od

This is better than printing random bytes in the terminal.

Note that Jakub suggested 'hexdump', but Mat found out this tool is not
often installed by default. 'od' can do a similar job, and it is in the
POSIX specs and available in coreutils, so it should be on more systems.

While at it, display a few more bytes, just to fill in the two lines.
And no need to display the 3rd only line showing the next number of
bytes: 0000040.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-4-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agoselftests: mptcp: avoid spurious errors on TCP disconnect
Matthieu Baerts (NGI0) [Fri, 12 Sep 2025 12:25:52 +0000 (14:25 +0200)]
selftests: mptcp: avoid spurious errors on TCP disconnect

The disconnect test-case, with 'plain' TCP sockets generates spurious
errors, e.g.

  07 ns1 TCP   -> ns1 (dead:beef:1::1:10006) MPTCP
  read: Connection reset by peer
  read: Connection reset by peer
  (duration   155ms) [FAIL] client exit code 3, server 3

  netns ns1-FloSdv (listener) socket stat for 10006:
  TcpActiveOpens                  2                  0.0
  TcpPassiveOpens                 2                  0.0
  TcpEstabResets                  2                  0.0
  TcpInSegs                       274                0.0
  TcpOutSegs                      276                0.0
  TcpOutRsts                      3                  0.0
  TcpExtPruneCalled               2                  0.0
  TcpExtRcvPruned                 1                  0.0
  TcpExtTCPPureAcks               104                0.0
  TcpExtTCPRcvCollapsed           2                  0.0
  TcpExtTCPBacklogCoalesce        42                 0.0
  TcpExtTCPRcvCoalesce            43                 0.0
  TcpExtTCPChallengeACK           1                  0.0
  TcpExtTCPFromZeroWindowAdv      42                 0.0
  TcpExtTCPToZeroWindowAdv        41                 0.0
  TcpExtTCPWantZeroWindowAdv      13                 0.0
  TcpExtTCPOrigDataSent           164                0.0
  TcpExtTCPDelivered              165                0.0
  TcpExtTCPRcvQDrop               1                  0.0

In the failing scenarios (TCP -> MPTCP), the involved sockets are
actually plain TCP ones, as fallbacks for passive sockets at 2WHS time
cause the MPTCP listeners to actually create 'plain' TCP sockets.

Similar to commit 218cc166321f ("selftests: mptcp: avoid spurious errors
on disconnect"), the root cause is in the user-space bits: the test
program tries to disconnect as soon as all the pending data has been
spooled, generating an RST. If such option reaches the peer before the
connection has reached the closed status, the TCP socket will report an
error to the user-space, as per protocol specification, causing the
above failure. Note that it looks like this issue got more visible since
the "tcp: receiver changes" series from commit 06baf9bfa6ca ("Merge
branch 'tcp-receiver-changes'").

Address the issue by explicitly waiting for the TCP sockets (-t) to
reach a closed status before performing the disconnect. More precisely,
the test program now waits for plain TCP sockets or TCP subflows in
addition to the MPTCP sockets that were already monitored.

While at it, use 'ss' with '-n' to avoid resolving service names, which
is not needed here.

Fixes: 218cc166321f ("selftests: mptcp: avoid spurious errors on disconnect")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-3-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agoselftests: mptcp: connect: catch IO errors on listen side
Matthieu Baerts (NGI0) [Fri, 12 Sep 2025 12:25:51 +0000 (14:25 +0200)]
selftests: mptcp: connect: catch IO errors on listen side

IO errors were correctly printed to stderr, and propagated up to the
main loop for the server side, but the returned value was ignored. As a
consequence, the program for the listener side was no longer exiting
with an error code in case of IO issues.

Because of that, some issues might not have been seen. But very likely,
most issues either had an effect on the client side, or the file
transfer was not the expected one, e.g. the connection got reset before
the end. Still, it is better to fix this.

The main consequence of this issue is the error that was reported by the
selftests: the received and sent files were different, and the MIB
counters were not printed. Also, when such errors happened during the
'disconnect' tests, the program tried to continue until the timeout.

Now when an IO error is detected, the program exits directly with an
error.

Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-2-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agomptcp: propagate shutdown to subflows when possible
Matthieu Baerts (NGI0) [Fri, 12 Sep 2025 12:25:50 +0000 (14:25 +0200)]
mptcp: propagate shutdown to subflows when possible

When the MPTCP DATA FIN have been ACKed, there is no more MPTCP related
metadata to exchange, and all subflows can be safely shutdown.

Before this patch, the subflows were actually terminated at 'close()'
time. That's certainly fine most of the time, but not when the userspace
'shutdown()' a connection, without close()ing it. When doing so, the
subflows were staying in LAST_ACK state on one side -- and consequently
in FIN_WAIT2 on the other side -- until the 'close()' of the MPTCP
socket.

Now, when the DATA FIN have been ACKed, all subflows are shutdown. A
consequence of this is that the TCP 'FIN' flag can be set earlier now,
but the end result is the same. This affects the packetdrill tests
looking at the end of the MPTCP connections, but for a good reason.

Note that tcp_shutdown() will check the subflow state, so no need to do
that again before calling it.

Fixes: 3721b9b64676 ("mptcp: Track received DATA_FIN sequence number and add related helpers")
Cc: stable@vger.kernel.org
Fixes: 16a9a9da1723 ("mptcp: Add helper to process acks of DATA_FIN")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250912-net-mptcp-fix-sft-connect-v1-1-d40e77cbbf02@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agoselftests: bonding: add fail_over_mac testing
Hangbin Liu [Wed, 10 Sep 2025 02:43:35 +0000 (02:43 +0000)]
selftests: bonding: add fail_over_mac testing

Add a test to check each value of bond fail_over_mac option.

Also fix a minor garp_test print issue.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250910024336.400253-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agobonding: set random address only when slaves already exist
Hangbin Liu [Wed, 10 Sep 2025 02:43:34 +0000 (02:43 +0000)]
bonding: set random address only when slaves already exist

After commit 5c3bf6cba791 ("bonding: assign random address if device
address is same as bond"), bonding will erroneously randomize the MAC
address of the first interface added to the bond if fail_over_mac =
follow.

Correct this by additionally testing for the bond being empty before
randomizing the MAC.

Fixes: 5c3bf6cba791 ("bonding: assign random address if device address is same as bond")
Reported-by: Qiuling Ren <qren@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250910024336.400253-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agords: ib: Increment i_fastreg_wrs before bailing out
Håkon Bugge [Thu, 11 Sep 2025 13:33:34 +0000 (15:33 +0200)]
rds: ib: Increment i_fastreg_wrs before bailing out

We need to increment i_fastreg_wrs before we bail out from
rds_ib_post_reg_frmr().

We have a fixed budget of how many FRWR operations that can be
outstanding using the dedicated QP used for memory registrations and
de-registrations. This budget is enforced by the atomic_t
i_fastreg_wrs. If we bail out early in rds_ib_post_reg_frmr(), we will
"leak" the possibility of posting an FRWR operation, and if that
accumulates, no FRWR operation can be carried out.

Fixes: 1659185fb4d0 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode")
Fixes: 3a2886cca703 ("net/rds: Keep track of and wait for FRWR segments in use upon shutdown")
Cc: stable@vger.kernel.org
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20250911133336.451212-1-haakon.bugge@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 days agodrm/amd/display: Allow RX6xxx & RX7700 to invoke amdgpu_irq_get/put
Ivan Lipski [Tue, 2 Sep 2025 20:20:09 +0000 (16:20 -0400)]
drm/amd/display: Allow RX6xxx & RX7700 to invoke amdgpu_irq_get/put

[Why&How]
As reported on https://gitlab.freedesktop.org/drm/amd/-/issues/3936,
SMU hang can occur if the interrupts are not enabled appropriately,
causing a vblank timeout.

This patch reverts commit 5009628d8509 ("drm/amd/display: Remove unnecessary
amdgpu_irq_get/put"), but only for RX6xxx & RX7700 GPUs, on which the
issue was observed.

This will re-enable interrupts regardless of whether the user space needed
it or not.

Fixes: 5009628d8509 ("drm/amd/display: Remove unnecessary amdgpu_irq_get/put")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3936
Suggested-by: Sun peng Li <sunpeng.li@amd.com>
Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 95d168b367aa28a59f94fc690ff76ebf69312c6d)
Cc: stable@vger.kernel.org
3 days agodrm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.0.1/11.0.4 GPUs
Srinivasan Shanmugam [Wed, 10 Sep 2025 06:57:05 +0000 (12:27 +0530)]
drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.0.1/11.0.4 GPUs

Enable the cleaner shader for additional GFX11.0.1/11.0.4 series GPUs to
ensure data isolation among GPU tasks. The cleaner shader is tasked with
clearing the Local Data Store (LDS), Vector General Purpose Registers
(VGPRs), and Scalar General Purpose Registers (SGPRs), which helps avoid
data leakage and guarantees the accuracy of computational results.

This update extends cleaner shader support to GFX11.0.1/11.0.4 GPUs,
previously available for GFX11.0.3. It enhances security by clearing GPU
memory between processes and maintains a consistent GPU state across KGD
and KFD workloads.

Cc: Wasee Alam <wasee.alam@amd.com>
Cc: Mario Sopena-Novales <mario.novales@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0a71ceb27f88a944c2de2808b67b2f46ac75076b)

4 days agodrm: bridge: anx7625: Fix NULL pointer dereference with early IRQ
Loic Poulain [Wed, 9 Jul 2025 08:54:38 +0000 (10:54 +0200)]
drm: bridge: anx7625: Fix NULL pointer dereference with early IRQ

If the interrupt occurs before resource initialization is complete, the
interrupt handler/worker may access uninitialized data such as the I2C
tcpc_client device, potentially leading to NULL pointer dereference.

Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Fixes: 8bdfc5dae4e3 ("drm/bridge: anx7625: Add anx7625 MIPI DSI/DPI to DP")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250709085438.56188-1-loic.poulain@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
4 days agodrm/xe: defer free of NVM auxiliary container to device release callback
Nitin Gote [Thu, 11 Sep 2025 05:28:23 +0000 (10:58 +0530)]
drm/xe: defer free of NVM auxiliary container to device release callback

Do not kfree the intel_dg_nvm_dev in xe_nvm_fini() right after
auxiliary_device_delete/uninit. The auxiliary_device embeds the
device/kobject (and its name); freeing it too early can race
with asynchronous device_del/udev processing and cause a use-after-free.

Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Fixes: c28bfb107dac ("drm/xe/nvm: add on-die non-volatile memory device")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250911052823.226696-1-nitin.r.gote@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit d4c3ed963e41d488695cf91068eabb8eb9f538ec)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 days agodrm/xe/hwmon: Remove type casting
Mallesh Koujalagi [Fri, 12 Sep 2025 11:34:58 +0000 (17:04 +0530)]
drm/xe/hwmon: Remove type casting

Refactor: eliminate type casts by using proper u32
declarations.

v2:
- Address review comments. (Karthik)

v3:
- Use the proper u32 type and drop cast. (Lucas De Marchi)
- Modify variable when actually using u64 value.
- Change r value to reg_value with u32 type.

v4:
- Remove newline between trailer and Signed-off-by. (Lucas De Marchi)
- Change reg_val to val for more user-friendly logging.
- Use mul_u32_u32 function since both values are u32.

v5:
- mul_u32_u32 function with shift. (Lucas De Marchi)

Fixes: 7596d839f6228 ("drm/xe/hwmon: Add support to manage power limits though mailbox")
Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250912113458.2815172-1-mallesh.koujalagi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 4e1d3b5e6423dc841acd8691d75626b3d3b2b6a8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 days agoperf maps: Ensure kmap is set up for all inserts
Ian Rogers [Sun, 14 Sep 2025 18:18:08 +0000 (11:18 -0700)]
perf maps: Ensure kmap is set up for all inserts

__maps__fixup_overlap_and_insert may split or directly insert a map,
when doing this the map may need to have a kmap set up for the sake of
the kmaps. The missing kmap set up fails the check_invariants test in
maps, later "Internal error" reports from map__kmap and ultimately
causes segfaults.

Similar fixes were added in commit e0e4e0b8b7fa ("perf maps: Add
missing map__set_kmap_maps() when replacing a kernel map") and commit
25d9c0301d36 ("perf maps: Set the kmaps for newly created/added kernel
maps") but they missed cases. To try to reduce the risk of this,
update the kmap directly following any manual insert. This identified
another problem in maps__copy_from.

Fixes: e0e4e0b8b7fa ("perf maps: Add missing map__set_kmap_maps() when replacing a kernel map")
Fixes: 25d9c0301d36 ("perf maps: Set the kmaps for newly created/added kernel maps")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
4 days agoMerge tag 'for-v6.17-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux...
Linus Torvalds [Mon, 15 Sep 2025 15:15:11 +0000 (08:15 -0700)]
Merge tag 'for-v6.17-rc' of git://git./linux/kernel/git/sre/linux-power-supply

Pull power supply fixes from Sebastian Reichel:

 - bq27xxx: avoid spamming the log for missing bq27000 battery

* tag 'for-v6.17-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
  power: supply: bq27xxx: restrict no-battery detection to bq27000
  power: supply: bq27xxx: fix error return in case of no bq27000 hdq battery

4 days agodrm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation
Michal Wajdeczko [Wed, 10 Sep 2025 22:24:39 +0000 (00:24 +0200)]
drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation

This effectively reverts commit 4c3fe5eae46b ("drm/xe/pf: Limit
fair VF LMEM provisioning") since we don't need it any more after
non-contig VRAM allocations were fixed. This allows larger LMEM
auto-provisioning for VFs, so instead:

 [ ] GT0: PF: LMEM available(14096M) fair(1 x 8192M)
 [ ] GT0: PF: VF1 provisioned with 8589934592 (8.00 GiB) LMEM
or
 [ ] GT0: PF: LMEM available(14096M) fair(2 x 4096M)
 [ ] GT0: PF: VF1..VF2 provisioned with 4294967296 (4.00 GiB) LMEM

we may get:

 [ ] GT0: PF: LMEM available(14096M) fair(1 x 14096M)
 [ ] GT0: PF: VF1 provisioned with 14780727296 (13.8 GiB) LMEM
and
 [ ] GT0: PF: LMEM available(14096M) fair(2 x 7048M)
 [ ] GT0: PF: VF1..VF2 provisioned with 7390363648 (6.88 GiB) LMEM

Fixes: 1e32ffbc9dc8 ("drm/xe/sriov: support non-contig VRAM provisioning")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250910222439.32869-1-michal.wajdeczko@intel.com
(cherry picked from commit 95c1cfa306087142989bff34ea0e05dcd95ddc58)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 days agodrm/xe/tile: Release kobject for the failure path
Shuicheng Lin [Tue, 19 Aug 2025 15:39:51 +0000 (15:39 +0000)]
drm/xe/tile: Release kobject for the failure path

Call kobject_put() for the failure path to release the kobject

v2: remove extra newline. (Matt)

Fixes: e3d0839aa501 ("drm/xe/tile: Abort driver load for sysfs creation failure")
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250819153950.2973344-2-shuicheng.lin@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit b98775bca99511cc22ab459a2de646cd2fa7241f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
4 days agowifi: iwlwifi: pcie: fix byte count table for some devices
Johannes Berg [Mon, 15 Sep 2025 07:30:52 +0000 (10:30 +0300)]
wifi: iwlwifi: pcie: fix byte count table for some devices

In my previous fix for this condition, I erroneously listed 9000
instead of 7000 family, when 7000/8000 were already using iwlmvm.
Thus the condition ended up wrong, causing the issue I had fixed
for older devices to suddenly appear on 7000/8000 family devices.
Correct the condition accordingly.

Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/r/20250909165811.10729-1-00107082@163.com/
Fixes: 586e3cb33ba6 ("wifi: iwlwifi: fix byte count table for old devices")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250915102743.777aaafbcc6c.I84404edfdfbf400501f6fb06def5b86c501da198@changeid
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
4 days agorv: Fix missing mutex unlock in rv_register_monitor()
Zhen Ni [Wed, 3 Sep 2025 06:51:12 +0000 (14:51 +0800)]
rv: Fix missing mutex unlock in rv_register_monitor()

If create_monitor_dir() fails, the function returns directly without
releasing rv_interface_lock. This leaves the mutex locked and causes
subsequent monitor registration attempts to deadlock.

Fix it by making the error path jump to out_unlock, ensuring that the
mutex is always released before returning.

Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor")
Signed-off-by: Zhen Ni <zhen.ni@easystack.cn>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20250903065112.1878330-1-zhen.ni@easystack.cn
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
4 days agoinclude/linux/rv.h: remove redundant include file
Akhilesh Patil [Mon, 11 Aug 2025 12:12:53 +0000 (17:42 +0530)]
include/linux/rv.h: remove redundant include file

Remove redundant include <linux/types.h> to clean up the code.
Move all unique include files inside CONFIG_RV as they are only needed
when CONFIG_RV is enabled. Arrange include files alphabetically.

Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor") [1]
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202507312017.oyD08TL5-lkp@intel.com/
Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/aJneRbHGlNFg7lr9@bhairav-test.ee.iitb.ac.in
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
4 days agorv: Fix wrong type cast in enabled_monitors_next()
Nam Cao [Wed, 6 Aug 2025 12:09:11 +0000 (14:09 +0200)]
rv: Fix wrong type cast in enabled_monitors_next()

Argument 'p' of enabled_monitors_next() is not a pointer to struct
rv_monitor, it is actually a pointer to the list_head inside struct
rv_monitor. Therefore it is wrong to cast 'p' to struct rv_monitor *.

This wrong type cast has been there since the beginning. But it still
worked because the list_head was the first field in struct rv_monitor_def.
This is no longer true since commit 24cbfe18d55a ("rv: Merge struct
rv_monitor_def into struct rv_monitor") moved the list_head, and this wrong
type cast became a functional problem.

Properly use container_of() instead.

Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20250806120911.989365-1-namcao@linutronix.de
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
4 days agorv: Support systems with time64-only syscalls
Palmer Dabbelt [Mon, 4 Aug 2025 19:45:19 +0000 (12:45 -0700)]
rv: Support systems with time64-only syscalls

Some systems (like 32-bit RISC-V) only have the 64-bit time_t versions
of syscalls.  So handle the 32-bit time_t version of those being
undefined.

Fixes: f74f8bb246cf ("rv: Add rtapp_sleep monitor")
Closes: https://lore.kernel.org/oe-kbuild-all/202508160204.SsFyNfo6-lkp@intel.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Acked-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20250804194518.97620-2-palmer@dabbelt.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
4 days agobtrfs: annotate block group access with data_race() when sorting for reclaim
Filipe Manana [Mon, 8 Sep 2025 11:51:11 +0000 (12:51 +0100)]
btrfs: annotate block group access with data_race() when sorting for reclaim

When sorting the block group list for reclaim we are using a block group's
used bytes counter without taking the block group's spinlock, so we can
race with a concurrent task updating it (at btrfs_update_block_group()),
which makes tools like KCSAN unhappy and report a race.

Since the sorting is not strictly needed from a functional perspective
and such races should rarely cause any ordering changes (only load/store
tearing could cause them), not to mention that after the sorting the
ordering may no longer be accurate due to concurrent allocations and
deallocations of extents in a block group, annotate the accesses to the
used counter with data_race() to silence KCSAN and similar tools.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 days agobtrfs: initialize inode::file_extent_tree after i_mode has been set
austinchang [Thu, 11 Sep 2025 06:06:29 +0000 (06:06 +0000)]
btrfs: initialize inode::file_extent_tree after i_mode has been set

btrfs_init_file_extent_tree() uses S_ISREG() to determine if the file is
a regular file. In the beginning of btrfs_read_locked_inode(), the i_mode
hasn't been read from inode item, then file_extent_tree won't be used at
all in volumes without NO_HOLES.

Fix this by calling btrfs_init_file_extent_tree() after i_mode is
initialized in btrfs_read_locked_inode().

Fixes: 3d7db6e8bd22e6 ("btrfs: don't allocate file extent tree for non regular files")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: austinchang <austinchang@synology.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 days agobtrfs: zoned: fix incorrect ASSERT in btrfs_zoned_reserve_data_reloc_bg()
Johannes Thumshirn [Fri, 5 Sep 2025 13:54:43 +0000 (15:54 +0200)]
btrfs: zoned: fix incorrect ASSERT in btrfs_zoned_reserve_data_reloc_bg()

When moving a block-group to the dedicated data relocation space-info in
btrfs_zoned_reserve_data_reloc_bg() it is asserted that the newly
created block group for data relocation does not contain any
zone_unusable bytes.

But on disks with zone_capacity < zone_size, the difference between
zone_size and zone_capacity is accounted as zone_unusable.

Instead of asserting that the block-group does not contain any
zone_unusable bytes, remove them from the block-groups total size.

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/linux-block/CAHj4cs8-cS2E+-xQ-d2Bj6vMJZ+CwT_cbdWBTju4BV35LsvEYw@mail.gmail.com/
Fixes: daa0fde322350 ("btrfs: zoned: fix data relocation block group reservation")
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>