Michal Pecio [Wed, 6 Nov 2024 10:14:59 +0000 (12:14 +0200)]
usb: xhci: Avoid queuing redundant Stop Endpoint commands
Stop Endpoint command on an already stopped endpoint fails and may be
misinterpreted as a known hardware bug by the completion handler. This
results in an unnecessary delay with repeated retries of the command.
Avoid queuing this command when endpoint state flags indicate that it's
stopped or halted and the command will fail. If commands are pending on
the endpoint, their completion handlers will process cancelled TDs so
it's done. In case of waiting for external operations like clearing TT
buffer, the endpoint is stopped and cancelled TDs can be processed now.
This eliminates practically all unnecessary retries because an endpoint
with pending URBs is maintained in Running state by the driver, unless
aforementioned commands or other operations are pending on it. This is
guaranteed by xhci_ring_ep_doorbell() and by the fact that it is called
every time any of those operations completes.
The only known exceptions are hardware bugs (the endpoint never starts
at all) and Stream Protocol errors not associated with any TRB, which
cause an endpoint reset not followed by restart. Sounds like a bug.
Generally, these retries are only expected to happen when the endpoint
fails to start for unknown/no reason, which is a worse problem itself,
and fixing the bug eliminates the retries too.
All cases were tested and found to work as expected. SET_DEQ_PENDING
was produced by patching uvcvideo to unlink URBs in 100us intervals,
which then runs into this case very often. EP_HALTED was produced by
restarting 'cat /dev/ttyUSB0' on a serial dongle with broken cable.
EP_CLEARING_TT by the same, with the dongle on an external hub.
Fixes:
fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-34-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Pecio [Wed, 6 Nov 2024 10:14:58 +0000 (12:14 +0200)]
usb: xhci: Fix TD invalidation under pending Set TR Dequeue
xhci_invalidate_cancelled_tds() may not work correctly if the hardware
is modifying endpoint or stream contexts at the same time by executing
a Set TR Dequeue command. And even if it worked, it would be unable to
queue Set TR Dequeue for the next stream, failing to clear xHC cache.
On stream endpoints, a chain of Set TR Dequeue commands may take some
time to execute and we may want to cancel more TDs during this time.
Currently this leads to Stop Endpoint completion handler calling this
function without testing for SET_DEQ_PENDING, which will trigger the
aforementioned problems when it happens.
On all endpoints, a halt condition causes Reset Endpoint to be queued
and an error status given to the class driver, which may unlink more
URBs in response. Stop Endpoint is queued and its handler may execute
concurrently with Set TR Dequeue queued by Reset Endpoint handler.
(Reset Endpoint handler calls this function too, but there seems to
be no possibility of it running concurrently with Set TR Dequeue).
Fix xhci_invalidate_cancelled_tds() to work correctly under a pending
Set TR Dequeue. Bail out of the function when SET_DEQ_PENDING is set,
then make the completion handler call the function again and also call
xhci_giveback_invalidated_tds(), which needs to be called next.
This seems to fix another potential bug, where the handler would call
xhci_invalidate_cancelled_tds(), which may clear some deferred TDs if
a sanity check fails, and the TDs wouldn't be given back promptly.
Said sanity check seems to be wrong and prone to false positives when
the endpoint halts, but fixing it is beyond the scope of this change,
besides ensuring that cleared TDs are given back properly.
Fixes:
5ceac4402f5d ("xhci: Handle TD clearing for multiple streams case")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-33-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Pecio [Wed, 6 Nov 2024 10:14:57 +0000 (12:14 +0200)]
usb: xhci: Limit Stop Endpoint retries
Some host controllers fail to atomically transition an endpoint to the
Running state on a doorbell ring and enter a hidden "Restarting" state,
which looks very much like Stopped, with the important difference that
it will spontaneously transition to Running anytime soon.
A Stop Endpoint command queued in the Restarting state typically fails
with Context State Error and the completion handler sees the Endpoint
Context State as either still Stopped or already Running. Even a case
of Halted was observed, when an error occurred right after the restart.
The Halted state is already recovered from by resetting the endpoint.
The Running state is handled by retrying Stop Endpoint.
The Stopped state was recognized as a problem on NEC controllers and
worked around also by retrying, because the endpoint soon restarts and
then stops for good. But there is a risk: the command may fail if the
endpoint is "stopped for good" already, and retries will fail forever.
The possibility of this was not realized at the time, but a number of
cases were discovered later and reproduced. Some proved difficult to
deal with, and it is outright impossible to predict if an endpoint may
fail to ever start at all due to a hardware bug. One such bug (albeit
on ASM3142, not on NEC) was found to be reliably triggered simply by
toggling an AX88179 NIC up/down in a tight loop for a few seconds.
An endless retries storm is quite nasty. Besides putting needless load
on the xHC and CPU, it causes URBs never to be given back, paralyzing
the device and connection/disconnection logic for the whole bus if the
device is unplugged. User processes waiting for URBs become unkillable,
drivers and kworker threads lock up and xhci_hcd cannot be reloaded.
For peace of mind, impose a timeout on Stop Endpoint retries in this
case. If they don't succeed in 100ms, consider the endpoint stopped
permanently for some reason and just give back the unlinked URBs. This
failure case is rare already and work is under way to make it rarer.
Start this work today by also handling one simple case of race with
Reset Endpoint, because it costs just two lines to implement.
Fixes:
fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers")
CC: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-32-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:56 +0000 (12:14 +0200)]
usb: xhci: remove irrelevant comment
The code which it is referencing does not exist in the same function,
or the file for that matter. Since it was added [1], the Interrupter
Moderation Interval can be changed within xhci addon, e.g. PCI
xhci_pci_setup().
[1], commit
0ebbab374223 ("USB: xhci: Ring allocation and initialization.")
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-31-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:55 +0000 (12:14 +0200)]
usb: xhci: add help function xhci_dequeue_td()
Add xhci_dequeue_td() helper function to reduce code duplication.
Function xhci_dequeue_td() advances the dequeue pointer past the specified
Transfer Descriptor (TD) and releases the TD.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-30-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:54 +0000 (12:14 +0200)]
usb: xhci: refactor xhci_td_cleanup() to return void
The function is modified to return 'void' instead of an integer since it
invariably returns '0'. Additionally, multiple functions which only
return xhci_td_cleanup() are also refactored to return void.
This change eliminates the need for callers to handle a return value that
does not convey meaningful information and improve code readability, as it
becomes immediately clear that the function does not produce a significant
output.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-29-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:53 +0000 (12:14 +0200)]
usb: xhci: remove unused arguments from td_to_noop()
Function td_to_noop() does not utilize arguments 'xhci' and 'ep_ring'.
These unused arguments are removed to clean up the code.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-28-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:52 +0000 (12:14 +0200)]
usb: xhci: improve xhci_clear_command_ring()
Remove redundant TRB cycle reset, the TRB cycle is already set to zero by
the preceding memset(), making the explicit reset unnecessary.
Clarify ring loop start point. Change the loop start from the dequeue
segment to the start segment. Both approaches achieve the same result,
but starting from the start segment makes it clearer that the entire ring
is being zeroed out.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-27-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:51 +0000 (12:14 +0200)]
usb: xhci: request MSI/-X according to requested amount
Variable 'max_interrupters' contains the maximum supported interrupters
or the maximum interrupters the user has requested. Thus, it should be
used instead of HCS_MAX_INTRS().
User set 'max_interrupters' value is validated in xhci_gen_setup(),
otherwise 'max_interrupters' value is 'HCS_MAX_INTRS(xhci->hcs_params1)'.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-26-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:50 +0000 (12:14 +0200)]
usb: xhci: move link TRB quirk to xhci_gen_setup()
This quirk is old and seldom seen, as a result the trace is changed
to debug message and only printed when the quirk is set.
Move it into xhci_gen_setup() where the majority of quirks are set.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-25-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:49 +0000 (12:14 +0200)]
usb: xhci: simplify TDs start and end naming scheme in struct 'xhci_td'
Old names:
* start_seg - last_trb_seg
* first_trb - last_trb
New names:
* start_seg - end_seg
* start_trb - end_trb
A Transfer Descriptor (TD) in the xhci driver is a data structure that
represents a single transaction to be performed by the USB host controller.
This transaction is defined by TRBs from 'start_trb' in 'start_seg' to
'end_trb' in 'end_seg'.
The terms "start" and "end" were chosen over "first" and "last" for ease
of searching within the codebase. The ring structure uses 'first_seg' and
'last_seg', while the TD structure uses 'start_seg' and 'end_seg'.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-24-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Shevchenko [Wed, 6 Nov 2024 10:14:48 +0000 (12:14 +0200)]
xhci: pci: Fix indentation in the PCI device ID definitions
Some of the definitions are missing the one TAB, add it to them.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-23-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Shevchenko [Wed, 6 Nov 2024 10:14:47 +0000 (12:14 +0200)]
xhci: pci: Use standard pattern for device IDs
The definitions of vendor IDs follow the pattern PCI_VENDOR_ID_#vendor,
while device IDs — PCI_DEVICE_ID_#vendor_#device.
Update the ETRON device IDs to follow the above mentioned pattern.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-22-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kuangyi Chiang [Wed, 6 Nov 2024 10:14:46 +0000 (12:14 +0200)]
xhci: Don't perform Soft Retry for Etron xHCI host
Since commit
f8f80be501aa ("xhci: Use soft retry to recover faster from
transaction errors"), unplugging USB device while enumeration results in
errors like this:
[ 364.855321] xhci_hcd 0000:0b:00.0: ERROR Transfer event for disabled endpoint slot 5 ep 2
[ 364.864622] xhci_hcd 0000:0b:00.0: @
0000002167656d70 67f03000 00000021 0c000000 05038001
[ 374.934793] xhci_hcd 0000:0b:00.0: Abort failed to stop command ring: -110
[ 374.958793] xhci_hcd 0000:0b:00.0: xHCI host controller not responding, assume dead
[ 374.967590] xhci_hcd 0000:0b:00.0: HC died; cleaning up
[ 374.973984] xhci_hcd 0000:0b:00.0: Timeout while waiting for configure endpoint command
Seems that Etorn xHCI host can not perform Soft Retry correctly, apply
XHCI_NO_SOFT_RETRY quirk to disable Soft Retry and then issue is gone.
This patch depends on commit
a4a251f8c235 ("usb: xhci: do not perform
Soft Retry for some xHCI hosts").
Fixes:
f8f80be501aa ("xhci: Use soft retry to recover faster from transaction errors")
Cc: stable@vger.kernel.org
Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-21-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kuangyi Chiang [Wed, 6 Nov 2024 10:14:45 +0000 (12:14 +0200)]
xhci: Fix control transfer error on Etron xHCI host
Performing a stability stress test on a USB3.0 2.5G ethernet adapter
results in errors like this:
[ 91.441469] r8152 2-3:1.0 eth3: get_registers -71
[ 91.458659] r8152 2-3:1.0 eth3: get_registers -71
[ 91.475911] r8152 2-3:1.0 eth3: get_registers -71
[ 91.493203] r8152 2-3:1.0 eth3: get_registers -71
[ 91.510421] r8152 2-3:1.0 eth3: get_registers -71
The r8152 driver will periodically issue lots of control-IN requests
to access the status of ethernet adapter hardware registers during
the test.
This happens when the xHCI driver enqueue a control TD (which cross
over the Link TRB between two ring segments, as shown) in the endpoint
zero's transfer ring. Seems the Etron xHCI host can not perform this
TD correctly, causing the USB transfer error occurred, maybe the upper
driver retry that control-IN request can solve problem, but not all
drivers do this.
| |
-------
| TRB | Setup Stage
-------
| TRB | Link
-------
-------
| TRB | Data Stage
-------
| TRB | Status Stage
-------
| |
To work around this, the xHCI driver should enqueue a No Op TRB if
next available TRB is the Link TRB in the ring segment, this can
prevent the Setup and Data Stage TRB to be breaked by the Link TRB.
Check if the XHCI_ETRON_HOST quirk flag is set before invoking the
workaround in xhci_queue_ctrl_tx().
Fixes:
d0e96f5a71a0 ("USB: xhci: Control transfer support.")
Cc: stable@vger.kernel.org
Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-20-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kuangyi Chiang [Wed, 6 Nov 2024 10:14:44 +0000 (12:14 +0200)]
xhci: Don't issue Reset Device command to Etron xHCI host
Sometimes the hub driver does not recognize the USB device connected
to the external USB2.0 hub when the system resumes from S4.
After the SetPortFeature(PORT_RESET) request is completed, the hub
driver calls the HCD reset_device callback, which will issue a Reset
Device command and free all structures associated with endpoints
that were disabled.
This happens when the xHCI driver issue a Reset Device command to
inform the Etron xHCI host that the USB device associated with a
device slot has been reset. Seems that the Etron xHCI host can not
perform this command correctly, affecting the USB device.
To work around this, the xHCI driver should obtain a new device slot
with reference to commit
651aaf36a7d7 ("usb: xhci: Handle USB transaction
error on address command"), which is another way to inform the Etron
xHCI host that the USB device has been reset.
Add a new XHCI_ETRON_HOST quirk flag to invoke the workaround in
xhci_discover_or_reset_device().
Fixes:
2a8f82c4ceaf ("USB: xhci: Notify the xHC when a device is reset.")
Cc: stable@vger.kernel.org
Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-19-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kuangyi Chiang [Wed, 6 Nov 2024 10:14:43 +0000 (12:14 +0200)]
xhci: Combine two if statements for Etron xHCI host
Combine two if statements, because these hosts have the same
quirk flags applied.
[Mathias: has stable tag because other fixes in series depend on this]
Fixes:
91f7a1524a92 ("xhci: Apply broken streams quirk to Etron EJ188 xHCI host")
Cc: stable@vger.kernel.org
Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-18-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:42 +0000 (12:14 +0200)]
usb: xhci: add xhci_initialize_ring_segments()
A ring consists of a list of segments, each containing a specific number of
TRBs. The xhci driver allocates and initializes ring segments and TRBs in
the same functions. This combined allocation and initialization process
leads to an issue where, after hibernation (S4 state), the xhci driver
frees all its memory and re-creates the rings, segments, and TRBs from
scratch.
Move all default ring segment initialization into function
xhci_initialize_ring_segments(). This function can be called to
reinitialize a ring without freeing and reallocating it.
Since xhci_alloc_segments_for_ring() no longer initializes segment TRBs,
xhci_initialize_ring_segments() is added to xhci_ring_expansion(). This
results in the last segment of the source ring having the 'LINK_TOGGLE'
bit set. Therefore, if the last source ring segment is not the last in
the destination ring, the 'LINK_TOGGLE' bit must be cleared.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-17-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:41 +0000 (12:14 +0200)]
usb: xhci: rework xhci_link_segments()
Prepare for splitting ring segments allocation and initialization by
reworking the xhci_link_segments() function. Segment linking and ring type
checks are moved out of xhci_link_segments(), and the function is renamed
to "xhci_set_link_trb()".
The goal is to keep ring linking within xhci_alloc_segments_for_ring() and
move initialization into a separate function.
Additionally, reorder and simplify xhci_set_link_trb() for better
readability.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-16-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:40 +0000 (12:14 +0200)]
usb: xhci: refactor xhci_link_rings() to use source and destination rings
Refactor the xhci_link_rings() function to accept two rings: a source ring
and a destination ring. Previously, the function accepted a ring and a
segment list as arguments, now the function splices the source ring segment
list into the destination ring. This new approach reduces the number of
arguments and simplifies the code, making it easier to follow.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-15-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:39 +0000 (12:14 +0200)]
usb: xhci: rework xhci_free_segments_for_ring()
The segment list is only relevant within the context of the ring.
Thus, it is more logical to pass the ring itself when freeing all segments
from a ring, rather than passing the ring list.
Rename the function to "xhci_ring_segments_free" to align with the naming
convention of xhci_segment_free().
To make the freeing process more intuitive, free segments sequentially
from start to end (i.e., 0, 1, 2, 3, ...) instead of freeing the first
segment last (i.e., 1, 2, 3, ... 0).
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-14-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:38 +0000 (12:14 +0200)]
usb: xhci: adjust xhci_alloc_segments_for_ring() arguments
The function xhci_alloc_segments_for_ring() currently takes 7 arguments,
5 of which are components of the 'xhci_ring' struct. Refactor the function
to accept a pointer to the 'xhci_ring' struct instead of passing these
components separately.
The change reduces the number of arguments, making the function signature
cleaner and easier to understand.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-13-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:37 +0000 (12:14 +0200)]
usb: xhci: remove option to change a default ring's TRB cycle bit
The TRB cycle bit indicates TRB ownership by the Host Controller (HC) or
Host Controller Driver (HCD). New rings are initialized with 'cycle_state'
equal to one, and all its TRBs' cycle bits are set to zero. When handling
ring expansion, set the source ring cycle bits to the same value as the
destination ring.
Move the cycle bit setting from xhci_segment_alloc() to xhci_link_rings(),
and remove the 'cycle_state' argument from xhci_initialize_ring_info().
The xhci_segment_alloc() function uses kzalloc_node() to allocate segments,
ensuring that all TRB cycle bits are initialized to zero.
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-12-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Niklas Neronin [Wed, 6 Nov 2024 10:14:36 +0000 (12:14 +0200)]
usb: xhci: introduce macro for ring segment list iteration
Add macro to streamline and standardize the iteration over ring
segment list.
xhci_for_each_ring_seg(): Iterates over the entire ring segment list.
The xhci_free_segments_for_ring() function's while loop has not been
updated to use the new macro. This function has some underlying issues,
and as a result, it will be handled separately in a future patch.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-11-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
WangYuli [Wed, 6 Nov 2024 10:14:35 +0000 (12:14 +0200)]
xhci: debugfs: Add virt endpoint state to xhci debugfs
The ring data structure of each xHCI endpoint might stop sending
data due to the virt endpoint state.
Show the virt endpoint state within the endpoint context via debugfs
to facilitate debugging.
Co-developed-by: Xu Rao <raoxu@uniontech.com>
Signed-off-by: Xu Rao <raoxu@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-10-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathias Nyman [Wed, 6 Nov 2024 10:14:34 +0000 (12:14 +0200)]
xhci: trace stream context at Set TR Deq command completion
A successful 'Set TR Deq' command writes a new dequeue pointer to the
stream context for stream rings, show the content of the stream ctx
at command completion.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-9-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathias Nyman [Wed, 6 Nov 2024 10:14:33 +0000 (12:14 +0200)]
xhci: add stream context tracing
Show stream id, stream context type (SCT), ring dequeue pointer and
the DMA address of the stream context during stream allocation
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-8-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathias Nyman [Wed, 6 Nov 2024 10:14:32 +0000 (12:14 +0200)]
xhci: Don't trace ring at every enqueue or dequeue increase
Don't trace ring pointers every time driver increases enqueue pointer
after queuing a TRB, or dequeue pointer after handling an event.
These xhci_inc_deq: and xhci_inc_enq: trace entries fill up the trace and
provides little useful info now that the TRB DMA address is printed with
the TRB itself.
Only trace ring during xhci_inc_enq() and xhci_inc_deq() in case we move
to a new ring segment.
Also don't show both segment and TRB addess in trace.
Segment is just TRB with 0xfff masked off.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-7-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathias Nyman [Wed, 6 Nov 2024 10:14:31 +0000 (12:14 +0200)]
xhci: show DMA address of TRB when tracing TRBs
The DMA address of a queued TRB is essential when looking at traces as
both transfer events and command completion events refer to the command
or transfer based on its DMA address.
Previously the TRB address was figured out from xhci_inc_enq and
xhci_inc_deq trace entries seen after queuing or handlong a TRB.
Now that DMA address is shown in TRB tracing we can get rid of most of the
xhci_inc_enq and xhci_inc_deq traces thus decreasing trace size.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-6-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathias Nyman [Wed, 6 Nov 2024 10:14:30 +0000 (12:14 +0200)]
xhci: Cleanup Candence controller PCI device and vendor ID usage
Use predefined PCI vendor ID constant for Cadence controller in pci_ids.h
Rename the Candence device ID to match the pattern other PCI vendor and
device IDs use
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Closes: https://lore.kernel.org/linux-usb/ZuMOfHp9j_6_3-WC@surfacebook.localdomain
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-5-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Pecio [Wed, 6 Nov 2024 10:14:29 +0000 (12:14 +0200)]
usb: xhci: Fix sum_trb_lengths()
This function is supposed to sum the lengths of all transfer TRBs in
a TD up to a point, but it starts summing at the current dequeue since
it only ever gets called on the first pending TD.
This won't work when there are cancelled TDs at the beginning of the
ring. The function tries to exclude No-Ops from the count, but not all
cancelled TDs are No-Op'ed - not those the HW stopped on.
The absolutely obvious fix is to start counting at the TD's first TRB.
And remove the now-useless 'ring' parameter, and 'xhci' too.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Pecio [Wed, 6 Nov 2024 10:14:28 +0000 (12:14 +0200)]
usb: xhci: Remove unused parameters of next_trb()
The function has two parameters which it doesn't use and hasn't ever
used. One caller even puts NULL there, knowing it will work anyway.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathias Nyman [Wed, 6 Nov 2024 10:14:27 +0000 (12:14 +0200)]
xhci: Add Isochronous TRB fields to TRB tracer
In addition to Normal TRB fields the Isoch TRBs have a SIA flag,
Frame ID, TLBPC and TBC fields.
Add these fields to tracing output
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20241106101459.775897-2-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rob Herring (Arm) [Mon, 4 Nov 2024 19:08:18 +0000 (13:08 -0600)]
usb: Use (of|device)_property_present() for non-boolean properties
The use of (of|device)_property_read_bool() for non-boolean properties
is deprecated in favor of of_property_present() when testing for
property presence.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Peter Chen <peter.chen@kernel.org>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Link: https://lore.kernel.org/r/20241104190820.277702-1-robh@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitry Baryshkov [Thu, 17 Oct 2024 18:16:38 +0000 (21:16 +0300)]
dt-bindings: usb: qcom,dwc3: Add SAR2130P compatible
Document compatible for the Synopsys DWC3 USB Controller on SAR2130P
platform.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20241017-sar2130p-usb-v1-1-21e01264b70e@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Tue, 5 Nov 2024 08:55:37 +0000 (09:55 +0100)]
Merge v6.12-rc6 into usb-next
We need the USB fixes in here as well, and this resolves a merge
conflict in:
drivers/usb/typec/tcpm/tcpm.c
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20241101150730.090dc30f@canb.auug.org.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Charles Han [Fri, 25 Oct 2024 07:07:44 +0000 (15:07 +0800)]
phy: realtek: usb: fix NULL deref in rtk_usb3phy_probe
In rtk_usb3phy_probe() devm_kzalloc() may return NULL
but this returned value is not checked.
Fixes:
adda6e82a7de ("phy: realtek: usb: Add driver for the Realtek SoC USB 3.0 PHY")
Signed-off-by: Charles Han <hanchunchao@inspur.com>
Link: https://lore.kernel.org/r/20241025070744.149070-1-hanchunchao@inspur.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Charles Han [Fri, 25 Oct 2024 06:59:12 +0000 (14:59 +0800)]
phy: realtek: usb: fix NULL deref in rtk_usb2phy_probe
In rtk_usb2phy_probe() devm_kzalloc() may return NULL
but this returned value is not checked.
Fixes:
134e6d25f6bd ("phy: realtek: usb: Add driver for the Realtek SoC USB 2.0 PHY")
Signed-off-by: Charles Han <hanchunchao@inspur.com>
Link: https://lore.kernel.org/r/20241025065912.143692-1-hanchunchao@inspur.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Romain Gantois [Thu, 24 Oct 2024 08:54:17 +0000 (10:54 +0200)]
usb: typec: mux: Add support for the TUSB1046 crosspoint switch
The TUSB1046-DCI is a USB-C linear redriver crosspoint switch, which can
mux SuperSpeed lanes from a Type-C connector to a USB3.0 data lane or up to
4 display port lanes.
Add support for driving the TUSB1046 as a Type-C orientation switch and
DisplayPort altmode multiplexer.
Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20241024-tusb1046-v2-2-d031b1a43e6d@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Romain Gantois [Thu, 24 Oct 2024 08:54:16 +0000 (10:54 +0200)]
dt-bindings: usb: Describe TUSB1046 crosspoint switch
Describe the Texas Instruments TUSB1046-DCI USB Type-C linear redriver
crosspoint switch. This component is used to handle orientation switching
and DisplayPort altmode multiplexing for Type-C signals.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
Link: https://lore.kernel.org/r/20241024-tusb1046-v2-1-d031b1a43e6d@bootlin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Parth Pancholi [Tue, 29 Oct 2024 07:24:44 +0000 (08:24 +0100)]
USB: xhci: add support for PWRON active high
Some PCIe-to-USB controllers such as TI's TUSB73x0 3.0 xHCI host
controller supports controlling the PWRONx polarity via the USB control
register (E0h). Add support for device tree property
ti,pwron-active-high which indicates PWRONx to be active high and
configure the E0h register accordingly. This enables the software
control for the TUSB73x0's PWRONx outputs with an inverted polarity from
the default configuration which could be used as USB EN signals for the
other hubs or devices.
Signed-off-by: Parth Pancholi <parth.pancholi@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Link: https://lore.kernel.org/r/20241029072444.8827-3-francesco@dolcini.it
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Parth Pancholi [Tue, 29 Oct 2024 07:24:43 +0000 (08:24 +0100)]
dt-bindings: usb: add TUSB73x0 PCIe
Add device tree bindings for TI's TUSB73x0 PCIe-to-USB 3.0 xHCI
host controller. The controller supports software configuration
through PCIe registers, such as controlling the PWRONx polarity
via the USB control register (E0h).
Datasheet: https://www.ti.com/lit/ds/symlink/tusb7320.pdf
Signed-off-by: Parth Pancholi <parth.pancholi@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20241029072444.8827-2-francesco@dolcini.it
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andy Shevchenko [Thu, 31 Oct 2024 10:38:18 +0000 (12:38 +0200)]
USB: bcma: Remove unused of_gpio.h
of_gpio.h is deprecated and subject to remove. The drivers in question
don't use it, simply remove the unused header.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20241031103818.2451816-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Mon, 4 Nov 2024 00:05:52 +0000 (14:05 -1000)]
Linux 6.12-rc6
Linus Torvalds [Sun, 3 Nov 2024 20:25:05 +0000 (10:25 -1000)]
Merge tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes. 9 are cc:stable. 13 are MM and 4 are non-MM.
The usual collection of singletons - please see the changelogs"
* tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
mm: shrinker: avoid memleak in alloc_shrinker_info
.mailmap: update e-mail address for Eugen Hristev
vmscan,migrate: fix page count imbalance on node stats when demoting pages
mailmap: update Jarkko's email addresses
mm: allow set/clear page_type again
nilfs2: fix potential deadlock with newly created symlinks
Squashfs: fix variable overflow in squashfs_readpage_block
kasan: remove vmalloc_percpu test
tools/mm: -Werror fixes in page-types/slabinfo
mm, swap: avoid over reclaim of full clusters
mm: fix PSWPIN counter for large folios swap-in
mm: avoid VM_BUG_ON when try to map an anon large folio to zero page.
mm/codetag: fix null pointer check logic for ref and tag
mm/gup: stop leaking pinned pages in low memory conditions
Linus Torvalds [Sun, 3 Nov 2024 20:19:34 +0000 (10:19 -1000)]
Merge tag 'phy-fixes-6.12' of git://git./linux/kernel/git/phy/linux-phy
Pull phy fixes from Vinod Koul:
- Qualcomm QMP driver fixes for null deref on suspend, bogus supplies
fix and reset entries fix
- BCM usb driver init array fix
- cadence array offset fix
- starfive link configuration fix
- config dependency fix for rockchip driver
- freescale reset signal fix before pll lock
- tegra driver fix for error pointer check
* tag 'phy-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
phy: tegra: xusb: Add error pointer check in xusb.c
dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: Fix X1E80100 resets entries
phy: freescale: imx8m-pcie: Do CMN_RST just before PHY PLL lock check
phy: phy-rockchip-samsung-hdptx: Depend on CONFIG_COMMON_CLK
phy: ti: phy-j721e-wiz: fix usxgmii configuration
phy: starfive: jh7110-usb: Fix link configuration to controller
phy: qcom: qmp-pcie: drop bogus x1e80100 qref supplies
phy: qcom: qmp-combo: move driver data initialisation earlier
phy: qcom: qmp-usbc: fix NULL-deref on runtime suspend
phy: qcom: qmp-usb-legacy: fix NULL-deref on runtime suspend
phy: qcom: qmp-usb: fix NULL-deref on runtime suspend
dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: add missing x1e80100 pipediv2 clocks
phy: usb: disable COMMONONN for dual mode
phy: cadence: Sierra: Fix offset of DEQ open eye algorithm control register
phy: usb: Fix missing elements in BCM4908 USB init array
Linus Torvalds [Sun, 3 Nov 2024 20:15:50 +0000 (10:15 -1000)]
Merge tag 'dmaengine-fix-6.12' of git://git./linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
- TI driver fix to set EOP for cyclic BCDMA transfers
- sh rz-dmac driver fix for handling config with zero address
* tag 'dmaengine-fix-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine: ti: k3-udma: Set EOP for all TRs in cyclic BCDMA transfer
dmaengine: sh: rz-dmac: handle configs where one address is zero
Linus Torvalds [Sun, 3 Nov 2024 18:51:53 +0000 (08:51 -1000)]
Merge tag 'driver-core-6.12-rc6' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core revert from Greg KH:
"Here is a single driver core revert for 6.12-rc6. It reverts a change
that came in -rc1 that was supposed to resolve a reported problem, but
caused another one, so revert it for now so that we can get this all
worked out properly in 6.13.
The revert has been in linux-next all week with no reported issues"
* tag 'driver-core-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "driver core: Fix uevent_show() vs driver detach race"
Linus Torvalds [Sun, 3 Nov 2024 18:48:11 +0000 (08:48 -1000)]
Merge tag 'usb-6.12-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes for 6.12-rc6 that
have been sitting in my tree this week. Included in here are the
following:
- thunderbolt driver fixes for reported issues
- USB typec driver fixes
- xhci driver fixes for reported problems
- dwc2 driver revert for a broken change
- usb phy driver fix
- usbip tool fix
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tcpm: restrict SNK_WAIT_CAPABILITIES_TIMEOUT transitions to non self-powered devices
usb: phy: Fix API devm_usb_put_phy() can not release the phy
usb: typec: use cleanup facility for 'altmodes_node'
usb: typec: fix unreleased fwnode_handle in typec_port_register_altmodes()
usb: typec: qcom-pmic-typec: fix missing fwnode removal in error path
usb: typec: qcom-pmic-typec: use fwnode_handle_put() to release fwnodes
usb: acpi: fix boot hang due to early incorrect 'tunneled' USB3 device links
Revert "usb: dwc2: Skip clock gating on Broadcom SoCs"
xhci: Fix Link TRB DMA in command ring stopped completion event
xhci: Use pm_runtime_get to prevent RPM on unsupported systems
usbip: tools: Fix detach_port() invalid port error path
thunderbolt: Honor TMU requirements in the domain when setting TMU mode
thunderbolt: Fix KASAN reported stack out-of-bounds read in tb_retimer_scan()
Yu Zhao [Sat, 19 Oct 2024 01:29:39 +0000 (01:29 +0000)]
mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
When the MM_WALK capability is enabled, memory that is mostly accessed by
a VM appears younger than it really is, therefore this memory will be less
likely to be evicted. Therefore, the presence of a running VM can
significantly increase swap-outs for non-VM memory, regressing the
performance for the rest of the system.
Fix this regression by always calling {ptep,pmdp}_clear_young_notify()
whenever we clear the young bits on PMDs/PTEs.
[jthoughton@google.com: fix link-time error]
Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com
Fixes:
bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Reported-by: David Stevens <stevensd@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yu Zhao [Sat, 19 Oct 2024 01:29:38 +0000 (01:29 +0000)]
mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in
MM_WALK".
Today, the MM_WALK capability causes MGLRU to clear the young bit from
PMDs and PTEs during the page table walk before eviction, but MGLRU does
not call the clear_young() MMU notifier in this case. By not calling this
notifier, the MM walk takes less time/CPU, but it causes pages that are
accessed mostly through KVM / secondary MMUs to appear younger than they
should be.
We do call the clear_young() notifier today, but only when attempting to
evict the page, so we end up clearing young/accessed information less
frequently for secondary MMUs than for mm PTEs, and therefore they appear
younger and are less likely to be evicted. Therefore, memory that is
*not* being accessed mostly by KVM will be evicted *more* frequently,
worsening performance.
ChromeOS observed a tab-open latency regression when enabling MGLRU with a
setup that involved running a VM:
Tab-open latency histogram (ms)
Version p50 mean p95 p99 max
base 1315 1198 2347 3454 10319
mglru 2559 1311 7399 12060 43758
fix 1119 926 2470 4211 6947
This series replaces the final non-selftest patchs from this series[1],
which introduced a similar change (and a new MMU notifier) with KVM
optimizations. I'll send a separate series (to Sean and Paolo) for the
KVM optimizations.
This series also makes proactive reclaim with MGLRU possible for KVM
memory. I have verified that this functions correctly with the selftest
from [1], but given that that test is a KVM selftest, I'll send it with
the rest of the KVM optimizations later. Andrew, let me know if you'd
like to take the test now anyway.
[1]: https://lore.kernel.org/linux-mm/
20240926013506.860253-18-jthoughton@google.com/
This patch (of 2):
The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful
and become more complicated to properly compute when adding
test/clear_young() notifiers in MGLRU's mm walk.
Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com
Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com
Fixes:
bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Stevens <stevensd@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Sun, 3 Nov 2024 18:45:03 +0000 (08:45 -1000)]
Merge tag 'char-misc-6.12-rc6' of git://git./linux/kernel/git/gregkh/char-misc
Pull misc driver fixes from Greg KH:
"Here are some small char/misc/iio fixes for 6.12-rc6 that resolve
some reported issues. Included in here are the following:
- small IIO driver fixes for many reported issues
- mei driver fix for a suddenly much reported issue for an "old"
issue.
- MAINTAINERS update for a developer who has moved companies and
forgot to update their old entry.
All of these have been in linux-next this week with no reported
issues"
* tag 'char-misc-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mei: use kvmalloc for read buffer
MAINTAINERS: add netup_unidvb maintainer
iio: dac: Kconfig: Fix build error for ltc2664
iio: adc: ad7124: fix division by zero in ad7124_set_channel_odr()
staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg()
docs: iio: ad7380: fix supply for ad7380-4
iio: adc: ad7380: fix supplies for ad7380-4
iio: adc: ad7380: add missing supplies
iio: adc: ad7380: use devm_regulator_get_enable_read_voltage()
dt-bindings: iio: adc: ad7380: fix ad7380-4 reference supply
iio: light: veml6030: fix microlux value calculation
iio: gts-helper: Fix memory leaks for the error path of iio_gts_build_avail_scale_table()
iio: gts-helper: Fix memory leaks in iio_gts_build_avail_scale_table()
Linus Torvalds [Sun, 3 Nov 2024 18:35:29 +0000 (08:35 -1000)]
Merge tag 'input-for-v6.12-rc5' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix for regression in input core introduced in 6.11 preventing
re-registering input handlers
- a fix for adp5588-keys driver tyring to disable interrupt 0 at
suspend when devices is used without interrupt
- a fix for edt-ft5x06 to stop leaking regmap structure when probing
fails and to make sure it is not released too early on removal.
* tag 'input-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: fix regression when re-registering input handlers
Input: adp5588-keys - do not try to disable interrupt 0
Input: edt-ft5x06 - fix regmap leak when probe fails
Linus Torvalds [Sun, 3 Nov 2024 18:29:02 +0000 (08:29 -1000)]
Merge tag 'kbuild-fixes-v6.12-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix a memory leak in modpost
- Resolve build issues when cross-compiling RPM and Debian packages
- Fix another regression in Kconfig
- Fix incorrect MODULE_ALIAS() output in modpost
* tag 'kbuild-fixes-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
modpost: fix input MODULE_DEVICE_TABLE() built for 64-bit on 32-bit host
modpost: fix acpi MODULE_DEVICE_TABLE built with mismatched endianness
kconfig: show sub-menu entries even if the prompt is hidden
kbuild: deb-pkg: add pkg.linux-upstream.nokerneldbg build profile
kbuild: deb-pkg: add pkg.linux-upstream.nokernelheaders build profile
kbuild: rpm-pkg: disable kernel-devel package when cross-compiling
sumversion: Fix a memory leak in get_src_version()
Linus Torvalds [Sun, 3 Nov 2024 18:26:00 +0000 (08:26 -1000)]
Merge tag 'x86-urgent-2024-11-03' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A trivial compile test fix for x86:
When CONFIG_AMD_NB is not set a COMPILE_TEST of an AMD specific driver
fails due to a missing inline stub. Add the stub to cure it"
* tag 'x86-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/amd_nb: Fix compile-testing without CONFIG_AMD_NB
Linus Torvalds [Sun, 3 Nov 2024 18:22:21 +0000 (08:22 -1000)]
Merge tag 'timers-urgent-2024-11-03' of git://git./linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for posix CPU timers.
When a thread is cloned, the posix CPU timers are not inherited.
If the parent has a CPU timer armed the corresponding tick dependency
in the tasks tick_dep_mask is set and copied to the new thread, which
means the new thread and all decendants will prevent the system to go
into full NOHZ operation.
Clear the tick dependency mask in copy_process() to fix this"
* tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
Linus Torvalds [Sun, 3 Nov 2024 18:18:28 +0000 (08:18 -1000)]
Merge tag 'sched-urgent-2024-11-03' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
- Plug a race between pick_next_task_fair() and try_to_wake_up() where
both try to write to the same task, even though both paths hold a
runqueue lock, but obviously from different runqueues.
The problem is that the store to task::on_rq in __block_task() is
visible to try_to_wake_up() which assumes that the task is not
queued. Both sides then operate on the same task.
Cure it by rearranging __block_task() so the the store to task::on_rq
is the last operation on the task.
- Prevent a potential NULL pointer dereference in task_numa_work()
task_numa_work() iterates the VMAs of a process. A concurrent unmap
of the address space can result in a NULL pointer return from
vma_next() which is unchecked.
Add the missing NULL pointer check to prevent this.
- Operate on the correct scheduler policy in task_should_scx()
task_should_scx() returns true when a task should be handled by sched
EXT. It checks the tasks scheduling policy.
This fails when the check is done before a policy has been set.
Cure it by handing the policy into task_should_scx() so it operates
on the requested value.
- Add the missing handling of sched EXT in the delayed dequeue
mechanism. This was simply forgotten.
* tag 'sched-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/ext: Fix scx vs sched_delayed
sched: Pass correct scheduling policy to __setscheduler_class
sched/numa: Fix the potential null pointer dereference in task_numa_work()
sched: Fix pick_next_task_fair() vs try_to_wake_up() race
Linus Torvalds [Sun, 3 Nov 2024 18:13:52 +0000 (08:13 -1000)]
Merge tag 'perf-urgent-2024-11-03' of git://git./linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
"perf_event_clear_cpumask() uses list_for_each_entry_rcu() without
being in a RCU read side critical section, which triggers a
'suspicious RCU usage' warning.
It turns out that the list walk does not be RCU protected because the
write side lock is held in this context.
Change it to a regular list walk"
* tag 'perf-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix missing RCU reader protection in perf_event_clear_cpumask()
Linus Torvalds [Sun, 3 Nov 2024 18:09:25 +0000 (08:09 -1000)]
Merge tag 'irq-urgent-2024-11-03' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
- Fix an off-by-one error in the failure path of msi_domain_alloc(),
which causes the cleanup loop to terminate early and leaking the
first allocated interrupt.
- Handle a corner case in GIC-V4 versus a lazily mapped Virtual
Processing Element (VPE). If the VPE has not been mapped because the
guest has not yet emitted a mapping command, then the set_affinity()
callback returns an error code, which causes the vCPU management to
fail.
Return success in this case without touching the hardware. This will
be done later when the guest issues the mapping command.
* tag 'irq-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v4: Correctly deal with set_affinity on lazily-mapped VPEs
genirq/msi: Fix off-by-one error in msi_domain_alloc()
Masahiro Yamada [Sun, 3 Nov 2024 12:52:57 +0000 (21:52 +0900)]
modpost: fix input MODULE_DEVICE_TABLE() built for 64-bit on 32-bit host
When building a 64-bit kernel on a 32-bit build host, incorrect
input MODULE_ALIAS() entries may be generated.
For example, when compiling a 64-bit kernel with CONFIG_INPUT_MOUSEDEV=m
on a 64-bit build machine, you will get the correct output:
$ grep MODULE_ALIAS drivers/input/mousedev.mod.c
MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*110,*r*0,*1,*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*r*8,*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*14A,*r*a*0,*1,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*145,*r*a*0,*1,*18,*1C,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*110,*r*a*0,*1,*m*l*s*f*w*");
However, building the same kernel on a 32-bit machine results in
incorrect output:
$ grep MODULE_ALIAS drivers/input/mousedev.mod.c
MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*110,*130,*r*0,*1,*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*r*8,*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*14A,*16A,*r*a*0,*1,*20,*21,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*145,*165,*r*a*0,*1,*18,*1C,*20,*21,*38,*3C,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*110,*130,*r*a*0,*1,*20,*21,*m*l*s*f*w*");
A similar issue occurs with CONFIG_INPUT_JOYDEV=m. On a 64-bit build
machine, the output is:
$ grep MODULE_ALIAS drivers/input/joydev.mod.c
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*0,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*2,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*8,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*6,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*120,*r*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*130,*r*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*2C0,*r*a*m*l*s*f*w*");
However, on a 32-bit machine, the output is incorrect:
$ grep MODULE_ALIAS drivers/input/joydev.mod.c
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*0,*20,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*2,*22,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*8,*28,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*6,*26,*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*11F,*13F,*r*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*11F,*13F,*r*a*m*l*s*f*w*");
MODULE_ALIAS("input:b*v*p*e*-e*1,*k*2C0,*2E0,*r*a*m*l*s*f*w*");
When building a 64-bit kernel, BITS_PER_LONG is defined as 64. However,
on a 32-bit build machine, the constant 1L is a signed 32-bit value.
Left-shifting it beyond 32 bits causes wraparound, and shifting by 31
or 63 bits makes it a negative value.
The fix in commit
e0e92632715f ("[PATCH] PATCH: 1 line 2.6.18 bugfix:
modpost-64bit-fix.patch") is incorrect; it only addresses cases where
a 64-bit kernel is built on a 64-bit build machine, overlooking cases
on a 32-bit build machine.
Using 1ULL ensures a 64-bit width on both 32-bit and 64-bit machines,
avoiding the wraparound issue.
Fixes:
e0e92632715f ("[PATCH] PATCH: 1 line 2.6.18 bugfix: modpost-64bit-fix.patch")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Masahiro Yamada [Sun, 3 Nov 2024 12:46:50 +0000 (21:46 +0900)]
modpost: fix acpi MODULE_DEVICE_TABLE built with mismatched endianness
When CONFIG_SATA_AHCI_PLATFORM=m, modpost outputs incorect acpi
MODULE_ALIAS() if the endianness of the target and the build machine
do not match.
When the endianness of the target kernel and the build machine match,
the output is correct:
$ grep 'MODULE_ALIAS("acpi' drivers/ata/ahci_platform.mod.c
MODULE_ALIAS("acpi*:APMC0D33:*");
MODULE_ALIAS("acpi*:010601:*");
However, when building a little-endian kernel on a big-endian machine
(or vice versa), the output is incorrect:
$ grep 'MODULE_ALIAS("acpi' drivers/ata/ahci_platform.mod.c
MODULE_ALIAS("acpi*:APMC0D33:*");
MODULE_ALIAS("acpi*:0601??:*");
The 'cls' and 'cls_msk' fields are 32-bit.
DEF_FIELD() must be used instead of DEF_FIELD_ADDR() to correctly handle
endianness of these 32-bit fields.
The check 'if (cls)' was unnecessary; it never became NULL, as it was
the pointer to 'symval' plus the offset to the 'cls' field.
Fixes:
26095a01d359 ("ACPI / scan: Add support for ACPI _CLS device matching")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Dmitry Torokhov [Mon, 28 Oct 2024 05:31:15 +0000 (22:31 -0700)]
Input: fix regression when re-registering input handlers
Commit
d469647bafd9 ("Input: simplify event handling logic") introduced
code that would set handler->events() method to either
input_handler_events_filter() or input_handler_events_default() or
input_handler_events_null(), depending on the kind of input handler
(a filter or a regular one) we are dealing with. Unfortunately this
breaks cases when we try to re-register the same filter (as is the case
with sysrq handler): after initial registration the handler will have 2
event handling methods defined, and will run afoul of the check in
input_handler_check_methods():
input: input_handler_check_methods: only one event processing method can be defined (sysrq)
sysrq: Failed to register input handler, error -22
Fix this by adding handle_events() method to input_handle structure and
setting it up when registering a new input handle according to event
handling methods defined in associated input_handler structure, thus
avoiding modifying the input_handler structure.
Reported-by: "Ned T. Crigler" <crigler@gmail.com>
Reported-by: Christian Heusel <christian@heusel.eu>
Tested-by: "Ned T. Crigler" <crigler@gmail.com>
Tested-by: Peter Seiderer <ps.report@gmx.net>
Fixes:
d469647bafd9 ("Input: simplify event handling logic")
Link: https://lore.kernel.org/r/Zx2iQp6csn42PJA7@xavtug
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Linus Torvalds [Sat, 2 Nov 2024 19:27:11 +0000 (09:27 -1000)]
Merge tag 'nfsd-6.12-3' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix two async COPY bugs found during NFS bake-a-thon
- Fix an svcrdma memory leak
* tag 'nfsd-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
rpcrdma: Always release the rpcrdma_device's xa_array
NFSD: Never decrement pending_async_copies on error
NFSD: Initialize struct nfsd4_copy earlier
Linus Torvalds [Sat, 2 Nov 2024 19:22:16 +0000 (09:22 -1000)]
Merge tag 'xfs-6.12-fixes-6' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
- fix a sysbot reported crash on filestreams
- Reduce cpu time spent searching for extents in a very fragmented FS
- Check for delayed allocations before setting extsize
* tag 'xfs-6.12-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: streamline xfs_filestream_pick_ag
xfs: fix finding a last resort AG in xfs_filestream_pick_ag
xfs: Reduce unnecessary searches when searching for the best extents
xfs: Check for delayed allocations before setting extsize
Linus Torvalds [Sat, 2 Nov 2024 02:05:50 +0000 (16:05 -1000)]
Merge tag 'linux_kselftest-fixes-6.12-rc6' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- fix syntax error in frequency calculation arithmetic expression in
intel_pstate run.sh
- add missing cpupower dependency check intel_pstate run.sh
- fix idmap_mount_tree_invalid test failure due to incorrect argument
- fix watchdog-test run leaving the watchdog timer enabled causing
system reboot. With this fix, the test disables the watchdog timer
when it gets terminated with SIGTERM, SIGKILL, and SIGQUIT in
addition to SIGINT
* tag 'linux_kselftest-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/watchdog-test: Fix system accidentally reset after watchdog-test
selftests/intel_pstate: check if cpupower is installed
selftests/intel_pstate: fix operand expected error
selftests/mount_setattr: fix idmap_mount_tree_invalid failed to run
Linus Torvalds [Sat, 2 Nov 2024 01:59:46 +0000 (15:59 -1000)]
Merge tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Avoid build errors with old 'rustc's without LLVM patch version
(important since it impacts people that do not even enable Rust)
- Update LLVM version for 'HAVE_CFI_ICALL_NORMALIZE_INTEGERS' in
'depends on' condition (the fix was eventually backported rather
than land in LLVM 19)"
* tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linux:
cfi: tweak llvm version for HAVE_CFI_ICALL_NORMALIZE_INTEGERS
kbuild: rust: avoid errors with old `rustc`s without LLVM patch version
Linus Torvalds [Sat, 2 Nov 2024 01:44:23 +0000 (15:44 -1000)]
Merge tag 'pci-v6.12-fixes-2' of git://git./linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:
- Enable device-specific ACS-like functionality even if the device
doesn't advertise an ACS capability, which got broken when adding
fancy ACS kernel parameter (Jason Gunthorpe)
* tag 'pci-v6.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Fix pci_enable_acs() support for the ACS quirks
Linus Torvalds [Sat, 2 Nov 2024 01:37:09 +0000 (15:37 -1000)]
Merge tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Regular fixes pull, nothing too out of the ordinary, the mediatek
fixes came in a batch that I might have preferred a bit earlier but
all seem fine, otherwise regular xe/amdgpu and a few misc ones.
xe:
- Fix missing HPD interrupt enabling, bringing one PM refactor with it
- Workaround LNL GGTT invalidation not being visible to GuC
- Avoid getting jobs stuck without a protecting timeout
ivpu:
- Fix firewall IRQ handling
panthor:
- Fix firmware initialization wrt page sizes
- Fix handling and reporting of dead job groups
sched:
- Guarantee forward progress via WC_MEM_RECLAIM
tests:
- Fix memory leak in drm_display_mode_from_cea_vic()
amdgpu:
- DCN 3.5 fix
- Vangogh SMU KASAN fix
- SMU 13 profile reporting fix
mediatek:
- Fix degradation problem of alpha blending
- Fix color format MACROs in OVL
- Fix get efuse issue for MT8188 DPTX
- Fix potential NULL dereference in mtk_crtc_destroy()
- Correct dpi power-domains property
- Add split subschema property constraints"
* tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel: (27 commits)
drm/xe: Don't short circuit TDR on jobs not started
drm/xe: Add mmio read before GGTT invalidate
drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic()
drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic()
drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic()
drm/panthor: Report group as timedout when we fail to properly suspend
drm/panthor: Fail job creation when the group is dead
drm/panthor: Fix firmware initialization on systems with a page size > 4k
accel/ivpu: Fix NOC firewall interrupt handling
drm/xe/display: Add missing HPD interrupt enabling during non-d3cold RPM resume
drm/xe/display: Separate the d3cold and non-d3cold runtime PM handling
drm/xe: Remove runtime argument from display s/r functions
drm/amdgpu/smu13: fix profile reporting
drm/amd/pm: Vangogh: Fix kernel memory out of bounds write
Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"
drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM
drm/tegra: Fix NULL vs IS_ERR() check in probe()
dt-bindings: display: mediatek: split: add subschema property constraints
dt-bindings: display: mediatek: dpi: correct power-domains property
drm/mediatek: Fix potential NULL dereference in mtk_crtc_destroy()
...
Linus Torvalds [Sat, 2 Nov 2024 01:22:57 +0000 (15:22 -1000)]
Merge tag 'cxl-fixes-6.12-rc6' of git://git./linux/kernel/git/cxl/cxl
Pull cxl fixes from Ira Weiny:
"The bulk of these fixes center around an initialization order bug
reported by Gregory Price and some additional fall out from the
debugging effort.
In summary, cxl_acpi and cxl_mem race and previously worked because of
a bus_rescan_devices() while testing without modules built in.
Unfortunately with modules built in the rescan would fail due to the
cxl_port driver being registered late via the build order. Furthermore
it was found bus_rescan_devices() did not guarantee a probe barrier
which CXL was expecting. Additional fixes to cxl-test and decoder
allocation came along as they were found in this debugging effort.
The other fixes are pretty minor but one affects trace point data seen
by user space.
Summary:
- Fix crashes when running with cxl-test code
- Fix Trace DRAM Event Record field decodes
- Fix module/built in initialization order errors
- Fix use after free on decoder shutdowns
- Fix out of order decoder allocations
- Improve cxl-test to better reflect real world systems"
* tag 'cxl-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/test: Improve init-order fidelity relative to real-world systems
cxl/port: Prevent out-of-order decoder allocation
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
cxl/port: Fix CXL port initialization order when the subsystem is built-in
cxl/events: Fix Trace DRAM Event Record
cxl/core: Return error when cxl_endpoint_gather_bandwidth() handles a non-PCI device
Linus Torvalds [Fri, 1 Nov 2024 23:41:55 +0000 (13:41 -1000)]
Merge tag 'block-6.12-
20241101' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fixup for a recent blk_rq_map_user_bvec() patch
- NVMe pull request via Keith:
- Spec compliant identification fix (Keith)
- Module parameter to enable backward compatibility on unusual
namespace formats (Keith)
- Target double free fix when using keys (Vitaliy)
- Passthrough command error handling fix (Keith)
* tag 'block-6.12-
20241101' of git://git.kernel.dk/linux:
nvme: re-fix error-handling for io_uring nvme-passthrough
nvmet-auth: assign dh_key to NULL after kfree_sensitive
nvme: module parameter to disable pi with offsets
block: fix queue limits checks in blk_rq_map_user_bvec for real
nvme: enhance cns version checking
Linus Torvalds [Fri, 1 Nov 2024 23:38:01 +0000 (13:38 -1000)]
Merge tag 'io_uring-6.12-
20241101' of git://git.kernel.dk/linux
Pull io_uring fix from Jens Axboe:
- Fix not honoring IOCB_NOWAIT for starting buffered writes in terms of
calling sb_start_write(), leading to a deadlock if someone is
attempting to freeze the file system with writes in progress, as each
side will end up waiting for the other to make progress.
* tag 'io_uring-6.12-
20241101' of git://git.kernel.dk/linux:
io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
Linus Torvalds [Fri, 1 Nov 2024 19:04:23 +0000 (09:04 -1000)]
Merge tag 'acpi-6.12-rc6' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Make the ACPI CPPC library use a raw spinlock for operations carried
out in scheduler context via the schedutil governor and the ACPI CPPC
cpufreq driver (Pierre Gondois)"
* tag 'acpi-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Make rmw_lock a raw_spin_lock
Linus Torvalds [Fri, 1 Nov 2024 19:03:02 +0000 (09:03 -1000)]
Merge tag 'gpio-fixes-for-v6.12-rc6' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix an uninitialized variable in GPIO swnode code
- add a missing return value check for devm_mutex_init()
- fix an old issue with debugfs output
* tag 'gpio-fixes-for-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: fix debugfs dangling chip separator
gpiolib: fix debugfs newline separators
gpio: sloppy-logic-analyzer: Check for error code from devm_mutex_init() call
gpio: fix uninit-value in swnode_find_gpio
Dave Airlie [Fri, 1 Nov 2024 18:44:02 +0000 (04:44 +1000)]
Merge tag 'drm-xe-fixes-2024-10-31' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix missing HPD interrupt enabling, bringing one PM refactor with it
(Imre / Maarten)
- Workaround LNL GGTT invalidation not being visible to GuC
(Matthew Brost)
- Avoid getting jobs stuck without a protecting timeout (Matthew Brost)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/tsbftadm7owyizzdaqnqu7u4tqggxgeqeztlfvmj5fryxlfomi@5m5bfv2zvzmw
Linus Torvalds [Fri, 1 Nov 2024 18:26:38 +0000 (08:26 -1000)]
Merge tag 'riscv-for-linus-6.11-rc6' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- Avoid accessing the early boot ACPI tables via unsafe memory
attributes, which can result in incorrect ACPI table data appearing.
This can cause all sorts of bad behavior.
- Avoid compiler-inserted library calls in the VDSO.
- GCC+Rust builds have been disabled, to avoid issues related to ISA
string mismatched between the GCC and LLVM Rust implementations.
- The NX flag is now set in the EFI PE/COFF headers, which is necessary
for some distro GRUB versions to boot images.
- A fix to avoid leaking DT node reference counts on ACPI systems
during cache info parsing.
- CPU numbers are now printed as unsigned values during hotplug.
- A pair of build fixes for usused macros, which can trigger warnings
on some configurations.
* tag 'riscv-for-linus-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Remove duplicated GET_RM
riscv: Remove unused GENERATING_ASM_OFFSETS
riscv: Use '%u' to format the output of 'cpu'
riscv: Prevent a bad reference count on CPU nodes
riscv: efi: Set NX compat flag in PE/COFF header
RISC-V: disallow gcc + rust builds
riscv: Do not use fortify in early code
RISC-V: ACPI: fix early_ioremap to early_memremap
riscv: vdso: Prevent the compiler from inserting calls to memset()
Linus Torvalds [Fri, 1 Nov 2024 17:54:11 +0000 (07:54 -1000)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The important one is a change to the way in which we handle protection
keys around signal delivery so that we're more closely aligned with
the x86 behaviour, however there is also a revert of the previous fix
to disable software tag-based KASAN with GCC, since a workaround
materialised shortly afterwards.
I'd love to say we're done with 6.12, but we're aware of some
longstanding fpsimd register corruption issues that we're almost at
the bottom of resolving.
Summary:
- Fix handling of POR_EL0 during signal delivery so that pushing the
signal context doesn't fail based on the pkey configuration of the
interrupted context and align our user-visible behaviour with that
of x86.
- Fix a bogus pointer being passed to the CPU hotplug code from the
Arm SDEI driver.
- Re-enable software tag-based KASAN with GCC by using an alternative
implementation of '__no_sanitize_address'"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state()
Revert "kasan: Disable Software Tag-Based KASAN with GCC"
kasan: Fix Software Tag-Based KASAN with GCC
Linus Torvalds [Fri, 1 Nov 2024 17:45:00 +0000 (07:45 -1000)]
Merge tag 'vfs-6.12-rc6.iomap' of gitolite.pub/scm/linux/kernel/git/vfs/vfs
Pull iomap fixes from Christian Brauner:
"Fixes for iomap to prevent data corruption bugs in the fallocate
unshare range implementation of fsdax and a small cleanup to turn
iomap_want_unshare_iter() into an inline function"
* tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
iomap: turn iomap_want_unshare_iter into an inline function
fsdax: dax_unshare_iter needs to copy entire blocks
fsdax: remove zeroing code from dax_unshare_iter
iomap: share iomap_unshare_iter predicate code with fsdax
xfs: don't allocate COW extents when unsharing a hole
Linus Torvalds [Fri, 1 Nov 2024 17:37:10 +0000 (07:37 -1000)]
Merge tag 'vfs-6.12-rc6.fixes' of gitolite.pub/scm/linux/kernel/git/vfs/vfs
Pull filesystem fixes from Christian Brauner:
"VFS:
- Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set
- Add a get_tree_bdev_flags() helper that allows to modify e.g.,
whether errors are logged into the filesystem context during
superblock creation. This is used by erofs to fix a userspace
regression where an error is currently logged when its used on a
regular file which is an new allowed mode in erofs.
netfs:
- Fix the sysfs debug path in the documentation.
- Fix iov_iter_get_pages*() for folio queues by skipping the page
extracation if we're at the end of a folio.
afs:
- Fix moving subdirectories to different parent directory.
autofs:
- Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in
validate_dev_ioctl(). The actual ioctl number, not the ioctl
command needs to be checked for autofs"
* tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP
autofs: fix thinko in validate_dev_ioctl()
iov_iter: Fix iov_iter_get_pages*() for folio_queue
afs: Fix missing subdir edit when renamed between parent dirs
doc: correcting the debug path for cachefiles
erofs: use get_tree_bdev_flags() to avoid misleading messages
fs/super.c: introduce get_tree_bdev_flags()
Linus Torvalds [Fri, 1 Nov 2024 17:31:47 +0000 (07:31 -1000)]
Merge tag 'for-6.12-rc5-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more stability fixes. There's one patch adding export of MIPS
cmpxchg helper, used in the error propagation fix.
- fix error propagation from split bios to the original btrfs bio
- fix merging of adjacent extents (normal operation, defragmentation)
- fix potential use after free after freeing btrfs device structures"
* tag 'for-6.12-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix defrag not merging contiguous extents due to merged extent maps
btrfs: fix extent map merging not happening for adjacent extents
btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()
btrfs: fix error propagation of split bios
MIPS: export __cmpxchg_small()
Linus Torvalds [Fri, 1 Nov 2024 17:21:03 +0000 (07:21 -1000)]
Merge tag 'bcachefs-2024-10-31' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Various syzbot fixes, and the more notable ones:
- Fix for pointers in an extent overflowing the max (16) on a
filesystem with many devices: we were creating too many cached
copies when moving data around. Now, we only create at most one
cached copy if there's a promote target set.
Caching will be a bit broken for reflinked data until 6.13: I have
larger series queued up which significantly improves the plumbing
for data options down into the extent (bch_extent_rebalance) to fix
this.
- Fix for deadlock on -ENOSPC on tiny filesystems
Allocation from the partial open_bucket list wasn't correctly
accounting partial open_buckets as free: this fixes the main cause
of tests timing out in the automated tests"
* tag 'bcachefs-2024-10-31' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix NULL ptr dereference in btree_node_iter_and_journal_peek
bcachefs: fix possible null-ptr-deref in __bch2_ec_stripe_head_get()
bcachefs: Fix deadlock on -ENOSPC w.r.t. partial open buckets
bcachefs: Don't filter partial list buckets in open_buckets_to_text()
bcachefs: Don't keep tons of cached pointers around
bcachefs: init freespace inited bits to 0 in bch2_fs_initialize
bcachefs: Fix unhandled transaction restart in fallocate
bcachefs: Fix UAF in bch2_reconstruct_alloc()
bcachefs: fix null-ptr-deref in have_stripes()
bcachefs: fix shift oob in alloc_lru_idx_fragmentation
bcachefs: Fix invalid shift in validate_sb_layout()
Vlastimil Babka [Thu, 24 Oct 2024 15:12:29 +0000 (17:12 +0200)]
mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
Since commit
efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
boundaries") a mmap() of anonymous memory without a specific address hint
and of at least PMD_SIZE will be aligned to PMD so that it can benefit
from a THP backing page.
However this change has been shown to regress some workloads
significantly. [1] reports regressions in various spec benchmarks, with
up to 600% slowdown of the cactusBSSN benchmark on some platforms. The
benchmark seems to create many mappings of 4632kB, which would have merged
to a large THP-backed area before commit
efa7df3e3bb5 and now they are
fragmented to multiple areas each aligned to PMD boundary with gaps
between. The regression then seems to be caused mainly due to the
benchmark's memory access pattern suffering from TLB or cache aliasing due
to the aligned boundaries of the individual areas.
Another known regression bisected to commit
efa7df3e3bb5 is darktable [2]
[3] and early testing suggests this patch fixes the regression there as
well.
To fix the regression but still try to benefit from THP-friendly anonymous
mapping alignment, add a condition that the size of the mapping must be a
multiple of PMD size instead of at least PMD size. In case of many
odd-sized mapping like the cactusBSSN creates, those will stop being
aligned and with gaps between, and instead naturally merge again.
Link: https://lkml.kernel.org/r/20241024151228.101841-2-vbabka@suse.cz
Fixes:
efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Michael Matz <matz@suse.de>
Debugged-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Closes: https://bugzilla.suse.com/show_bug.cgi?id=
1229012 [1]
Reported-by: Matthias Bodenbinder <matthias@bodenbinder.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219366 [2]
Closes: https://lore.kernel.org/all/
2050f0d4-57b0-481d-bab8-
05e8d48fed0c@leemhuis.info/ [3]
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Yang Shi <yang@os.amperecomputing.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Petr Tesarik <ptesarik@suse.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chen Ridong [Fri, 25 Oct 2024 06:09:42 +0000 (06:09 +0000)]
mm: shrinker: avoid memleak in alloc_shrinker_info
A memleak was found as below:
unreferenced object 0xffff8881010d2a80 (size 32):
comm "mkdir", pid 1559, jiffies
4294932666
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 @...............
backtrace (crc
2e7ef6fa):
[<
ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470
[<
ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0
[<
ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0
[<
ffffffff81198dd9>] online_css+0x29/0xa0
[<
ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360
[<
ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0
[<
ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90
[<
ffffffff813dbb24>] vfs_mkdir+0x144/0x220
[<
ffffffff813e1c97>] do_mkdirat+0x87/0x130
[<
ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70
[<
ffffffff81f8c928>] do_syscall_64+0x68/0x140
[<
ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
alloc_shrinker_info(), when shrinker_unit_alloc() returns an errer, the
info won't be freed. Just fix it.
Link: https://lkml.kernel.org/r/20241025060942.1049263-1-chenridong@huaweicloud.com
Fixes:
307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Wang Weiyang <wangweiyang2@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Eugen Hristev [Fri, 25 Oct 2024 08:58:48 +0000 (11:58 +0300)]
.mailmap: update e-mail address for Eugen Hristev
Update e-mail address.
Link: https://lkml.kernel.org/r/20241025085848.483149-1-eugen.hristev@linaro.org
Signed-off-by: Eugen Hristev <eugen.hristev@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Gregory Price [Fri, 25 Oct 2024 14:17:24 +0000 (10:17 -0400)]
vmscan,migrate: fix page count imbalance on node stats when demoting pages
When numa balancing is enabled with demotion, vmscan will call
migrate_pages when shrinking LRUs. migrate_pages will decrement the
the node's isolated page count, leading to an imbalanced count when
invoked from (MG)LRU code.
The result is dmesg output like such:
$ cat /proc/sys/vm/stat_refresh
[77383.088417] vmstat_refresh: nr_isolated_anon -103212
[77383.088417] vmstat_refresh: nr_isolated_file -899642
This negative value may impact compaction and reclaim throttling.
The following path produces the decrement:
shrink_folio_list
demote_folio_list
migrate_pages
migrate_pages_batch
migrate_folio_move
migrate_folio_done
mod_node_page_state(-ve) <- decrement
This path happens for SUCCESSFUL migrations, not failures. Typically
callers to migrate_pages are required to handle putback/accounting for
failures, but this is already handled in the shrink code.
When accounting for migrations, instead do not decrement the count when
the migration reason is MR_DEMOTION. As of v6.11, this demotion logic
is the only source of MR_DEMOTION.
Link: https://lkml.kernel.org/r/20241025141724.17927-1-gourry@gourry.net
Fixes:
26aa2d199d6f ("mm/migrate: demote pages during reclaim")
Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jarkko Sakkinen [Fri, 25 Oct 2024 18:15:28 +0000 (21:15 +0300)]
mailmap: update Jarkko's email addresses
Remove my previous work email, and the new one. The previous was never
used in the commit log, so there's no good reason to spare it.
Link: https://lkml.kernel.org/r/20241025181530.6151-1-jarkko@kernel.org
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Alex Elder <elder@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geliang Tang <geliang@kernel.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: Matt Ranostay <matt@ranostay.sg>
Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Cc: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Fri, 1 Nov 2024 02:49:23 +0000 (16:49 -1000)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
- Put the QP netlink dump back in cxgb4, fixes a user visible
regression
- Don't change the rounding style in mlx5 for user provided rd_atomic
values
- Resolve a race in bnxt_re around the qp-handle table array
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/bnxt_re: synchronize the qp-handle table array
RDMA/bnxt_re: Fix the usage of control path spin locks
RDMA/mlx5: Round max_rd_atomic/max_dest_rd_atomic up instead of down
RDMA/cxgb4: Dump vendor specific QP details
Linus Torvalds [Fri, 1 Nov 2024 00:56:19 +0000 (14:56 -1000)]
Merge tag 'bpf-fixes' of git://git./linux/kernel/git/bpf/bpf
Pull bpf fixes from Daniel Borkmann:
- Fix BPF verifier to force a checkpoint when the program's jump
history becomes too long (Eduard Zingerman)
- Add several fixes to the BPF bits iterator addressing issues like
memory leaks and overflow problems (Hou Tao)
- Fix an out-of-bounds write in trie_get_next_key (Byeonguk Jeong)
- Fix BPF test infra's LIVE_FRAME frame update after a page has been
recycled (Toke Høiland-Jørgensen)
- Fix BPF verifier and undo the 40-bytes extra stack space for
bpf_fastcall patterns due to various bugs (Eduard Zingerman)
- Fix a BPF sockmap race condition which could trigger a NULL pointer
dereference in sock_map_link_update_prog (Cong Wang)
- Fix tcp_bpf_recvmsg_parser to retrieve seq_copied from tcp_sk under
the socket lock (Jiayuan Chen)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled
selftests/bpf: Add three test cases for bits_iter
bpf: Use __u64 to save the bits in bits iterator
bpf: Check the validity of nr_words in bpf_iter_bits_new()
bpf: Add bpf_mem_alloc_check_size() helper
bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
bpf: disallow 40-bytes extra stack for bpf_fastcall patterns
selftests/bpf: Add test for trie_get_next_key()
bpf: Fix out-of-bounds write in trie_get_next_key()
selftests/bpf: Test with a very short loop
bpf: Force checkpoint when jmp history is too long
bpf: fix filed access without lock
sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()
Linus Torvalds [Thu, 31 Oct 2024 22:39:58 +0000 (12:39 -1000)]
Merge tag 'net-6.12-rc6' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from WiFi, bluetooth and netfilter.
No known new regressions outstanding.
Current release - regressions:
- wifi: mt76: do not increase mcu skb refcount if retry is not
supported
Current release - new code bugs:
- wifi:
- rtw88: fix the RX aggregation in USB 3 mode
- mac80211: fix memory corruption bug in struct ieee80211_chanctx
Previous releases - regressions:
- sched:
- stop qdisc_tree_reduce_backlog on TC_H_ROOT
- sch_api: fix xa_insert() error path in tcf_block_get_ext()
- wifi:
- revert "wifi: iwlwifi: remove retry loops in start"
- cfg80211: clear wdev->cqm_config pointer on free
- netfilter: fix potential crash in nf_send_reset6()
- ip_tunnel: fix suspicious RCU usage warning in ip_tunnel_find()
- bluetooth: fix null-ptr-deref in hci_read_supported_codecs
- eth: mlxsw: add missing verification before pushing Tx header
- eth: hns3: fixed hclge_fetch_pf_reg accesses bar space out of
bounds issue
Previous releases - always broken:
- wifi: mac80211: do not pass a stopped vif to the driver in
.get_txpower
- netfilter: sanitize offset and length before calling skb_checksum()
- core:
- fix crash when config small gso_max_size/gso_ipv4_max_size
- skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
- mptcp: protect sched with rcu_read_lock
- eth: ice: fix crash on probe for DPLL enabled E810 LOM
- eth: macsec: fix use-after-free while sending the offloading packet
- eth: stmmac: fix unbalanced DMA map/unmap for non-paged SKB data
- eth: hns3: fix kernel crash when 1588 is sent on HIP08 devices
- eth: mtk_wed: fix path of MT7988 WO firmware"
* tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
net: hns3: fix kernel crash when 1588 is sent on HIP08 devices
net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue
net: hns3: initialize reset_timer before hclgevf_misc_irq_init()
net: hns3: don't auto enable misc vector
net: hns3: Resolved the issue that the debugfs query result is inconsistent.
net: hns3: fix missing features due to dev->features configuration too early
net: hns3: fixed reset failure issues caused by the incorrect reset type
net: hns3: add sync command to sync io-pgtable
net: hns3: default enable tx bounce buffer when smmu enabled
netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
net: ethernet: mtk_wed: fix path of MT7988 WO firmware
selftests: forwarding: Add IPv6 GRE remote change tests
mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address
mlxsw: pci: Sync Rx buffers for device
mlxsw: pci: Sync Rx buffers for CPU
mlxsw: spectrum_ptp: Add missing verification before pushing Tx header
net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs
netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()
netfilter: Fix use-after-free in get_info()
...
Dave Airlie [Thu, 31 Oct 2024 21:34:14 +0000 (07:34 +1000)]
Merge tag 'mediatek-drm-fixes-
20241028' of https://git./linux/kernel/git/chunkuang.hu/linux into drm-fixes
Mediatek DRM Fixes -
20241028
1. Fix degradation problem of alpha blending
2. Fix color format MACROs in OVL
3. Fix get efuse issue for MT8188 DPTX
4. Fix potential NULL dereference in mtk_crtc_destroy()
5. Correct dpi power-domains property
6. Add split subschema property constraints
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241028135846.3570-1-chunkuang.hu@kernel.org
Dave Airlie [Thu, 31 Oct 2024 21:24:37 +0000 (07:24 +1000)]
Merge tag 'amd-drm-fixes-6.12-2024-10-31' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.12-2024-10-31:
amdgpu:
- DCN 3.5 fix
- Vangogh SMU KASAN fix
- SMU 13 profile reporting fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241031151539.3523633-1-alexander.deucher@amd.com
Dave Airlie [Thu, 31 Oct 2024 19:05:41 +0000 (05:05 +1000)]
Merge tag 'drm-misc-fixes-2024-10-31' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
ivpu:
- Fix firewall IRQ handling
panthor:
- Fix firmware initialization wrt page sizes
- Fix handling and reporting of dead job groups
sched:
- Guarantee forward progress via WC_MEM_RECLAIM
tests:
- Fix memory leak in drm_display_mode_from_cea_vic()
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241031144348.GA7826@linux-2.fritz.box
Linus Torvalds [Thu, 31 Oct 2024 18:15:40 +0000 (08:15 -1000)]
Merge tag 'sound-6.12-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here we see slightly more commits than wished, but basically all are
small and mostly trivial fixes.
The only core change is the workaround for __counted_by() usage in
ASoC DAPM code, while the rest are device-specific fixes for Intel
Baytrail devices, Cirrus and wcd937x codecs, and HD-audio / USB-audio
devices"
* tag 'sound-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1
ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3
ALSA: usb-audio: Add quirks for Dell WD19 dock
ASoC: codecs: wcd937x: relax the AUX PDM watchdog
ASoC: codecs: wcd937x: add missing LO Switch control
ASoC: dt-bindings: rockchip,rk3308-codec: add port property
ALSA: hda/realtek: Add subwoofer quirk for Infinix ZERO BOOK 13
ASoC: dapm: fix bounds checker error in dapm_widget_list_create
ASoC: Intel: sst: Fix used of uninitialized ctx to log an error
ASoC: cs42l51: Fix some error handling paths in cs42l51_probe()
ASoC: Intel: sst: Support LPE0F28 ACPI HID
ALSA: hda/realtek: Limit internal Mic boost on Dell platform
ASoC: Intel: bytcr_rt5640: Add DMI quirk for Vexia Edu Atla 10 tablet
ASoC: Intel: bytcr_rt5640: Add support for non ACPI instantiated codec
ASoC: codecs: rt5640: Always disable IRQs from rt5640_cancel_work()
Johan Hovold [Mon, 28 Oct 2024 12:49:59 +0000 (13:49 +0100)]
gpiolib: fix debugfs dangling chip separator
Add the missing newline after entries for recently removed gpio chips
so that the chip sections are separated by a newline as intended.
Fixes:
e348544f7994 ("gpio: protect the list of GPIO devices with SRCU")
Cc: stable@vger.kernel.org # 6.9
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20241028125000.24051-3-johan+linaro@kernel.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Johan Hovold [Mon, 28 Oct 2024 12:49:58 +0000 (13:49 +0100)]
gpiolib: fix debugfs newline separators
The gpiolib debugfs interface exports a list of all gpio chips in a
system and the state of their pins.
The gpio chip sections are supposed to be separated by a newline
character, but a long-standing bug prevents the separator from
being included when output is generated in multiple sessions, making the
output inconsistent and hard to read.
Make sure to only suppress the newline separator at the beginning of the
file as intended.
Fixes:
f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface")
Cc: stable@vger.kernel.org # 3.7
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20241028125000.24051-2-johan+linaro@kernel.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Filipe Manana [Tue, 29 Oct 2024 15:18:45 +0000 (15:18 +0000)]
btrfs: fix defrag not merging contiguous extents due to merged extent maps
When running defrag (manual defrag) against a file that has extents that
are contiguous and we already have the respective extent maps loaded and
merged, we end up not defragging the range covered by those contiguous
extents. This happens when we have an extent map that was the result of
merging multiple extent maps for contiguous extents and the length of the
merged extent map is greater than or equals to the defrag threshold
length.
The script below reproduces this scenario:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV
mount $DEV $MNT
# Create a 256K file with 4 extents of 64K each.
xfs_io -f -c "falloc 0 64K" \
-c "pwrite 0 64K" \
-c "falloc 64K 64K" \
-c "pwrite 64K 64K" \
-c "falloc 128K 64K" \
-c "pwrite 128K 64K" \
-c "falloc 192K 64K" \
-c "pwrite 192K 64K" \
$MNT/foo
umount $MNT
echo -n "Initial number of file extent items: "
btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l
mount $DEV $MNT
# Read the whole file in order to load and merge extent maps.
cat $MNT/foo > /dev/null
btrfs filesystem defragment -t 128K $MNT/foo
umount $MNT
echo -n "Number of file extent items after defrag with 128K threshold: "
btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l
mount $DEV $MNT
# Read the whole file in order to load and merge extent maps.
cat $MNT/foo > /dev/null
btrfs filesystem defragment -t 256K $MNT/foo
umount $MNT
echo -n "Number of file extent items after defrag with 256K threshold: "
btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l
Running it:
$ ./test.sh
Initial number of file extent items: 4
Number of file extent items after defrag with 128K threshold: 4
Number of file extent items after defrag with 256K threshold: 4
The 4 extents don't get merged because we have an extent map with a size
of 256K that is the result of merging the individual extent maps for each
of the four 64K extents and at defrag_lookup_extent() we have a value of
zero for the generation threshold ('newer_than' argument) since this is a
manual defrag. As a consequence we don't call defrag_get_extent() to get
an extent map representing a single file extent item in the inode's
subvolume tree, so we end up using the merged extent map at
defrag_collect_targets() and decide not to defrag.
Fix this by updating defrag_lookup_extent() to always discard extent maps
that were merged and call defrag_get_extent() regardless of the minimum
generation threshold ('newer_than' argument).
A test case for fstests will be sent along soon.
CC: stable@vger.kernel.org # 6.1+
Fixes:
199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 28 Oct 2024 16:23:00 +0000 (16:23 +0000)]
btrfs: fix extent map merging not happening for adjacent extents
If we have 3 or more adjacent extents in a file, that is, consecutive file
extent items pointing to adjacent extents, within a contiguous file range
and compatible flags, we end up not merging all the extents into a single
extent map.
For example:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt/sdc
$ xfs_io -f -d -c "pwrite -b 64K 0 64K" \
-c "pwrite -b 64K 64K 64K" \
-c "pwrite -b 64K 128K 64K" \
-c "pwrite -b 64K 192K 64K" \
/mnt/sdc/foo
After all the ordered extents complete we unpin the extent maps and try
to merge them, but instead of getting a single extent map we get two
because:
1) When the first ordered extent completes (file range [0, 64K)) we
unpin its extent map and attempt to merge it with the extent map for
the range [64K, 128K), but we can't because that extent map is still
pinned;
2) When the second ordered extent completes (file range [64K, 128K)), we
unpin its extent map and merge it with the previous extent map, for
file range [0, 64K), but we can't merge with the next extent map, for
the file range [128K, 192K), because this one is still pinned.
The merged extent map for the file range [0, 128K) gets the flag
EXTENT_MAP_MERGED set;
3) When the third ordered extent completes (file range [128K, 192K)), we
unpin its extent map and attempt to merge it with the previous extent
map, for file range [0, 128K), but we can't because that extent map
has the flag EXTENT_MAP_MERGED set (mergeable_maps() returns false
due to different flags) while the extent map for the range [128K, 192K)
doesn't have that flag set.
We also can't merge it with the next extent map, for file range
[192K, 256K), because that one is still pinned.
At this moment we have 3 extent maps:
One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set.
One for file range [128K, 192K).
One for file range [192K, 256K) which is still pinned;
4) When the fourth and final extent completes (file range [192K, 256K)),
we unpin its extent map and attempt to merge it with the previous
extent map, for file range [128K, 192K), which succeeds since none
of these extent maps have the EXTENT_MAP_MERGED flag set.
So we end up with 2 extent maps:
One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set.
One for file range [128K, 256K), with the flag EXTENT_MAP_MERGED set.
Since after merging extent maps we don't attempt to merge again, that
is, merge the resulting extent map with the one that is now preceding
it (and the one following it), we end up with those two extent maps,
when we could have had a single extent map to represent the whole file.
Fix this by making mergeable_maps() ignore the EXTENT_MAP_MERGED flag.
While this doesn't present any functional issue, it prevents the merging
of extent maps which allows to save memory, and can make defrag not
merging extents too (that will be addressed in the next patch).
Fixes:
199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Toke Høiland-Jørgensen [Wed, 30 Oct 2024 10:48:26 +0000 (11:48 +0100)]
bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled
The test_run code detects whether a page has been modified and
re-initialises the xdp_frame structure if it has, using
xdp_update_frame_from_buff(). However, xdp_update_frame_from_buff()
doesn't touch frame->mem, so that wasn't correctly re-initialised, which
led to the pages from page_pool not being returned correctly. Syzbot
noticed this as a memory leak.
Fix this by also copying the frame->mem structure when re-initialising
the frame, like we do on initialisation of a new page from page_pool.
Fixes:
e5995bc7e2ba ("bpf, test_run: fix crashes due to XDP frame overwriting/corruption")
Fixes:
b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Reported-by: syzbot+d121e098da06af416d23@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+d121e098da06af416d23@syzkaller.appspotmail.com
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/bpf/20241030-test-run-mem-fix-v1-1-41e88e8cae43@redhat.com
Jens Axboe [Thu, 31 Oct 2024 15:10:07 +0000 (09:10 -0600)]
Merge tag 'nvme-6.12-2024-10-31' of git://git.infradead.org/nvme into block-6.12
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.12
- Spec compliant identification fix (Keith)
- Module parameter to enable backward compatibility on unusual
namespace formats (Keith)
- Target double free fix when using keys (Vitaliy)
- Passthrough command error handling fix (Keith)"
* tag 'nvme-6.12-2024-10-31' of git://git.infradead.org/nvme:
nvme: re-fix error-handling for io_uring nvme-passthrough
nvmet-auth: assign dh_key to NULL after kfree_sensitive
nvme: module parameter to disable pi with offsets
nvme: enhance cns version checking
Jens Axboe [Thu, 31 Oct 2024 14:05:44 +0000 (08:05 -0600)]
io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
When io_uring starts a write, it'll call kiocb_start_write() to bump the
super block rwsem, preventing any freezes from happening while that
write is in-flight. The freeze side will grab that rwsem for writing,
excluding any new writers from happening and waiting for existing writes
to finish. But io_uring unconditionally uses kiocb_start_write(), which
will block if someone is currently attempting to freeze the mount point.
This causes a deadlock where freeze is waiting for previous writes to
complete, but the previous writes cannot complete, as the task that is
supposed to complete them is blocked waiting on starting a new write.
This results in the following stuck trace showing that dependency with
the write blocked starting a new write:
task:fio state:D stack:0 pid:886 tgid:886 ppid:876
Call trace:
__switch_to+0x1d8/0x348
__schedule+0x8e8/0x2248
schedule+0x110/0x3f0
percpu_rwsem_wait+0x1e8/0x3f8
__percpu_down_read+0xe8/0x500
io_write+0xbb8/0xff8
io_issue_sqe+0x10c/0x1020
io_submit_sqes+0x614/0x2110
__arm64_sys_io_uring_enter+0x524/0x1038
invoke_syscall+0x74/0x268
el0_svc_common.constprop.0+0x160/0x238
do_el0_svc+0x44/0x60
el0_svc+0x44/0xb0
el0t_64_sync_handler+0x118/0x128
el0t_64_sync+0x168/0x170
INFO: task fsfreeze:7364 blocked for more than 15 seconds.
Not tainted
6.12.0-rc5-00063-g76aaf945701c #7963
with the attempting freezer stuck trying to grab the rwsem:
task:fsfreeze state:D stack:0 pid:7364 tgid:7364 ppid:995
Call trace:
__switch_to+0x1d8/0x348
__schedule+0x8e8/0x2248
schedule+0x110/0x3f0
percpu_down_write+0x2b0/0x680
freeze_super+0x248/0x8a8
do_vfs_ioctl+0x149c/0x1b18
__arm64_sys_ioctl+0xd0/0x1a0
invoke_syscall+0x74/0x268
el0_svc_common.constprop.0+0x160/0x238
do_el0_svc+0x44/0x60
el0_svc+0x44/0xb0
el0t_64_sync_handler+0x118/0x128
el0t_64_sync+0x168/0x170
Fix this by having the io_uring side honor IOCB_NOWAIT, and only attempt a
blocking grab of the super block rwsem if it isn't set. For normal issue
where IOCB_NOWAIT would always be set, this returns -EAGAIN which will
have io_uring core issue a blocking attempt of the write. That will in
turn also get completions run, ensuring forward progress.
Since freezing requires CAP_SYS_ADMIN in the first place, this isn't
something that can be triggered by a regular user.
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Peter Mann <peter.mann@sh.cz>
Link: https://lore.kernel.org/io-uring/38c94aec-81c9-4f62-b44e-1d87f5597644@sh.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Matthew Brost [Fri, 25 Oct 2024 21:43:29 +0000 (14:43 -0700)]
drm/xe: Don't short circuit TDR on jobs not started
Short circuiting TDR on jobs not started is an optimization which is not
required. On LNL we are facing an issue where jobs do not get scheduled
by the GuC if it misses a GGTT page update. When this occurs let the TDR
fire, toggle the scheduling which may get the job unstuck, and print a
warning message. If the TDR fires twice on job that hasn't started,
timeout the job.
v2:
- Add warning message (Paulo)
- Add fixes tag (Paulo)
- Timeout job which hasn't started after TDR firing twice
v3:
- Include local change
v4:
- Short circuit check_timeout on job not started
- use warn level rather than notice (Paulo)
Fixes:
7ddb9403dd74 ("drm/xe: Sample ctx timestamp to determine if jobs have timed out")
Cc: stable@vger.kernel.org
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241025214330.2010521-2-matthew.brost@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit
35d25a4a0012e690ef0cc4c5440231176db595cc)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>