David Sterba [Mon, 17 Feb 2025 22:01:15 +0000 (23:01 +0100)]
btrfs: pass struct btrfs_inode to new_simple_dir()
Pass a struct btrfs_inode to new_simple_dir() as it's an internal
interface, allowing to remove some use of BTRFS_I.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 17 Feb 2025 21:36:17 +0000 (22:36 +0100)]
btrfs: pass struct btrfs_inode to btrfs_iget_locked()
Pass a struct btrfs_inode to btrfs_inode() as it's an internal
interface, allowing to remove some use of BTRFS_I.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 17 Feb 2025 21:29:45 +0000 (22:29 +0100)]
btrfs: pass struct btrfs_inode to btrfs_read_locked_inode()
Pass a struct btrfs_inode to btrfs_read_locked_inode() as it's an
internal interface, allowing to remove some use of BTRFS_I.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 17 Feb 2025 21:20:27 +0000 (22:20 +0100)]
btrfs: pass struct btrfs_inode to extent_range_clear_dirty_for_io()
Pass a struct btrfs_inode to extent_range_clear_dirty_for_io() as it's
an internal interface, allowing to remove some use of BTRFS_I.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Mon, 17 Feb 2025 21:15:25 +0000 (22:15 +0100)]
btrfs: pass struct btrfs_inode to can_nocow_extent()
Pass a struct btrfs_inode to can_nocow_extent() as it's an internal
interface, allowing to remove some use of BTRFS_I.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 12 Feb 2025 20:22:16 +0000 (21:22 +0100)]
btrfs: update include and forward declarations in headers
Pass over all header files and add missing forward declarations,
includes or fix include types.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 12 Feb 2025 20:22:14 +0000 (21:22 +0100)]
btrfs: simplify returns and labels in btrfs_init_fs_root()
There's a label that does nothing else than return, so remove it and
also change other gotos to immediate returns as the function is short
enough for this pattern.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 12 Feb 2025 20:22:07 +0000 (21:22 +0100)]
btrfs: unify ordering of btrfs_key initializations
The btrfs_key is defined as objectid/type/offset and the keys are also
printed like that. For better readability, update all key
initializations to match this order.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 12 Feb 2025 20:22:05 +0000 (21:22 +0100)]
btrfs: zstd: remove local variable for storing page offsets
When using offset_in_page() it's clear what it means, we don't need to
store it in the local variable just to use it right away. There's no
change in the generated code, but keeps the declarations smaller.
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 12 Feb 2025 20:22:02 +0000 (21:22 +0100)]
btrfs: zstd: move zstd_parameters to the workspace
Reduce stack consumption of zstd_compress_folios() by 40 bytes
(10*sizeof(int)) as we can store struct zstd_parameters in the workspace
that is reused for each call.
typedef struct {
ZSTD_compressionParameters cParams;
ZSTD_frameParameters fParams;
} ZSTD_parameters;
typedef struct {
unsigned windowLog;
unsigned chainLog;
unsigned hashLog;
unsigned searchLog;
unsigned minMatch;
unsigned targetLength;
ZSTD_strategy strategy;
} ZSTD_compressionParameters;
typedef struct {
int contentSizeFlag;
int checksumFlag;
int noDictIDFlag;
} ZSTD_frameParameters;
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 12 Feb 2025 20:22:00 +0000 (21:22 +0100)]
btrfs: async-thread: switch local variables need_order bool
Use bool for 0/1 indicators in thresh_exec_hook() and
btrfs_work_helper().
Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 12 Feb 2025 20:21:50 +0000 (21:21 +0100)]
btrfs: add __cold attribute to extent_io_tree_panic()
This is a wrapper that leads to a panic, so add the annotation like the
other similar functions have.
Signed-off-by: David Sterba <dsterba@suse.com>
Johannes Thumshirn [Wed, 12 Feb 2025 14:05:00 +0000 (15:05 +0100)]
btrfs: zoned: exit btrfs_can_activate_zone if BTRFS_FS_NEED_ZONE_FINISH is set
If BTRFS_FS_NEED_ZONE_FINISH is already set for the whole filesystem, exit
early in btrfs_can_activate_zone(). There's no need to check if
BTRFS_FS_NEED_ZONE_FINISH needs to be set if it is already set.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 29 Jan 2025 03:59:30 +0000 (14:29 +1030)]
btrfs: require strict data/metadata split for subpage checks
Since we have btrfs_meta_is_subpage(), we should make btrfs_is_subpage()
to be data inode specific.
This change involves:
- Simplify btrfs_is_subpage()
Now we only need to do a very simple sectorsize check against
PAGE_SIZE.
And since the function is pretty simple now, just make it an inline
function.
- Add an extra ASSERT() to make sure btrfs_is_subpage() is only called
on data inode mapping
- Migrate btree_csum_one_bio() to use btrfs_meta_folio_*() helpers
- Migrate alloc_extent_buffer() to use btrfs_meta_folio_*() helpers
- Migrate end_bbio_meta_write() to use btrfs_meta_folio_*() helpers
Or we will trigger the ASSERT() due to calling btrfs_folio_*() on
metadata folios.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 29 Jan 2025 03:53:02 +0000 (14:23 +1030)]
btrfs: simplify subpage handling of read_extent_buffer_pages_nowait()
By using a shared bio_add_folio_nofail() with calculated
range_start/range_len, so no more explicit subpage routine needed.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 29 Jan 2025 03:35:26 +0000 (14:05 +1030)]
btrfs: simplify subpage handling of write_one_eb()
Currently inside write_one_eb() we have two different ways of handling
subpage and regular metadata.
The differences are:
- Extra offset/length calculation when adding the folio range to bio for
subpage cases
- Only decrease wbc->nr_to_write if the whole page is no longer dirty
for subpage cases
- Use subpage helper for subpage cases
Merge the tow ways into a shared one:
- Always calculate the to-be-queued range
So that bio_add_folio() can use the same calculated resulted length
and offset for both cases.
- Use btrfs_meta_folio_clear_dirty() and
btrfs_meta_folio_set_writeback() helpers
This will cover both cases.
- Only decrease wbc->nr_to_write if the folio is no longer dirty
Since we have the folio locked, no one else can modify the folio dirty
flags (set_extent_buffer_dirty() will also lock the folio for subpage
cases).
Thus after our btrfs_meta_folio_clear_dirty() call, if the whole folio
is no longer dirty, we're submitting the last dirty eb of the folio,
and can decrease wbc->nr_to_write properly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 29 Jan 2025 03:23:17 +0000 (13:53 +1030)]
btrfs: simplify subpage handling of btrfs_clear_buffer_dirty()
The function btrfs_clear_buffer_dirty() is called on dirty extent buffer
that will not be written back.
The function will call btree_clear_folio_dirty() to clear the folio
dirty flag and also clear PAGECACHE_TAG_DIRTY flag.
And we split the subpage and regular handling, as for subpage cases we
should only clear PAGECACHE_TAG_DIRTY if the last dirty extent buffer in
the page is cleared.
So here we can simplify the function by:
- Use the newly introduced btrfs_meta_folio_clear_and_test_dirty() helper
The helper will return true if we cleared the folio dirty flag.
With that we can use the same helper for both subpage and regular
cases.
- Rename btree_clear_folio_dirty() to btree_clear_folio_dirty_tag()
As we move the folio dirty clearing in the btrfs_clear_buffer_dirty().
- Call btrfs_meta_folio_clear_and_test_dirty() to clear the dirty flags
for both regular and subpage metadata cases
- Only call btree_clear_folio_dirty_tag() when the folio is no longer
dirty
- Update the comment inside set_extent_buffer_dirty()
As there is no separate clear_subpage_extent_buffer_dirty() anymore.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Wed, 29 Jan 2025 02:57:39 +0000 (13:27 +1030)]
btrfs: use metadata specific helpers to simplify extent buffer helpers
The following functions are doing metadata specific checks:
- set_extent_buffer_uptodate()
- clear_extent_buffer_uptodate()
The reason why we do not use btrfs_folio_*() helpers for those helpers
is, btrfs_is_subpage() cannot handle dummy extent buffer if nodesize >=
PAGE_SIZE but block size < PAGE_SIZE.
In that case, we do not need to attach extra bitmaps to the extent
buffer folio. But since dummy extent buffer folios are not attached to
btree inode, btrfs_is_subpage() will return true, causing problems.
And the following are using btrfs_folio_*() helpers for metadata, but
in theory we should use metadata specific checks:
- set_extent_buffer_dirty()
This is not causing problems because a dummy extent buffer should never
be marked dirty.
To make code simpler, introduce btrfs_meta_folio_*() helpers, to do
the metadata specific handling, so that we do not to open-code such
checks in above involved functions.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 28 Jan 2025 05:24:40 +0000 (15:54 +1030)]
btrfs: make subpage attach and detach handle metadata properly
Currently subpage attach/detach is not doing proper dummy extent buffer
subpage check, as btrfs_is_subpage() is not reliable for dummy extent
buffer folios.
Since we have a metadata specific check now, use that for
btrfs_attach_subpage() first.
Then enhance btrfs_detach_subpage() to accept a type parameter, so that
we can do extra checks for dummy extent buffers properly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 28 Jan 2025 04:56:42 +0000 (15:26 +1030)]
btrfs: factor out metadata subpage detection into a dedicated helper
Currently we have only one btrfs_is_subpage() to cover both data and
metadata.
But there is a special case for metadata:
- dummy extent buffer, sector size < PAGE_SIZE and node size >= PAGE_SIZE
In such case, btrfs_is_subpage() will return true for extent buffer
folio.
But that is not correct, and that's exactly why we have some open-coded
checks for functions like set_extent_buffer_uptodate() and
clear_extent_buffer_uptodate().
Just extract the metadata specific checks into a helper, and replace
those call sites.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 24 Jan 2025 05:17:22 +0000 (15:47 +1030)]
btrfs: remove btrfs_fs_info::sectors_per_page
For the future large folio support, our filemap can have folios with
different sizes, thus we can no longer rely on a fixed blocks_per_page
value.
To prepare for that future, here we do:
- Remove btrfs_fs_info::sectors_per_page
- Introduce a helper, btrfs_blocks_per_folio()
Which uses the folio size to calculate the number of blocks for each
folio.
- Migrate the existing btrfs_fs_info::sectors_per_page to use that
helper
There are some exceptions:
* Metadata nodesize < page size support
In the future, even if we support large folios, we will only
allocate a folio that matches our nodesize.
Thus we won't have a folio covering multiple metadata unless
nodesize < page size.
* Existing subpage bitmap dump
We use a single unsigned long to store the bitmap.
That means until we change the bitmap dumping code, our upper limit
for folio size will only be 256K (4K block size, 64 bit unsigned
long).
* btrfs_is_subpage() check
This will be migrated into a future patch.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Daniel Vacek [Thu, 30 Jan 2025 17:58:19 +0000 (18:58 +0100)]
btrfs: zstd: enable negative compression levels mount option
Allow using the fast modes (negative compression levels) of zstd as a
mount option.
As per the results, the compression ratio is (expectedly) lower:
for level in {-15..-1} 1 2 3; \
do printf "level %3d\n" $level; \
mount -o compress=zstd:$level /dev/sdb /mnt/test/; \
grep sdb /proc/mounts; \
cp -r /usr/bin /mnt/test/; sync; compsize /mnt/test/bin; \
cp -r /usr/share/doc /mnt/test/; sync; compsize /mnt/test/doc; \
cp enwik9 /mnt/test/; sync; compsize /mnt/test/enwik9; \
cp linux-6.13.tar /mnt/test/; sync; compsize /mnt/test/linux-6.13.tar; \
rm -r /mnt/test/{bin,doc,enwik9,linux-6.13.tar}; \
umount /mnt/test/; \
done |& tee results | \
awk '/^level/{print}/^TOTAL/{print$3"\t"$2" |"}' | paste - - - - -
266M bin | 45M doc | 953M wiki | 1.4G source
=============================+===============+===============+===============+
level -15 180M 67% | 30M 68% | 694M 72% | 598M 40% |
level -14 180M 67% | 30M 67% | 683M 71% | 581M 39% |
level -13 177M 66% | 29M 66% | 671M 70% | 566M 38% |
level -12 174M 65% | 29M 65% | 658M 69% | 548M 37% |
level -11 174M 65% | 28M 64% | 645M 67% | 530M 35% |
level -10 171M 64% | 28M 62% | 631M 66% | 512M 34% |
level -9 165M 62% | 27M 61% | 615M 64% | 493M 33% |
level -8 161M 60% | 27M 59% | 598M 62% | 475M 32% |
level -7 155M 58% | 26M 58% | 582M 61% | 457M 30% |
level -6 151M 56% | 25M 56% | 565M 59% | 437M 29% |
level -5 145M 54% | 24M 55% | 545M 57% | 417M 28% |
level -4 139M 52% | 23M 52% | 520M 54% | 391M 26% |
level -3 135M 50% | 22M 50% | 495M 51% | 369M 24% |
level -2 127M 47% | 22M 48% | 470M 49% | 349M 23% |
level -1 120M 45% | 21M 47% | 452M 47% | 332M 22% |
level 1 110M 41% | 17M 39% | 362M 38% | 290M 19% |
level 2 106M 40% | 17M 38% | 349M 36% | 288M 19% |
level 3 104M 39% | 16M 37% | 340M 35% | 276M 18% |
The samples represent some data sets that can be commonly found and show
approximate compressibility. The fast levels trade off speed for ratio
and are best suitable for highly compressible data.
As can be seen above, comparing the results to the current default zstd
level 3, the negative levels are roughly 2x worse at -15 and the
ratio increases almost linearly with each level.
Signed-off-by: Daniel Vacek <neelx@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 13 Jan 2025 03:39:24 +0000 (14:09 +1030)]
btrfs: move ordered extent cleanup to where they are allocated
The ordered extent cleanup is hard to grasp because it doesn't follow
the common cleanup-asap pattern.
E.g. run_delalloc_nocow() and cow_file_range() allocate one or more
ordered extent, but if any error is hit, the cleanup is done later inside
btrfs_run_delalloc_range().
To change the existing delayed cleanup:
- Update the comment on error handling of run_delalloc_nocow()
There are in fact 3 different cases other than 2 if we are doing
ordered extents cleanup inside run_delalloc_nocow():
1) @cow_start and @cow_end not set
No fallback to COW at all.
Before @cur_offset we need to cleanup the OE and page dirty.
After @cur_offset just clear all involved page and extent flags.
2) @cow_start set but @cow_end not set.
This means we failed before even calling fallback_to_cow().
It's just a variant of case 1), where it's @cow_start splitting
the two parts (and we should just ignore @cur_offset since it's
advanced without any new ordered extent).
3) @cow_start and @cow_end both set
This means fallback_to_cow() failed, meaning [start, cow_start)
needs the regular OE and dirty folio cleanup, and skip range
[cow_start, cow_end) as cow_file_range() has done the cleanup,
and eventually cleanup [cow_end, end) range.
- Only reset @cow_start after fallback_to_cow() succeeded
As above case 2) and 3) are both relying on @cow_start to determine
the cleanup range.
- Move btrfs_cleanup_ordered_extents() into run_delalloc_nocow(),
cow_file_range() and nocow_one_range()
For cow_file_range() it's pretty straightforward and easy.
For run_delalloc_nocow() refer to the above 3 different error cases.
For nocow_one_range() if we hit an error, we need to cleanup the
ordered extents by ourselves.
And then it will fallback to case 1), since @cur_offset is not yet
advanced, the existing cleanup will co-operate with nocow_one_range()
well.
- Remove the btrfs_cleanup_ordered_extents() inside submit_uncompressed_range()
As failed cow_file_range() will do all the proper cleanup now.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 13 Jan 2025 03:23:41 +0000 (13:53 +1030)]
btrfs: factor out nocow ordered extent and extent map generation into a helper
Currently we're doing all the ordered extent and extent map generation
inside a while() loop of run_delalloc_nocow(). This makes it pretty
hard to read, nor doing proper error handling.
So move that part of code into a helper, nocow_one_range().
This should not change anything, but there is a tiny timing change where
btrfs_dec_nocow_writers() is only called after nocow_one_range() helper
exits.
This timing change is small, and makes error handling easier, thus
should be fine.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 27 Jan 2025 23:18:18 +0000 (09:48 +1030)]
btrfs: expose per-inode stable writes flag
The address space flag AS_STABLE_WRITES determine if FGP_STABLE for will
wait for the folio to finish its writeback.
For btrfs, due to the default data checksum behavior, if we modify the
folio while it's still under writeback, it will cause data checksum
mismatch. Thus for quite some call sites we manually call
folio_wait_writeback() to prevent such problem from happening.
Currently there is only one call site inside btrfs really utilizing
FGP_STABLE, and in that case we also manually call folio_wait_writeback()
to do the waiting.
But it's better to properly expose the stable writes flag to a per-inode
basis, to allow call sites to fully benefit from FGP_STABLE flag.
E.g. for inodes with NODATASUM allowing beginning dirtying the page
without waiting for writeback.
This involves:
- Update the mapping's stable write flag when setting/clearing NODATASUM
inode flag using ioctl
This only works for empty files, so it should be fine.
- Update the mapping's stable write flag when reading an inode from disk
- Remove the explicit folio_wait_writeback() for FGP_BEGINWRITE call
site
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 24 Jan 2025 06:59:58 +0000 (17:29 +1030)]
btrfs: zlib: refactor S390x HW acceleration buffer preparation
Currently for s390x HW zlib compression, to get the best performance we
need a buffer size which is larger than a page.
This means we need to copy multiple pages into workspace->buf, then use
that buffer as zlib compression input.
Currently it's hardcoded using page sized folio, and all the handling
are deep inside a loop.
Refactor the code by:
- Introduce a dedicated helper to do the buffer copy
The new helper will be called copy_data_into_buffer().
- Add extra ASSERT()s
* Make sure we only go into the function for hardware acceleration
* Make sure we still get page sized folio
- Prepare for future large folios
This means we will rely on the folio size, other than PAGE_SIZE to do
the copy.
- Handle the folio mapping and unmapping inside the helper function
For S390x hardware acceleration case, it never utilize the @data_in
pointer, thus we can do folio mapping/unmapping all inside the function.
Acked-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 5 Feb 2025 11:30:34 +0000 (11:30 +0000)]
btrfs: avoid assigning twice to block_start at btrfs_do_readpage()
At btrfs_do_readpage() if we get an extent map for a prealloc extent we
end up assigning twice to the 'block_start' variable, first the value
returned by extent_map_block_start() and then EXTENT_MAP_HOLE. This is
pointless so make it more clear by using an if-else statement and doing
only one assignment. Also, while at it, move the declaration of
'block_start' into the while loop's scope, since it's not used outside of
it and the related 'disk_bytenr' is also declared in this scope.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Tue, 4 Feb 2025 03:16:09 +0000 (13:46 +1030)]
btrfs: always fallback to buffered write if the inode requires checksum
[BUG]
It is a long known bug that VM image on btrfs can lead to data csum
mismatch, if the qemu is using direct-io for the image (this is commonly
known as cache mode 'none').
[CAUSE]
Inside the VM, if the fs is EXT4 or XFS, or even NTFS from Windows, the
fs is allowed to dirty/modify the folio even if the folio is under
writeback (as long as the address space doesn't have AS_STABLE_WRITES
flag inherited from the block device).
This is a valid optimization to improve the concurrency, and since these
filesystems have no extra checksum on data, the content change is not a
problem at all.
But the final write into the image file is handled by btrfs, which needs
the content not to be modified during writeback, or the checksum will
not match the data (checksum is calculated before submitting the bio).
So EXT4/XFS/NTRFS assume they can modify the folio under writeback, but
btrfs requires no modification, this leads to the false csum mismatch.
This is only a controlled example, there are even cases where
multi-thread programs can submit a direct IO write, then another thread
modifies the direct IO buffer for whatever reason.
For such cases, btrfs has no sane way to detect such cases and leads to
false data csum mismatch.
[FIX]
I have considered the following ideas to solve the problem:
- Make direct IO to always skip data checksum
This not only requires a new incompatible flag, as it breaks the
current per-inode NODATASUM flag.
But also requires extra handling for no csum found cases.
And this also reduces our checksum protection.
- Let hardware handle all the checksum
AKA, just nodatasum mount option.
That requires trust for hardware (which is not that trustful in a lot
of cases), and it's not generic at all.
- Always fallback to buffered write if the inode requires checksum
This was suggested by Christoph, and is the solution utilized by this
patch.
The cost is obvious, the extra buffer copying into page cache, thus it
reduces the performance.
But at least it's still user configurable, if the end user still wants
the zero-copy performance, just set NODATASUM flag for the inode
(which is a common practice for VM images on btrfs).
Since we cannot trust user space programs to keep the buffer
consistent during direct IO, we have no choice but always falling back
to buffered IO. At least by this, we avoid the more deadly false data
checksum mismatch error.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Mon, 27 Jan 2025 02:30:49 +0000 (13:00 +1030)]
btrfs: remove duplicated metadata folio flag update in end_bbio_meta_read()
In that function we call set_extent_buffer_uptodate() or
clear_extent_buffer_uptodate(), which will already update the uptodate
flag for all the involved extent buffer folios.
Thus there is no need to update the folio uptodate flags again.
Just remove the open-coded part.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Matthew Wilcox (Oracle) [Tue, 21 Jan 2025 05:40:52 +0000 (05:40 +0000)]
btrfs: convert io_ctl_prepare_pages() to work on folios
Retrieve folios instead of pages and work on them throughout. Removes
a few calls to compound_head() and a reference to page->mapping.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Sterba <dsterba@suse.com>
Matthew Wilcox (Oracle) [Tue, 21 Jan 2025 05:40:50 +0000 (05:40 +0000)]
btrfs: update some folio related comments
Remove references to the page lock and page->mapping. Also btrfs folios
can never be swizzled into swap (mentioned in extent_write_cache_pages()).
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Daniel Vacek [Wed, 15 Jan 2025 15:24:58 +0000 (16:24 +0100)]
btrfs: keep private struct on stack for sync reads in btrfs_encoded_read_regular_fill_pages()
Only allocate the btrfs_encoded_read_private structure for asynchronous
(io_uring) mode.
There's no need to allocate an object from slab in the synchronous mode. In
such a case stack can be happily used as it used to be before
68d3b27e05c7
("btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages()")
which was a preparation for the async mode.
While at it, fix the comment to reflect the atomic => refcount change in
d29662695ed7 ("btrfs: fix use-after-free waiting for encoded read endios").
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Daniel Vacek <neelx@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Sun, 16 Mar 2025 22:55:17 +0000 (12:55 -1000)]
Linux 6.14-rc7
Linus Torvalds [Sun, 16 Mar 2025 19:18:46 +0000 (09:18 -1000)]
Merge tag 'media/v6.14-3' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"rtl2832 driver regression fix"
* tag 'media/v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: rtl2832_sdr: assign vb2 lock before vb2_queue_init
Linus Torvalds [Sun, 16 Mar 2025 19:09:44 +0000 (09:09 -1000)]
Merge tag 'i2c-for-6.14-rc7' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- omap: fix irq ACKS to avoid irq storming and system hang
- ali1535, ali15x3, sis630: fix error path at probe exit
* tag 'i2c-for-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: sis630: Fix an error handling path in sis630_probe()
i2c: ali15x3: Fix an error handling path in ali15x3_probe()
i2c: ali1535: Fix an error handling path in ali1535_probe()
i2c: omap: fix IRQ storms
Linus Torvalds [Sun, 16 Mar 2025 19:05:00 +0000 (09:05 -1000)]
Merge tag 'trace-v6.14-rc5' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix ref count of trace_array in error path of histogram file open
Tracing instances have a ref count to keep them around while files
within their directories are open. This prevents them from being
deleted while they are used.
The histogram code had some files that needed to take the ref count
and that was added, but the error paths did not decrement the ref
counts. This caused the instances from ever being removed if a
histogram file failed to open due to some error"
* tag 'trace-v6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Correct the refcount if the hist/hist_debug file fails to open
Linus Torvalds [Sun, 16 Mar 2025 06:39:55 +0000 (20:39 -1000)]
Merge tag 'usb-6.14-rc7' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes and new
usb-serial device ids. Included in here are:
- new usb-serial device ids
- typec driver bugfix
- thunderbolt driver resume bugfix
All of these have been in linux-next with no reported issues"
* tag 'usb-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tcpm: fix state transition for SNK_WAIT_CAPABILITIES state in run_state_machine()
USB: serial: ftdi_sio: add support for Altera USB Blaster 3
thunderbolt: Prevent use-after-free in resume from hibernate
USB: serial: option: fix Telit Cinterion FE990A name
USB: serial: option: add Telit Cinterion FE990B compositions
USB: serial: option: match on interface class for Telit FN990B
Linus Torvalds [Sun, 16 Mar 2025 01:46:29 +0000 (15:46 -1000)]
Merge tag 'input-for-v6.14-rc6' of git://git./linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- several new device IDs added to xpad game controller driver
- support for imagis IST3038H variant of chip added to imagis touch
controller driver
- a fix for GPIO allocation for ads7846 touch controller driver
- a fix for iqs7222 driver to properly support status register
- a fix for goodix-berlin touch controller driver to use the right name
for the regulator
- more i8042 quirks to better handle several old Clevo devices.
* tag 'input-for-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
MAINTAINERS: Remove myself from the goodix touchscreen maintainers
Input: iqs7222 - preserve system status register
Input: i8042 - swap old quirk combination with new quirk for more devices
Input: i8042 - swap old quirk combination with new quirk for several devices
Input: i8042 - add required quirks for missing old boardnames
Input: i8042 - swap old quirk combination with new quirk for NHxxRZQ
Input: xpad - rename QH controller to Legion Go S
Input: xpad - add support for TECNO Pocket Go
Input: xpad - add support for ZOTAC Gaming Zone
Input: goodix-berlin - fix vddio regulator references
Input: goodix-berlin - fix comment referencing wrong regulator
Input: imagis - add support for imagis IST3038H
dt-bindings: input/touchscreen: imagis: add compatible for ist3038h
Input: xpad - add multiple supported devices
Input: xpad - add 8BitDo SN30 Pro, Hyperkin X91 and Gamesir G7 SE controllers
Input: ads7846 - fix gpiod allocation
Input: wdt87xx_i2c - fix compiler warning
Linus Torvalds [Sun, 16 Mar 2025 01:40:42 +0000 (15:40 -1000)]
Merge tag 'rust-fixes-6.14-3' of git://git./linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Disallow BTF generation with Rust + LTO
- Improve rust-analyzer support
'kernel' crate:
- 'init' module: remove 'Zeroable' implementation for a couple types
that should not have it
- 'alloc' module: fix macOS failure in host test by satisfying POSIX
alignment requirement
- Add missing '\n's to 'pr_*!()' calls
And a couple other minor cleanups"
* tag 'rust-fixes-6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
scripts: generate_rust_analyzer: add uapi crate
scripts: generate_rust_analyzer: add missing include_dirs
scripts: generate_rust_analyzer: add missing macros deps
rust: Disallow BTF generation with Rust + LTO
rust: task: fix `SAFETY` comment in `Task::wake_up`
rust: workqueue: add missing newline to pr_info! examples
rust: sync: add missing newline in locked_by log example
rust: init: add missing newline to pr_info! calls
rust: error: add missing newline to pr_warn! calls
rust: docs: add missing newline to printing macro examples
rust: alloc: satisfy POSIX alignment requirement
rust: init: fix `Zeroable` implementation for `Option<NonNull<T>>` and `Option<KBox<T>>`
rust: remove leftover mentions of the `alloc` crate
Linus Torvalds [Sat, 15 Mar 2025 18:32:16 +0000 (08:32 -1000)]
Merge tag 'fsnotify_for_v6.14-rc7' of git://git./linux/kernel/git/jack/linux-fs
Pull fsnotify reverts from Jan Kara:
"Syzbot has found out that fsnotify HSM events generated on page fault
can be generated while we already hold freeze protection for the
filesystem (when you do buffered write from a buffer which is mmapped
file on the same filesystem) which violates expectations for HSM
events and could lead to deadlocks of HSM clients with filesystem
freezing.
Since it's quite late in the cycle we've decided to revert changes
implementing HSM events on page fault for now and instead just
generate one event for the whole range on mmap(2) so that HSM client
can fetch the data at that moment"
* tag 'fsnotify_for_v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
Revert "fanotify: disable readahead if we have pre-content watches"
Revert "mm: don't allow huge faults for files with pre content watches"
Revert "fsnotify: generate pre-content permission event on page fault"
Revert "xfs: add pre-content fsnotify hook for DAX faults"
Revert "ext4: add pre-content fsnotify hook for DAX faults"
fsnotify: add pre-content hooks on mmap()
Wolfram Sang [Sat, 15 Mar 2025 08:28:41 +0000 (09:28 +0100)]
Merge tag 'i2c-host-fixes-6.14-rc7' of git://git./linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.14-rc7
- omap: fixed irq ACKS to avoid irq storming and system hang.
- ali1535, ali15x3, sis630: fixed error path at probe exit.
Linus Torvalds [Sat, 15 Mar 2025 04:43:37 +0000 (18:43 -1000)]
Merge tag 'v6.14-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Two fixes for oplock break/lease races
* tag 'v6.14-rc6-smb3-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: prevent connection release during oplock break notification
ksmbd: fix use-after-free in ksmbd_free_work_struct
Linus Torvalds [Sat, 15 Mar 2025 00:24:05 +0000 (14:24 -1000)]
Merge tag 'v6.14-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Six smb3 client fixes, all also for stable"
* tag 'v6.14-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Fix match_session bug preventing session reuse
cifs: Fix integer overflow while processing closetimeo mount option
cifs: Fix integer overflow while processing actimeo mount option
cifs: Fix integer overflow while processing acdirmax mount option
cifs: Fix integer overflow while processing acregmax mount option
smb: client: fix regression with guest option
Linus Torvalds [Sat, 15 Mar 2025 00:17:37 +0000 (14:17 -1000)]
Merge tag 'bcachefs-2025-03-14.2' of git://evilpiepirate.org/bcachefs
Pull another bcachefs hotfix from Kent Overstreet:
- fix 32 bit build breakage
* tag 'bcachefs-2025-03-14.2' of git://evilpiepirate.org/bcachefs:
bcachefs: fix build on 32 bit in get_random_u64_below()
Kent Overstreet [Fri, 14 Mar 2025 22:20:20 +0000 (18:20 -0400)]
bcachefs: fix build on 32 bit in get_random_u64_below()
bare 64 bit divides not allowed, whoops
arm-linux-gnueabi-ld: drivers/char/random.o: in function `__get_random_u64_below':
drivers/char/random.c:602:(.text+0xc70): undefined reference to `__aeabi_uldivmod'
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Linus Torvalds [Fri, 14 Mar 2025 23:21:31 +0000 (13:21 -1000)]
Merge tag 'xfs-fixes-6.14-rc7' of git://git./fs/xfs/xfs-linux
Pull xfs cleanup from Carlos Maiolino:
"Use abs_diff instead of XFS_ABSDIFF"
* tag 'xfs-fixes-6.14-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Use abs_diff instead of XFS_ABSDIFF
Linus Torvalds [Fri, 14 Mar 2025 22:14:32 +0000 (12:14 -1000)]
Merge tag 'bcachefs-2025-03-14' of git://evilpiepirate.org/bcachefs
Pull bcachefs hotfix from Kent Overstreet:
"This one is high priority: a user hit an assertion in the upgrade to
6.14, and we don't have a reproducer, so this changes the assertion to
an emergency read-only with more info so we can debug it"
* tag 'bcachefs-2025-03-14' of git://evilpiepirate.org/bcachefs:
bcachefs: Change btree wb assert to runtime error
Linus Torvalds [Fri, 14 Mar 2025 21:31:57 +0000 (11:31 -1000)]
Merge tag 'for-6.14/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fix from Mikulas Patocka:
- dm-flakey: fix memory corruption in optional corrupt_bio_byte feature
* tag 'for-6.14/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-flakey: Fix memory corruption in optional corrupt_bio_byte feature
Linus Torvalds [Fri, 14 Mar 2025 21:22:05 +0000 (11:22 -1000)]
Merge tag 'block-6.14-
20250313' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Concurrent pci error and hotplug handling fix (Keith)
- Endpoint function fixes (Damien)
- Fix for a regression introduced in this cycle with error checking for
batched request completions (Shin'ichiro)
* tag 'block-6.14-
20250313' of git://git.kernel.dk/linux:
block: change blk_mq_add_to_batch() third argument type to bool
nvme: move error logging from nvme_end_req() to __nvme_end_req()
nvmet: pci-epf: Do not add an IRQ vector if not needed
nvmet: pci-epf: Set NVMET_PCI_EPF_Q_LIVE when a queue is fully created
nvme-pci: fix stuck reset on concurrent DPC and HP
Linus Torvalds [Fri, 14 Mar 2025 20:57:28 +0000 (10:57 -1000)]
Merge tag 'platform-drivers-x86-v6.14-5' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
"Fixes and new HW support.
The diff is a bit larger than I'd prefer at this point due to
unwinding the amd/pmf driver's error handling properly instead of
calling a deinit function that was a can full of worms.
Summary:
- amd/pmf:
- Fix error handling in amd_pmf_init_smart_pc()
- Fix missing hidden options for Smart PC
- surface: aggregator_registry: Add Support for Surface Pro 11"
* tag 'platform-drivers-x86-v6.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
MAINTAINERS: Update Ike Panhc's email address
platform/x86/amd: pmf: Fix missing hidden options for Smart PC
platform/surface: aggregator_registry: Add Support for Surface Pro 11
platform/x86/amd/pmf: fix cleanup in amd_pmf_init_smart_pc()
Linus Torvalds [Fri, 14 Mar 2025 20:39:41 +0000 (10:39 -1000)]
Merge tag 'gpio-fixes-for-v6.14-rc7' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"The first fix is a backport from my v6.15-rc1 queue that turned out to
be needed in v6.14 as well but as the former diverged from my fixes
branch I had to adjust the patch a bit.
The second one fixes a regression observed in user-space where closing
a file descriptor associated with a GPIO device results in a ~10ms
delay due to the atomic notifier calling rcu_synchronize() when
unregistering.
Summary:
- don't check the return value of gpio_chip::get_direction() when
registering a GPIO chip
- use raw notifier for line state events"
* tag 'gpio-fixes-for-v6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: cdev: use raw notifier for line state events
gpiolib: don't check the retval of get_direction() when registering a chip
Linus Torvalds [Fri, 14 Mar 2025 20:35:39 +0000 (10:35 -1000)]
Merge tag 'sound-6.14-rc7' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of last-minute fixes.
Most of them are for ASoC, and the only one core fix is for reverting
the previous change, while the rest are all device-specific quirks and
fixes, which should be relatively safe to apply"
* tag 'sound-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: cs42l43: convert to SYSTEM_SLEEP_PM_OPS
ALSA: hda/realtek: Add mute LED quirk for HP Pavilion x360 14-dy1xxx
ASoC: codecs: wm0010: Fix error handling path in wm0010_spi_probe()
ASoC: rt722-sdca: add missing readable registers
ASoC: amd: yc: Support mic on another Lenovo ThinkPad E16 Gen 2 model
ASoC: cs42l43: Fix maximum ADC Volume
ASoC: ops: Consistently treat platform_max as control value
ASoC: rt1320: set wake_capable = 0 explicitly
ASoC: cs42l43: Add jack delay debounce after suspend
ASoC: tegra: Fix ADX S24_LE audio format
ASoC: codecs: wsa884x: report temps to hwmon in millidegree of Celsius
ASoC: Intel: sof_sdw: Fix unlikely uninitialized variable use in create_sdw_dailinks()
Linus Torvalds [Fri, 14 Mar 2025 20:24:57 +0000 (10:24 -1000)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main one is a horrible macro fix for our TLB flushing code which
resulted in over-invalidation on the MMU notifier path.
Summary:
- Fix population of the vmemmap for regions of memory that are
smaller than a section (128 MiB)
- Fix range-based TLB over-invalidation when invoked via a MMU
notifier"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Fix mmu notifiers for range-based invalidates
arm64: mm: Populate vmemmap at the page level if not section aligned
Linus Torvalds [Fri, 14 Mar 2025 20:07:16 +0000 (10:07 -1000)]
Merge tag 'x86-urgent-2025-03-14' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"Fix the bootup of SEV-SNP enabled guests under VMware hypervisors"
* tag 'x86-urgent-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vmware: Parse MP tables for SEV-SNP enabled guests under VMware hypervisors
Linus Torvalds [Fri, 14 Mar 2025 19:56:46 +0000 (09:56 -1000)]
Merge tag 'sched-urgent-2025-03-14' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a sleeping-while-atomic bug caused by a recent optimization
utilizing static keys that didn't consider that the
static_key_disable() call could be triggered in atomic context.
Revert the optimization"
* tag 'sched-urgent-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/clock: Don't define sched_clock_irqtime as static key
Linus Torvalds [Fri, 14 Mar 2025 19:41:36 +0000 (09:41 -1000)]
Merge tag 'locking-urgent-2025-03-14' of git://git./linux/kernel/git/tip/tip
Pull misc locking fixes from Ingo Molnar:
- Restrict the Rust runtime from unintended access to dynamically
allocated LockClassKeys
- KernelDoc annotation fix
- Fix a lock ordering bug in semaphore::up(), related to trying to
printk() and wake up the console within critical sections
* tag 'locking-urgent-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/semaphore: Use wake_q to wake up processes outside lock critical section
locking/rtmutex: Use the 'struct' keyword in kernel-doc comment
rust: lockdep: Remove support for dynamically allocated LockClassKeys
Linus Torvalds [Fri, 14 Mar 2025 19:12:28 +0000 (09:12 -1000)]
Merge tag 'core-urgent-2025-03-14' of git://git./linux/kernel/git/tip/tip
Pull core fix from Ingo Molnar:
"Fix a Sparse false positive warning triggered by no_free_ptr()"
* tag 'core-urgent-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
<linux/cleanup.h>: Allow the passing of both iomem and non-iomem pointers to no_free_ptr()
Kent Overstreet [Fri, 14 Mar 2025 13:54:43 +0000 (09:54 -0400)]
bcachefs: Change btree wb assert to runtime error
We just had a report of the assert for "btree in write buffer for
non-write buffer btree" popping during the 6.14 upgrade.
- 150TB filesystem, after a reboot the upgrade was able to continue from
where it left off, so no major damage.
But with 6.14 about to come out we want to get this tracked down asap,
and need more data if other users hit this.
Convert the BUG_ON() to an emergency read-only, and print out btree, the
key itself, and stack trace from the original write buffer update (which
did not have this check before).
Reported-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Ike Panhc [Fri, 14 Mar 2025 04:57:32 +0000 (12:57 +0800)]
MAINTAINERS: Update Ike Panhc's email address
I am no longer at Canonical and update with my personal email address.
Signed-off-by: Ike Panhc <ike.pan@canonical.com>
Link: https://lore.kernel.org/r/20250314045732.389973-1-ike.pan@canonical.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Matthew Wilcox (Oracle) [Mon, 3 Mar 2025 18:02:32 +0000 (18:02 +0000)]
xfs: Use abs_diff instead of XFS_ABSDIFF
We have a central definition for this function since 2023, used by
a number of different parts of the kernel.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
Tengda Wu [Fri, 14 Mar 2025 06:53:35 +0000 (06:53 +0000)]
tracing: Correct the refcount if the hist/hist_debug file fails to open
The function event_{hist,hist_debug}_open() maintains the refcount of
'file->tr' and 'file' through tracing_open_file_tr(). However, it does
not roll back these counts on subsequent failure paths, resulting in a
refcount leak.
A very obvious case is that if the hist/hist_debug file belongs to a
specific instance, the refcount leak will prevent the deletion of that
instance, as it relies on the condition 'tr->ref == 1' within
__remove_instance().
Fix this by calling tracing_release_file_tr() on all failure paths in
event_{hist,hist_debug}_open() to correct the refcount.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Link: https://lore.kernel.org/20250314065335.1202817-1-wutengda@huaweicloud.com
Fixes:
1cc111b9cddc ("tracing: Fix uaf issue when open the hist or hist_debug file")
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Linus Torvalds [Fri, 14 Mar 2025 08:52:52 +0000 (22:52 -1000)]
Merge tag 'leds-fixes-6.14' of git://git./linux/kernel/git/lee/leds
Pull LED fix from Lee Jones:
- Fix NULL pointer in STMicroelectronics LED1202 LED support
* tag 'leds-fixes-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds:
leds: leds-st1202: Fix NULL pointer access on race condition
Linus Torvalds [Fri, 14 Mar 2025 08:45:25 +0000 (22:45 -1000)]
Merge tag 'drm-fixes-2025-03-14' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Regular weekly fixes pull, the usual leaders in amdgpu/xe, a couple of
i915, and some scattered misc fixes.
panic:
- two clippy fixes
dp_mst
- locking fix
atomic:
- fix redundant DPMS calls
i915:
- Do cdclk post plane programming later
- Bump MMAP_GTT_VERSION: missing indication of partial mmaps support
xe:
- Release guc ids before cancelling work
- Fix new warnings around userptr
- Temporaritly disable D3Cold on BMG
- Retry and wait longer for GuC PC to start
- Remove redundant check in xe_vm_create_ioctl
amdgpu:
- GC 12.x DCC fix
- DC DCE 6.x fix
- Hibernation fix
- HPD fix
- Backlight fixes
- Color depth fix
- UAF fix in hdcp_work
- VCE 2.x fix
- GC 12.x PTE fix
amdkfd:
- Queue eviction fix
gma500:
- fix NULL pointer check"
* tag 'drm-fixes-2025-03-14' of https://gitlab.freedesktop.org/drm/kernel: (23 commits)
drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flags
drm/amd/amdkfd: Evict all queues even HWS remove queue failed
drm/i915: Increase I915_PARAM_MMAP_GTT_VERSION version to indicate support for partial mmaps
drm/dp_mst: Fix locking when skipping CSN before topology probing
drm/amdgpu/vce2: fix ip block reference
drm/amd/display: Fix slab-use-after-free on hdcp_work
drm/amd/display: Assign normalized_pix_clk when color depth = 14
drm/amd/display: Restore correct backlight brightness after a GPU reset
drm/amd/display: fix default brightness
drm/amd/display: Disable unneeded hpd interrupts during dm_init
drm/amd: Keep display off while going into S4
drm/amd/display: fix missing .is_two_pixels_per_container
drm/amdgpu/display: Allow DCC for video formats on GFX12
drm/xe: remove redundant check in xe_vm_create_ioctl()
drm/atomic: Filter out redundant DPMS calls
drm/xe/guc_pc: Retry and wait longer for GuC PC start
drm/xe/pm: Temporarily disable D3Cold on BMG
drm/i915/cdclk: Do cdclk post plane programming later
drm/xe/userptr: Fix an incorrect assert
drm/xe: Release guc ids before cancelling work
...
Amit Sunil Dhamne [Tue, 11 Mar 2025 02:19:07 +0000 (19:19 -0700)]
usb: typec: tcpm: fix state transition for SNK_WAIT_CAPABILITIES state in run_state_machine()
A subtle error got introduced while manually fixing merge conflict in
tcpm.c for commit
85c4efbe6088 ("Merge v6.12-rc6 into usb-next"). As a
result of this error, the next state is unconditionally set to
SNK_WAIT_CAPABILITIES_TIMEOUT while handling SNK_WAIT_CAPABILITIES state
in run_state_machine(...).
Fix this by setting new state of TCPM state machine to `upcoming_state`
(that is set to different values based on conditions).
Cc: stable@vger.kernel.org
Fixes:
85c4efbe60888 ("Merge v6.12-rc6 into usb-next")
Signed-off-by: Amit Sunil Dhamne <amitsd@google.com>
Reviewed-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20250310-fix-snk-wait-timeout-v6-14-rc6-v1-1-5db14475798f@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Fri, 14 Mar 2025 07:43:39 +0000 (08:43 +0100)]
Merge tag 'usb-serial-6.14-rc7' of ssh://gitolite./linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial device ids for 6.14-rc7
Here are some new modem device ids and a couple of related fixes, and
support for Altera USB Blaster 3.
All have been in linux-next with no reported issues.
* tag 'usb-serial-6.14-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: ftdi_sio: add support for Altera USB Blaster 3
USB: serial: option: fix Telit Cinterion FE990A name
USB: serial: option: add Telit Cinterion FE990B compositions
USB: serial: option: match on interface class for Telit FN990B
Dave Airlie [Fri, 14 Mar 2025 03:42:13 +0000 (13:42 +1000)]
Merge tag 'drm-xe-fixes-2025-03-13' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
- Release guc ids before cancelling work (Tejas)
- Fix new warnings around userptr (Thomas)
- Temporaritly disable D3Cold on BMG (Rodrigo)
- Retry and wait longer for GuC PC to start (Rodrigo)
- Remove redundant check in xe_vm_create_ioctl (Xin)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z9MJWeIlZPuvXZ_G@intel.com
Dave Airlie [Fri, 14 Mar 2025 02:30:43 +0000 (12:30 +1000)]
Merge tag 'drm-intel-fixes-2025-03-13' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Do cdclk post plane programming later (Ville)
- Bump MMAP_GTT_VERSION: missing indication of partial mmaps support (Jose)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Z9MG4fH-6Q8dTHE1@intel.com
Linus Torvalds [Fri, 14 Mar 2025 01:34:26 +0000 (15:34 -1000)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A few clk driver fixes for Samsung and Qualcomm clk drivers:
- Suspend on Google GS101 crashes when trying to save some clk
registers that we shouldn't be saving so we don't do that anymore
- The PLL lock time was wrong on the Tesla FSD which could lead to
the PLL never locking
- Qualcomm's display clk controller on SM8750 was trying to change
the frequency of a parent clk for the DSI device when it should
have stopped and adjusted the divider. The failure is that the clk
frequency was half what was expected, leading to broken display"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: samsung: update PLL locktime for PLL142XX used on FSD platform
clk: samsung: gs101: fix synchronous external abort in samsung_clk_save()
clk: qcom: dispcc-sm8750: Drop incorrect CLK_SET_RATE_PARENT on byte intf parent
Linus Torvalds [Fri, 14 Mar 2025 01:10:59 +0000 (15:10 -1000)]
Merge tag 'bcachefs-2025-03-13' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Roxana caught an unitialized value that might explain some of the
rebalance weirdness we're still tracking down - cool.
Otherwise pretty minor"
* tag 'bcachefs-2025-03-13' of git://evilpiepirate.org/bcachefs:
bcachefs: bch2_get_random_u64_below()
bcachefs: target_congested -> get_random_u32_below()
bcachefs: fix tiny leak in bch2_dev_add()
bcachefs: Make sure trans is unlocked when submitting read IO
bcachefs: Initialize from_inode members for bch_io_opts
bcachefs: Fix b->written overflow
Dave Airlie [Fri, 14 Mar 2025 01:09:31 +0000 (11:09 +1000)]
Merge tag 'drm-misc-fixes-2025-03-13' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A null pointer check for gma500, two clippy fixes for panic, a fix for
an interaction between DPMS and atomic leading to dropped frames, and
a locking fix for dp_mst
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250313-holistic-clay-moose-fead28@houat
Dave Airlie [Thu, 13 Mar 2025 23:17:28 +0000 (09:17 +1000)]
Merge tag 'amd-drm-fixes-6.14-2025-03-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.14-2025-03-12:
amdgpu:
- GC 12.x DCC fix
- DC DCE 6.x fix
- Hibernation fix
- HPD fix
- Backlight fixes
- Color depth fix
- UAF fix in hdcp_work
- VCE 2.x fix
- GC 12.x PTE fix
amdkfd:
- Queue eviction fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250312190931.216506-1-alexander.deucher@amd.com
Ajay Kaher [Thu, 13 Mar 2025 17:31:11 +0000 (17:31 +0000)]
x86/vmware: Parse MP tables for SEV-SNP enabled guests under VMware hypervisors
Under VMware hypervisors, SEV-SNP enabled VMs are fundamentally able to boot
without UEFI, but this regressed a year ago due to:
0f4a1e80989a ("x86/sev: Skip ROM range scans and validation for SEV-SNP guests")
In this case, mpparse_find_mptable() has to be called to parse MP
tables which contains the necessary boot information.
[ mingo: Updated the changelog. ]
Fixes:
0f4a1e80989a ("x86/sev: Skip ROM range scans and validation for SEV-SNP guests")
Co-developed-by: Ye Li <ye.li@broadcom.com>
Signed-off-by: Ye Li <ye.li@broadcom.com>
Signed-off-by: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Ye Li <ye.li@broadcom.com>
Reviewed-by: Kevin Loughlin <kevinloughlin@google.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250313173111.10918-1-ajay.kaher@broadcom.com
Linus Torvalds [Thu, 13 Mar 2025 17:58:48 +0000 (07:58 -1000)]
Merge tag 'net-6.14-rc7' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter, bluetooth and wireless.
No known regressions outstanding.
Current release - regressions:
- wifi: nl80211: fix assoc link handling
- eth: lan78xx: sanitize return values of register read/write
functions
Current release - new code bugs:
- ethtool: tsinfo: fix dump command
- bluetooth: btusb: configure altsetting for HCI_USER_CHANNEL
- eth: mlx5: DR, use the right action structs for STEv3
Previous releases - regressions:
- netfilter: nf_tables: make destruction work queue pernet
- gre: fix IPv6 link-local address generation.
- wifi: iwlwifi: fix TSO preparation
- bluetooth: revert "bluetooth: hci_core: fix sleeping function
called from invalid context"
- ovs: revert "openvswitch: switch to per-action label counting in
conntrack"
- eth:
- ice: fix switchdev slow-path in LAG
- bonding: fix incorrect MAC address setting to receive NS
messages
Previous releases - always broken:
- core: prevent TX of unreadable skbs
- sched: prevent creation of classes with TC_H_ROOT
- netfilter: nft_exthdr: fix offset with ipv4_find_option()
- wifi: cfg80211: cancel wiphy_work before freeing wiphy
- mctp: copy headers if cloned
- phy: nxp-c45-tja11xx: add errata for TJA112XA/B
- eth:
- bnxt: fix kernel panic in the bnxt_get_queue_stats{rx | tx}
- mlx5: bridge, fix the crash caused by LAG state check"
* tag 'net-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
net: mana: cleanup mana struct after debugfs_remove()
net/mlx5e: Prevent bridge link show failure for non-eswitch-allowed devices
net/mlx5: Bridge, fix the crash caused by LAG state check
net/mlx5: Lag, Check shared fdb before creating MultiPort E-Switch
net/mlx5: Fix incorrect IRQ pool usage when releasing IRQs
net/mlx5: HWS, Rightsize bwc matcher priority
net/mlx5: DR, use the right action structs for STEv3
Revert "openvswitch: switch to per-action label counting in conntrack"
net: openvswitch: remove misbehaving actions length check
selftests: Add IPv6 link-local address generation tests for GRE devices.
gre: Fix IPv6 link-local address generation.
netfilter: nft_exthdr: fix offset with ipv4_find_option()
selftests/tc-testing: Add a test case for DRR class with TC_H_ROOT
net_sched: Prevent creation of classes with TC_H_ROOT
ipvs: prevent integer overflow in do_ip_vs_get_ctl()
selftests: netfilter: skip br_netfilter queue tests if kernel is tainted
netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree()
wifi: mac80211: fix MPDU length parsing for EHT 5/6 GHz
qlcnic: fix memory leak issues in qlcnic_sriov_common.c
rtase: Fix improper release of ring list entries in rtase_sw_reset
...
Kent Overstreet [Sat, 8 Mar 2025 15:50:08 +0000 (10:50 -0500)]
dm-flakey: Fix memory corruption in optional corrupt_bio_byte feature
Fix memory corruption due to incorrect parameter being passed to bio_init
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v6.5+
Fixes:
1d9a94389853 ("dm flakey: clone pages on write bio before corrupting them")
Linus Torvalds [Thu, 13 Mar 2025 17:53:25 +0000 (07:53 -1000)]
Merge tag 'vfs-6.14-rc7.fixes' of gitolite.pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Bring in an RCU pathwalk fix for afs. This is brought in as a merge
from the vfs-6.15.shared.afs branch that needs this commit and other
trees already depend on it.
- Fix vboxfs unterminated string handling.
* tag 'vfs-6.14-rc7.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
vboxsf: Add __nonstring annotations for unterminated strings
afs: Fix afs_atcell_get_link() to handle RCU pathwalk
Kent Overstreet [Thu, 13 Mar 2025 15:16:28 +0000 (11:16 -0400)]
bcachefs: bch2_get_random_u64_below()
steal the (clever) algorithm from get_random_u32_below()
this fixes a bug where we were passing roundup_pow_of_two() a 64 bit
number - we're squaring device latencies now:
[ +1.681698] ------------[ cut here ]------------
[ +0.000010] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[ +0.000011] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[ +0.000011] CPU: 1 UID: 0 PID: 196 Comm: kworker/u32:13 Not tainted 6.14.0-rc6-dave+ #10
[ +0.000012] Hardware name: ASUS System Product Name/PRIME B460I-PLUS, BIOS 1301 07/13/2021
[ +0.000005] Workqueue: events_unbound __bch2_read_endio [bcachefs]
[ +0.000354] Call Trace:
[ +0.000005] <TASK>
[ +0.000007] dump_stack_lvl+0x5d/0x80
[ +0.000018] ubsan_epilogue+0x5/0x30
[ +0.000008] __ubsan_handle_shift_out_of_bounds.cold+0x61/0xe6
[ +0.000011] bch2_rand_range.cold+0x17/0x20 [bcachefs]
[ +0.000231] bch2_bkey_pick_read_device+0x547/0x920 [bcachefs]
[ +0.000229] __bch2_read_extent+0x1e4/0x18e0 [bcachefs]
[ +0.000241] ? bch2_btree_iter_peek_slot+0x3df/0x800 [bcachefs]
[ +0.000180] ? bch2_read_retry_nodecode+0x270/0x330 [bcachefs]
[ +0.000230] bch2_read_retry_nodecode+0x270/0x330 [bcachefs]
[ +0.000230] bch2_rbio_retry+0x1fa/0x600 [bcachefs]
[ +0.000224] ? bch2_printbuf_make_room+0x71/0xb0 [bcachefs]
[ +0.000243] ? bch2_read_csum_err+0x4a4/0x610 [bcachefs]
[ +0.000278] bch2_read_csum_err+0x4a4/0x610 [bcachefs]
[ +0.000227] ? __bch2_read_endio+0x58b/0x870 [bcachefs]
[ +0.000220] __bch2_read_endio+0x58b/0x870 [bcachefs]
[ +0.000268] ? try_to_wake_up+0x31c/0x7f0
[ +0.000011] ? process_one_work+0x176/0x330
[ +0.000008] process_one_work+0x176/0x330
[ +0.000008] worker_thread+0x252/0x390
[ +0.000008] ? __pfx_worker_thread+0x10/0x10
[ +0.000006] kthread+0xec/0x230
[ +0.000011] ? __pfx_kthread+0x10/0x10
[ +0.000009] ret_from_fork+0x31/0x50
[ +0.000009] ? __pfx_kthread+0x10/0x10
[ +0.000008] ret_from_fork_asm+0x1a/0x30
[ +0.000012] </TASK>
[ +0.000046] ---[ end trace ]---
Reported-by: Roland Vet <vet.roland@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 13 Mar 2025 13:56:07 +0000 (09:56 -0400)]
bcachefs: target_congested -> get_random_u32_below()
get_random_u32_below() has a better algorithm than bch2_rand_range(),
it just didn't exist at the time.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Jens Axboe [Thu, 13 Mar 2025 15:41:57 +0000 (09:41 -0600)]
Merge tag 'nvme-6.14-2025-03-13' of git://git.infradead.org/nvme into block-6.14
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.14
- Concurrent pci error and hotplug handling fix (Keith)
- Endpoint function fixes (Damien)"
* tag 'nvme-6.14-2025-03-13' of git://git.infradead.org/nvme:
nvmet: pci-epf: Do not add an IRQ vector if not needed
nvmet: pci-epf: Set NVMET_PCI_EPF_Q_LIVE when a queue is fully created
nvme-pci: fix stuck reset on concurrent DPC and HP
Amir Goldstein [Wed, 12 Mar 2025 07:38:52 +0000 (08:38 +0100)]
Revert "fanotify: disable readahead if we have pre-content watches"
This reverts commit
fac84846a28c0950d4433118b3dffd44306df62d.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250312073852.2123409-7-amir73il@gmail.com
Amir Goldstein [Wed, 12 Mar 2025 07:38:51 +0000 (08:38 +0100)]
Revert "mm: don't allow huge faults for files with pre content watches"
This reverts commit
20bf82a898b65c129af76deb96a1b415d3098a28.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250312073852.2123409-6-amir73il@gmail.com
Amir Goldstein [Wed, 12 Mar 2025 07:38:50 +0000 (08:38 +0100)]
Revert "fsnotify: generate pre-content permission event on page fault"
This reverts commit
8392bc2ff8c8bf7c4c5e6dfa71ccd893a3c046f6.
In the use case of buffered write whose input buffer is mmapped file on a
filesystem with a pre-content mark, the prefaulting of the buffer can
happen under the filesystem freeze protection (obtained in vfs_write())
which breaks assumptions of pre-content hook and introduces potential
deadlock of HSM handler in userspace with filesystem freezing.
Now that we have pre-content hooks at file mmap() time, disable the
pre-content event hooks on page fault to avoid the potential deadlock.
Reported-by: syzbot+7229071b47908b19d5b7@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-fsdevel/7ehxrhbvehlrjwvrduoxsao5k3x4aw275patsb3krkwuq573yv@o2hskrfawbnc/
Fixes:
8392bc2ff8c8 ("fsnotify: generate pre-content permission event on page fault")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250312073852.2123409-5-amir73il@gmail.com
Amir Goldstein [Wed, 12 Mar 2025 07:38:49 +0000 (08:38 +0100)]
Revert "xfs: add pre-content fsnotify hook for DAX faults"
This reverts commit
7f4796a46571ced5d3d5b0942e1bfea1eedaaecd.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250312073852.2123409-4-amir73il@gmail.com
Amir Goldstein [Wed, 12 Mar 2025 07:38:48 +0000 (08:38 +0100)]
Revert "ext4: add pre-content fsnotify hook for DAX faults"
This reverts commit
bb480760ffc7018e21ee6f60241c2b99ff26ee0e.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250312073852.2123409-3-amir73il@gmail.com
Paolo Abeni [Thu, 13 Mar 2025 14:04:26 +0000 (15:04 +0100)]
Merge tag 'nf-25-03-13' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for net:
1) Missing initialization of cpu and jiffies32 fields in conncount,
from Kohei Enju.
2) Skip several tests in case kernel is tainted, otherwise tests bogusly
report failure too as they also check for tainted kernel,
from Florian Westphal.
3) Fix a hyphothetical integer overflow in do_ip_vs_get_ctl() leading
to bogus error logs, from Dan Carpenter.
4) Fix incorrect offset in ipv4 option match in nft_exthdr, from
Alexey Kashavkin.
netfilter pull request 25-03-13
* tag 'nf-25-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_exthdr: fix offset with ipv4_find_option()
ipvs: prevent integer overflow in do_ip_vs_get_ctl()
selftests: netfilter: skip br_netfilter queue tests if kernel is tainted
netfilter: nf_conncount: Fully initialize struct nf_conncount_tuple in insert_tree()
====================
Link: https://patch.msgid.link/20250313095636.2186-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Henrique Carvalho [Tue, 11 Mar 2025 18:23:59 +0000 (15:23 -0300)]
smb: client: Fix match_session bug preventing session reuse
Fix a bug in match_session() that can causes the session to not be
reused in some cases.
Reproduction steps:
mount.cifs //server/share /mnt/a -o credentials=creds
mount.cifs //server/share /mnt/b -o credentials=creds,sec=ntlmssp
cat /proc/fs/cifs/DebugData | grep SessionId | wc -l
mount.cifs //server/share /mnt/b -o credentials=creds,sec=ntlmssp
mount.cifs //server/share /mnt/a -o credentials=creds
cat /proc/fs/cifs/DebugData | grep SessionId | wc -l
Cc: stable@vger.kernel.org
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Murad Masimov [Tue, 11 Mar 2025 14:22:06 +0000 (17:22 +0300)]
cifs: Fix integer overflow while processing closetimeo mount option
User-provided mount parameter closetimeo of type u32 is intended to have
an upper limit, but before it is validated, the value is converted from
seconds to jiffies which can lead to an integer overflow.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
5efdd9122eff ("smb3: allow deferred close timeout to be configurable")
Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
Murad Masimov [Tue, 11 Mar 2025 14:22:05 +0000 (17:22 +0300)]
cifs: Fix integer overflow while processing actimeo mount option
User-provided mount parameter actimeo of type u32 is intended to have
an upper limit, but before it is validated, the value is converted from
seconds to jiffies which can lead to an integer overflow.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
6d20e8406f09 ("cifs: add attribute cache timeout (actimeo) tunable")
Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
Murad Masimov [Tue, 11 Mar 2025 14:22:04 +0000 (17:22 +0300)]
cifs: Fix integer overflow while processing acdirmax mount option
User-provided mount parameter acdirmax of type u32 is intended to have
an upper limit, but before it is validated, the value is converted from
seconds to jiffies which can lead to an integer overflow.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
4c9f948142a5 ("cifs: Add new mount parameter "acdirmax" to allow caching directory metadata")
Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
Murad Masimov [Tue, 11 Mar 2025 14:22:03 +0000 (17:22 +0300)]
cifs: Fix integer overflow while processing acregmax mount option
User-provided mount parameter acregmax of type u32 is intended to have
an upper limit, but before it is validated, the value is converted from
seconds to jiffies which can lead to an integer overflow.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes:
5780464614f6 ("cifs: Add new parameter "acregmax" for distinct file and directory metadata timeout")
Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru>
Signed-off-by: Steve French <stfrench@microsoft.com>
Paulo Alcantara [Wed, 12 Mar 2025 13:51:31 +0000 (10:51 -0300)]
smb: client: fix regression with guest option
When mounting a CIFS share with 'guest' mount option, mount.cifs(8)
will set empty password= and password2= options. Currently we only
handle empty strings from user= and password= options, so the mount
will fail with
cifs: Bad value for 'password2'
Fix this by handling empty string from password2= option as well.
Link: https://bbs.archlinux.org/viewtopic.php?id=303927
Reported-by: Adam Williamson <awilliam@redhat.com>
Closes: https://lore.kernel.org/r/
83c00b5fea81c07f6897a5dd3ef50fd3b290f56c.camel@redhat.com
Fixes:
35f834265e0d ("smb3: fix broken reconnect when password changing on the server by allowing password rotation")
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Mario Limonciello [Thu, 6 Mar 2025 03:44:02 +0000 (21:44 -0600)]
platform/x86/amd: pmf: Fix missing hidden options for Smart PC
amd_pmf_get_slider_info() checks the current profile to report correct
value to the TA inputs. If hidden options are in use then the wrong
values will be reported to TA.
Add the two compat options PLATFORM_PROFILE_BALANCED_PERFORMANCE and
PLATFORM_PROFILE_QUIET for this use.
Reported-by: Yijun Shen <Yijun.Shen@dell.com>
Fixes:
9a43102daf64d ("platform/x86/amd: pmf: Add balanced-performance to hidden choices")
Fixes:
44e94fece5170 ("platform/x86/amd: pmf: Add 'quiet' to hidden choices")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://lore.kernel.org/r/20250306034402.50478-1-superm1@kernel.org
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Shradha Gupta [Tue, 11 Mar 2025 10:17:40 +0000 (03:17 -0700)]
net: mana: cleanup mana struct after debugfs_remove()
When on a MANA VM hibernation is triggered, as part of hibernate_snapshot(),
mana_gd_suspend() and mana_gd_resume() are called. If during this
mana_gd_resume(), a failure occurs with HWC creation, mana_port_debugfs
pointer does not get reinitialized and ends up pointing to older,
cleaned-up dentry.
Further in the hibernation path, as part of power_down(), mana_gd_shutdown()
is triggered. This call, unaware of the failures in resume, tries to cleanup
the already cleaned up mana_port_debugfs value and hits the following bug:
[ 191.359296] mana 7870:00:00.0: Shutdown was called
[ 191.359918] BUG: kernel NULL pointer dereference, address:
0000000000000098
[ 191.360584] #PF: supervisor write access in kernel mode
[ 191.361125] #PF: error_code(0x0002) - not-present page
[ 191.361727] PGD
1080ea067 P4D 0
[ 191.362172] Oops: Oops: 0002 [#1] SMP NOPTI
[ 191.362606] CPU: 11 UID: 0 PID: 1674 Comm: bash Not tainted 6.14.0-rc5+ #2
[ 191.363292] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024
[ 191.364124] RIP: 0010:down_write+0x19/0x50
[ 191.364537] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 de cd ff ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 16 65 48 8b 05 88 24 4c 6a 48 89 43 08 48 8b 5d
[ 191.365867] RSP: 0000:
ff45fbe0c1c037b8 EFLAGS:
00010246
[ 191.366350] RAX:
0000000000000000 RBX:
0000000000000098 RCX:
ffffff8100000000
[ 191.366951] RDX:
0000000000000001 RSI:
0000000000000064 RDI:
0000000000000098
[ 191.367600] RBP:
ff45fbe0c1c037c0 R08:
0000000000000000 R09:
0000000000000001
[ 191.368225] R10:
ff45fbe0d2b01000 R11:
0000000000000008 R12:
0000000000000000
[ 191.368874] R13:
000000000000000b R14:
ff43dc27509d67c0 R15:
0000000000000020
[ 191.369549] FS:
00007dbc5001e740(0000) GS:
ff43dc663f380000(0000) knlGS:
0000000000000000
[ 191.370213] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 191.370830] CR2:
0000000000000098 CR3:
0000000168e8e002 CR4:
0000000000b73ef0
[ 191.371557] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 191.372192] DR3:
0000000000000000 DR6:
00000000fffe07f0 DR7:
0000000000000400
[ 191.372906] Call Trace:
[ 191.373262] <TASK>
[ 191.373621] ? show_regs+0x64/0x70
[ 191.374040] ? __die+0x24/0x70
[ 191.374468] ? page_fault_oops+0x290/0x5b0
[ 191.374875] ? do_user_addr_fault+0x448/0x800
[ 191.375357] ? exc_page_fault+0x7a/0x160
[ 191.375971] ? asm_exc_page_fault+0x27/0x30
[ 191.376416] ? down_write+0x19/0x50
[ 191.376832] ? down_write+0x12/0x50
[ 191.377232] simple_recursive_removal+0x4a/0x2a0
[ 191.377679] ? __pfx_remove_one+0x10/0x10
[ 191.378088] debugfs_remove+0x44/0x70
[ 191.378530] mana_detach+0x17c/0x4f0
[ 191.378950] ? __flush_work+0x1e2/0x3b0
[ 191.379362] ? __cond_resched+0x1a/0x50
[ 191.379787] mana_remove+0xf2/0x1a0
[ 191.380193] mana_gd_shutdown+0x3b/0x70
[ 191.380642] pci_device_shutdown+0x3a/0x80
[ 191.381063] device_shutdown+0x13e/0x230
[ 191.381480] kernel_power_off+0x35/0x80
[ 191.381890] hibernate+0x3c6/0x470
[ 191.382312] state_store+0xcb/0xd0
[ 191.382734] kobj_attr_store+0x12/0x30
[ 191.383211] sysfs_kf_write+0x3e/0x50
[ 191.383640] kernfs_fop_write_iter+0x140/0x1d0
[ 191.384106] vfs_write+0x271/0x440
[ 191.384521] ksys_write+0x72/0xf0
[ 191.384924] __x64_sys_write+0x19/0x20
[ 191.385313] x64_sys_call+0x2b0/0x20b0
[ 191.385736] do_syscall_64+0x79/0x150
[ 191.386146] ? __mod_memcg_lruvec_state+0xe7/0x240
[ 191.386676] ? __lruvec_stat_mod_folio+0x79/0xb0
[ 191.387124] ? __pfx_lru_add+0x10/0x10
[ 191.387515] ? queued_spin_unlock+0x9/0x10
[ 191.387937] ? do_anonymous_page+0x33c/0xa00
[ 191.388374] ? __handle_mm_fault+0xcf3/0x1210
[ 191.388805] ? __count_memcg_events+0xbe/0x180
[ 191.389235] ? handle_mm_fault+0xae/0x300
[ 191.389588] ? do_user_addr_fault+0x559/0x800
[ 191.390027] ? irqentry_exit_to_user_mode+0x43/0x230
[ 191.390525] ? irqentry_exit+0x1d/0x30
[ 191.390879] ? exc_page_fault+0x86/0x160
[ 191.391235] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 191.391745] RIP: 0033:0x7dbc4ff1c574
[ 191.392111] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d d5 ea 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
[ 191.393412] RSP: 002b:
00007ffd95a23ab8 EFLAGS:
00000202 ORIG_RAX:
0000000000000001
[ 191.393990] RAX:
ffffffffffffffda RBX:
0000000000000005 RCX:
00007dbc4ff1c574
[ 191.394594] RDX:
0000000000000005 RSI:
00005a6eeadb0ce0 RDI:
0000000000000001
[ 191.395215] RBP:
00007ffd95a23ae0 R08:
00007dbc50003b20 R09:
0000000000000000
[ 191.395805] R10:
0000000000000001 R11:
0000000000000202 R12:
0000000000000005
[ 191.396404] R13:
00005a6eeadb0ce0 R14:
00007dbc500045c0 R15:
00007dbc50001ee0
[ 191.396987] </TASK>
To fix this, we explicitly set such mana debugfs variables to NULL after
debugfs_remove() is called.
Fixes:
6607c17c6c5e ("net: mana: Enable debugfs files for MANA device")
Cc: stable@vger.kernel.org
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/1741688260-28922-1-git-send-email-shradhagupta@linux.microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 13 Mar 2025 12:11:18 +0000 (13:11 +0100)]
Merge branch 'mlx5-misc-fixes-2025-03-10'
Tariq Toukan says:
====================
mlx5 misc fixes 2025-03-10
This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.
====================
Link: https://patch.msgid.link/1741644104-97767-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Carolina Jubran [Mon, 10 Mar 2025 22:01:44 +0000 (00:01 +0200)]
net/mlx5e: Prevent bridge link show failure for non-eswitch-allowed devices
mlx5_eswitch_get_vepa returns -EPERM if the device lacks
eswitch_manager capability, blocking mlx5e_bridge_getlink from
retrieving VEPA mode. Since mlx5e_bridge_getlink implements
ndo_bridge_getlink, returning -EPERM causes bridge link show to fail
instead of skipping devices without this capability.
To avoid this, return -EOPNOTSUPP from mlx5e_bridge_getlink when
mlx5_eswitch_get_vepa fails, ensuring the command continues processing
other devices while ignoring those without the necessary capability.
Fixes:
4b89251de024 ("net/mlx5: Support ndo bridge_setlink and getlink")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1741644104-97767-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jianbo Liu [Mon, 10 Mar 2025 22:01:43 +0000 (00:01 +0200)]
net/mlx5: Bridge, fix the crash caused by LAG state check
When removing LAG device from bridge, NETDEV_CHANGEUPPER event is
triggered. Driver finds the lower devices (PFs) to flush all the
offloaded entries. And mlx5_lag_is_shared_fdb is checked, it returns
false if one of PF is unloaded. In such case,
mlx5_esw_bridge_lag_rep_get() and its caller return NULL, instead of
the alive PF, and the flush is skipped.
Besides, the bridge fdb entry's lastuse is updated in mlx5 bridge
event handler. But this SWITCHDEV_FDB_ADD_TO_BRIDGE event can be
ignored in this case because the upper interface for bond is deleted,
and the entry will never be aged because lastuse is never updated.
To make things worse, as the entry is alive, mlx5 bridge workqueue
keeps sending that event, which is then handled by kernel bridge
notifier. It causes the following crash when accessing the passed bond
netdev which is already destroyed.
To fix this issue, remove such checks. LAG state is already checked in
commit
15f8f168952f ("net/mlx5: Bridge, verify LAG state when adding
bond to bridge"), driver still need to skip offload if LAG becomes
invalid state after initialization.
Oops: stack segment: 0000 [#1] SMP
CPU: 3 UID: 0 PID: 23695 Comm: kworker/u40:3 Tainted: G OE 6.11.0_mlnx #1
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_bridge_wq mlx5_esw_bridge_update_work [mlx5_core]
RIP: 0010:br_switchdev_event+0x2c/0x110 [bridge]
Code: 44 00 00 48 8b 02 48 f7 00 00 02 00 00 74 69 41 54 55 53 48 83 ec 08 48 8b a8 08 01 00 00 48 85 ed 74 4a 48 83 fe 02 48 89 d3 <4c> 8b 65 00 74 23 76 49 48 83 fe 05 74 7e 48 83 fe 06 75 2f 0f b7
RSP: 0018:
ffffc900092cfda0 EFLAGS:
00010297
RAX:
ffff888123bfe000 RBX:
ffffc900092cfe08 RCX:
00000000ffffffff
RDX:
ffffc900092cfe08 RSI:
0000000000000001 RDI:
ffffffffa0c585f0
RBP:
6669746f6e690a30 R08:
0000000000000000 R09:
ffff888123ae92c8
R10:
0000000000000000 R11:
fefefefefefefeff R12:
ffff888123ae9c60
R13:
0000000000000001 R14:
ffffc900092cfe08 R15:
0000000000000000
FS:
0000000000000000(0000) GS:
ffff88852c980000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f15914c8734 CR3:
0000000002830005 CR4:
0000000000770ef0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
PKRU:
55555554
Call Trace:
<TASK>
? __die_body+0x1a/0x60
? die+0x38/0x60
? do_trap+0x10b/0x120
? do_error_trap+0x64/0xa0
? exc_stack_segment+0x33/0x50
? asm_exc_stack_segment+0x22/0x30
? br_switchdev_event+0x2c/0x110 [bridge]
? sched_balance_newidle.isra.149+0x248/0x390
notifier_call_chain+0x4b/0xa0
atomic_notifier_call_chain+0x16/0x20
mlx5_esw_bridge_update+0xec/0x170 [mlx5_core]
mlx5_esw_bridge_update_work+0x19/0x40 [mlx5_core]
process_scheduled_works+0x81/0x390
worker_thread+0x106/0x250
? bh_worker+0x110/0x110
kthread+0xb7/0xe0
? kthread_park+0x80/0x80
ret_from_fork+0x2d/0x50
? kthread_park+0x80/0x80
ret_from_fork_asm+0x11/0x20
</TASK>
Fixes:
ff9b7521468b ("net/mlx5: Bridge, support LAG")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1741644104-97767-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shay Drory [Mon, 10 Mar 2025 22:01:42 +0000 (00:01 +0200)]
net/mlx5: Lag, Check shared fdb before creating MultiPort E-Switch
Currently, MultiPort E-Switch is requesting to create a LAG with shared
FDB without checking the LAG is supporting shared FDB.
Add the check.
Fixes:
a32327a3a02c ("net/mlx5: Lag, Control MultiPort E-Switch single FDB mode")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1741644104-97767-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Shay Drory [Mon, 10 Mar 2025 22:01:41 +0000 (00:01 +0200)]
net/mlx5: Fix incorrect IRQ pool usage when releasing IRQs
mlx5_irq_pool_get() is a getter for completion IRQ pool only.
However, after the cited commit, mlx5_irq_pool_get() is called during
ctrl IRQ release flow to retrieve the pool, resulting in the use of an
incorrect IRQ pool.
Hence, use the newly introduced mlx5_irq_get_pool() getter to retrieve
the correct IRQ pool based on the IRQ itself. While at it, rename
mlx5_irq_pool_get() to mlx5_irq_table_get_comp_irq_pool() which
accurately reflects its purpose and improves code readability.
Fixes:
0477d5168bbb ("net/mlx5: Expose SFs IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/1741644104-97767-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Vlad Dogaru [Mon, 10 Mar 2025 22:01:40 +0000 (00:01 +0200)]
net/mlx5: HWS, Rightsize bwc matcher priority
The bwc layer was clamping the matcher priority from 32 bits to 16 bits.
This didn't show up until a matcher was resized, since the initial
native matcher was created using the correct 32 bit value.
The fix also reorders fields to avoid some padding.
Fixes:
2111bb970c78 ("net/mlx5: HWS, added backward-compatible API handling")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741644104-97767-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yevgeny Kliteynik [Mon, 10 Mar 2025 22:01:39 +0000 (00:01 +0200)]
net/mlx5: DR, use the right action structs for STEv3
Some actions in ConnectX-8 (STEv3) have different structure,
and they are handled separately in ste_ctx_v3.
This separate handling was missing two actions: INSERT_HDR
and REMOVE_HDR, which broke SWS for Linux Bridge.
This patch resolves the issue by introducing dedicated
callbacks for the insert and remove header functions,
with version-specific implementations for each STE variant.
Fixes:
4d617b57574f ("net/mlx5: DR, add support for ConnectX-8 steering")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1741644104-97767-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Xin Long [Sat, 8 Mar 2025 18:05:43 +0000 (13:05 -0500)]
Revert "openvswitch: switch to per-action label counting in conntrack"
Currently, ovs_ct_set_labels() is only called for confirmed conntrack
entries (ct) within ovs_ct_commit(). However, if the conntrack entry
does not have the labels_ext extension, attempting to allocate it in
ovs_ct_get_conn_labels() for a confirmed entry triggers a warning in
nf_ct_ext_add():
WARN_ON(nf_ct_is_confirmed(ct));
This happens when the conntrack entry is created externally before OVS
increments net->ct.labels_used. The issue has become more likely since
commit
fcb1aa5163b1 ("openvswitch: switch to per-action label counting
in conntrack"), which changed to use per-action label counting and
increment net->ct.labels_used when a flow with ct action is added.
Since there’s no straightforward way to fully resolve this issue at the
moment, this reverts the commit to avoid breaking existing use cases.
Fixes:
fcb1aa5163b1 ("openvswitch: switch to per-action label counting in conntrack")
Reported-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/1bdeb2f3a812bca016a225d3de714427b2cd4772.1741457143.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>