linux-2.6-block.git
4 months agoxfs: simplify iext overflow checking and upgrade
Christoph Hellwig [Thu, 2 May 2024 07:33:55 +0000 (09:33 +0200)]
xfs: simplify iext overflow checking and upgrade

Currently the calls to xfs_iext_count_may_overflow and
xfs_iext_count_upgrade are always paired.  Merge them into a single
function to simplify the callers and the actual check and upgrade
logic itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent
Christoph Hellwig [Thu, 2 May 2024 07:33:54 +0000 (09:33 +0200)]
xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent

Accessing if_bytes without the ilock is racy.  Remove the initial
if_bytes == 0 check in xfs_reflink_end_cow_extent and let
ext_iext_lookup_extent fail for this case after we've taken the ilock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: upgrade the extent counters in xfs_reflink_end_cow_extent later
Christoph Hellwig [Thu, 2 May 2024 07:33:53 +0000 (09:33 +0200)]
xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later

Defer the extent counter size upgrade until we know we're going to
modify the extent mapping.  This also defers dirtying the transaction
and will allow us safely back out later in the function in later
changes.

Fixes: 4f86bb4b66c9 ("xfs: Conditionally upgrade existing inodes to use large extent counters")
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: xfs_quota_unreserve_blkres can't fail
Christoph Hellwig [Tue, 30 Apr 2024 12:58:22 +0000 (14:58 +0200)]
xfs: xfs_quota_unreserve_blkres can't fail

Unreserving quotas can't fail due to quota limits, and we'll notice a
shut down file system a bit later in all the callers anyway.  Return
void and remove the error checking and propagation in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: consolidate the xfs_quota_reserve_blkres definitions
Christoph Hellwig [Tue, 30 Apr 2024 12:58:21 +0000 (14:58 +0200)]
xfs: consolidate the xfs_quota_reserve_blkres definitions

xfs_trans_reserve_quota_nblks is already stubbed out if quota support
is disabled, no need for an extra xfs_quota_reserve_blkres stub.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: clean up buffer allocation in xlog_do_recovery_pass
Christoph Hellwig [Tue, 30 Apr 2024 04:07:56 +0000 (06:07 +0200)]
xfs: clean up buffer allocation in xlog_do_recovery_pass

Merge the initial xlog_alloc_buffer calls, and pass the variable
designating the length that is initialized to 1 above instead of passing
the open coded 1 directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: fix log recovery buffer allocation for the legacy h_size fixup
Christoph Hellwig [Tue, 30 Apr 2024 04:07:55 +0000 (06:07 +0200)]
xfs: fix log recovery buffer allocation for the legacy h_size fixup

Commit a70f9fe52daa ("xfs: detect and handle invalid iclog size set by
mkfs") added a fixup for incorrect h_size values used for the initial
umount record in old xfsprogs versions.  Later commit 0c771b99d6c9
("xfs: clean up calculation of LR header blocks") cleaned up the log
reover buffer calculation, but stoped using the fixed up h_size value
to size the log recovery buffer, which can lead to an out of bounds
access when the incorrect h_size does not come from the old mkfs
tool, but a fuzzer.

Fix this by open coding xlog_logrec_hblks and taking the fixed h_size
into account for this calculation.

Fixes: 0c771b99d6c9 ("xfs: clean up calculation of LR header blocks")
Reported-by: Sam Sun <samsun1006219@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoMerge tag 'xfs-cleanups-6.10_2024-05-02' of https://git.kernel.org/pub/scm/linux...
Chandan Babu R [Fri, 3 May 2024 05:35:39 +0000 (11:05 +0530)]
Merge tag 'xfs-cleanups-6.10_2024-05-02' of https://git./linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeF

xfs: last round of cleanups for 6.10

Here are the reviewed cleanups at the head of the fsverity series.
Apparently there's other work that could use some of these things, so
let's try to get it in for 6.10.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'xfs-cleanups-6.10_2024-05-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: widen flags argument to the xfs_iflags_* helpers
  xfs: minor cleanups of xfs_attr3_rmt_blocks
  xfs: create a helper to compute the blockcount of a max sized remote value
  xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function
  xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c

4 months agoxfs: widen flags argument to the xfs_iflags_* helpers
Darrick J. Wong [Thu, 2 May 2024 14:48:37 +0000 (07:48 -0700)]
xfs: widen flags argument to the xfs_iflags_* helpers

xfs_inode.i_flags is an unsigned long, so make these helpers take that
as the flags argument instead of unsigned short.  This is needed for the
next patch.

While we're at it, remove the iflags variable from xfs_iget_cache_miss
because we no longer need it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
4 months agoxfs: minor cleanups of xfs_attr3_rmt_blocks
Darrick J. Wong [Thu, 2 May 2024 14:48:37 +0000 (07:48 -0700)]
xfs: minor cleanups of xfs_attr3_rmt_blocks

Clean up the type signature of this function since we don't have
negative attr lengths or block counts.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: create a helper to compute the blockcount of a max sized remote value
Darrick J. Wong [Thu, 2 May 2024 14:48:36 +0000 (07:48 -0700)]
xfs: create a helper to compute the blockcount of a max sized remote value

Create a helper function to compute the number of fsblocks needed to
store a maximally-sized extended attribute value.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function
Darrick J. Wong [Thu, 2 May 2024 14:48:36 +0000 (07:48 -0700)]
xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function

Turn this into a properly typechecked function, and actually use the
correct blocksize for extended attributes.  The function cannot be
static inline because xfsprogs userspace uses it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
Darrick J. Wong [Thu, 2 May 2024 14:48:35 +0000 (07:48 -0700)]
xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c

In the next few patches we're going to refactor the attr remote code so
that we can support headerless remote xattr values for storing merkle
tree blocks.  For now, let's change the code to use unsigned int to
describe quantities of bytes and blocks that cannot be negative.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 months agoxfs: do not allocate the entire delalloc extent in xfs_bmapi_write
Christoph Hellwig [Mon, 29 Apr 2024 06:15:28 +0000 (08:15 +0200)]
xfs: do not allocate the entire delalloc extent in xfs_bmapi_write

While trying to convert the entire delalloc extent is a good decision
for regular writeback as it leads to larger contigous on-disk extents,
but for other callers of xfs_bmapi_write is is rather questionable as
it forced them to loop creating new transactions just in case there
is no large enough contiguous extent to cover the whole delalloc
reservation.

Change xfs_bmapi_write to only allocate the passed in range instead,
whіle the writeback path through xfs_bmapi_convert_delalloc and
xfs_bmapi_allocate still always converts the full extents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: fix xfs_bmap_add_extent_delay_real for partial conversions
Christoph Hellwig [Mon, 29 Apr 2024 06:15:27 +0000 (08:15 +0200)]
xfs: fix xfs_bmap_add_extent_delay_real for partial conversions

xfs_bmap_add_extent_delay_real takes parts or all of a delalloc extent
and converts them to a real extent.  It is written to deal with any
potential overlap of the to be converted range with the delalloc extent,
but it turns out that currently only converting the entire extents, or a
part starting at the beginning is actually exercised, as the only caller
always tries to convert the entire delalloc extent, and either succeeds
or at least progresses partially from the start.

If it only converts a tiny part of a delalloc extent, the indirect block
calculation for the new delalloc extent (da_new) might be equivalent to that
of the existing delalloc extent (da_old).  If this extent conversion now
requires allocating an indirect block that gets accounted into da_new,
leading to the assert that da_new must be smaller or equal to da_new
unless we split the extent to trigger.

Except for the assert that case is actually handled by just trying to
allocate more space, as that already handled for the split case (which
currently can't be reached at all), so just reusing it should be fine.
Except that without dipping into the reserved block pool that would make
it a bit too easy to trigger a fs shutdown due to ENOSPC.  So in addition
to adjusting the assert, also dip into the reserved block pool.

Note that I could only reproduce the assert with a change to only convert
the actually asked range instead of the full delalloc extent from
xfs_bmapi_write.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate
Christoph Hellwig [Mon, 29 Apr 2024 06:15:26 +0000 (08:15 +0200)]
xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate

Both callers of xfs_bmapi_allocate already initialize bma->prev, don't
redo that in xfs_bmapi_allocate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: pass the actual offset and len to allocate to xfs_bmapi_allocate
Christoph Hellwig [Mon, 29 Apr 2024 06:15:25 +0000 (08:15 +0200)]
xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate

xfs_bmapi_allocate currently overwrites offset and len when converting
delayed allocations, and duplicates the length cap done for non-delalloc
allocations.  Move all that logic into the callers to avoid duplication
and to make the calling conventions more obvious.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write
Christoph Hellwig [Mon, 29 Apr 2024 06:15:24 +0000 (08:15 +0200)]
xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write

XFS_FILBLKS_MIN uses min_t and thus does the comparison using the correct
xfs_filblks_t type.  Use it in xfs_bmapi_write and slightly adjust the
comment document th potential pitfall to take account of this

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: lift a xfs_valid_startblock into xfs_bmapi_allocate
Christoph Hellwig [Mon, 29 Apr 2024 06:15:23 +0000 (08:15 +0200)]
xfs: lift a xfs_valid_startblock into xfs_bmapi_allocate

xfs_bmapi_convert_delalloc has a xfs_valid_startblock check on the block
allocated by xfs_bmapi_allocate.  Lift it into xfs_bmapi_allocate as
we should assert the same for xfs_bmapi_write.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate
Christoph Hellwig [Mon, 29 Apr 2024 06:15:22 +0000 (08:15 +0200)]
xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate

tmp_logflags is initialized to 0 and then ORed into bma->logflags, which
isn't actually doing anything.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: fix error returns from xfs_bmapi_write
Christoph Hellwig [Mon, 29 Apr 2024 06:15:21 +0000 (08:15 +0200)]
xfs: fix error returns from xfs_bmapi_write

xfs_bmapi_write can return 0 without actually returning a mapping in
mval in two different cases:

 1) when there is absolutely no space available to do an allocation
 2) when converting delalloc space, and the allocation is so small
    that it only covers parts of the delalloc extent before the
    range requested by the caller

Callers at best can handle one of these cases, but in many cases can't
cope with either one.  Switch xfs_bmapi_write to always return a
mapping or return an error code instead.  For case 1) above ENOSPC is
the obvious choice which is very much what the callers expect anyway.
For case 2) there is no really good error code, so pick a funky one
from the SysV streams portfolio.

This fixes the reproducer here:

    https://lore.kernel.org/linux-xfs/CAEJPjCvT3Uag-pMTYuigEjWZHn1sGMZ0GCjVVCv29tNHK76Cgg@mail.gmail.com0/

which uses reserved blocks to create file systems that are gravely
out of space and thus cause at least xfs_file_alloc_space to hang
and trigger the lack of ENOSPC handling in xfs_dquot_disk_alloc.

Note that this patch does not actually make any caller but
xfs_alloc_file_space deal intelligently with case 2) above.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: 刘通 <lyutoon@gmail.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: convert delayed extents to unwritten when zeroing post eof blocks
Zhang Yi [Thu, 25 Apr 2024 13:13:30 +0000 (21:13 +0800)]
xfs: convert delayed extents to unwritten when zeroing post eof blocks

Current clone operation could be non-atomic if the destination of a file
is beyond EOF, user could get a file with corrupted (zeroed) data on
crash.

The problem is about preallocations. If you write some data into a file:

[A...B)

and XFS decides to preallocate some post-eof blocks, then it can create
a delayed allocation reservation:

[A.........D)

The writeback path tries to convert delayed extents to real ones by
allocating blocks. If there aren't enough contiguous free space, we can
end up with two extents, the first real and the second still delalloc:

[A....C)[C.D)

After that, both the in-memory and the on-disk file sizes are still B.
If we clone into the range [E...F) from another file:

[A....C)[C.D)      [E...F)

then xfs_reflink_zero_posteof() calls iomap_zero_range() to zero out the
range [B, E) beyond EOF and flush it. Since [C, D) is still a delalloc
extent, its pagecache will be zeroed and both the in-memory and on-disk
size will be updated to D after flushing but before cloning. This is
wrong, because the user can see the size change and read the zeroes
while the clone operation is ongoing.

We need to keep the in-memory and on-disk size before the clone
operation starts, so instead of writing zeroes through the page cache
for delayed ranges beyond EOF, we convert these ranges to unwritten and
invalidate any cached data over that range beyond EOF.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: make xfs_bmapi_convert_delalloc() to allocate the target offset
Zhang Yi [Thu, 25 Apr 2024 13:13:29 +0000 (21:13 +0800)]
xfs: make xfs_bmapi_convert_delalloc() to allocate the target offset

Since xfs_bmapi_convert_delalloc() only attempts to allocate the entire
delalloc extent and require multiple invocations to allocate the target
offset. So xfs_convert_blocks() add a loop to do this job and we call it
in the write back path, but xfs_convert_blocks() isn't a common helper.
Let's do it in xfs_bmapi_convert_delalloc() and drop
xfs_convert_blocks(), preparing for the post EOF delalloc blocks
converting in the buffered write begin path.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: make the seq argument to xfs_bmapi_convert_delalloc() optional
Zhang Yi [Thu, 25 Apr 2024 13:13:28 +0000 (21:13 +0800)]
xfs: make the seq argument to xfs_bmapi_convert_delalloc() optional

Allow callers to pass a NULLL seq argument if they don't care about
the fork sequence number.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: match lock mode in xfs_buffered_write_iomap_begin()
Zhang Yi [Thu, 25 Apr 2024 13:13:27 +0000 (21:13 +0800)]
xfs: match lock mode in xfs_buffered_write_iomap_begin()

Commit 1aa91d9c9933 ("xfs: Add async buffered write support") replace
xfs_ilock(XFS_ILOCK_EXCL) with xfs_ilock_for_iomap() when locking the
writing inode, and a new variable lockmode is used to indicate the lock
mode. Although the lockmode should always be XFS_ILOCK_EXCL, it's still
better to use this variable instead of useing XFS_ILOCK_EXCL directly
when unlocking the inode.

Fixes: 1aa91d9c9933 ("xfs: Add async buffered write support")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: refactor dir format helpers
Christoph Hellwig [Thu, 25 Apr 2024 13:17:03 +0000 (15:17 +0200)]
xfs: refactor dir format helpers

Add a new enum and a xfs_dir2_format helper that returns it to allow
the code to switch on the format of a directory in a single operation
and switch all helpers of xfs_dir2_isblock and xfs_dir2_isleaf to it.

This also removes the explicit xfs_iread_extents call in a few of the
call sites given that xfs_bmap_last_offset already takes care of it
underneath.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: factor out a xfs_dir_replace_args helper
Christoph Hellwig [Thu, 25 Apr 2024 13:17:02 +0000 (15:17 +0200)]
xfs: factor out a xfs_dir_replace_args helper

Add a helper to switch between the different directory formats for
removing a directory entry.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: factor out a xfs_dir_removename_args helper
Christoph Hellwig [Thu, 25 Apr 2024 13:17:01 +0000 (15:17 +0200)]
xfs: factor out a xfs_dir_removename_args helper

Add a helper to switch between the different directory formats for
removing a directory entry.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: factor out a xfs_dir_createname_args helper
Christoph Hellwig [Thu, 25 Apr 2024 13:17:00 +0000 (15:17 +0200)]
xfs: factor out a xfs_dir_createname_args helper

Add a helper to switch between the different directory formats for
creating a directory entry and to handle the XFS_DA_OP_JUSTCHECK flag
based on the passed in ino number field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
4 months agoxfs: factor out a xfs_dir_lookup_args helper
Christoph Hellwig [Thu, 25 Apr 2024 13:16:59 +0000 (15:16 +0200)]
xfs: factor out a xfs_dir_lookup_args helper

Add a helper to switch between the different directory formats for
lookup and to handle the -EEXIST return for a successful lookup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
5 months agoxfs: Remove unused function xrep_dir_self_parent
Jiapeng Chong [Wed, 24 Apr 2024 02:06:38 +0000 (10:06 +0800)]
xfs: Remove unused function xrep_dir_self_parent

The function are defined in the dir_repair.c file, but not called
elsewhere, so delete the unused function.

fs/xfs/scrub/dir_repair.c:186:1: warning: unused function 'xrep_dir_self_parent'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8867
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
5 months agoMerge tag 'repair-fixes-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux...
Chandan Babu R [Wed, 24 Apr 2024 07:01:13 +0000 (12:31 +0530)]
Merge tag 'repair-fixes-6.10_2024-04-23' of https://git./linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC

xfs: minor fixes to online repair

Here are some miscellaneous bug fixes for the online repair code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'repair-fixes-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: invalidate dentries for a file before moving it to the orphanage
  xfs: exchange-range for repairs is no longer dynamic
  xfs: fix iunlock calls in xrep_adoption_trans_alloc
  xfs: drop the scrub file's iolock when transaction allocation fails

5 months agoMerge tag 'reduce-scrub-iget-overhead-6.10_2024-04-23' of https://git.kernel.org...
Chandan Babu R [Wed, 24 Apr 2024 06:57:33 +0000 (12:27 +0530)]
Merge tag 'reduce-scrub-iget-overhead-6.10_2024-04-23' of https://git./linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC

xfs: reduce iget overhead in scrub

This patchset looks to reduce iget overhead in two ways: First, a
previous patch conditionally set DONTCACHE on inodes during xchk_irele
on the grounds that we knew better at irele time if an inode should be
dropped.  Unfortunately, over time that patch morphed into a call to
d_mark_dontcache, which resulted in inodes being dropped even if they
were referenced by the dcache.  This actually caused *more* recycle
overhead than if we'd simply called xfs_iget to set DONTCACHE only on
misses.

The second patch reduces the cost of untrusted iget for a vectored scrub
call by having the scrubv code maintain a separate refcount to the inode
so that the cache will always hit.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'reduce-scrub-iget-overhead-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: only iget the file once when doing vectored scrub-by-handle
  xfs: use dontcache for grabbing inodes during scrub

5 months agoMerge tag 'vectorized-scrub-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux...
Chandan Babu R [Wed, 24 Apr 2024 06:52:07 +0000 (12:22 +0530)]
Merge tag 'vectorized-scrub-6.10_2024-04-23' of https://git./linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC

xfs: vectorize scrub kernel calls

Create a vectorized version of the metadata scrub and repair ioctl, and
adapt xfs_scrub to use that.  This mitigates the impact of system call
overhead on xfs_scrub runtime.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'vectorized-scrub-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: introduce vectored scrub mode
  xfs: move xfs_ioc_scrub_metadata to scrub.c
  xfs: reduce the rate of cond_resched calls inside scrub

5 months agoMerge tag 'scrub-directory-tree-6.10_2024-04-23' of https://git.kernel.org/pub/scm...
Chandan Babu R [Wed, 24 Apr 2024 06:47:51 +0000 (12:17 +0530)]
Merge tag 'scrub-directory-tree-6.10_2024-04-23' of https://git./linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC

xfs: detect and correct directory tree problems

Historically, checking the tree-ness of the directory tree structure has
not been complete.  Cycles of subdirectories break the tree properties,
as do subdirectories with multiple parents.  It's easy enough for DFS to
detect problems as long as one of the participants is reachable from the
root, but this technique cannot find unconnected cycles.

Directory parent pointers change that, because we can discover all of
these problems from a simple walk from a subdirectory towards the root.
For each child we start with, if the walk terminates without reaching
the root, we know the path is disconnected and ought to be attached to
the lost and found.  If we find ourselves, we know this is a cycle and
can delete an incoming edge.  If we find multiple paths to the root, we
know to delete an incoming edge.

Even better, once we've finished walking paths, we've identified the
good ones and know which other path(s) to remove.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'scrub-directory-tree-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: fix corruptions in the directory tree
  xfs: report directory tree corruption in the health information
  xfs: invalidate dirloop scrub path data when concurrent updates happen
  xfs: teach online scrub to find directory tree structure problems

5 months agoMerge tag 'repair-pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux...
Chandan Babu R [Wed, 24 Apr 2024 06:43:09 +0000 (12:13 +0530)]
Merge tag 'repair-pptrs-6.10_2024-04-23' of https://git./linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC

xfs: online repair for parent pointers

This series implements online repair for directory parent pointer
metadata.  The checking half is fairly straightforward -- for each
outgoing directory link (forward or backwards), grab the inode at the
other end, and confirm that there's a corresponding link.  If we can't
grab an inode or lock it, we'll save that link for a slower loop that
cycles all the locks, confirms the continued existence of the link, and
rechecks the link if it's actually still there.

Repairs are a bit more involved -- for directories, we walk the entire
filesystem to rebuild the dirents from parent pointer information.
Parent pointer repairs do the same walk but rebuild the pptrs from the
dirent information, but with the added twist that it duplicates all the
xattrs so that it can use the atomic extent swapping code to commit the
repairs atomically.

This introduces an added twist to the xattr repair code -- we use dirent
hooks to detect a colliding update to the pptr data while we're not
holding the ILOCKs; if one is detected, we restart the xattr salvaging
process but this time hold all the ILOCKs until the end of the scan.

For offline repair, the phase6 directory connectivity scan generates an
index of all the expected parent pointers in the filesystem.  Then it
walks each file and compares the parent pointers attached to that file
against the index generated, and resyncs the results as necessary.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'repair-pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: inode repair should ensure there's an attr fork to store parent pointers
  xfs: repair link count of nondirectories after rebuilding parent pointers
  xfs: adapt the orphanage code to handle parent pointers
  xfs: actually rebuild the parent pointer xattrs
  xfs: add a per-leaf block callback to xchk_xattr_walk
  xfs: split xfs_bmap_add_attrfork into two pieces
  xfs: remove pointless unlocked assertion
  xfs: implement live updates for parent pointer repairs
  xfs: repair directory parent pointers by scanning for dirents
  xfs: replay unlocked parent pointer updates that accrue during xattr repair
  xfs: implement live updates for directory repairs
  xfs: repair directories by scanning directory parent pointers
  xfs: add raw parent pointer apis to support repair
  xfs: salvage parent pointers when rebuilding xattr structures
  xfs: make the reserved block permission flag explicit in xfs_attr_set
  xfs: remove some boilerplate from xfs_attr_set

5 months agoMerge tag 'scrub-pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kerne...
Chandan Babu R [Wed, 24 Apr 2024 06:36:51 +0000 (12:06 +0530)]
Merge tag 'scrub-pptrs-6.10_2024-04-23' of https://git./linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC

xfs: scrubbing for parent pointers

Teach online fsck to use parent pointers to assist in checking
directories, parent pointers, extended attributes, and link counts.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'scrub-pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: check parent pointer xattrs when scrubbing
  xfs: walk directory parent pointers to determine backref count
  xfs: deferred scrub of parent pointers
  xfs: scrub parent pointers
  xfs: deferred scrub of dirents
  xfs: check dirents have parent pointers
  xfs: revert commit 44af6c7e59b12

5 months agoMerge tag 'pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel...
Chandan Babu R [Wed, 24 Apr 2024 06:24:37 +0000 (11:54 +0530)]
Merge tag 'pptrs-6.10_2024-04-23' of https://git./linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC

xfs: Parent Pointers

This is the latest parent pointer attributes for xfs.  The goal of this
patch set is to add a parent pointer attribute to each inode.  The
attribute name containing the parent inode, generation, and directory
offset, while the  attribute value contains the file name.  This feature
will enable future optimizations for online scrub, shrink, nfs handles,
verity, or any other feature that could make use of quickly deriving an
inodes path from the mount point.

Directory parent pointers are stored as namespaced extended attributes
of a file.  Because parent pointers are an indivisible tuple of
(dirent_name, parent_ino, parent_gen) we cannot use the usual attr name
lookup functions to find a parent pointer.  This is solvable by
introducing a new lookup mode that checks both the name and the value of
the xattr.

Therefore, introduce this new name-value lookup mode that's gated on the
XFS_ATTR_PARENT namespace.  This requires the introduction of new
opcodes for the extended attribute update log intent items, which
actually means that parent pointers (itself an INCOMPAT feature) does
not depend on the LOGGED_XATTRS log incompat feature bit.

To reduce collisions on the dirent names of parent pointers, introduce a
new attr hash mode that is the dir2 namehash of the dirent name xor'd
with the parent inode number.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'pptrs-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: enable parent pointers
  xfs: drop compatibility minimum log size computations for reflink
  xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res
  xfs: add a incompat feature bit for parent pointers
  xfs: don't remove the attr fork when parent pointers are enabled
  xfs: add parent pointer ioctls
  xfs: split out handle management helpers a bit
  xfs: move handle ioctl code to xfs_handle.c
  xfs: pass the attr value to put_listent when possible
  xfs: don't return XFS_ATTR_PARENT attributes via listxattr
  xfs: Add parent pointers to xfs_cross_rename
  xfs: Add parent pointers to rename
  xfs: remove parent pointers in unlink
  xfs: add parent attributes to symlink
  xfs: add parent attributes to link
  xfs: parent pointer attribute creation
  xfs: create a hashname function for parent pointers
  xfs: extend transaction reservations for parent attributes
  xfs: add parent pointer validator functions
  xfs: Expose init_xattrs in xfs_create_tmpfile
  xfs: record inode generation in xattr update log intent items
  xfs: create attr log item opcodes and formats for parent pointers
  xfs: refactor xfs_is_using_logged_xattrs checks in attr item recovery
  xfs: allow xattr matching on name and value for parent pointers
  xfs: define parent pointer ondisk extended attribute format
  xfs: add parent pointer support to attribute code
  xfs: create a separate hashname function for extended attributes
  xfs: move xfs_attr_defer_add to xfs_attr_item.c
  xfs: check the flags earlier in xfs_attr_match
  xfs: rearrange xfs_attr_match parameters

5 months agoMerge tag 'improve-attr-validation-6.10_2024-04-23' of https://git.kernel.org/pub...
Chandan Babu R [Wed, 24 Apr 2024 06:20:04 +0000 (11:50 +0530)]
Merge tag 'improve-attr-validation-6.10_2024-04-23' of https://git./linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC

xfs: improve extended attribute validation

Prior to introducing parent pointer extended attributes, let's spend
some time cleaning up the attr code and strengthening the validation
that it performs on attrs coming in from the disk.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'improve-attr-validation-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: enforce one namespace per attribute
  xfs: refactor name/value iovec validation in xlog_recover_attri_commit_pass2
  xfs: refactor name/length checks in xfs_attri_validate
  xfs: use local variables for name and value length in _attri_commit_pass2
  xfs: always set args->value in xfs_attri_item_recover
  xfs: validate recovered name buffers when recovering xattr items
  xfs: use helpers to extract xattr op from opflags
  xfs: restructure xfs_attr_complete_op a bit
  xfs: check shortform attr entry flags specifically
  xfs: fix missing check for invalid attr flags
  xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2
  xfs: use an XFS_OPSTATE_ flag for detecting if logged xattrs are available
  xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recovery
  xfs: attr fork iext must be loaded before calling xfs_attr_is_leaf

5 months agoMerge tag 'shrink-dirattr-args-6.10_2024-04-23' of https://git.kernel.org/pub/scm...
Chandan Babu R [Wed, 24 Apr 2024 05:44:36 +0000 (11:14 +0530)]
Merge tag 'shrink-dirattr-args-6.10_2024-04-23' of https://git./linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeC

xfs: shrink struct xfs_da_args

Let's clean out some unused flags and fields from struct xfs_da_args.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'shrink-dirattr-args-6.10_2024-04-23' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
  xfs: rearrange xfs_da_args a bit to use less space
  xfs: make attr removal an explicit operation
  xfs: remove xfs_da_args.attr_flags
  xfs: remove XFS_DA_OP_NOTIME
  xfs: remove XFS_DA_OP_REMOVE

5 months agoxfs: invalidate dentries for a file before moving it to the orphanage
Darrick J. Wong [Mon, 22 Apr 2024 16:48:30 +0000 (09:48 -0700)]
xfs: invalidate dentries for a file before moving it to the orphanage

Invalidate the cached dentries that point to the file that we're moving
to lost+found before we actually move it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: exchange-range for repairs is no longer dynamic
Darrick J. Wong [Mon, 22 Apr 2024 16:48:29 +0000 (09:48 -0700)]
xfs: exchange-range for repairs is no longer dynamic

The atomic file exchange-range functionality is now a permanent
filesystem feature instead of a dynamic log-incompat feature.  It cannot
be turned on at runtime, so we no longer need the XCHK_FSGATES flags and
whatnot that supported it.  Remove the flag and the enable function, and
move the xfs_has_exchange_range checks to the start of the repair
functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: fix iunlock calls in xrep_adoption_trans_alloc
Darrick J. Wong [Mon, 22 Apr 2024 16:48:28 +0000 (09:48 -0700)]
xfs: fix iunlock calls in xrep_adoption_trans_alloc

If the transaction allocation in xrep_adoption_trans_alloc fails, we
should drop only the locks that we took.  In this case this is
ILOCK_EXCL of both the orphanage and the file being repaired.  Dropping
any IOLOCK here is incorrect.

Found by fuzzing u3.sfdir3.list[1].name = zeroes in xfs/1546.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: only iget the file once when doing vectored scrub-by-handle
Darrick J. Wong [Mon, 22 Apr 2024 16:48:26 +0000 (09:48 -0700)]
xfs: only iget the file once when doing vectored scrub-by-handle

If a program wants us to perform a scrub on a file handle and the fd
passed to ioctl() is not the file referenced in the handle, iget the
file once and pass it into the scrub code.  This amortizes the untrusted
iget lookup over /all/ the scrubbers mentioned in the scrubv call.

When running fstests in "rebuild all metadata after each test" mode, I
observed a 10% reduction in runtime on account of avoiding repeated
inobt lookups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: introduce vectored scrub mode
Darrick J. Wong [Mon, 22 Apr 2024 16:48:25 +0000 (09:48 -0700)]
xfs: introduce vectored scrub mode

Introduce a variant on XFS_SCRUB_METADATA that allows for a vectored
mode.  The caller specifies the principal metadata object that they want
to scrub (allocation group, inode, etc.) once, followed by an array of
scrub types they want called on that object.  The kernel runs the scrub
operations and writes the output flags and errno code to the
corresponding array element.

A new pseudo scrub type BARRIER is introduced to force the kernel to
return to userspace if any corruptions have been found when scrubbing
the previous scrub types in the array.  This enables userspace to
schedule, for example, the sequence:

 1. data fork
 2. barrier
 3. directory

If the data fork scrub is clean, then the kernel will perform the
directory scrub.  If not, the barrier in 2 will exit back to userspace.

The alternative would have been an interface where userspace passes a
pointer to an empty buffer, and the kernel formats that with
xfs_scrub_vecs that tell userspace what it scrubbed and what the outcome
was.  With that the kernel would have to communicate that the buffer
needed to have been at least X size, even though for our cases
XFS_SCRUB_TYPE_NR + 2 would always be enough.

Compared to that, this design keeps all the dependency policy and
ordering logic in userspace where it already resides instead of
duplicating it in the kernel. The downside of that is that it needs the
barrier logic.

When running fstests in "rebuild all metadata after each test" mode, I
observed a 10% reduction in runtime due to fewer transitions across the
system call boundary.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: drop the scrub file's iolock when transaction allocation fails
Darrick J. Wong [Mon, 22 Apr 2024 16:48:27 +0000 (09:48 -0700)]
xfs: drop the scrub file's iolock when transaction allocation fails

If the transaction allocation in the !orphanage_available case of
xrep_nlinks_repair_inode fails, we need to drop the IOLOCK of the file
being scrubbed before exiting.

Found by fuzzing u3.sfdir3.list[1].name = zeroes in xfs/1546.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: use dontcache for grabbing inodes during scrub
Darrick J. Wong [Mon, 22 Apr 2024 16:48:26 +0000 (09:48 -0700)]
xfs: use dontcache for grabbing inodes during scrub

Back when I wrote commit a03297a0ca9f2, I had thought that we'd be doing
users a favor by only marking inodes dontcache at the end of a scrub
operation, and only if there's only one reference to that inode.  This
was more or less true back when I_DONTCACHE was an XFS iflag and the
only thing it did was change the outcome of xfs_fs_drop_inode to 1.

Note: If there are dentries pointing to the inode when scrub finishes,
the inode will have positive i_count and stay around in cache until
dentry reclaim.

But now we have d_mark_dontcache, which cause the inode *and* the
dentries attached to it all to be marked I_DONTCACHE, which means that
we drop the dentries ASAP, which drops the inode ASAP.

This is bad if scrub found problems with the inode, because now they can
be scheduled for inactivation, which can cause inodegc to trip on it and
shut down the filesystem.

Even if the inode isn't bad, this is still suboptimal because phases 3-7
each initiate inode scans.  Dropping the inode immediately during phase
3 is silly because phase 5 will reload it and drop it immediately, etc.
It's fine to mark the inodes dontcache, but if there have been accesses
to the file that set up dentries, we should keep them.

I validated this by setting up ftrace to capture xfs_iget_recycle*
tracepoints and ran xfs/285 for 30 seconds.  With current djwong-wtf I
saw ~30,000 recycle events.  I then dropped the d_mark_dontcache calls
and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per
30 seconds.

Therefore, grab the inode with XFS_IGET_DONTCACHE, which only has the
effect of setting I_DONTCACHE for cache misses.  Remove the
d_mark_dontcache call that can happen in xchk_irele.

Fixes: a03297a0ca9f2 ("xfs: manage inode DONTCACHE status at irele time")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: fix corruptions in the directory tree
Darrick J. Wong [Mon, 22 Apr 2024 16:48:22 +0000 (09:48 -0700)]
xfs: fix corruptions in the directory tree

Repair corruptions in the directory tree itself.  Cycles are broken by
removing an incoming parent->child link.  Multiply-owned directories are
fixed by pruning the extra parent -> child links  Disconnected subtrees
are reconnected to the lost and found.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: move xfs_ioc_scrub_metadata to scrub.c
Darrick J. Wong [Mon, 22 Apr 2024 16:48:24 +0000 (09:48 -0700)]
xfs: move xfs_ioc_scrub_metadata to scrub.c

Move the scrub ioctl handler to scrub.c to keep the code together and to
reduce unnecessary code when CONFIG_XFS_ONLINE_SCRUB=n.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: report directory tree corruption in the health information
Darrick J. Wong [Mon, 22 Apr 2024 16:48:21 +0000 (09:48 -0700)]
xfs: report directory tree corruption in the health information

Report directories that are the source of corruption in the directory
tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: reduce the rate of cond_resched calls inside scrub
Darrick J. Wong [Mon, 22 Apr 2024 16:48:23 +0000 (09:48 -0700)]
xfs: reduce the rate of cond_resched calls inside scrub

We really don't want to call cond_resched every single time we go
through a loop in scrub -- there may be billions of records, and probing
into the scheduler itself has overhead.  Reduce this overhead by only
calling cond_resched 10x per second; and add a counter so that we only
check jiffies once every 1000 records or so.

Surprisingly, this reduces scrub-only fstests runtime by about 2%.  I
used the bmapinflate xfs_db command to produce a billion-extent file and
this stupid gadget reduced the scrub runtime by about 4%.

From a stupid microbenchmark of calling these things 1 billion times, I
estimate that cond_resched costs about 5.5ns per call; jiffes costs
about 0.3ns per read; and fatal_signal_pending costs about 0.4ns per
call.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: inode repair should ensure there's an attr fork to store parent pointers
Darrick J. Wong [Mon, 22 Apr 2024 16:48:19 +0000 (09:48 -0700)]
xfs: inode repair should ensure there's an attr fork to store parent pointers

The runtime parent pointer update code expects that any file being moved
around the directory tree already has an attr fork.  However, if we had
to rebuild an inode core record, there's a chance that we zeroed forkoff
as part of the inode to pass the iget verifiers.

Therefore, if we performed any repairs on an inode core, ensure that the
inode has a nonzero forkoff before unlocking the inode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: invalidate dirloop scrub path data when concurrent updates happen
Darrick J. Wong [Mon, 22 Apr 2024 16:48:21 +0000 (09:48 -0700)]
xfs: invalidate dirloop scrub path data when concurrent updates happen

Add a dirent update hook so that we can detect directory tree updates
that affect any of the paths found by this scrubber and force it to
rescan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: repair link count of nondirectories after rebuilding parent pointers
Darrick J. Wong [Mon, 22 Apr 2024 16:48:18 +0000 (09:48 -0700)]
xfs: repair link count of nondirectories after rebuilding parent pointers

Since the parent pointer scrubber does not exhaustively search the
filesystem for missing parent pointers, it doesn't have a good way to
determine that there are pointers missing from an otherwise uncorrupt
xattr structure.  Instead, for nondirectories it employs a heuristic of
comparing the file link count to the number of parent pointers found.

However, we don't want this heuristic flagging a false corruption after
a repair has actually scanned the entire filesystem to rebuild the
parent pointers.  Therefore, reset the file link count in this one case
because we actually know the correct link count.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: teach online scrub to find directory tree structure problems
Darrick J. Wong [Mon, 22 Apr 2024 16:48:20 +0000 (09:48 -0700)]
xfs: teach online scrub to find directory tree structure problems

Create a new scrubber that detects corruptions within the directory tree
structure itself.  It can detect directories with multiple parents;
loops within the directory tree; and directory loops not accessible from
the root.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: adapt the orphanage code to handle parent pointers
Darrick J. Wong [Mon, 22 Apr 2024 16:48:17 +0000 (09:48 -0700)]
xfs: adapt the orphanage code to handle parent pointers

Adapt the orphanage's adoption code to update the child file's parent
pointers as part of the reparenting process.  Also ensure that the child
has an attr fork to receive the parent pointer update, since the runtime
code assumes one exists.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: actually rebuild the parent pointer xattrs
Darrick J. Wong [Mon, 22 Apr 2024 16:48:16 +0000 (09:48 -0700)]
xfs: actually rebuild the parent pointer xattrs

Once we've assembled all the parent pointers for a file, we need to
commit the new dataset atomically to that file.  Parent pointer records
are embedded in the xattr structure, which means that we must write a
new extended attribute structure, again, atomically.  Therefore, we must
copy the non-parent-pointer attributes from the file being repaired into
the temporary file's extended attributes and then call the atomic extent
swap mechanism to exchange the blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: add a per-leaf block callback to xchk_xattr_walk
Darrick J. Wong [Mon, 22 Apr 2024 16:48:15 +0000 (09:48 -0700)]
xfs: add a per-leaf block callback to xchk_xattr_walk

Add a second callback function to xchk_xattr_walk so that we can do
something in between attr leaf blocks.  This will be used by the next
patch to see if we should flush cached parent pointer updates to
constrain memory usage.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: split xfs_bmap_add_attrfork into two pieces
Darrick J. Wong [Mon, 22 Apr 2024 16:48:15 +0000 (09:48 -0700)]
xfs: split xfs_bmap_add_attrfork into two pieces

Split this function into two pieces -- one to make the actual changes to
the inode core to add the attr fork, and another one to deal with
getting the transaction and locking the inodes.

The next couple of patches will need this to be split into two.  One
patch implements committing new parent pointer recordsets to damaged
files.  If one file has an attr fork and the other does not, we have to
create the missing attr fork before the atomic swap transaction, and can
use the behavior encoded in the current xfs_bmap_add_attrfork.

The second patch adapts /lost+found adoptions to handle parent pointers
correctly.  The adoption process will add a parent pointer to a child
that is being moved to /lost+found, but this requires that the attr fork
already exists.  We don't know if we're actually going to commit the
adoption until we've already reserved a transaction and taken the
ILOCKs, which means that we must have a way to bypass the start of the
current xfs_bmap_add_attrfork.

Therefore, create xfs_attr_add_fork as the helper that creates a
transaction and takes locks; and make xfs_bmap_add_attrfork the function
that updates the inode core and allocates the incore attr fork.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: remove pointless unlocked assertion
Darrick J. Wong [Mon, 22 Apr 2024 16:48:14 +0000 (09:48 -0700)]
xfs: remove pointless unlocked assertion

Remove this assertion about the inode not having an attr fork from
xfs_bmap_add_attrfork because the function handles that case just fine.
Weirder still, the function actually /requires/ the caller not to hold
the ILOCK, which means that its accesses are not stabilized.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: implement live updates for parent pointer repairs
Darrick J. Wong [Mon, 22 Apr 2024 16:48:13 +0000 (09:48 -0700)]
xfs: implement live updates for parent pointer repairs

While we're scanning the filesystem for dirents that we can turn into
parent pointers, we cannot hold the IOLOCK or ILOCK of the file being
repaired.  Therefore, we need to set up a dirent hook so that we can
keep the temporary file's parent pionters up to date with the rest of
the filesystem.  Hence we add the ability to *remove* pptrs from the
temporary file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: repair directory parent pointers by scanning for dirents
Darrick J. Wong [Mon, 22 Apr 2024 16:48:12 +0000 (09:48 -0700)]
xfs: repair directory parent pointers by scanning for dirents

If parent pointers are enabled on the filesystem, we can repair the
entire dataset by walking the directories of the filesystem looking for
dirents that we can turn into parent pointers.  Once we have a full
incore dataset, we'll figure out what to do with it, but that's for a
subsequent patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: replay unlocked parent pointer updates that accrue during xattr repair
Darrick J. Wong [Mon, 22 Apr 2024 16:48:11 +0000 (09:48 -0700)]
xfs: replay unlocked parent pointer updates that accrue during xattr repair

There are a few places where the extended attribute repair code drops
the ILOCK to apply stashed xattrs to the temporary file.  Although
setxattr and removexattr are still locked out because we retain our hold
on the IOLOCK, this doesn't prevent renames from updating parent
pointers, because the VFS doesn't take i_rwsem on children that are
being moved.

Therefore, set up a dirent hook to capture parent pointer updates for
this file, and replay(?) the updates.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: implement live updates for directory repairs
Darrick J. Wong [Mon, 22 Apr 2024 16:48:10 +0000 (09:48 -0700)]
xfs: implement live updates for directory repairs

While we're scanning the filesystem for parent pointers that we can turn
into dirents, we cannot hold the IOLOCK or ILOCK of the directory being
repaired.  Therefore, we need to set up a dirent hook so that we can
keep the temporary directory up to date with the rest of the filesystem.
Hence we add the ability to *remove* entries from the temporary dir.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: repair directories by scanning directory parent pointers
Darrick J. Wong [Mon, 22 Apr 2024 16:48:10 +0000 (09:48 -0700)]
xfs: repair directories by scanning directory parent pointers

For filesystems with parent pointers, scan the entire filesystem looking
for parent pointers that target the directory we're rebuilding instead
of trying to salvage whatever we can from the directory data blocks.
This will be more robust than salvaging, but there's more code to come.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: add raw parent pointer apis to support repair
Darrick J. Wong [Mon, 22 Apr 2024 16:48:09 +0000 (09:48 -0700)]
xfs: add raw parent pointer apis to support repair

Add a couple of utility functions to set or remove parent pointers from
a file.  These functions will be used by repair code, hence they skip
the xattr logging that regular parent pointer updates use.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: check parent pointer xattrs when scrubbing
Darrick J. Wong [Mon, 22 Apr 2024 16:48:05 +0000 (09:48 -0700)]
xfs: check parent pointer xattrs when scrubbing

Check parent pointer xattrs as part of scrubbing xattrs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: salvage parent pointers when rebuilding xattr structures
Darrick J. Wong [Mon, 22 Apr 2024 16:48:08 +0000 (09:48 -0700)]
xfs: salvage parent pointers when rebuilding xattr structures

When we're salvaging extended attributes, make sure we validate the ones
that claim to be parent pointers before adding them to the salvage pile.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: walk directory parent pointers to determine backref count
Darrick J. Wong [Mon, 22 Apr 2024 16:48:05 +0000 (09:48 -0700)]
xfs: walk directory parent pointers to determine backref count

If the filesystem has parent pointers enabled, walk the parent pointers
of subdirectories to determine the true backref count.  In theory each
subdir should have a single parent reachable via dotdot, but in the case
of (corrupt) subdirs with multiple parents, we need to keep the link
counts high enough that the directory loop detector will be able to
correct the multiple parents problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: make the reserved block permission flag explicit in xfs_attr_set
Darrick J. Wong [Mon, 22 Apr 2024 16:48:07 +0000 (09:48 -0700)]
xfs: make the reserved block permission flag explicit in xfs_attr_set

Make the use of reserved blocks an explicit parameter to xfs_attr_set.
Userspace setting XFS_ATTR_ROOT attrs should continue to be able to use
it, but for online repairs we can back out and therefore do not care.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: remove some boilerplate from xfs_attr_set
Darrick J. Wong [Mon, 22 Apr 2024 16:48:06 +0000 (09:48 -0700)]
xfs: remove some boilerplate from xfs_attr_set

In preparation for online/offline repair wanting to use xfs_attr_set,
move some of the boilerplate out of this function into the callers.
Repair can initialize the da_args completely, and the userspace flag
handling/twisting goes away once we move it to xfs_attr_change.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: deferred scrub of parent pointers
Darrick J. Wong [Mon, 22 Apr 2024 16:48:04 +0000 (09:48 -0700)]
xfs: deferred scrub of parent pointers

If the trylock-based dirent check fails, retain those parent pointers
and check them at the end.  This may involve dropping the locks on the
file being scanned, so yay.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: scrub parent pointers
Darrick J. Wong [Mon, 22 Apr 2024 16:48:03 +0000 (09:48 -0700)]
xfs: scrub parent pointers

Actually check parent pointers now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: deferred scrub of dirents
Darrick J. Wong [Mon, 22 Apr 2024 16:48:02 +0000 (09:48 -0700)]
xfs: deferred scrub of dirents

If the trylock-based parent pointer check fails, retain those dirents
and check them at the end.  This may involve dropping the locks on the
file being scanned, so yay.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: check dirents have parent pointers
Darrick J. Wong [Mon, 22 Apr 2024 16:48:01 +0000 (09:48 -0700)]
xfs: check dirents have parent pointers

If the fs has parent pointers, we need to check that each child dirent
points to a file that has a parent pointer pointing back at us.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: enable parent pointers
Darrick J. Wong [Mon, 22 Apr 2024 16:48:00 +0000 (09:48 -0700)]
xfs: enable parent pointers

Add parent pointers to the list of supported features.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: revert commit 44af6c7e59b12
Darrick J. Wong [Mon, 22 Apr 2024 16:48:00 +0000 (09:48 -0700)]
xfs: revert commit 44af6c7e59b12

In my haste to fix what I thought was a performance problem in the attr
scrub code, I neglected to notice that the xfs_attr_get_ilocked also had
the effect of checking that attributes can actually be looked up through
the attr dabtree.  Fix this.

Fixes: 44af6c7e59b12 ("xfs: don't load local xattr values during scrub")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: drop compatibility minimum log size computations for reflink
Darrick J. Wong [Mon, 22 Apr 2024 16:47:59 +0000 (09:47 -0700)]
xfs: drop compatibility minimum log size computations for reflink

Let's also drop the oversized minimum log computations for reflink and
rmap that were the result of bugs introduced many years ago.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res
Darrick J. Wong [Mon, 22 Apr 2024 16:47:58 +0000 (09:47 -0700)]
xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res

Dave and I were discussing some recent test regressions as a result of
me turning on nrext64=1 on realtime filesystems, when we noticed that
the minimum log size of a 32M filesystem jumped from 954 blocks to 4287
blocks.

Digging through xfs_log_calc_max_attrsetm_res, Dave noticed that @size
contains the maximum estimated amount of space needed for a local format
xattr, in bytes, but we feed this quantity to XFS_NEXTENTADD_SPACE_RES,
which requires units of blocks.  This has resulted in an overestimation
of the minimum log size over the years.

We should nominally correct this, but there's a backwards compatibility
problem -- if we enable it now, the minimum log size will decrease.  If
a corrected mkfs formats a filesystem with this new smaller log size, a
user will encounter mount failures on an uncorrected kernel due to the
larger minimum log size computations there.

Therefore, turn this on for parent pointers because it wasn't merged at
all upstream when this issue was discovered.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: add a incompat feature bit for parent pointers
Allison Henderson [Mon, 22 Apr 2024 16:47:57 +0000 (09:47 -0700)]
xfs: add a incompat feature bit for parent pointers

Create an incompat feature bit and a fs geometry flag so that we can
enable the feature in the ondisk superblock and advertise its existence
to userspace.

Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
5 months agoxfs: don't remove the attr fork when parent pointers are enabled
Allison Henderson [Mon, 22 Apr 2024 16:47:56 +0000 (09:47 -0700)]
xfs: don't remove the attr fork when parent pointers are enabled

When an inode is removed, it may also cause the attribute fork to be
removed if it is the last attribute. This transaction gets flushed to
the log, but if the system goes down before we could inactivate the symlink,
the log recovery tries to inactivate this inode (since it is on the unlinked
list) but the verifier trips over the remote value and leaks it.

Hence we ended up with a file in this odd state on a "clean" mount.  The
"obvious" fix is to prohibit erasure of the attr fork to avoid tripping
over the verifiers when pptrs are enabled.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: add parent pointer ioctls
Darrick J. Wong [Mon, 22 Apr 2024 16:47:55 +0000 (09:47 -0700)]
xfs: add parent pointer ioctls

This patch adds a pair of new file ioctls to retrieve the parent pointer
of a given inode.  They both return the same results, but one operates
on the file descriptor passed to ioctl() whereas the other allows the
caller to specify a file handle for which the caller wants results.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: split out handle management helpers a bit
Darrick J. Wong [Mon, 22 Apr 2024 16:47:55 +0000 (09:47 -0700)]
xfs: split out handle management helpers a bit

Split out the functions that generate file/fs handles and map them back
into dentries in preparation for the GETPARENTS ioctl next.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: move handle ioctl code to xfs_handle.c
Darrick J. Wong [Mon, 22 Apr 2024 16:47:54 +0000 (09:47 -0700)]
xfs: move handle ioctl code to xfs_handle.c

Move the handle managemnet code (and the attrmulti code that uses it) to
xfs_handle.c.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: pass the attr value to put_listent when possible
Allison Henderson [Mon, 22 Apr 2024 16:47:53 +0000 (09:47 -0700)]
xfs: pass the attr value to put_listent when possible

Pass the attr value to put_listent when we have local xattrs or
shortform xattrs.  This will enable the GETPARENTS ioctl to use
xfs_attr_list as its backend.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: don't return XFS_ATTR_PARENT attributes via listxattr
Allison Henderson [Mon, 22 Apr 2024 16:47:52 +0000 (09:47 -0700)]
xfs: don't return XFS_ATTR_PARENT attributes via listxattr

Parent pointers are internal filesystem metadata.  They're not intended
to be directly visible to userspace, so filter them out of
xfs_xattr_put_listent so that they don't appear in listxattr.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Inspired-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: change this to XFS_ATTR_PRIVATE_NSP_MASK per fsverity patchset]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: Add parent pointers to xfs_cross_rename
Allison Henderson [Mon, 22 Apr 2024 16:47:51 +0000 (09:47 -0700)]
xfs: Add parent pointers to xfs_cross_rename

Cross renames are handled separately from standard renames, and
need different handling to update the parent attributes correctly.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: Add parent pointers to rename
Allison Henderson [Mon, 22 Apr 2024 16:47:50 +0000 (09:47 -0700)]
xfs: Add parent pointers to rename

This patch removes the old parent pointer attribute during the rename
operation, and re-adds the updated parent pointer.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: adjust to new ondisk format]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: remove parent pointers in unlink
Allison Henderson [Mon, 22 Apr 2024 16:47:49 +0000 (09:47 -0700)]
xfs: remove parent pointers in unlink

This patch removes the parent pointer attribute during unlink

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: adjust to new ondisk format, minor rebase fixes]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: add parent attributes to symlink
Allison Henderson [Mon, 22 Apr 2024 16:47:49 +0000 (09:47 -0700)]
xfs: add parent attributes to symlink

This patch modifies xfs_symlink to add a parent pointer to the inode.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor rebase fixups]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: add parent attributes to link
Allison Henderson [Mon, 22 Apr 2024 16:47:48 +0000 (09:47 -0700)]
xfs: add parent attributes to link

This patch modifies xfs_link to add a parent pointer to the inode.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor rebase fixes]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: parent pointer attribute creation
Allison Henderson [Mon, 22 Apr 2024 16:47:47 +0000 (09:47 -0700)]
xfs: parent pointer attribute creation

Add parent pointer attribute during xfs_create, and subroutines to
initialize attributes.  Note that the xfs_attr_intent object contains a
pointer to the caller's xfs_da_args object, so the latter must persist
until transaction commit.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: shorten names, adjust to new format, set init_xattrs for parent
pointers]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: create a hashname function for parent pointers
Darrick J. Wong [Mon, 22 Apr 2024 16:47:46 +0000 (09:47 -0700)]
xfs: create a hashname function for parent pointers

Although directory entry and parent pointer recordsets look very similar
(name -> ino), there's one major difference between them: a file can be
hardlinked from multiple parent directories with the same filename.
This is common in shared container environments where a base directory
tree might be hardlink-copied multiple times.  IOWs the same 'ls'
program might be hardlinked to multiple /srv/*/bin/ls paths.

We don't want parent pointer operations to bog down on hash collisions
between the same dirent name, so create a special hash function that
mixes in the parent directory inode number.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: extend transaction reservations for parent attributes
Allison Henderson [Mon, 22 Apr 2024 16:47:45 +0000 (09:47 -0700)]
xfs: extend transaction reservations for parent attributes

We need to add, remove or modify parent pointer attributes during
create/link/unlink/rename operations atomically with the dirents in the
parent directories being modified. This means they need to be modified
in the same transaction as the parent directories, and so we need to add
the required space for the attribute modifications to the transaction
reservations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix indenting errors, adjust for new log format]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: add parent pointer validator functions
Allison Henderson [Mon, 22 Apr 2024 16:47:44 +0000 (09:47 -0700)]
xfs: add parent pointer validator functions

The attr name of a parent pointer is a string, and the attr value of a
parent pointer is (more or less) a file handle.  So we need to modify
attr_namecheck to verify the parent pointer name, and add a
xfs_parent_valuecheck function to sanitize the handle.  At the same
time, we need to validate attr values during log recovery if the xattr
is really a parent pointer.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: move functions to xfs_parent.c, adjust for new disk format]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: Expose init_xattrs in xfs_create_tmpfile
Allison Henderson [Mon, 22 Apr 2024 16:47:43 +0000 (09:47 -0700)]
xfs: Expose init_xattrs in xfs_create_tmpfile

Tmp files are used as part of rename operations and will need attr forks
initialized for parent pointers.  Expose the init_xattrs parameter to
the calling function to initialize the fork.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: record inode generation in xattr update log intent items
Darrick J. Wong [Mon, 22 Apr 2024 16:47:43 +0000 (09:47 -0700)]
xfs: record inode generation in xattr update log intent items

For parent pointer updates, record the i_generation of the file that is
being updated so that we don't accidentally jump generations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: create attr log item opcodes and formats for parent pointers
Darrick J. Wong [Mon, 22 Apr 2024 16:47:42 +0000 (09:47 -0700)]
xfs: create attr log item opcodes and formats for parent pointers

Make the necessary alterations to the extended attribute log intent item
ondisk format so that we can log parent pointer operations.  This
requires the creation of new opcodes specific to parent pointers, and a
new four-argument replace operation to handle renames.  At this point
this part of the patchset has changed so much from what Allison original
wrote that I no longer think her SoB applies.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: refactor xfs_is_using_logged_xattrs checks in attr item recovery
Darrick J. Wong [Mon, 22 Apr 2024 16:47:41 +0000 (09:47 -0700)]
xfs: refactor xfs_is_using_logged_xattrs checks in attr item recovery

Move this feature check down to the per-op checks so that we can ensure
that we never see parent pointer attr items on non-pptr filesystems, and
that logged xattrs are turned on for non-pptr attr items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
5 months agoxfs: allow xattr matching on name and value for parent pointers
Darrick J. Wong [Mon, 22 Apr 2024 16:47:40 +0000 (09:47 -0700)]
xfs: allow xattr matching on name and value for parent pointers

If a file is hardlinked with the same name but from multiple parents,
the parent pointers will all have the same dirent name (== attr name)
but with different parent_ino/parent_gen values.  To disambiguate, we
need to be able to match on both the attr name and the attr value.  This
is in contrast to regular xattrs, which are matchtg edit
d only on name.

Therefore, plumb in the ability to match shortform and local attrs on
name and value in the XFS_ATTR_PARENT namespace.  Parent pointer attr
values are never large enough to be stored in a remote attr, so we need
can reject these cases as corruption.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>