bcachefs: Runtime self healing for keys for deleted snapshots
authorKent Overstreet <kent.overstreet@linux.dev>
Wed, 28 May 2025 02:20:27 +0000 (22:20 -0400)
committerKent Overstreet <kent.overstreet@linux.dev>
Sun, 1 Jun 2025 02:03:17 +0000 (22:03 -0400)
If snapshot deletion incorrectly missing some keys and leaves keys for
deleted snapshots, that causes a bit of a problem for data move - we
can't move an extent for a nonexistent snapshot, because the extent
might have to be fragmented, and maintaining correct visibility in child
snapshots doesn't work if it doesn't have a snapshot.

Previously we'd just skip these keys, but it turns out that causes
copygc to spin.

So we need runtime self healing, i.e. calling check_key_has_snapshot()
from the data move path.

Snapshot deletion v2 included sentinal values for deleted snapshot
nodes, so this is quite safe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
fs/bcachefs/data_update.c

index 63a10ea83c97432ec2774b32ee4e1f7ac0bf308e..fafe7a57ea4161b4f6558e48e762c4f698c101a6 100644 (file)
@@ -821,13 +821,24 @@ int bch2_data_update_init(struct btree_trans *trans,
        struct bch_fs *c = trans->c;
        int ret = 0;
 
-       /*
-        * fs is corrupt  we have a key for a snapshot node that doesn't exist,
-        * and we have to check for this because we go rw before repairing the
-        * snapshots table - just skip it, we can move it later.
-        */
-       if (unlikely(k.k->p.snapshot && !bch2_snapshot_exists(c, k.k->p.snapshot)))
-               return -BCH_ERR_data_update_done_no_snapshot;
+       if (k.k->p.snapshot) {
+               /*
+                * We'll go ERO if we see a key for a missing snapshot, and if
+                * we're still in recovery we want to give that a chance to
+                * repair:
+                */
+               if (unlikely(test_bit(BCH_FS_in_recovery, &c->flags) &&
+                            bch2_snapshot_id_state(c, k.k->p.snapshot) == SNAPSHOT_ID_empty))
+                       return -BCH_ERR_data_update_done_no_snapshot;
+
+               ret = bch2_check_key_has_snapshot(trans, iter, k);
+               if (ret < 0)
+                       return ret;
+               if (ret) /* key was deleted */
+                       return bch2_trans_commit(trans, NULL, NULL, BCH_TRANS_COMMIT_no_enospc) ?:
+                               -BCH_ERR_data_update_done_no_snapshot;
+               ret = 0;
+       }
 
        bch2_bkey_buf_init(&m->k);
        bch2_bkey_buf_reassemble(&m->k, c, k);