blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_pol...
authorYu Kuai <yukuai3@huawei.com>
Thu, 19 Jan 2023 11:03:50 +0000 (19:03 +0800)
committerJens Axboe <axboe@kernel.dk>
Sun, 29 Jan 2023 22:19:04 +0000 (15:19 -0700)
commitf1c006f1c6850c14040f8337753a63119bba39b9
tree1bd458198e20174933ee162f16f315a0d82e339e
parentdfd6200a095440b663099d8d42f1efb0175a1ce3
blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()

Currently parent pd can be freed before child pd:

t1: remove cgroup C1
blkcg_destroy_blkgs
 blkg_destroy
  list_del_init(&blkg->q_node)
  // remove blkg from queue list
  percpu_ref_kill(&blkg->refcnt)
   blkg_release
    call_rcu

t2: from t1
__blkg_release
 blkg_free
  schedule_work
t4: deactivate policy
blkcg_deactivate_policy
 pd_free_fn
 // parent of C1 is freed first
t3: from t2
 blkg_free_workfn
  pd_free_fn

If policy(for example, ioc_timer_fn() from iocost) access parent pd from
child pd after pd_offline_fn(), then UAF can be triggered.

Fix the problem by delaying 'list_del_init(&blkg->q_node)' from
blkg_destroy() to blkg_free_workfn(), and using a new disk level mutex to
synchronize blkg_free_workfn() and blkcg_deactivate_policy().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230119110350.2287325-4-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
block/blk-cgroup.c
include/linux/blkdev.h