Merge tag 'iomap-5.10-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
[linux-block.git] / Documentation / vm / split_page_table_lock.rst
CommitLineData
d18edf52
MR
1.. _split_page_table_lock:
2
3=====================
49076ec2
KS
4Split page table lock
5=====================
6
7Originally, mm->page_table_lock spinlock protected all page tables of the
8mm_struct. But this approach leads to poor page fault scalability of
9multi-threaded applications due high contention on the lock. To improve
10scalability, split page table lock was introduced.
11
12With split page table lock we have separate per-table lock to serialize
13access to the table. At the moment we use split lock for PTE and PMD
14tables. Access to higher level tables protected by mm->page_table_lock.
15
16There are helpers to lock/unlock a table and other accessor functions:
d18edf52 17
49076ec2
KS
18 - pte_offset_map_lock()
19 maps pte and takes PTE table lock, returns pointer to the taken
20 lock;
21 - pte_unmap_unlock()
22 unlocks and unmaps PTE table;
23 - pte_alloc_map_lock()
24 allocates PTE table if needed and take the lock, returns pointer
25 to taken lock or NULL if allocation failed;
26 - pte_lockptr()
27 returns pointer to PTE table lock;
28 - pmd_lock()
29 takes PMD table lock, returns pointer to taken lock;
30 - pmd_lockptr()
31 returns pointer to PMD table lock;
32
33Split page table lock for PTE tables is enabled compile-time if
34CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
35If split lock is disabled, all tables guaded by mm->page_table_lock.
36
37Split page table lock for PMD tables is enabled, if it's enabled for PTE
38tables and the architecture supports it (see below).
39
40Hugetlb and split page table lock
d18edf52 41=================================
49076ec2
KS
42
43Hugetlb can support several page sizes. We use split lock only for PMD
44level, but not for PUD.
45
46Hugetlb-specific helpers:
d18edf52 47
49076ec2
KS
48 - huge_pte_lock()
49 takes pmd split lock for PMD_SIZE page, mm->page_table_lock
50 otherwise;
51 - huge_pte_lockptr()
52 returns pointer to table lock;
53
54Support of split page table lock by an architecture
d18edf52 55===================================================
49076ec2 56
b4ed71f5
MR
57There's no need in special enabling of PTE split page table lock: everything
58required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), which
59must be called on PTE table allocation / freeing.
49076ec2
KS
60
61Make sure the architecture doesn't use slab allocator for page table
1d798ca3
KS
62allocation: slab uses page->slab_cache for its pages.
63This field shares storage with page->ptl.
49076ec2
KS
64
65PMD split lock only makes sense if you have more than two page table
66levels.
67
68PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
69allocation and pgtable_pmd_page_dtor() on freeing.
70
c283610e
KS
71Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
72pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
73paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
49076ec2
KS
74
75With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
76
b4ed71f5 77NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
49076ec2
KS
78be handled properly.
79
80page->ptl
d18edf52 81=========
49076ec2
KS
82
83page->ptl is used to access split page table lock, where 'page' is struct
84page of page containing the table. It shares storage with page->private
85(and few other fields in union).
86
87To avoid increasing size of struct page and have best performance, we use a
88trick:
d18edf52 89
49076ec2
KS
90 - if spinlock_t fits into long, we use page->ptr as spinlock, so we
91 can avoid indirect access and save a cache line.
92 - if size of spinlock_t is bigger then size of long, we use page->ptl as
93 pointer to spinlock_t and allocate it dynamically. This allows to use
94 split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
95 one more cache line for indirect access;
96
b4ed71f5 97The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
49076ec2
KS
98pgtable_pmd_page_ctor() for PMD table.
99
100Please, never access page->ptl directly -- use appropriate helper.