Commit | Line | Data |
---|---|---|
16f9f7f9 MR |
1 | .. _mmu_notifier: |
2 | ||
0f10851e | 3 | When do you need to notify inside page table lock ? |
16f9f7f9 | 4 | =================================================== |
0f10851e JG |
5 | |
6 | When clearing a pte/pmd we are given a choice to notify the event through | |
16f9f7f9 | 7 | (notify version of \*_clear_flush call mmu_notifier_invalidate_range) under |
0f10851e JG |
8 | the page table lock. But that notification is not necessary in all cases. |
9 | ||
10 | For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when device use | |
11 | thing like ATS/PASID to get the IOMMU to walk the CPU page table to access a | |
12 | process virtual address space). There is only 2 cases when you need to notify | |
13 | those secondary TLB while holding page table lock when clearing a pte/pmd: | |
14 | ||
15 | A) page backing address is free before mmu_notifier_invalidate_range_end() | |
16 | B) a page table entry is updated to point to a new page (COW, write fault | |
17 | on zero page, __replace_page(), ...) | |
18 | ||
19 | Case A is obvious you do not want to take the risk for the device to write to | |
20 | a page that might now be used by some completely different task. | |
21 | ||
22 | Case B is more subtle. For correctness it requires the following sequence to | |
23 | happen: | |
16f9f7f9 | 24 | |
0f10851e JG |
25 | - take page table lock |
26 | - clear page table entry and notify ([pmd/pte]p_huge_clear_flush_notify()) | |
27 | - set page table entry to point to new page | |
28 | ||
29 | If clearing the page table entry is not followed by a notify before setting | |
30 | the new pte/pmd value then you can break memory model like C11 or C++11 for | |
31 | the device. | |
32 | ||
33 | Consider the following scenario (device use a feature similar to ATS/PASID): | |
34 | ||
16f9f7f9 | 35 | Two address addrA and addrB such that \|addrA - addrB\| >= PAGE_SIZE we assume |
0f10851e JG |
36 | they are write protected for COW (other case of B apply too). |
37 | ||
16f9f7f9 MR |
38 | :: |
39 | ||
40 | [Time N] -------------------------------------------------------------------- | |
41 | CPU-thread-0 {try to write to addrA} | |
42 | CPU-thread-1 {try to write to addrB} | |
43 | CPU-thread-2 {} | |
44 | CPU-thread-3 {} | |
45 | DEV-thread-0 {read addrA and populate device TLB} | |
46 | DEV-thread-2 {read addrB and populate device TLB} | |
47 | [Time N+1] ------------------------------------------------------------------ | |
48 | CPU-thread-0 {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}} | |
49 | CPU-thread-1 {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}} | |
50 | CPU-thread-2 {} | |
51 | CPU-thread-3 {} | |
52 | DEV-thread-0 {} | |
53 | DEV-thread-2 {} | |
54 | [Time N+2] ------------------------------------------------------------------ | |
55 | CPU-thread-0 {COW_step1: {update page table to point to new page for addrA}} | |
56 | CPU-thread-1 {COW_step1: {update page table to point to new page for addrB}} | |
57 | CPU-thread-2 {} | |
58 | CPU-thread-3 {} | |
59 | DEV-thread-0 {} | |
60 | DEV-thread-2 {} | |
61 | [Time N+3] ------------------------------------------------------------------ | |
62 | CPU-thread-0 {preempted} | |
63 | CPU-thread-1 {preempted} | |
64 | CPU-thread-2 {write to addrA which is a write to new page} | |
65 | CPU-thread-3 {} | |
66 | DEV-thread-0 {} | |
67 | DEV-thread-2 {} | |
68 | [Time N+3] ------------------------------------------------------------------ | |
69 | CPU-thread-0 {preempted} | |
70 | CPU-thread-1 {preempted} | |
71 | CPU-thread-2 {} | |
72 | CPU-thread-3 {write to addrB which is a write to new page} | |
73 | DEV-thread-0 {} | |
74 | DEV-thread-2 {} | |
75 | [Time N+4] ------------------------------------------------------------------ | |
76 | CPU-thread-0 {preempted} | |
77 | CPU-thread-1 {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}} | |
78 | CPU-thread-2 {} | |
79 | CPU-thread-3 {} | |
80 | DEV-thread-0 {} | |
81 | DEV-thread-2 {} | |
82 | [Time N+5] ------------------------------------------------------------------ | |
83 | CPU-thread-0 {preempted} | |
84 | CPU-thread-1 {} | |
85 | CPU-thread-2 {} | |
86 | CPU-thread-3 {} | |
87 | DEV-thread-0 {read addrA from old page} | |
88 | DEV-thread-2 {read addrB from new page} | |
0f10851e JG |
89 | |
90 | So here because at time N+2 the clear page table entry was not pair with a | |
91 | notification to invalidate the secondary TLB, the device see the new value for | |
94ebdd28 | 92 | addrB before seeing the new value for addrA. This break total memory ordering |
0f10851e JG |
93 | for the device. |
94 | ||
95 | When changing a pte to write protect or to point to a new write protected page | |
96 | with same content (KSM) it is fine to delay the mmu_notifier_invalidate_range | |
97 | call to mmu_notifier_invalidate_range_end() outside the page table lock. This | |
98 | is true even if the thread doing the page table update is preempted right after | |
99 | releasing page table lock but before call mmu_notifier_invalidate_range_end(). |