Commit | Line | Data |
---|---|---|
54319982 MR |
1 | .. _swap_numa: |
2 | ||
3 | =========================================== | |
a2468cc9 | 4 | Automatically bind swap device to numa node |
54319982 | 5 | =========================================== |
a2468cc9 AL |
6 | |
7 | If the system has more than one swap device and swap device has the node | |
8 | information, we can make use of this information to decide which swap | |
9 | device to use in get_swap_pages() to get better performance. | |
10 | ||
11 | ||
12 | How to use this feature | |
54319982 | 13 | ======================= |
a2468cc9 AL |
14 | |
15 | Swap device has priority and that decides the order of it to be used. To make | |
16 | use of automatically binding, there is no need to manipulate priority settings | |
17 | for swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and | |
18 | swapB, with swapA attached to node 0 and swapB attached to node 1, are going | |
54319982 MR |
19 | to be swapped on. Simply swapping them on by doing:: |
20 | ||
21 | # swapon /dev/swapA | |
22 | # swapon /dev/swapB | |
a2468cc9 AL |
23 | |
24 | Then node 0 will use the two swap devices in the order of swapA then swapB and | |
25 | node 1 will use the two swap devices in the order of swapB then swapA. Note | |
26 | that the order of them being swapped on doesn't matter. | |
27 | ||
28 | A more complex example on a 4 node machine. Assume 6 swap devices are going to | |
29 | be swapped on: swapA and swapB are attached to node 0, swapC is attached to | |
30 | node 1, swapD and swapE are attached to node 2 and swapF is attached to node3. | |
54319982 MR |
31 | The way to swap them on is the same as above:: |
32 | ||
33 | # swapon /dev/swapA | |
34 | # swapon /dev/swapB | |
35 | # swapon /dev/swapC | |
36 | # swapon /dev/swapD | |
37 | # swapon /dev/swapE | |
38 | # swapon /dev/swapF | |
39 | ||
40 | Then node 0 will use them in the order of:: | |
41 | ||
42 | swapA/swapB -> swapC -> swapD -> swapE -> swapF | |
43 | ||
a2468cc9 AL |
44 | swapA and swapB will be used in a round robin mode before any other swap device. |
45 | ||
54319982 MR |
46 | node 1 will use them in the order of:: |
47 | ||
48 | swapC -> swapA -> swapB -> swapD -> swapE -> swapF | |
49 | ||
50 | node 2 will use them in the order of:: | |
51 | ||
52 | swapD/swapE -> swapA -> swapB -> swapC -> swapF | |
a2468cc9 | 53 | |
a2468cc9 AL |
54 | Similaly, swapD and swapE will be used in a round robin mode before any |
55 | other swap devices. | |
56 | ||
54319982 MR |
57 | node 3 will use them in the order of:: |
58 | ||
59 | swapF -> swapA -> swapB -> swapC -> swapD -> swapE | |
a2468cc9 AL |
60 | |
61 | ||
62 | Implementation details | |
54319982 | 63 | ====================== |
a2468cc9 AL |
64 | |
65 | The current code uses a priority based list, swap_avail_list, to decide | |
66 | which swap device to use and if multiple swap devices share the same | |
67 | priority, they are used round robin. This change here replaces the single | |
68 | global swap_avail_list with a per-numa-node list, i.e. for each numa node, | |
69 | it sees its own priority based list of available swap devices. Swap | |
70 | device's priority can be promoted on its matching node's swap_avail_list. | |
71 | ||
72 | The current swap device's priority is set as: user can set a >=0 value, | |
73 | or the system will pick one starting from -1 then downwards. The priority | |
74 | value in the swap_avail_list is the negated value of the swap device's | |
75 | due to plist being sorted from low to high. The new policy doesn't change | |
76 | the semantics for priority >=0 cases, the previous starting from -1 then | |
77 | downwards now becomes starting from -2 then downwards and -1 is reserved | |
78 | as the promoted value. So if multiple swap devices are attached to the same | |
79 | node, they will all be promoted to priority -1 on that node's plist and will | |
80 | be used round robin before any other swap devices. |