Commit | Line | Data |
---|---|---|
52d7e21f MR |
1 | .. _memory_hotplug: |
2 | ||
3 | ============== | |
4 | Memory hotplug | |
5 | ============== | |
98cee674 | 6 | |
98cee674 MR |
7 | Memory hotplug event notifier |
8 | ============================= | |
9 | ||
10 | Hotplugging events are sent to a notification queue. | |
11 | ||
12 | There are six types of notification defined in ``include/linux/memory.h``: | |
13 | ||
14 | MEM_GOING_ONLINE | |
15 | Generated before new memory becomes available in order to be able to | |
16 | prepare subsystems to handle memory. The page allocator is still unable | |
17 | to allocate from the new memory. | |
18 | ||
19 | MEM_CANCEL_ONLINE | |
20 | Generated if MEM_GOING_ONLINE fails. | |
21 | ||
22 | MEM_ONLINE | |
23 | Generated when memory has successfully brought online. The callback may | |
24 | allocate pages from the new memory. | |
25 | ||
26 | MEM_GOING_OFFLINE | |
27 | Generated to begin the process of offlining memory. Allocations are no | |
28 | longer possible from the memory but some of the memory to be offlined | |
29 | is still in use. The callback can be used to free memory known to a | |
30 | subsystem from the indicated memory block. | |
31 | ||
32 | MEM_CANCEL_OFFLINE | |
33 | Generated if MEM_GOING_OFFLINE fails. Memory is available again from | |
34 | the memory block that we attempted to offline. | |
35 | ||
36 | MEM_OFFLINE | |
37 | Generated after offlining memory is complete. | |
38 | ||
39 | A callback routine can be registered by calling:: | |
40 | ||
41 | hotplug_memory_notifier(callback_func, priority) | |
42 | ||
43 | Callback functions with higher values of priority are called before callback | |
44 | functions with lower values. | |
45 | ||
46 | A callback function must have the following prototype:: | |
47 | ||
48 | int callback_func( | |
49 | struct notifier_block *self, unsigned long action, void *arg); | |
50 | ||
51 | The first argument of the callback function (self) is a pointer to the block | |
52 | of the notifier chain that points to the callback function itself. | |
53 | The second argument (action) is one of the event types described above. | |
54 | The third argument (arg) passes a pointer of struct memory_notify:: | |
55 | ||
56 | struct memory_notify { | |
57 | unsigned long start_pfn; | |
58 | unsigned long nr_pages; | |
59 | int status_change_nid_normal; | |
98cee674 MR |
60 | int status_change_nid; |
61 | } | |
62 | ||
63 | - start_pfn is start_pfn of online/offline memory. | |
64 | - nr_pages is # of pages of online/offline memory. | |
65 | - status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask | |
66 | is (will be) set/clear, if this is -1, then nodemask status is not changed. | |
98cee674 MR |
67 | - status_change_nid is set node id when N_MEMORY of nodemask is (will be) |
68 | set/clear. It means a new(memoryless) node gets new memory by online and a | |
69 | node loses all memory. If this is -1, then nodemask status is not changed. | |
70 | ||
71 | If status_changed_nid* >= 0, callback should create/discard structures for the | |
72 | node if necessary. | |
73 | ||
74 | The callback routine shall return one of the values | |
75 | NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP | |
76 | defined in ``include/linux/notifier.h`` | |
77 | ||
78 | NOTIFY_DONE and NOTIFY_OK have no effect on the further processing. | |
79 | ||
80 | NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE, | |
81 | MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops | |
82 | further processing of the notification queue. | |
83 | ||
84 | NOTIFY_STOP stops further processing of the notification queue. | |
3a7452c5 DH |
85 | |
86 | Locking Internals | |
87 | ================= | |
88 | ||
89 | When adding/removing memory that uses memory block devices (i.e. ordinary RAM), | |
90 | the device_hotplug_lock should be held to: | |
91 | ||
92 | - synchronize against online/offline requests (e.g. via sysfs). This way, memory | |
93 | block devices can only be accessed (.online/.state attributes) by user | |
94 | space once memory has been fully added. And when removing memory, we | |
95 | know nobody is in critical sections. | |
96 | - synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) | |
97 | ||
98 | Especially, there is a possible lock inversion that is avoided using | |
99 | device_hotplug_lock when adding memory and user space tries to online that | |
100 | memory faster than expected: | |
101 | ||
102 | - device_online() will first take the device_lock(), followed by | |
103 | mem_hotplug_lock | |
104 | - add_memory_resource() will first take the mem_hotplug_lock, followed by | |
105 | the device_lock() (while creating the devices, during bus_add_device()). | |
106 | ||
107 | As the device is visible to user space before taking the device_lock(), this | |
108 | can result in a lock inversion. | |
109 | ||
110 | onlining/offlining of memory should be done via device_online()/ | |
111 | device_offline() - to make sure it is properly synchronized to actions | |
112 | via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) | |
113 | ||
114 | When adding/removing/onlining/offlining memory or adding/removing | |
115 | heterogeneous/device memory, we should always hold the mem_hotplug_lock in | |
116 | write mode to serialise memory hotplug (e.g. access to global/zone | |
117 | variables). | |
118 | ||
119 | In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read | |
120 | mode allows for a quite efficient get_online_mems/put_online_mems | |
121 | implementation, so code accessing memory can protect from that memory | |
122 | vanishing. |