Commit | Line | Data |
---|---|---|
c757249a SN |
1 | Per-task statistics interface |
2 | ----------------------------- | |
3 | ||
4 | ||
5 | Taskstats is a netlink-based interface for sending per-task and | |
6 | per-process statistics from the kernel to userspace. | |
7 | ||
8 | Taskstats was designed for the following benefits: | |
9 | ||
10 | - efficiently provide statistics during lifetime of a task and on its exit | |
11 | - unified interface for multiple accounting subsystems | |
12 | - extensibility for use by future accounting patches | |
13 | ||
14 | Terminology | |
15 | ----------- | |
16 | ||
17 | "pid", "tid" and "task" are used interchangeably and refer to the standard | |
18 | Linux task defined by struct task_struct. per-pid stats are the same as | |
19 | per-task stats. | |
20 | ||
21 | "tgid", "process" and "thread group" are used interchangeably and refer to the | |
22 | tasks that share an mm_struct i.e. the traditional Unix process. Despite the | |
23 | use of tgid, there is no special treatment for the task that is thread group | |
24 | leader - a process is deemed alive as long as it has any task belonging to it. | |
25 | ||
26 | Usage | |
27 | ----- | |
28 | ||
29 | To get statistics during task's lifetime, userspace opens a unicast netlink | |
30 | socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid. | |
31 | The response contains statistics for a task (if pid is specified) or the sum of | |
32 | statistics for all tasks of the process (if tgid is specified). | |
33 | ||
34 | To obtain statistics for tasks which are exiting, userspace opens a multicast | |
35 | netlink socket. Each time a task exits, two records are sent by the kernel to | |
36 | each listener on the multicast socket. The first the per-pid task's statistics | |
37 | and the second is the sum for all tasks of the process to which the task | |
38 | belongs (the task does not need to be the thread group leader). The need for | |
39 | per-tgid stats to be sent for each exiting task is explained in the per-tgid | |
40 | stats section below. | |
41 | ||
42 | ||
43 | Interface | |
44 | --------- | |
45 | ||
46 | The user-kernel interface is encapsulated in include/linux/taskstats.h | |
47 | ||
48 | To avoid this documentation becoming obsolete as the interface evolves, only | |
49 | an outline of the current version is given. taskstats.h always overrides the | |
50 | description here. | |
51 | ||
52 | struct taskstats is the common accounting structure for both per-pid and | |
53 | per-tgid data. It is versioned and can be extended by each accounting subsystem | |
54 | that is added to the kernel. The fields and their semantics are defined in the | |
55 | taskstats.h file. | |
56 | ||
57 | The data exchanged between user and kernel space is a netlink message belonging | |
58 | to the NETLINK_GENERIC family and using the netlink attributes interface. | |
59 | The messages are in the format | |
60 | ||
61 | +----------+- - -+-------------+-------------------+ | |
62 | | nlmsghdr | Pad | genlmsghdr | taskstats payload | | |
63 | +----------+- - -+-------------+-------------------+ | |
64 | ||
65 | ||
66 | The taskstats payload is one of the following three kinds: | |
67 | ||
68 | 1. Commands: Sent from user to kernel. The payload is one attribute, of type | |
69 | TASKSTATS_CMD_ATTR_PID/TGID, containing a u32 pid or tgid in the attribute | |
70 | payload. The pid/tgid denotes the task/process for which userspace wants | |
71 | statistics. | |
72 | ||
73 | 2. Response for a command: sent from the kernel in response to a userspace | |
74 | command. The payload is a series of three attributes of type: | |
75 | ||
76 | a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates | |
77 | a pid/tgid will be followed by some stats. | |
78 | ||
79 | b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats | |
80 | is being returned. | |
81 | ||
82 | c) TASKSTATS_TYPE_STATS: attribute with a struct taskstsats as payload. The | |
83 | same structure is used for both per-pid and per-tgid stats. | |
84 | ||
85 | 3. New message sent by kernel whenever a task exits. The payload consists of a | |
86 | series of attributes of the following type: | |
87 | ||
88 | a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats | |
89 | b) TASKSTATS_TYPE_PID: contains exiting task's pid | |
90 | c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats | |
91 | d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats | |
92 | e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs | |
93 | f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process | |
94 | ||
95 | ||
96 | per-tgid stats | |
97 | -------------- | |
98 | ||
99 | Taskstats provides per-process stats, in addition to per-task stats, since | |
100 | resource management is often done at a process granularity and aggregating task | |
101 | stats in userspace alone is inefficient and potentially inaccurate (due to lack | |
102 | of atomicity). | |
103 | ||
104 | However, maintaining per-process, in addition to per-task stats, within the | |
105 | kernel has space and time overheads. Hence the taskstats implementation | |
106 | dynamically sums up the per-task stats for each task belonging to a process | |
107 | whenever per-process stats are needed. | |
108 | ||
109 | Not maintaining per-tgid stats creates a problem when userspace is interested | |
110 | in getting these stats when the process dies i.e. the last thread of | |
111 | a process exits. It isn't possible to simply return some aggregated per-process | |
112 | statistic from the kernel. | |
113 | ||
114 | The approach taken by taskstats is to return the per-tgid stats *each* time | |
115 | a task exits, in addition to the per-pid stats for that task. Userspace can | |
116 | maintain task<->process mappings and use them to maintain the per-process stats | |
117 | in userspace, updating the aggregate appropriately as the tasks of a process | |
118 | exit. | |
119 | ||
120 | Extending taskstats | |
121 | ------------------- | |
122 | ||
123 | There are two ways to extend the taskstats interface to export more | |
124 | per-task/process stats as patches to collect them get added to the kernel | |
125 | in future: | |
126 | ||
127 | 1. Adding more fields to the end of the existing struct taskstats. Backward | |
128 | compatibility is ensured by the version number within the | |
129 | structure. Userspace will use only the fields of the struct that correspond | |
130 | to the version its using. | |
131 | ||
132 | 2. Defining separate statistic structs and using the netlink attributes | |
133 | interface to return them. Since userspace processes each netlink attribute | |
134 | independently, it can always ignore attributes whose type it does not | |
135 | understand (because it is using an older version of the interface). | |
136 | ||
137 | ||
138 | Choosing between 1. and 2. is a matter of trading off flexibility and | |
139 | overhead. If only a few fields need to be added, then 1. is the preferable | |
140 | path since the kernel and userspace don't need to incur the overhead of | |
141 | processing new netlink attributes. But if the new fields expand the existing | |
142 | struct too much, requiring disparate userspace accounting utilities to | |
143 | unnecessarily receive large structures whose fields are of no interest, then | |
144 | extending the attributes structure would be worthwhile. | |
145 | ||
146 | ---- |