Commit | Line | Data |
---|---|---|
b7cb8405 SK |
1 | .. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0) |
2 | ||
3 | ====================================================== | |
4 | Discovering Linux kernel subsystems used by a workload | |
5 | ====================================================== | |
6 | ||
7 | :Authors: - Shuah Khan <skhan@linuxfoundation.org> | |
8 | - Shefali Sharma <sshefali021@gmail.com> | |
9 | :maintained-by: Shuah Khan <skhan@linuxfoundation.org> | |
10 | ||
11 | Key Points | |
12 | ========== | |
13 | ||
14 | * Understanding system resources necessary to build and run a workload | |
15 | is important. | |
16 | * Linux tracing and strace can be used to discover the system resources | |
17 | in use by a workload. The completeness of the system usage information | |
18 | depends on the completeness of coverage of a workload. | |
19 | * Performance and security of the operating system can be analyzed with | |
20 | the help of tools such as: | |
21 | `perf <https://man7.org/linux/man-pages/man1/perf.1.html>`_, | |
22 | `stress-ng <https://www.mankier.com/1/stress-ng>`_, | |
23 | `paxtest <https://github.com/opntr/paxtest-freebsd>`_. | |
24 | * Once we discover and understand the workload needs, we can focus on them | |
25 | to avoid regressions and use it to evaluate safety considerations. | |
26 | ||
27 | Methodology | |
28 | =========== | |
29 | ||
30 | `strace <https://man7.org/linux/man-pages/man1/strace.1.html>`_ is a | |
31 | diagnostic, instructional, and debugging tool and can be used to discover | |
32 | the system resources in use by a workload. Once we discover and understand | |
33 | the workload needs, we can focus on them to avoid regressions and use it | |
34 | to evaluate safety considerations. We use strace tool to trace workloads. | |
35 | ||
36 | This method of tracing using strace tells us the system calls invoked by | |
37 | the workload and doesn't include all the system calls that can be invoked | |
38 | by it. In addition, this tracing method tells us just the code paths within | |
39 | these system calls that are invoked. As an example, if a workload opens a | |
40 | file and reads from it successfully, then the success path is the one that | |
41 | is traced. Any error paths in that system call will not be traced. If there | |
42 | is a workload that provides full coverage of a workload then the method | |
43 | outlined here will trace and find all possible code paths. The completeness | |
44 | of the system usage information depends on the completeness of coverage of a | |
45 | workload. | |
46 | ||
47 | The goal is tracing a workload on a system running a default kernel without | |
48 | requiring custom kernel installs. | |
49 | ||
50 | How do we gather fine-grained system information? | |
51 | ================================================= | |
52 | ||
53 | strace tool can be used to trace system calls made by a process and signals | |
54 | it receives. System calls are the fundamental interface between an | |
55 | application and the operating system kernel. They enable a program to | |
56 | request services from the kernel. For instance, the open() system call in | |
57 | Linux is used to provide access to a file in the file system. strace enables | |
58 | us to track all the system calls made by an application. It lists all the | |
59 | system calls made by a process and their resulting output. | |
60 | ||
61 | You can generate profiling data combining strace and perf record tools to | |
62 | record the events and information associated with a process. This provides | |
63 | insight into the process. "perf annotate" tool generates the statistics of | |
64 | each instruction of the program. This document goes over the details of how | |
65 | to gather fine-grained information on a workload's usage of system resources. | |
66 | ||
67 | We used strace to trace the perf, stress-ng, paxtest workloads to illustrate | |
68 | our methodology to discover resources used by a workload. This process can | |
69 | be applied to trace other workloads. | |
70 | ||
71 | Getting the system ready for tracing | |
72 | ==================================== | |
73 | ||
74 | Before we can get started we will show you how to get your system ready. | |
75 | We assume that you have a Linux distribution running on a physical system | |
76 | or a virtual machine. Most distributions will include strace command. Let’s | |
77 | install other tools that aren’t usually included to build Linux kernel. | |
78 | Please note that the following works on Debian based distributions. You | |
79 | might have to find equivalent packages on other Linux distributions. | |
80 | ||
81 | Install tools to build Linux kernel and tools in kernel repository. | |
82 | scripts/ver_linux is a good way to check if your system already has | |
83 | the necessary tools:: | |
84 | ||
85 | sudo apt-get build-essentials flex bison yacc | |
86 | sudo apt install libelf-dev systemtap-sdt-dev libaudit-dev libslang2-dev libperl-dev libdw-dev | |
87 | ||
88 | cscope is a good tool to browse kernel sources. Let's install it now:: | |
89 | ||
90 | sudo apt-get install cscope | |
91 | ||
92 | Install stress-ng and paxtest:: | |
93 | ||
94 | apt-get install stress-ng | |
95 | apt-get install paxtest | |
96 | ||
97 | Workload overview | |
98 | ================= | |
99 | ||
100 | As mentioned earlier, we used strace to trace perf bench, stress-ng and | |
101 | paxtest workloads to show how to analyze a workload and identify Linux | |
102 | subsystems used by these workloads. Let's start with an overview of these | |
103 | three workloads to get a better understanding of what they do and how to | |
104 | use them. | |
105 | ||
106 | perf bench (all) workload | |
107 | ------------------------- | |
108 | ||
109 | The perf bench command contains multiple multi-threaded microkernel | |
110 | benchmarks for executing different subsystems in the Linux kernel and | |
111 | system calls. This allows us to easily measure the impact of changes, | |
112 | which can help mitigate performance regressions. It also acts as a common | |
113 | benchmarking framework, enabling developers to easily create test cases, | |
114 | integrate transparently, and use performance-rich tooling subsystems. | |
115 | ||
116 | Stress-ng netdev stressor workload | |
117 | ---------------------------------- | |
118 | ||
119 | stress-ng is used for performing stress testing on the kernel. It allows | |
120 | you to exercise various physical subsystems of the computer, as well as | |
121 | interfaces of the OS kernel, using "stressor-s". They are available for | |
122 | CPU, CPU cache, devices, I/O, interrupts, file system, memory, network, | |
123 | operating system, pipelines, schedulers, and virtual machines. Please refer | |
124 | to the `stress-ng man-page <https://www.mankier.com/1/stress-ng>`_ to | |
125 | find the description of all the available stressor-s. The netdev stressor | |
126 | starts specified number (N) of workers that exercise various netdevice | |
127 | ioctl commands across all the available network devices. | |
128 | ||
129 | paxtest kiddie workload | |
130 | ----------------------- | |
131 | ||
132 | paxtest is a program that tests buffer overflows in the kernel. It tests | |
133 | kernel enforcements over memory usage. Generally, execution in some memory | |
134 | segments makes buffer overflows possible. It runs a set of programs that | |
135 | attempt to subvert memory usage. It is used as a regression test suite for | |
136 | PaX, but might be useful to test other memory protection patches for the | |
137 | kernel. We used paxtest kiddie mode which looks for simple vulnerabilities. | |
138 | ||
139 | What is strace and how do we use it? | |
140 | ==================================== | |
141 | ||
142 | As mentioned earlier, strace which is a useful diagnostic, instructional, | |
143 | and debugging tool and can be used to discover the system resources in use | |
144 | by a workload. It can be used: | |
145 | ||
146 | * To see how a process interacts with the kernel. | |
147 | * To see why a process is failing or hanging. | |
148 | * For reverse engineering a process. | |
149 | * To find the files on which a program depends. | |
150 | * For analyzing the performance of an application. | |
151 | * For troubleshooting various problems related to the operating system. | |
152 | ||
153 | In addition, strace can generate run-time statistics on times, calls, and | |
154 | errors for each system call and report a summary when program exits, | |
155 | suppressing the regular output. This attempts to show system time (CPU time | |
156 | spent running in the kernel) independent of wall clock time. We plan to use | |
157 | these features to get information on workload system usage. | |
158 | ||
159 | strace command supports basic, verbose, and stats modes. strace command when | |
160 | run in verbose mode gives more detailed information about the system calls | |
161 | invoked by a process. | |
162 | ||
163 | Running strace -c generates a report of the percentage of time spent in each | |
164 | system call, the total time in seconds, the microseconds per call, the total | |
165 | number of calls, the count of each system call that has failed with an error | |
166 | and the type of system call made. | |
167 | ||
168 | * Usage: strace <command we want to trace> | |
169 | * Verbose mode usage: strace -v <command> | |
170 | * Gather statistics: strace -c <command> | |
171 | ||
172 | We used the “-c” option to gather fine-grained run-time statistics in use | |
173 | by three workloads we have chose for this analysis. | |
174 | ||
175 | * perf | |
176 | * stress-ng | |
177 | * paxtest | |
178 | ||
179 | What is cscope and how do we use it? | |
180 | ==================================== | |
181 | ||
182 | Now let’s look at `cscope <https://cscope.sourceforge.net/>`_, a command | |
183 | line tool for browsing C, C++ or Java code-bases. We can use it to find | |
184 | all the references to a symbol, global definitions, functions called by a | |
185 | function, functions calling a function, text strings, regular expression | |
186 | patterns, files including a file. | |
187 | ||
188 | We can use cscope to find which system call belongs to which subsystem. | |
189 | This way we can find the kernel subsystems used by a process when it is | |
190 | executed. | |
191 | ||
192 | Let’s checkout the latest Linux repository and build cscope database:: | |
193 | ||
194 | git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux | |
195 | cd linux | |
196 | cscope -R -p10 # builds cscope.out database before starting browse session | |
197 | cscope -d -p10 # starts browse session on cscope.out database | |
198 | ||
199 | Note: Run "cscope -R -p10" to build the database and c"scope -d -p10" to | |
200 | enter into the browsing session. cscope by default cscope.out database. | |
201 | To get out of this mode press ctrl+d. -p option is used to specify the | |
202 | number of file path components to display. -p10 is optimal for browsing | |
203 | kernel sources. | |
204 | ||
205 | What is perf and how do we use it? | |
206 | ================================== | |
207 | ||
208 | Perf is an analysis tool based on Linux 2.6+ systems, which abstracts the | |
209 | CPU hardware difference in performance measurement in Linux, and provides | |
210 | a simple command line interface. Perf is based on the perf_events interface | |
211 | exported by the kernel. It is very useful for profiling the system and | |
212 | finding performance bottlenecks in an application. | |
213 | ||
214 | If you haven't already checked out the Linux mainline repository, you can do | |
215 | so and then build kernel and perf tool:: | |
216 | ||
217 | git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux | |
218 | cd linux | |
219 | make -j3 all | |
220 | cd tools/perf | |
221 | make | |
222 | ||
223 | Note: The perf command can be built without building the kernel in the | |
224 | repository and can be run on older kernels. However matching the kernel | |
225 | and perf revisions gives more accurate information on the subsystem usage. | |
226 | ||
227 | We used "perf stat" and "perf bench" options. For a detailed information on | |
228 | the perf tool, run "perf -h". | |
229 | ||
230 | perf stat | |
231 | --------- | |
232 | The perf stat command generates a report of various hardware and software | |
233 | events. It does so with the help of hardware counter registers found in | |
234 | modern CPUs that keep the count of these activities. "perf stat cal" shows | |
235 | stats for cal command. | |
236 | ||
237 | Perf bench | |
238 | ---------- | |
239 | The perf bench command contains multiple multi-threaded microkernel | |
240 | benchmarks for executing different subsystems in the Linux kernel and | |
241 | system calls. This allows us to easily measure the impact of changes, | |
242 | which can help mitigate performance regressions. It also acts as a common | |
243 | benchmarking framework, enabling developers to easily create test cases, | |
244 | integrate transparently, and use performance-rich tooling. | |
245 | ||
246 | "perf bench all" command runs the following benchmarks: | |
247 | ||
248 | * sched/messaging | |
249 | * sched/pipe | |
250 | * syscall/basic | |
251 | * mem/memcpy | |
252 | * mem/memset | |
253 | ||
254 | What is stress-ng and how do we use it? | |
255 | ======================================= | |
256 | ||
257 | As mentioned earlier, stress-ng is used for performing stress testing on | |
258 | the kernel. It allows you to exercise various physical subsystems of the | |
259 | computer, as well as interfaces of the OS kernel, using stressor-s. They | |
260 | are available for CPU, CPU cache, devices, I/O, interrupts, file system, | |
261 | memory, network, operating system, pipelines, schedulers, and virtual | |
262 | machines. | |
263 | ||
264 | The netdev stressor starts N workers that exercise various netdevice ioctl | |
265 | commands across all the available network devices. The following ioctls are | |
266 | exercised: | |
267 | ||
268 | * SIOCGIFCONF, SIOCGIFINDEX, SIOCGIFNAME, SIOCGIFFLAGS | |
269 | * SIOCGIFADDR, SIOCGIFNETMASK, SIOCGIFMETRIC, SIOCGIFMTU | |
270 | * SIOCGIFHWADDR, SIOCGIFMAP, SIOCGIFTXQLEN | |
271 | ||
272 | The following command runs the stressor:: | |
273 | ||
274 | stress-ng --netdev 1 -t 60 --metrics command. | |
275 | ||
276 | We can use the perf record command to record the events and information | |
277 | associated with a process. This command records the profiling data in the | |
278 | perf.data file in the same directory. | |
279 | ||
280 | Using the following commands you can record the events associated with the | |
281 | netdev stressor, view the generated report perf.data and annotate the to | |
282 | view the statistics of each instruction of the program:: | |
283 | ||
284 | perf record stress-ng --netdev 1 -t 60 --metrics command. | |
285 | perf report | |
286 | perf annotate | |
287 | ||
288 | What is paxtest and how do we use it? | |
289 | ===================================== | |
290 | ||
291 | paxtest is a program that tests buffer overflows in the kernel. It tests | |
292 | kernel enforcements over memory usage. Generally, execution in some memory | |
293 | segments makes buffer overflows possible. It runs a set of programs that | |
294 | attempt to subvert memory usage. It is used as a regression test suite for | |
295 | PaX, and will be useful to test other memory protection patches for the | |
296 | kernel. | |
297 | ||
298 | paxtest provides kiddie and blackhat modes. The paxtest kiddie mode runs | |
299 | in normal mode, whereas the blackhat mode tries to get around the protection | |
300 | of the kernel testing for vulnerabilities. We focus on the kiddie mode here | |
301 | and combine "paxtest kiddie" run with "perf record" to collect CPU stack | |
302 | traces for the paxtest kiddie run to see which function is calling other | |
303 | functions in the performance profile. Then the "dwarf" (DWARF's Call Frame | |
304 | Information) mode can be used to unwind the stack. | |
305 | ||
306 | The following command can be used to view resulting report in call-graph | |
307 | format:: | |
308 | ||
309 | perf record --call-graph dwarf paxtest kiddie | |
310 | perf report --stdio | |
311 | ||
312 | Tracing workloads | |
313 | ================= | |
314 | ||
315 | Now that we understand the workloads, let's start tracing them. | |
316 | ||
317 | Tracing perf bench all workload | |
318 | ------------------------------- | |
319 | ||
320 | Run the following command to trace perf bench all workload:: | |
321 | ||
322 | strace -c perf bench all | |
323 | ||
324 | **System Calls made by the workload** | |
325 | ||
326 | The below table shows the system calls invoked by the workload, number of | |
327 | times each system call is invoked, and the corresponding Linux subsystem. | |
328 | ||
329 | +-------------------+-----------+-----------------+-------------------------+ | |
330 | | System Call | # calls | Linux Subsystem | System Call (API) | | |
331 | +===================+===========+=================+=========================+ | |
332 | | getppid | 10000001 | Process Mgmt | sys_getpid() | | |
333 | +-------------------+-----------+-----------------+-------------------------+ | |
334 | | clone | 1077 | Process Mgmt. | sys_clone() | | |
335 | +-------------------+-----------+-----------------+-------------------------+ | |
336 | | prctl | 23 | Process Mgmt. | sys_prctl() | | |
337 | +-------------------+-----------+-----------------+-------------------------+ | |
338 | | prlimit64 | 7 | Process Mgmt. | sys_prlimit64() | | |
339 | +-------------------+-----------+-----------------+-------------------------+ | |
340 | | getpid | 10 | Process Mgmt. | sys_getpid() | | |
341 | +-------------------+-----------+-----------------+-------------------------+ | |
342 | | uname | 3 | Process Mgmt. | sys_uname() | | |
343 | +-------------------+-----------+-----------------+-------------------------+ | |
344 | | sysinfo | 1 | Process Mgmt. | sys_sysinfo() | | |
345 | +-------------------+-----------+-----------------+-------------------------+ | |
346 | | getuid | 1 | Process Mgmt. | sys_getuid() | | |
347 | +-------------------+-----------+-----------------+-------------------------+ | |
348 | | getgid | 1 | Process Mgmt. | sys_getgid() | | |
349 | +-------------------+-----------+-----------------+-------------------------+ | |
350 | | geteuid | 1 | Process Mgmt. | sys_geteuid() | | |
351 | +-------------------+-----------+-----------------+-------------------------+ | |
352 | | getegid | 1 | Process Mgmt. | sys_getegid | | |
353 | +-------------------+-----------+-----------------+-------------------------+ | |
354 | | close | 49951 | Filesystem | sys_close() | | |
355 | +-------------------+-----------+-----------------+-------------------------+ | |
356 | | pipe | 604 | Filesystem | sys_pipe() | | |
357 | +-------------------+-----------+-----------------+-------------------------+ | |
358 | | openat | 48560 | Filesystem | sys_opennat() | | |
359 | +-------------------+-----------+-----------------+-------------------------+ | |
360 | | fstat | 8338 | Filesystem | sys_fstat() | | |
361 | +-------------------+-----------+-----------------+-------------------------+ | |
362 | | stat | 1573 | Filesystem | sys_stat() | | |
363 | +-------------------+-----------+-----------------+-------------------------+ | |
364 | | pread64 | 9646 | Filesystem | sys_pread64() | | |
365 | +-------------------+-----------+-----------------+-------------------------+ | |
366 | | getdents64 | 1873 | Filesystem | sys_getdents64() | | |
367 | +-------------------+-----------+-----------------+-------------------------+ | |
368 | | access | 3 | Filesystem | sys_access() | | |
369 | +-------------------+-----------+-----------------+-------------------------+ | |
370 | | lstat | 1880 | Filesystem | sys_lstat() | | |
371 | +-------------------+-----------+-----------------+-------------------------+ | |
372 | | lseek | 6 | Filesystem | sys_lseek() | | |
373 | +-------------------+-----------+-----------------+-------------------------+ | |
374 | | ioctl | 3 | Filesystem | sys_ioctl() | | |
375 | +-------------------+-----------+-----------------+-------------------------+ | |
376 | | dup2 | 1 | Filesystem | sys_dup2() | | |
377 | +-------------------+-----------+-----------------+-------------------------+ | |
378 | | execve | 2 | Filesystem | sys_execve() | | |
379 | +-------------------+-----------+-----------------+-------------------------+ | |
380 | | fcntl | 8779 | Filesystem | sys_fcntl() | | |
381 | +-------------------+-----------+-----------------+-------------------------+ | |
382 | | statfs | 1 | Filesystem | sys_statfs() | | |
383 | +-------------------+-----------+-----------------+-------------------------+ | |
384 | | epoll_create | 2 | Filesystem | sys_epoll_create() | | |
385 | +-------------------+-----------+-----------------+-------------------------+ | |
386 | | epoll_ctl | 64 | Filesystem | sys_epoll_ctl() | | |
387 | +-------------------+-----------+-----------------+-------------------------+ | |
388 | | newfstatat | 8318 | Filesystem | sys_newfstatat() | | |
389 | +-------------------+-----------+-----------------+-------------------------+ | |
390 | | eventfd2 | 192 | Filesystem | sys_eventfd2() | | |
391 | +-------------------+-----------+-----------------+-------------------------+ | |
392 | | mmap | 243 | Memory Mgmt. | sys_mmap() | | |
393 | +-------------------+-----------+-----------------+-------------------------+ | |
394 | | mprotect | 32 | Memory Mgmt. | sys_mprotect() | | |
395 | +-------------------+-----------+-----------------+-------------------------+ | |
396 | | brk | 21 | Memory Mgmt. | sys_brk() | | |
397 | +-------------------+-----------+-----------------+-------------------------+ | |
398 | | munmap | 128 | Memory Mgmt. | sys_munmap() | | |
399 | +-------------------+-----------+-----------------+-------------------------+ | |
400 | | set_mempolicy | 156 | Memory Mgmt. | sys_set_mempolicy() | | |
401 | +-------------------+-----------+-----------------+-------------------------+ | |
402 | | set_tid_address | 1 | Process Mgmt. | sys_set_tid_address() | | |
403 | +-------------------+-----------+-----------------+-------------------------+ | |
404 | | set_robust_list | 1 | Futex | sys_set_robust_list() | | |
405 | +-------------------+-----------+-----------------+-------------------------+ | |
406 | | futex | 341 | Futex | sys_futex() | | |
407 | +-------------------+-----------+-----------------+-------------------------+ | |
408 | | sched_getaffinity | 79 | Scheduler | sys_sched_getaffinity() | | |
409 | +-------------------+-----------+-----------------+-------------------------+ | |
410 | | sched_setaffinity | 223 | Scheduler | sys_sched_setaffinity() | | |
411 | +-------------------+-----------+-----------------+-------------------------+ | |
412 | | socketpair | 202 | Network | sys_socketpair() | | |
413 | +-------------------+-----------+-----------------+-------------------------+ | |
414 | | rt_sigprocmask | 21 | Signal | sys_rt_sigprocmask() | | |
415 | +-------------------+-----------+-----------------+-------------------------+ | |
416 | | rt_sigaction | 36 | Signal | sys_rt_sigaction() | | |
417 | +-------------------+-----------+-----------------+-------------------------+ | |
418 | | rt_sigreturn | 2 | Signal | sys_rt_sigreturn() | | |
419 | +-------------------+-----------+-----------------+-------------------------+ | |
420 | | wait4 | 889 | Time | sys_wait4() | | |
421 | +-------------------+-----------+-----------------+-------------------------+ | |
422 | | clock_nanosleep | 37 | Time | sys_clock_nanosleep() | | |
423 | +-------------------+-----------+-----------------+-------------------------+ | |
424 | | capget | 4 | Capability | sys_capget() | | |
425 | +-------------------+-----------+-----------------+-------------------------+ | |
426 | ||
427 | Tracing stress-ng netdev stressor workload | |
428 | ------------------------------------------ | |
429 | ||
430 | Run the following command to trace stress-ng netdev stressor workload:: | |
431 | ||
432 | strace -c stress-ng --netdev 1 -t 60 --metrics | |
433 | ||
434 | **System Calls made by the workload** | |
435 | ||
436 | The below table shows the system calls invoked by the workload, number of | |
437 | times each system call is invoked, and the corresponding Linux subsystem. | |
438 | ||
439 | +-------------------+-----------+-----------------+-------------------------+ | |
440 | | System Call | # calls | Linux Subsystem | System Call (API) | | |
441 | +===================+===========+=================+=========================+ | |
442 | | openat | 74 | Filesystem | sys_openat() | | |
443 | +-------------------+-----------+-----------------+-------------------------+ | |
444 | | close | 75 | Filesystem | sys_close() | | |
445 | +-------------------+-----------+-----------------+-------------------------+ | |
446 | | read | 58 | Filesystem | sys_read() | | |
447 | +-------------------+-----------+-----------------+-------------------------+ | |
448 | | fstat | 20 | Filesystem | sys_fstat() | | |
449 | +-------------------+-----------+-----------------+-------------------------+ | |
450 | | flock | 10 | Filesystem | sys_flock() | | |
451 | +-------------------+-----------+-----------------+-------------------------+ | |
452 | | write | 7 | Filesystem | sys_write() | | |
453 | +-------------------+-----------+-----------------+-------------------------+ | |
454 | | getdents64 | 8 | Filesystem | sys_getdents64() | | |
455 | +-------------------+-----------+-----------------+-------------------------+ | |
456 | | pread64 | 8 | Filesystem | sys_pread64() | | |
457 | +-------------------+-----------+-----------------+-------------------------+ | |
458 | | lseek | 1 | Filesystem | sys_lseek() | | |
459 | +-------------------+-----------+-----------------+-------------------------+ | |
460 | | access | 2 | Filesystem | sys_access() | | |
461 | +-------------------+-----------+-----------------+-------------------------+ | |
462 | | getcwd | 1 | Filesystem | sys_getcwd() | | |
463 | +-------------------+-----------+-----------------+-------------------------+ | |
464 | | execve | 1 | Filesystem | sys_execve() | | |
465 | +-------------------+-----------+-----------------+-------------------------+ | |
466 | | mmap | 61 | Memory Mgmt. | sys_mmap() | | |
467 | +-------------------+-----------+-----------------+-------------------------+ | |
468 | | munmap | 3 | Memory Mgmt. | sys_munmap() | | |
469 | +-------------------+-----------+-----------------+-------------------------+ | |
470 | | mprotect | 20 | Memory Mgmt. | sys_mprotect() | | |
471 | +-------------------+-----------+-----------------+-------------------------+ | |
472 | | mlock | 2 | Memory Mgmt. | sys_mlock() | | |
473 | +-------------------+-----------+-----------------+-------------------------+ | |
474 | | brk | 3 | Memory Mgmt. | sys_brk() | | |
475 | +-------------------+-----------+-----------------+-------------------------+ | |
476 | | rt_sigaction | 21 | Signal | sys_rt_sigaction() | | |
477 | +-------------------+-----------+-----------------+-------------------------+ | |
478 | | rt_sigprocmask | 1 | Signal | sys_rt_sigprocmask() | | |
479 | +-------------------+-----------+-----------------+-------------------------+ | |
480 | | sigaltstack | 1 | Signal | sys_sigaltstack() | | |
481 | +-------------------+-----------+-----------------+-------------------------+ | |
482 | | rt_sigreturn | 1 | Signal | sys_rt_sigreturn() | | |
483 | +-------------------+-----------+-----------------+-------------------------+ | |
484 | | getpid | 8 | Process Mgmt. | sys_getpid() | | |
485 | +-------------------+-----------+-----------------+-------------------------+ | |
486 | | prlimit64 | 5 | Process Mgmt. | sys_prlimit64() | | |
487 | +-------------------+-----------+-----------------+-------------------------+ | |
488 | | arch_prctl | 2 | Process Mgmt. | sys_arch_prctl() | | |
489 | +-------------------+-----------+-----------------+-------------------------+ | |
490 | | sysinfo | 2 | Process Mgmt. | sys_sysinfo() | | |
491 | +-------------------+-----------+-----------------+-------------------------+ | |
492 | | getuid | 2 | Process Mgmt. | sys_getuid() | | |
493 | +-------------------+-----------+-----------------+-------------------------+ | |
494 | | uname | 1 | Process Mgmt. | sys_uname() | | |
495 | +-------------------+-----------+-----------------+-------------------------+ | |
496 | | setpgid | 1 | Process Mgmt. | sys_setpgid() | | |
497 | +-------------------+-----------+-----------------+-------------------------+ | |
498 | | getrusage | 1 | Process Mgmt. | sys_getrusage() | | |
499 | +-------------------+-----------+-----------------+-------------------------+ | |
500 | | geteuid | 1 | Process Mgmt. | sys_geteuid() | | |
501 | +-------------------+-----------+-----------------+-------------------------+ | |
502 | | getppid | 1 | Process Mgmt. | sys_getppid() | | |
503 | +-------------------+-----------+-----------------+-------------------------+ | |
504 | | sendto | 3 | Network | sys_sendto() | | |
505 | +-------------------+-----------+-----------------+-------------------------+ | |
506 | | connect | 1 | Network | sys_connect() | | |
507 | +-------------------+-----------+-----------------+-------------------------+ | |
508 | | socket | 1 | Network | sys_socket() | | |
509 | +-------------------+-----------+-----------------+-------------------------+ | |
510 | | clone | 1 | Process Mgmt. | sys_clone() | | |
511 | +-------------------+-----------+-----------------+-------------------------+ | |
512 | | set_tid_address | 1 | Process Mgmt. | sys_set_tid_address() | | |
513 | +-------------------+-----------+-----------------+-------------------------+ | |
514 | | wait4 | 2 | Time | sys_wait4() | | |
515 | +-------------------+-----------+-----------------+-------------------------+ | |
516 | | alarm | 1 | Time | sys_alarm() | | |
517 | +-------------------+-----------+-----------------+-------------------------+ | |
518 | | set_robust_list | 1 | Futex | sys_set_robust_list() | | |
519 | +-------------------+-----------+-----------------+-------------------------+ | |
520 | ||
521 | Tracing paxtest kiddie workload | |
522 | ------------------------------- | |
523 | ||
524 | Run the following command to trace paxtest kiddie workload:: | |
525 | ||
526 | strace -c paxtest kiddie | |
527 | ||
528 | **System Calls made by the workload** | |
529 | ||
530 | The below table shows the system calls invoked by the workload, number of | |
531 | times each system call is invoked, and the corresponding Linux subsystem. | |
532 | ||
533 | +-------------------+-----------+-----------------+----------------------+ | |
534 | | System Call | # calls | Linux Subsystem | System Call (API) | | |
535 | +===================+===========+=================+======================+ | |
536 | | read | 3 | Filesystem | sys_read() | | |
537 | +-------------------+-----------+-----------------+----------------------+ | |
538 | | write | 11 | Filesystem | sys_write() | | |
539 | +-------------------+-----------+-----------------+----------------------+ | |
540 | | close | 41 | Filesystem | sys_close() | | |
541 | +-------------------+-----------+-----------------+----------------------+ | |
542 | | stat | 24 | Filesystem | sys_stat() | | |
543 | +-------------------+-----------+-----------------+----------------------+ | |
544 | | fstat | 2 | Filesystem | sys_fstat() | | |
545 | +-------------------+-----------+-----------------+----------------------+ | |
546 | | pread64 | 6 | Filesystem | sys_pread64() | | |
547 | +-------------------+-----------+-----------------+----------------------+ | |
548 | | access | 1 | Filesystem | sys_access() | | |
549 | +-------------------+-----------+-----------------+----------------------+ | |
550 | | pipe | 1 | Filesystem | sys_pipe() | | |
551 | +-------------------+-----------+-----------------+----------------------+ | |
552 | | dup2 | 24 | Filesystem | sys_dup2() | | |
553 | +-------------------+-----------+-----------------+----------------------+ | |
554 | | execve | 1 | Filesystem | sys_execve() | | |
555 | +-------------------+-----------+-----------------+----------------------+ | |
556 | | fcntl | 26 | Filesystem | sys_fcntl() | | |
557 | +-------------------+-----------+-----------------+----------------------+ | |
558 | | openat | 14 | Filesystem | sys_openat() | | |
559 | +-------------------+-----------+-----------------+----------------------+ | |
560 | | rt_sigaction | 7 | Signal | sys_rt_sigaction() | | |
561 | +-------------------+-----------+-----------------+----------------------+ | |
562 | | rt_sigreturn | 38 | Signal | sys_rt_sigreturn() | | |
563 | +-------------------+-----------+-----------------+----------------------+ | |
564 | | clone | 38 | Process Mgmt. | sys_clone() | | |
565 | +-------------------+-----------+-----------------+----------------------+ | |
566 | | wait4 | 44 | Time | sys_wait4() | | |
567 | +-------------------+-----------+-----------------+----------------------+ | |
568 | | mmap | 7 | Memory Mgmt. | sys_mmap() | | |
569 | +-------------------+-----------+-----------------+----------------------+ | |
570 | | mprotect | 3 | Memory Mgmt. | sys_mprotect() | | |
571 | +-------------------+-----------+-----------------+----------------------+ | |
572 | | munmap | 1 | Memory Mgmt. | sys_munmap() | | |
573 | +-------------------+-----------+-----------------+----------------------+ | |
574 | | brk | 3 | Memory Mgmt. | sys_brk() | | |
575 | +-------------------+-----------+-----------------+----------------------+ | |
576 | | getpid | 1 | Process Mgmt. | sys_getpid() | | |
577 | +-------------------+-----------+-----------------+----------------------+ | |
578 | | getuid | 1 | Process Mgmt. | sys_getuid() | | |
579 | +-------------------+-----------+-----------------+----------------------+ | |
580 | | getgid | 1 | Process Mgmt. | sys_getgid() | | |
581 | +-------------------+-----------+-----------------+----------------------+ | |
582 | | geteuid | 2 | Process Mgmt. | sys_geteuid() | | |
583 | +-------------------+-----------+-----------------+----------------------+ | |
584 | | getegid | 1 | Process Mgmt. | sys_getegid() | | |
585 | +-------------------+-----------+-----------------+----------------------+ | |
586 | | getppid | 1 | Process Mgmt. | sys_getppid() | | |
587 | +-------------------+-----------+-----------------+----------------------+ | |
588 | | arch_prctl | 2 | Process Mgmt. | sys_arch_prctl() | | |
589 | +-------------------+-----------+-----------------+----------------------+ | |
590 | ||
591 | Conclusion | |
592 | ========== | |
593 | ||
594 | This document is intended to be used as a guide on how to gather fine-grained | |
595 | information on the resources in use by workloads using strace. | |
596 | ||
597 | References | |
598 | ========== | |
599 | ||
600 | * `Discovery Linux Kernel Subsystems used by OpenAPS <https://elisa.tech/blog/2022/02/02/discovery-linux-kernel-subsystems-used-by-openaps>`_ | |
601 | * `ELISA-White-Papers-Discovering Linux kernel subsystems used by a workload <https://github.com/elisa-tech/ELISA-White-Papers/blob/master/Processes/Discovering_Linux_kernel_subsystems_used_by_a_workload.md>`_ | |
602 | * `strace <https://man7.org/linux/man-pages/man1/strace.1.html>`_ | |
603 | * `perf <https://man7.org/linux/man-pages/man1/perf.1.html>`_ | |
604 | * `paxtest README <https://github.com/opntr/paxtest-freebsd/blob/hardenedbsd/0.9.14-hbsd/README>`_ | |
605 | * `stress-ng <https://www.mankier.com/1/stress-ng>`_ | |
606 | * `Monitoring and managing system status and performance <https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/index>`_ |