Commit | Line | Data |
---|---|---|
bce29ac9 DBO |
1 | ============== |
2 | OSNOISE Tracer | |
3 | ============== | |
4 | ||
5 | In the context of high-performance computing (HPC), the Operating System | |
6 | Noise (*osnoise*) refers to the interference experienced by an application | |
7 | due to activities inside the operating system. In the context of Linux, | |
8 | NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the | |
9 | system. Moreover, hardware-related jobs can also cause noise, for example, | |
10 | via SMIs. | |
11 | ||
12 | hwlat_detector is one of the tools used to identify the most complex | |
13 | source of noise: *hardware noise*. | |
14 | ||
15 | In a nutshell, the hwlat_detector creates a thread that runs | |
16 | periodically for a given period. At the beginning of a period, the thread | |
17 | disables interrupt and starts sampling. While running, the hwlatd | |
18 | thread reads the time in a loop. As interrupts are disabled, threads, | |
19 | IRQs, and SoftIRQs cannot interfere with the hwlatd thread. Hence, the | |
20 | cause of any gap between two different reads of the time roots either on | |
21 | NMI or in the hardware itself. At the end of the period, hwlatd enables | |
22 | interrupts and reports the max observed gap between the reads. It also | |
23 | prints a NMI occurrence counter. If the output does not report NMI | |
24 | executions, the user can conclude that the hardware is the culprit for | |
25 | the latency. The hwlat detects the NMI execution by observing | |
26 | the entry and exit of a NMI. | |
27 | ||
28 | The osnoise tracer leverages the hwlat_detector by running a | |
29 | similar loop with preemption, SoftIRQs and IRQs enabled, thus allowing | |
30 | all the sources of *osnoise* during its execution. Using the same approach | |
31 | of hwlat, osnoise takes note of the entry and exit point of any | |
32 | source of interferences, increasing a per-cpu interference counter. The | |
33 | osnoise tracer also saves an interference counter for each source of | |
34 | interference. The interference counter for NMI, IRQs, SoftIRQs, and | |
35 | threads is increased anytime the tool observes these interferences' entry | |
36 | events. When a noise happens without any interference from the operating | |
37 | system level, the hardware noise counter increases, pointing to a | |
38 | hardware-related noise. In this way, osnoise can account for any | |
39 | source of interference. At the end of the period, the osnoise tracer | |
40 | prints the sum of all noise, the max single noise, the percentage of CPU | |
41 | available for the thread, and the counters for the noise sources. | |
42 | ||
43 | Usage | |
44 | ----- | |
45 | ||
46 | Write the ASCII text "osnoise" into the current_tracer file of the | |
47 | tracing system (generally mounted at /sys/kernel/tracing). | |
48 | ||
49 | For example:: | |
50 | ||
51 | [root@f32 ~]# cd /sys/kernel/tracing/ | |
52 | [root@f32 tracing]# echo osnoise > current_tracer | |
53 | ||
62a40dad | 54 | It is possible to follow the trace by reading the trace file:: |
bce29ac9 DBO |
55 | |
56 | [root@f32 tracing]# cat trace | |
57 | # tracer: osnoise | |
58 | # | |
59 | # _-----=> irqs-off | |
60 | # / _----=> need-resched | |
61 | # | / _---=> hardirq/softirq | |
62 | # || / _--=> preempt-depth MAX | |
63 | # || / SINGLE Interference counters: | |
64 | # |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+ | |
65 | # TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD | |
66 | # | | | |||| | | | | | | | | | | | |
67 | <...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1 | |
68 | <...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3 | |
69 | <...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21 | |
70 | <...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0 | |
71 | <...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41 | |
72 | <...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2 | |
73 | <...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1 | |
74 | <...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19 | |
75 | ||
76 | In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the | |
77 | tracer prints a message at the end of each period for each CPU that is | |
78 | running an osnoise/ thread. The osnoise specific fields report: | |
79 | ||
bd09c055 | 80 | - The RUNTIME IN US reports the amount of time in microseconds that |
bce29ac9 DBO |
81 | the osnoise thread kept looping reading the time. |
82 | - The NOISE IN US reports the sum of noise in microseconds observed | |
83 | by the osnoise tracer during the associated runtime. | |
84 | - The % OF CPU AVAILABLE reports the percentage of CPU available for | |
85 | the osnoise thread during the runtime window. | |
86 | - The MAX SINGLE NOISE IN US reports the maximum single noise observed | |
87 | during the runtime window. | |
88 | - The Interference counters display how many each of the respective | |
89 | interference happened during the runtime window. | |
90 | ||
91 | Note that the example above shows a high number of HW noise samples. | |
92 | The reason being is that this sample was taken on a virtual machine, | |
93 | and the host interference is detected as a hardware interference. | |
94 | ||
d358dfe6 DBO |
95 | Tracer Configuration |
96 | -------------------- | |
bce29ac9 DBO |
97 | |
98 | The tracer has a set of options inside the osnoise directory, they are: | |
99 | ||
100 | - osnoise/cpus: CPUs at which a osnoise thread will execute. | |
101 | - osnoise/period_us: the period of the osnoise thread. | |
102 | - osnoise/runtime_us: how long an osnoise thread will look for noise. | |
103 | - osnoise/stop_tracing_us: stop the system tracing if a single noise | |
104 | higher than the configured value happens. Writing 0 disables this | |
105 | option. | |
106 | - osnoise/stop_tracing_total_us: stop the system tracing if total noise | |
107 | higher than the configured value happens. Writing 0 disables this | |
108 | option. | |
109 | - tracing_threshold: the minimum delta between two time() reads to be | |
110 | considered as noise, in us. When set to 0, the default value will | |
62a40dad | 111 | be used, which is currently 5 us. |
67543cd6 DBO |
112 | - osnoise/options: a set of on/off options that can be enabled by |
113 | writing the option name to the file or disabled by writing the option | |
0e162c6f | 114 | name preceded with the 'NO\_' prefix. For example, writing |
67543cd6 DBO |
115 | NO_OSNOISE_WORKLOAD disables the OSNOISE_WORKLOAD option. The |
116 | special DEAFAULTS option resets all options to the default value. | |
bce29ac9 | 117 | |
d358dfe6 DBO |
118 | Tracer Options |
119 | -------------- | |
120 | ||
121 | The osnoise/options file exposes a set of on/off configuration options for | |
122 | the osnoise tracer. These options are: | |
123 | ||
124 | - DEFAULTS: reset the options to the default value. | |
125 | - OSNOISE_WORKLOAD: do not dispatch osnoise workload (see dedicated | |
126 | section below). | |
127 | - PANIC_ON_STOP: call panic() if the tracer stops. This option serves to | |
128 | capture a vmcore. | |
129 | - OSNOISE_PREEMPT_DISABLE: disable preemption while running the osnoise | |
130 | workload, allowing only IRQ and hardware-related noise. | |
131 | - OSNOISE_IRQ_DISABLE: disable IRQs while running the osnoise workload, | |
132 | allowing only NMIs and hardware-related noise, like hwlat tracer. | |
133 | ||
bce29ac9 DBO |
134 | Additional Tracing |
135 | ------------------ | |
136 | ||
137 | In addition to the tracer, a set of tracepoints were added to | |
138 | facilitate the identification of the osnoise source. | |
139 | ||
140 | - osnoise:sample_threshold: printed anytime a noise is higher than | |
141 | the configurable tolerance_ns. | |
142 | - osnoise:nmi_noise: noise from NMI, including the duration. | |
143 | - osnoise:irq_noise: noise from an IRQ, including the duration. | |
144 | - osnoise:softirq_noise: noise from a SoftIRQ, including the | |
145 | duration. | |
146 | - osnoise:thread_noise: noise from a thread, including the duration. | |
147 | ||
148 | Note that all the values are *net values*. For example, if while osnoise | |
149 | is running, another thread preempts the osnoise thread, it will start a | |
150 | thread_noise duration at the start. Then, an IRQ takes place, preempting | |
151 | the thread_noise, starting a irq_noise. When the IRQ ends its execution, | |
152 | it will compute its duration, and this duration will be subtracted from | |
153 | the thread_noise, in such a way as to avoid the double accounting of the | |
154 | IRQ execution. This logic is valid for all sources of noise. | |
155 | ||
156 | Here is one example of the usage of these tracepoints:: | |
157 | ||
158 | osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns | |
159 | osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns | |
160 | migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns | |
161 | osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8812 ns interferences 2 | |
162 | ||
163 | In this example, a noise sample of 8 microseconds was reported in the last | |
164 | line, pointing to two interferences. Looking backward in the trace, the | |
165 | two previous entries were about the migration thread running after a | |
166 | timer IRQ execution. The first event is not part of the noise because | |
167 | it took place one millisecond before. | |
168 | ||
169 | It is worth noticing that the sum of the duration reported in the | |
170 | tracepoints is smaller than eight us reported in the sample_threshold. | |
171 | The reason roots in the overhead of the entry and exit code that happens | |
172 | before and after any interference execution. This justifies the dual | |
173 | approach: measuring thread and tracing. | |
67543cd6 DBO |
174 | |
175 | Running osnoise tracer without workload | |
176 | --------------------------------------- | |
177 | ||
178 | By enabling the osnoise tracer with the NO_OSNOISE_WORKLOAD option set, | |
179 | the osnoise: tracepoints serve to measure the execution time of | |
180 | any type of Linux task, free from the interference of other tasks. |