Commit | Line | Data |
---|---|---|
16b585fe TZ |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ====================== | |
4 | Histogram Design Notes | |
5 | ====================== | |
6 | ||
7 | :Author: Tom Zanussi <zanussi@kernel.org> | |
8 | ||
9 | This document attempts to provide a description of how the ftrace | |
10 | histograms work and how the individual pieces map to the data | |
11 | structures used to implement them in trace_events_hist.c and | |
12 | tracing_map.c. | |
13 | ||
14 | Note: All the ftrace histogram command examples assume the working | |
15 | directory is the ftrace /tracing directory. For example:: | |
16 | ||
daceabf1 | 17 | # cd /sys/kernel/debug/tracing |
16b585fe TZ |
18 | |
19 | Also, the histogram output displayed for those commands will be | |
20 | generally be truncated - only enough to make the point is displayed. | |
21 | ||
22 | 'hist_debug' trace event files | |
23 | ============================== | |
24 | ||
25 | If the kernel is compiled with CONFIG_HIST_TRIGGERS_DEBUG set, an | |
26 | event file named 'hist_debug' will appear in each event's | |
27 | subdirectory. This file can be read at any time and will display some | |
28 | of the hist trigger internals described in this document. Specific | |
29 | examples and output will be described in test cases below. | |
30 | ||
31 | Basic histograms | |
32 | ================ | |
33 | ||
34 | First, basic histograms. Below is pretty much the simplest thing you | |
35 | can do with histograms - create one with a single key on a single | |
36 | event and cat the output:: | |
37 | ||
38 | # echo 'hist:keys=pid' >> events/sched/sched_waking/trigger | |
39 | ||
40 | # cat events/sched/sched_waking/hist | |
41 | ||
42 | { pid: 18249 } hitcount: 1 | |
43 | { pid: 13399 } hitcount: 1 | |
44 | { pid: 17973 } hitcount: 1 | |
45 | { pid: 12572 } hitcount: 1 | |
46 | ... | |
47 | { pid: 10 } hitcount: 921 | |
48 | { pid: 18255 } hitcount: 1444 | |
49 | { pid: 25526 } hitcount: 2055 | |
50 | { pid: 5257 } hitcount: 2055 | |
51 | { pid: 27367 } hitcount: 2055 | |
52 | { pid: 1728 } hitcount: 2161 | |
53 | ||
54 | Totals: | |
55 | Hits: 21305 | |
56 | Entries: 183 | |
57 | Dropped: 0 | |
58 | ||
59 | What this does is create a histogram on the sched_waking event using | |
60 | pid as a key and with a single value, hitcount, which even if not | |
61 | explicitly specified, exists for every histogram regardless. | |
62 | ||
63 | The hitcount value is a per-bucket value that's automatically | |
64 | incremented on every hit for the given key, which in this case is the | |
65 | pid. | |
66 | ||
67 | So in this histogram, there's a separate bucket for each pid, and each | |
68 | bucket contains a value for that bucket, counting the number of times | |
69 | sched_waking was called for that pid. | |
70 | ||
71 | Each histogram is represented by a hist_data struct. | |
72 | ||
73 | To keep track of each key and value field in the histogram, hist_data | |
74 | keeps an array of these fields named fields[]. The fields[] array is | |
75 | an array containing struct hist_field representations of each | |
76 | histogram val and key in the histogram (variables are also included | |
77 | here, but are discussed later). So for the above histogram we have one | |
78 | key and one value; in this case the one value is the hitcount value, | |
79 | which all histograms have, regardless of whether they define that | |
80 | value or not, which the above histogram does not. | |
81 | ||
82 | Each struct hist_field contains a pointer to the ftrace_event_field | |
83 | from the event's trace_event_file along with various bits related to | |
84 | that such as the size, offset, type, and a hist_field_fn_t function, | |
85 | which is used to grab the field's data from the ftrace event buffer | |
86 | (in most cases - some hist_fields such as hitcount don't directly map | |
87 | to an event field in the trace buffer - in these cases the function | |
88 | implementation gets its value from somewhere else). The flags field | |
89 | indicates which type of field it is - key, value, variable, variable | |
90 | reference, etc., with value being the default. | |
91 | ||
92 | The other important hist_data data structure in addition to the | |
93 | fields[] array is the tracing_map instance created for the histogram, | |
94 | which is held in the .map member. The tracing_map implements the | |
95 | lock-free hash table used to implement histograms (see | |
96 | kernel/trace/tracing_map.h for much more discussion about the | |
97 | low-level data structures implementing the tracing_map). For the | |
98 | purposes of this discussion, the tracing_map contains a number of | |
99 | buckets, each bucket corresponding to a particular tracing_map_elt | |
100 | object hashed by a given histogram key. | |
101 | ||
102 | Below is a diagram the first part of which describes the hist_data and | |
103 | associated key and value fields for the histogram described above. As | |
104 | you can see, there are two fields in the fields array, one val field | |
105 | for the hitcount and one key field for the pid key. | |
106 | ||
107 | Below that is a diagram of a run-time snapshot of what the tracing_map | |
108 | might look like for a given run. It attempts to show the | |
109 | relationships between the hist_data fields and the tracing_map | |
daceabf1 | 110 | elements for a couple hypothetical keys and values.:: |
16b585fe TZ |
111 | |
112 | +------------------+ | |
113 | | hist_data | | |
114 | +------------------+ +----------------+ | |
115 | | .fields[] |---->| val = hitcount |----------------------------+ | |
116 | +----------------+ +----------------+ | | |
117 | | .map | | .size | | | |
118 | +----------------+ +--------------+ | | |
119 | | .offset | | | |
120 | +--------------+ | | |
121 | | .fn() | | | |
122 | +--------------+ | | |
123 | . | | |
124 | . | | |
125 | . | | |
126 | +----------------+ <--- n_vals | | |
127 | | key = pid |----------------------------|--+ | |
128 | +----------------+ | | | |
129 | | .size | | | | |
130 | +--------------+ | | | |
131 | | .offset | | | | |
132 | +--------------+ | | | |
133 | | .fn() | | | | |
134 | +----------------+ <--- n_fields | | | |
135 | | unused | | | | |
136 | +----------------+ | | | |
137 | | | | | | |
138 | +--------------+ | | | |
139 | | | | | | |
140 | +--------------+ | | | |
141 | | | | | | |
142 | +--------------+ | | | |
143 | n_keys = n_fields - n_vals | | | |
daceabf1 | 144 | |
16b585fe TZ |
145 | The hist_data n_vals and n_fields delineate the extent of the fields[] | | |
146 | array and separate keys from values for the rest of the code. | | | |
daceabf1 | 147 | |
16b585fe TZ |
148 | Below is a run-time representation of the tracing_map part of the | | |
149 | histogram, with pointers from various parts of the fields[] array | | | |
150 | to corresponding parts of the tracing_map. | | | |
daceabf1 | 151 | |
16b585fe TZ |
152 | The tracing_map consists of an array of tracing_map_entrys and a set | | |
153 | of preallocated tracing_map_elts (abbreviated below as map_entry and | | | |
154 | map_elt). The total number of map_entrys in the hist_data.map array = | | | |
155 | map->max_elts (actually map->map_size but only max_elts of those are | | | |
156 | used. This is a property required by the map_insert() algorithm). | | | |
daceabf1 | 157 | |
16b585fe TZ |
158 | If a map_entry is unused, meaning no key has yet hashed into it, its | | |
159 | .key value is 0 and its .val pointer is NULL. Once a map_entry has | | | |
160 | been claimed, the .key value contains the key's hash value and the | | | |
161 | .val member points to a map_elt containing the full key and an entry | | | |
162 | for each key or value in the map_elt.fields[] array. There is an | | | |
163 | entry in the map_elt.fields[] array corresponding to each hist_field | | | |
164 | in the histogram, and this is where the continually aggregated sums | | | |
165 | corresponding to each histogram value are kept. | | | |
daceabf1 | 166 | |
16b585fe TZ |
167 | The diagram attempts to show the relationship between the | | |
168 | hist_data.fields[] and the map_elt.fields[] with the links drawn | | | |
daceabf1 TZ |
169 | between diagrams:: |
170 | ||
16b585fe TZ |
171 | +-----------+ | | |
172 | | hist_data | | | | |
173 | +-----------+ | | | |
174 | | .fields | | | | |
175 | +---------+ +-----------+ | | | |
176 | | .map |---->| map_entry | | | | |
177 | +---------+ +-----------+ | | | |
178 | | .key |---> 0 | | | |
179 | +---------+ | | | |
180 | | .val |---> NULL | | | |
181 | +-----------+ | | | |
182 | | map_entry | | | | |
183 | +-----------+ | | | |
184 | | .key |---> pid = 999 | | | |
185 | +---------+ +-----------+ | | | |
186 | | .val |--->| map_elt | | | | |
187 | +---------+ +-----------+ | | | |
188 | . | .key |---> full key * | | | |
189 | . +---------+ +---------------+ | | | |
190 | . | .fields |--->| .sum (val) |<-+ | | |
191 | +-----------+ +---------+ | 2345 | | | | |
192 | | map_entry | +---------------+ | | | |
193 | +-----------+ | .offset (key) |<----+ | |
194 | | .key |---> 0 | 0 | | | | |
195 | +---------+ +---------------+ | | | |
196 | | .val |---> NULL . | | | |
197 | +-----------+ . | | | |
198 | | map_entry | . | | | |
199 | +-----------+ +---------------+ | | | |
200 | | .key | | .sum (val) or | | | | |
201 | +---------+ +---------+ | .offset (key) | | | | |
202 | | .val |--->| map_elt | +---------------+ | | | |
203 | +-----------+ +---------+ | .sum (val) or | | | | |
204 | | map_entry | | .offset (key) | | | | |
205 | +-----------+ +---------------+ | | | |
206 | | .key |---> pid = 4444 | | | |
207 | +---------+ +-----------+ | | | |
208 | | .val | | map_elt | | | | |
209 | +---------+ +-----------+ | | | |
210 | | .key |---> full key * | | | |
211 | +---------+ +---------------+ | | | |
212 | | .fields |--->| .sum (val) |<-+ | | |
213 | +---------+ | 65523 | | | |
214 | +---------------+ | | |
215 | | .offset (key) |<----+ | |
216 | | 0 | | |
217 | +---------------+ | |
218 | . | |
219 | . | |
220 | . | |
221 | +---------------+ | |
222 | | .sum (val) or | | |
223 | | .offset (key) | | |
224 | +---------------+ | |
225 | | .sum (val) or | | |
226 | | .offset (key) | | |
227 | +---------------+ | |
228 | ||
229 | Abbreviations used in the diagrams:: | |
230 | ||
231 | hist_data = struct hist_trigger_data | |
232 | hist_data.fields = struct hist_field | |
233 | fn = hist_field_fn_t | |
234 | map_entry = struct tracing_map_entry | |
235 | map_elt = struct tracing_map_elt | |
236 | map_elt.fields = struct tracing_map_field | |
237 | ||
238 | Whenever a new event occurs and it has a hist trigger associated with | |
239 | it, event_hist_trigger() is called. event_hist_trigger() first deals | |
240 | with the key: for each subkey in the key (in the above example, there | |
241 | is just one subkey corresponding to pid), the hist_field that | |
242 | represents that subkey is retrieved from hist_data.fields[] and the | |
243 | hist_field_fn_t fn() associated with that field, along with the | |
244 | field's size and offset, is used to grab that subkey's data from the | |
245 | current trace record. | |
246 | ||
247 | Once the complete key has been retrieved, it's used to look that key | |
248 | up in the tracing_map. If there's no tracing_map_elt associated with | |
249 | that key, an empty one is claimed and inserted in the map for the new | |
250 | key. In either case, the tracing_map_elt associated with that key is | |
251 | returned. | |
252 | ||
253 | Once a tracing_map_elt available, hist_trigger_elt_update() is called. | |
254 | As the name implies, this updates the element, which basically means | |
255 | updating the element's fields. There's a tracing_map_field associated | |
256 | with each key and value in the histogram, and each of these correspond | |
257 | to the key and value hist_fields created when the histogram was | |
258 | created. hist_trigger_elt_update() goes through each value hist_field | |
259 | and, as for the keys, uses the hist_field's fn() and size and offset | |
260 | to grab the field's value from the current trace record. Once it has | |
261 | that value, it simply adds that value to that field's | |
262 | continually-updated tracing_map_field.sum member. Some hist_field | |
263 | fn()s, such as for the hitcount, don't actually grab anything from the | |
264 | trace record (the hitcount fn() just increments the counter sum by 1), | |
265 | but the idea is the same. | |
266 | ||
267 | Once all the values have been updated, hist_trigger_elt_update() is | |
268 | done and returns. Note that there are also tracing_map_fields for | |
269 | each subkey in the key, but hist_trigger_elt_update() doesn't look at | |
270 | them or update anything - those exist only for sorting, which can | |
271 | happen later. | |
272 | ||
273 | Basic histogram test | |
274 | -------------------- | |
275 | ||
276 | This is a good example to try. It produces 3 value fields and 2 key | |
277 | fields in the output:: | |
278 | ||
279 | # echo 'hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger | |
280 | ||
281 | To see the debug data, cat the kmem/kmalloc's 'hist_debug' file. It | |
282 | will show the trigger info of the histogram it corresponds to, along | |
283 | with the address of the hist_data associated with the histogram, which | |
284 | will become useful in later examples. It then displays the number of | |
285 | total hist_fields associated with the histogram along with a count of | |
286 | how many of those correspond to keys and how many correspond to values. | |
287 | ||
288 | It then goes on to display details for each field, including the | |
289 | field's flags and the position of each field in the hist_data's | |
290 | fields[] array, which is useful information for verifying that things | |
291 | internally appear correct or not, and which again will become even | |
292 | more useful in further examples:: | |
293 | ||
294 | # cat events/kmem/kmalloc/hist_debug | |
295 | ||
296 | # event histogram | |
297 | # | |
298 | # trigger info: hist:keys=common_pid,call_site.sym:vals=hitcount,bytes_req,bytes_alloc:sort=hitcount:size=2048 [active] | |
299 | # | |
300 | ||
301 | hist_data: 000000005e48c9a5 | |
302 | ||
303 | n_vals: 3 | |
304 | n_keys: 2 | |
305 | n_fields: 5 | |
306 | ||
307 | val fields: | |
308 | ||
309 | hist_data->fields[0]: | |
310 | flags: | |
311 | VAL: HIST_FIELD_FL_HITCOUNT | |
312 | type: u64 | |
313 | size: 8 | |
314 | is_signed: 0 | |
315 | ||
316 | hist_data->fields[1]: | |
317 | flags: | |
318 | VAL: normal u64 value | |
319 | ftrace_event_field name: bytes_req | |
320 | type: size_t | |
321 | size: 8 | |
322 | is_signed: 0 | |
323 | ||
324 | hist_data->fields[2]: | |
325 | flags: | |
326 | VAL: normal u64 value | |
327 | ftrace_event_field name: bytes_alloc | |
328 | type: size_t | |
329 | size: 8 | |
330 | is_signed: 0 | |
331 | ||
332 | key fields: | |
333 | ||
334 | hist_data->fields[3]: | |
335 | flags: | |
336 | HIST_FIELD_FL_KEY | |
337 | ftrace_event_field name: common_pid | |
338 | type: int | |
339 | size: 8 | |
340 | is_signed: 1 | |
341 | ||
342 | hist_data->fields[4]: | |
343 | flags: | |
344 | HIST_FIELD_FL_KEY | |
345 | ftrace_event_field name: call_site | |
346 | type: unsigned long | |
347 | size: 8 | |
348 | is_signed: 0 | |
349 | ||
350 | The commands below can be used to clean things up for the next test:: | |
351 | ||
352 | # echo '!hist:keys=common_pid,call_site.sym:values=bytes_req,bytes_alloc,hitcount' >> events/kmem/kmalloc/trigger | |
353 | ||
354 | Variables | |
355 | ========= | |
356 | ||
357 | Variables allow data from one hist trigger to be saved by one hist | |
358 | trigger and retrieved by another hist trigger. For example, a trigger | |
359 | on the sched_waking event can capture a timestamp for a particular | |
360 | pid, and later a sched_switch event that switches to that pid event | |
361 | can grab the timestamp and use it to calculate a time delta between | |
362 | the two events:: | |
363 | ||
364 | # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> | |
365 | events/sched/sched_waking/trigger | |
366 | ||
367 | # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> | |
368 | events/sched/sched_switch/trigger | |
369 | ||
370 | In terms of the histogram data structures, variables are implemented | |
371 | as another type of hist_field and for a given hist trigger are added | |
372 | to the hist_data.fields[] array just after all the val fields. To | |
373 | distinguish them from the existing key and val fields, they're given a | |
374 | new flag type, HIST_FIELD_FL_VAR (abbreviated FL_VAR) and they also | |
375 | make use of a new .var.idx field member in struct hist_field, which | |
376 | maps them to an index in a new map_elt.vars[] array added to the | |
377 | map_elt specifically designed to store and retrieve variable values. | |
378 | The diagram below shows those new elements and adds a new variable | |
379 | entry, ts0, corresponding to the ts0 variable in the sched_waking | |
380 | trigger above. | |
381 | ||
382 | sched_waking histogram | |
daceabf1 | 383 | ----------------------:: |
16b585fe TZ |
384 | |
385 | +------------------+ | |
386 | | hist_data |<-------------------------------------------------------+ | |
387 | +------------------+ +-------------------+ | | |
388 | | .fields[] |-->| val = hitcount | | | |
389 | +----------------+ +-------------------+ | | |
390 | | .map | | .size | | | |
391 | +----------------+ +-----------------+ | | |
392 | | .offset | | | |
393 | +-----------------+ | | |
394 | | .fn() | | | |
395 | +-----------------+ | | |
396 | | .flags | | | |
397 | +-----------------+ | | |
398 | | .var.idx | | | |
399 | +-------------------+ | | |
400 | | var = ts0 | | | |
401 | +-------------------+ | | |
402 | | .size | | | |
403 | +-----------------+ | | |
404 | | .offset | | | |
405 | +-----------------+ | | |
406 | | .fn() | | | |
407 | +-----------------+ | | |
408 | | .flags & FL_VAR | | | |
409 | +-----------------+ | | |
410 | | .var.idx |----------------------------+-+ | | |
411 | +-----------------+ | | | | |
412 | . | | | | |
413 | . | | | | |
414 | . | | | | |
415 | +-------------------+ <--- n_vals | | | | |
416 | | key = pid | | | | | |
417 | +-------------------+ | | | | |
418 | | .size | | | | | |
419 | +-----------------+ | | | | |
420 | | .offset | | | | | |
421 | +-----------------+ | | | | |
422 | | .fn() | | | | | |
423 | +-----------------+ | | | | |
424 | | .flags & FL_KEY | | | | | |
425 | +-----------------+ | | | | |
426 | | .var.idx | | | | | |
427 | +-------------------+ <--- n_fields | | | | |
428 | | unused | | | | | |
429 | +-------------------+ | | | | |
430 | | | | | | | |
431 | +-----------------+ | | | | |
432 | | | | | | | |
433 | +-----------------+ | | | | |
434 | | | | | | | |
435 | +-----------------+ | | | | |
436 | | | | | | | |
437 | +-----------------+ | | | | |
438 | | | | | | | |
439 | +-----------------+ | | | | |
440 | n_keys = n_fields - n_vals | | | | |
441 | | | | | |
daceabf1 | 442 | |
16b585fe TZ |
443 | This is very similar to the basic case. In the above diagram, we can | | | |
444 | see a new .flags member has been added to the struct hist_field | | | | |
445 | struct, and a new entry added to hist_data.fields representing the ts0 | | | | |
446 | variable. For a normal val hist_field, .flags is just 0 (modulo | | | | |
447 | modifier flags), but if the value is defined as a variable, the .flags | | | | |
448 | contains a set FL_VAR bit. | | | | |
daceabf1 | 449 | |
16b585fe TZ |
450 | As you can see, the ts0 entry's .var.idx member contains the index | | | |
451 | into the tracing_map_elts' .vars[] array containing variable values. | | | | |
452 | This idx is used whenever the value of the variable is set or read. | | | | |
453 | The map_elt.vars idx assigned to the given variable is assigned and | | | | |
454 | saved in .var.idx by create_tracing_map_fields() after it calls | | | | |
455 | tracing_map_add_var(). | | | | |
daceabf1 | 456 | |
16b585fe TZ |
457 | Below is a representation of the histogram at run-time, which | | | |
458 | populates the map, along with correspondence to the above hist_data and | | | | |
459 | hist_field data structures. | | | | |
daceabf1 | 460 | |
16b585fe TZ |
461 | The diagram attempts to show the relationship between the | | | |
462 | hist_data.fields[] and the map_elt.fields[] and map_elt.vars[] with | | | | |
463 | the links drawn between diagrams. For each of the map_elts, you can | | | | |
464 | see that the .fields[] members point to the .sum or .offset of a key | | | | |
465 | or val and the .vars[] members point to the value of a variable. The | | | | |
466 | arrows between the two diagrams show the linkages between those | | | | |
467 | tracing_map members and the field definitions in the corresponding | | | | |
daceabf1 TZ |
468 | hist_data fields[] members.:: |
469 | ||
16b585fe TZ |
470 | +-----------+ | | | |
471 | | hist_data | | | | | |
472 | +-----------+ | | | | |
473 | | .fields | | | | | |
474 | +---------+ +-----------+ | | | | |
475 | | .map |---->| map_entry | | | | | |
476 | +---------+ +-----------+ | | | | |
477 | | .key |---> 0 | | | | |
478 | +---------+ | | | | |
479 | | .val |---> NULL | | | | |
480 | +-----------+ | | | | |
481 | | map_entry | | | | | |
482 | +-----------+ | | | | |
483 | | .key |---> pid = 999 | | | | |
484 | +---------+ +-----------+ | | | | |
485 | | .val |--->| map_elt | | | | | |
486 | +---------+ +-----------+ | | | | |
487 | . | .key |---> full key * | | | | |
488 | . +---------+ +---------------+ | | | | |
489 | . | .fields |--->| .sum (val) | | | | | |
490 | . +---------+ | 2345 | | | | | |
491 | . +--| .vars | +---------------+ | | | | |
492 | . | +---------+ | .offset (key) | | | | | |
493 | . | | 0 | | | | | |
494 | . | +---------------+ | | | | |
495 | . | . | | | | |
496 | . | . | | | | |
497 | . | . | | | | |
498 | . | +---------------+ | | | | |
499 | . | | .sum (val) or | | | | | |
500 | . | | .offset (key) | | | | | |
501 | . | +---------------+ | | | | |
502 | . | | .sum (val) or | | | | | |
503 | . | | .offset (key) | | | | | |
504 | . | +---------------+ | | | | |
505 | . | | | | | |
506 | . +---------------->+---------------+ | | | | |
507 | . | ts0 |<--+ | | | |
508 | . | 113345679876 | | | | | |
509 | . +---------------+ | | | | |
510 | . | unused | | | | | |
511 | . | | | | | | |
512 | . +---------------+ | | | | |
513 | . . | | | | |
514 | . . | | | | |
515 | . . | | | | |
516 | . +---------------+ | | | | |
517 | . | unused | | | | | |
518 | . | | | | | | |
519 | . +---------------+ | | | | |
520 | . | unused | | | | | |
521 | . | | | | | | |
522 | . +---------------+ | | | | |
523 | . | | | | |
524 | +-----------+ | | | | |
525 | | map_entry | | | | | |
526 | +-----------+ | | | | |
527 | | .key |---> pid = 4444 | | | | |
528 | +---------+ +-----------+ | | | | |
529 | | .val |--->| map_elt | | | | | |
530 | +---------+ +-----------+ | | | | |
531 | . | .key |---> full key * | | | | |
532 | . +---------+ +---------------+ | | | | |
533 | . | .fields |--->| .sum (val) | | | | | |
534 | +---------+ | 2345 | | | | | |
535 | +--| .vars | +---------------+ | | | | |
536 | | +---------+ | .offset (key) | | | | | |
537 | | | 0 | | | | | |
538 | | +---------------+ | | | | |
539 | | . | | | | |
540 | | . | | | | |
541 | | . | | | | |
542 | | +---------------+ | | | | |
543 | | | .sum (val) or | | | | | |
544 | | | .offset (key) | | | | | |
545 | | +---------------+ | | | | |
546 | | | .sum (val) or | | | | | |
547 | | | .offset (key) | | | | | |
548 | | +---------------+ | | | | |
549 | | | | | | |
550 | | +---------------+ | | | | |
551 | +---------------->| ts0 |<--+ | | | |
552 | | 213499240729 | | | | |
553 | +---------------+ | | | |
554 | | unused | | | | |
555 | | | | | | |
556 | +---------------+ | | | |
557 | . | | | |
558 | . | | | |
559 | . | | | |
560 | +---------------+ | | | |
561 | | unused | | | | |
562 | | | | | | |
563 | +---------------+ | | | |
564 | | unused | | | | |
565 | | | | | | |
566 | +---------------+ | | | |
daceabf1 | 567 | |
16b585fe TZ |
568 | For each used map entry, there's a map_elt pointing to an array of | | |
569 | .vars containing the current value of the variables associated with | | | |
570 | that histogram entry. So in the above, the timestamp associated with | | | |
571 | pid 999 is 113345679876, and the timestamp variable in the same | | | |
572 | .var.idx for pid 4444 is 213499240729. | | | |
daceabf1 | 573 | |
16b585fe TZ |
574 | sched_switch histogram | | |
575 | ---------------------- | | | |
daceabf1 | 576 | |
16b585fe TZ |
577 | The sched_switch histogram paired with the above sched_waking | | |
578 | histogram is shown below. The most important aspect of the | | | |
579 | sched_switch histogram is that it references a variable on the | | | |
580 | sched_waking histogram above. | | | |
daceabf1 | 581 | |
16b585fe TZ |
582 | The histogram diagram is very similar to the others so far displayed, | | |
583 | but it adds variable references. You can see the normal hitcount and | | | |
584 | key fields along with a new wakeup_lat variable implemented in the | | | |
585 | same way as the sched_waking ts0 variable, but in addition there's an | | | |
586 | entry with the new FL_VAR_REF (short for HIST_FIELD_FL_VAR_REF) flag. | | | |
daceabf1 | 587 | |
16b585fe TZ |
588 | Associated with the new var ref field are a couple of new hist_field | | |
589 | members, var.hist_data and var_ref_idx. For a variable reference, the | | | |
590 | var.hist_data goes with the var.idx, which together uniquely identify | | | |
591 | a particular variable on a particular histogram. The var_ref_idx is | | | |
592 | just the index into the var_ref_vals[] array that caches the values of | | | |
593 | each variable whenever a hist trigger is updated. Those resulting | | | |
594 | values are then finally accessed by other code such as trace action | | | |
595 | code that uses the var_ref_idx values to assign param values. | | | |
daceabf1 | 596 | |
16b585fe | 597 | The diagram below describes the situation for the sched_switch | | |
daceabf1 TZ |
598 | histogram referred to before:: |
599 | ||
16b585fe TZ |
600 | # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> | | |
601 | events/sched/sched_switch/trigger | | | |
602 | | | | |
603 | +------------------+ | | | |
604 | | hist_data | | | | |
605 | +------------------+ +-----------------------+ | | | |
606 | | .fields[] |-->| val = hitcount | | | | |
607 | +----------------+ +-----------------------+ | | | |
608 | | .map | | .size | | | | |
609 | +----------------+ +---------------------+ | | | |
610 | +--| .var_refs[] | | .offset | | | | |
611 | | +----------------+ +---------------------+ | | | |
612 | | | .fn() | | | | |
613 | | var_ref_vals[] +---------------------+ | | | |
614 | | +-------------+ | .flags | | | | |
615 | | | $ts0 |<---+ +---------------------+ | | | |
616 | | +-------------+ | | .var.idx | | | | |
617 | | | | | +---------------------+ | | | |
618 | | +-------------+ | | .var.hist_data | | | | |
619 | | | | | +---------------------+ | | | |
620 | | +-------------+ | | .var_ref_idx | | | | |
621 | | | | | +-----------------------+ | | | |
622 | | +-------------+ | | var = wakeup_lat | | | | |
623 | | . | +-----------------------+ | | | |
624 | | . | | .size | | | | |
625 | | . | +---------------------+ | | | |
626 | | +-------------+ | | .offset | | | | |
627 | | | | | +---------------------+ | | | |
628 | | +-------------+ | | .fn() | | | | |
629 | | | | | +---------------------+ | | | |
630 | | +-------------+ | | .flags & FL_VAR | | | | |
631 | | | +---------------------+ | | | |
632 | | | | .var.idx | | | | |
633 | | | +---------------------+ | | | |
634 | | | | .var.hist_data | | | | |
635 | | | +---------------------+ | | | |
636 | | | | .var_ref_idx | | | | |
637 | | | +---------------------+ | | | |
638 | | | . | | | |
639 | | | . | | | |
640 | | | . | | | |
641 | | | +-----------------------+ <--- n_vals | | | |
642 | | | | key = pid | | | | |
643 | | | +-----------------------+ | | | |
644 | | | | .size | | | | |
645 | | | +---------------------+ | | | |
646 | | | | .offset | | | | |
647 | | | +---------------------+ | | | |
648 | | | | .fn() | | | | |
649 | | | +---------------------+ | | | |
650 | | | | .flags | | | | |
651 | | | +---------------------+ | | | |
652 | | | | .var.idx | | | | |
653 | | | +-----------------------+ <--- n_fields | | | |
654 | | | | unused | | | | |
655 | | | +-----------------------+ | | | |
656 | | | | | | | | |
657 | | | +---------------------+ | | | |
658 | | | | | | | | |
659 | | | +---------------------+ | | | |
660 | | | | | | | | |
661 | | | +---------------------+ | | | |
662 | | | | | | | | |
663 | | | +---------------------+ | | | |
664 | | | | | | | | |
665 | | | +---------------------+ | | | |
666 | | | n_keys = n_fields - n_vals | | | |
667 | | | | | | |
668 | | | | | | |
669 | | | +-----------------------+ | | | |
670 | +---------------------->| var_ref = $ts0 | | | | |
671 | | +-----------------------+ | | | |
672 | | | .size | | | | |
673 | | +---------------------+ | | | |
674 | | | .offset | | | | |
675 | | +---------------------+ | | | |
676 | | | .fn() | | | | |
677 | | +---------------------+ | | | |
678 | | | .flags & FL_VAR_REF | | | | |
679 | | +---------------------+ | | | |
680 | | | .var.idx |--------------------------+ | | |
681 | | +---------------------+ | | |
682 | | | .var.hist_data |----------------------------+ | |
683 | | +---------------------+ | |
684 | +---| .var_ref_idx | | |
685 | +---------------------+ | |
686 | ||
687 | Abbreviations used in the diagrams:: | |
688 | ||
689 | hist_data = struct hist_trigger_data | |
690 | hist_data.fields = struct hist_field | |
691 | fn = hist_field_fn_t | |
692 | FL_KEY = HIST_FIELD_FL_KEY | |
693 | FL_VAR = HIST_FIELD_FL_VAR | |
694 | FL_VAR_REF = HIST_FIELD_FL_VAR_REF | |
695 | ||
696 | When a hist trigger makes use of a variable, a new hist_field is | |
697 | created with flag HIST_FIELD_FL_VAR_REF. For a VAR_REF field, the | |
698 | var.idx and var.hist_data take the same values as the referenced | |
699 | variable, as well as the referenced variable's size, type, and | |
700 | is_signed values. The VAR_REF field's .name is set to the name of the | |
701 | variable it references. If a variable reference was created using the | |
702 | explicit system.event.$var_ref notation, the hist_field's system and | |
09068445 | 703 | event_name variables are also set. |
16b585fe TZ |
704 | |
705 | So, in order to handle an event for the sched_switch histogram, | |
706 | because we have a reference to a variable on another histogram, we | |
707 | need to resolve all variable references first. This is done via the | |
708 | resolve_var_refs() calls made from event_hist_trigger(). What this | |
709 | does is grabs the var_refs[] array from the hist_data representing the | |
710 | sched_switch histogram. For each one of those, the referenced | |
711 | variable's var.hist_data along with the current key is used to look up | |
712 | the corresponding tracing_map_elt in that histogram. Once found, the | |
713 | referenced variable's var.idx is used to look up the variable's value | |
714 | using tracing_map_read_var(elt, var.idx), which yields the value of | |
715 | the variable for that element, ts0 in the case above. Note that both | |
716 | the hist_fields representing both the variable and the variable | |
717 | reference have the same var.idx, so this is straightforward. | |
718 | ||
719 | Variable and variable reference test | |
720 | ------------------------------------ | |
721 | ||
722 | This example creates a variable on the sched_waking event, ts0, and | |
723 | uses it in the sched_switch trigger. The sched_switch trigger also | |
724 | creates its own variable, wakeup_lat, but nothing yet uses it:: | |
725 | ||
726 | # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger | |
727 | ||
728 | # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger | |
729 | ||
730 | Looking at the sched_waking 'hist_debug' output, in addition to the | |
731 | normal key and value hist_fields, in the val fields section we see a | |
732 | field with the HIST_FIELD_FL_VAR flag, which indicates that that field | |
733 | represents a variable. Note that in addition to the variable name, | |
734 | contained in the var.name field, it includes the var.idx, which is the | |
735 | index into the tracing_map_elt.vars[] array of the actual variable | |
736 | location. Note also that the output shows that variables live in the | |
737 | same part of the hist_data->fields[] array as normal values:: | |
738 | ||
739 | # cat events/sched/sched_waking/hist_debug | |
740 | ||
741 | # event histogram | |
742 | # | |
743 | # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] | |
744 | # | |
745 | ||
746 | hist_data: 000000009536f554 | |
747 | ||
748 | n_vals: 2 | |
749 | n_keys: 1 | |
750 | n_fields: 3 | |
751 | ||
752 | val fields: | |
753 | ||
754 | hist_data->fields[0]: | |
755 | flags: | |
756 | VAL: HIST_FIELD_FL_HITCOUNT | |
757 | type: u64 | |
758 | size: 8 | |
759 | is_signed: 0 | |
760 | ||
761 | hist_data->fields[1]: | |
762 | flags: | |
763 | HIST_FIELD_FL_VAR | |
764 | var.name: ts0 | |
765 | var.idx (into tracing_map_elt.vars[]): 0 | |
766 | type: u64 | |
767 | size: 8 | |
768 | is_signed: 0 | |
769 | ||
770 | key fields: | |
771 | ||
772 | hist_data->fields[2]: | |
773 | flags: | |
774 | HIST_FIELD_FL_KEY | |
775 | ftrace_event_field name: pid | |
776 | type: pid_t | |
777 | size: 8 | |
778 | is_signed: 1 | |
779 | ||
780 | Moving on to the sched_switch trigger hist_debug output, in addition | |
781 | to the unused wakeup_lat variable, we see a new section displaying | |
782 | variable references. Variable references are displayed in a separate | |
2e171403 | 783 | section because in addition to being logically separate from |
16b585fe TZ |
784 | variables and values, they actually live in a separate hist_data |
785 | array, var_refs[]. | |
786 | ||
787 | In this example, the sched_switch trigger has a reference to a | |
788 | variable on the sched_waking trigger, $ts0. Looking at the details, | |
789 | we can see that the var.hist_data value of the referenced variable | |
790 | matches the previously displayed sched_waking trigger, and the var.idx | |
791 | value matches the previously displayed var.idx value for that | |
792 | variable. Also displayed is the var_ref_idx value for that variable | |
793 | reference, which is where the value for that variable is cached for | |
794 | use when the trigger is invoked:: | |
795 | ||
796 | # cat events/sched/sched_switch/hist_debug | |
797 | ||
798 | # event histogram | |
799 | # | |
800 | # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global [active] | |
801 | # | |
802 | ||
803 | hist_data: 00000000f4ee8006 | |
804 | ||
805 | n_vals: 2 | |
806 | n_keys: 1 | |
807 | n_fields: 3 | |
808 | ||
809 | val fields: | |
810 | ||
811 | hist_data->fields[0]: | |
812 | flags: | |
813 | VAL: HIST_FIELD_FL_HITCOUNT | |
814 | type: u64 | |
815 | size: 8 | |
816 | is_signed: 0 | |
817 | ||
818 | hist_data->fields[1]: | |
819 | flags: | |
820 | HIST_FIELD_FL_VAR | |
821 | var.name: wakeup_lat | |
822 | var.idx (into tracing_map_elt.vars[]): 0 | |
823 | type: u64 | |
824 | size: 0 | |
825 | is_signed: 0 | |
826 | ||
827 | key fields: | |
828 | ||
829 | hist_data->fields[2]: | |
830 | flags: | |
831 | HIST_FIELD_FL_KEY | |
832 | ftrace_event_field name: next_pid | |
833 | type: pid_t | |
834 | size: 8 | |
835 | is_signed: 1 | |
836 | ||
837 | variable reference fields: | |
838 | ||
839 | hist_data->var_refs[0]: | |
840 | flags: | |
841 | HIST_FIELD_FL_VAR_REF | |
842 | name: ts0 | |
843 | var.idx (into tracing_map_elt.vars[]): 0 | |
844 | var.hist_data: 000000009536f554 | |
845 | var_ref_idx (into hist_data->var_refs[]): 0 | |
846 | type: u64 | |
847 | size: 8 | |
848 | is_signed: 0 | |
849 | ||
850 | The commands below can be used to clean things up for the next test:: | |
851 | ||
852 | # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0' >> events/sched/sched_switch/trigger | |
853 | ||
854 | # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger | |
855 | ||
856 | Actions and Handlers | |
857 | ==================== | |
858 | ||
859 | Adding onto the previous example, we will now do something with that | |
860 | wakeup_lat variable, namely send it and another field as a synthetic | |
861 | event. | |
862 | ||
863 | The onmatch() action below basically says that whenever we have a | |
864 | sched_switch event, if we have a matching sched_waking event, in this | |
865 | case if we have a pid in the sched_waking histogram that matches the | |
2e171403 | 866 | next_pid field on this sched_switch event, we retrieve the |
16b585fe TZ |
867 | variables specified in the wakeup_latency() trace action, and use |
868 | them to generate a new wakeup_latency event into the trace stream. | |
869 | ||
870 | Note that the way the trace handlers such as wakeup_latency() (which | |
871 | could equivalently be written trace(wakeup_latency,$wakeup_lat,next_pid) | |
872 | are implemented, the parameters specified to the trace handler must be | |
873 | variables. In this case, $wakeup_lat is obviously a variable, but | |
874 | next_pid isn't, since it's just naming a field in the sched_switch | |
875 | trace event. Since this is something that almost every trace() and | |
876 | save() action does, a special shortcut is implemented to allow field | |
877 | names to be used directly in those cases. How it works is that under | |
878 | the covers, a temporary variable is created for the named field, and | |
879 | this variable is what is actually passed to the trace handler. In the | |
880 | code and documentation, this type of variable is called a 'field | |
881 | variable'. | |
882 | ||
883 | Fields on other trace event's histograms can be used as well. In that | |
884 | case we have to generate a new histogram and an unfortunately named | |
885 | 'synthetic_field' (the use of synthetic here has nothing to do with | |
886 | synthetic events) and use that special histogram field as a variable. | |
887 | ||
888 | The diagram below illustrates the new elements described above in the | |
889 | context of the sched_switch histogram using the onmatch() handler and | |
890 | the trace() action. | |
891 | ||
892 | First, we define the wakeup_latency synthetic event:: | |
893 | ||
894 | # echo 'wakeup_latency u64 lat; pid_t pid' >> synthetic_events | |
895 | ||
896 | Next, the sched_waking hist trigger as before:: | |
897 | ||
898 | # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> | |
899 | events/sched/sched_waking/trigger | |
900 | ||
901 | Finally, we create a hist trigger on the sched_switch event that | |
902 | generates a wakeup_latency() trace event. In this case we pass | |
903 | next_pid into the wakeup_latency synthetic event invocation, which | |
904 | means it will be automatically converted into a field variable:: | |
905 | ||
906 | # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \ | |
907 | onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid)' >> | |
908 | /sys/kernel/debug/tracing/events/sched/sched_switch/trigger | |
909 | ||
910 | The diagram for the sched_switch event is similar to previous examples | |
911 | but shows the additional field_vars[] array for hist_data and shows | |
912 | the linkages between the field_vars and the variables and references | |
913 | created to implement the field variables. The details are discussed | |
914 | below:: | |
915 | ||
916 | +------------------+ | |
917 | | hist_data | | |
918 | +------------------+ +-----------------------+ | |
919 | | .fields[] |-->| val = hitcount | | |
920 | +----------------+ +-----------------------+ | |
921 | | .map | | .size | | |
922 | +----------------+ +---------------------+ | |
923 | +---| .field_vars[] | | .offset | | |
924 | | +----------------+ +---------------------+ | |
925 | |+--| .var_refs[] | | .offset | | |
926 | || +----------------+ +---------------------+ | |
927 | || | .fn() | | |
928 | || var_ref_vals[] +---------------------+ | |
929 | || +-------------+ | .flags | | |
930 | || | $ts0 |<---+ +---------------------+ | |
931 | || +-------------+ | | .var.idx | | |
932 | || | $next_pid |<-+ | +---------------------+ | |
933 | || +-------------+ | | | .var.hist_data | | |
934 | ||+>| $wakeup_lat | | | +---------------------+ | |
935 | ||| +-------------+ | | | .var_ref_idx | | |
936 | ||| | | | | +-----------------------+ | |
937 | ||| +-------------+ | | | var = wakeup_lat | | |
938 | ||| . | | +-----------------------+ | |
939 | ||| . | | | .size | | |
940 | ||| . | | +---------------------+ | |
941 | ||| +-------------+ | | | .offset | | |
942 | ||| | | | | +---------------------+ | |
943 | ||| +-------------+ | | | .fn() | | |
944 | ||| | | | | +---------------------+ | |
945 | ||| +-------------+ | | | .flags & FL_VAR | | |
946 | ||| | | +---------------------+ | |
947 | ||| | | | .var.idx | | |
948 | ||| | | +---------------------+ | |
949 | ||| | | | .var.hist_data | | |
950 | ||| | | +---------------------+ | |
951 | ||| | | | .var_ref_idx | | |
952 | ||| | | +---------------------+ | |
953 | ||| | | . | |
954 | ||| | | . | |
955 | ||| | | . | |
956 | ||| | | . | |
957 | ||| +--------------+ | | . | |
958 | +-->| field_var | | | . | |
959 | || +--------------+ | | . | |
960 | || | var | | | . | |
961 | || +------------+ | | . | |
962 | || | val | | | . | |
963 | || +--------------+ | | . | |
964 | || | field_var | | | . | |
965 | || +--------------+ | | . | |
966 | || | var | | | . | |
967 | || +------------+ | | . | |
968 | || | val | | | . | |
969 | || +------------+ | | . | |
970 | || . | | . | |
971 | || . | | . | |
972 | || . | | +-----------------------+ <--- n_vals | |
973 | || +--------------+ | | | key = pid | | |
974 | || | field_var | | | +-----------------------+ | |
975 | || +--------------+ | | | .size | | |
976 | || | var |--+| +---------------------+ | |
977 | || +------------+ ||| | .offset | | |
978 | || | val |-+|| +---------------------+ | |
979 | || +------------+ ||| | .fn() | | |
980 | || ||| +---------------------+ | |
981 | || ||| | .flags | | |
982 | || ||| +---------------------+ | |
983 | || ||| | .var.idx | | |
984 | || ||| +---------------------+ <--- n_fields | |
985 | || ||| | |
986 | || ||| n_keys = n_fields - n_vals | |
987 | || ||| +-----------------------+ | |
988 | || |+->| var = next_pid | | |
989 | || | | +-----------------------+ | |
990 | || | | | .size | | |
991 | || | | +---------------------+ | |
992 | || | | | .offset | | |
993 | || | | +---------------------+ | |
994 | || | | | .flags & FL_VAR | | |
995 | || | | +---------------------+ | |
996 | || | | | .var.idx | | |
997 | || | | +---------------------+ | |
998 | || | | | .var.hist_data | | |
999 | || | | +-----------------------+ | |
1000 | || +-->| val for next_pid | | |
1001 | || | | +-----------------------+ | |
1002 | || | | | .size | | |
1003 | || | | +---------------------+ | |
1004 | || | | | .offset | | |
1005 | || | | +---------------------+ | |
1006 | || | | | .fn() | | |
1007 | || | | +---------------------+ | |
1008 | || | | | .flags | | |
1009 | || | | +---------------------+ | |
1010 | || | | | | | |
1011 | || | | +---------------------+ | |
1012 | || | | | |
1013 | || | | | |
1014 | || | | +-----------------------+ | |
1015 | +|------------------|-|>| var_ref = $ts0 | | |
1016 | | | | +-----------------------+ | |
1017 | | | | | .size | | |
1018 | | | | +---------------------+ | |
1019 | | | | | .offset | | |
1020 | | | | +---------------------+ | |
1021 | | | | | .fn() | | |
1022 | | | | +---------------------+ | |
1023 | | | | | .flags & FL_VAR_REF | | |
1024 | | | | +---------------------+ | |
1025 | | | +---| .var_ref_idx | | |
1026 | | | +-----------------------+ | |
1027 | | | | var_ref = $next_pid | | |
1028 | | | +-----------------------+ | |
1029 | | | | .size | | |
1030 | | | +---------------------+ | |
1031 | | | | .offset | | |
1032 | | | +---------------------+ | |
1033 | | | | .fn() | | |
1034 | | | +---------------------+ | |
1035 | | | | .flags & FL_VAR_REF | | |
1036 | | | +---------------------+ | |
1037 | | +-----| .var_ref_idx | | |
1038 | | +-----------------------+ | |
1039 | | | var_ref = $wakeup_lat | | |
1040 | | +-----------------------+ | |
1041 | | | .size | | |
1042 | | +---------------------+ | |
1043 | | | .offset | | |
1044 | | +---------------------+ | |
1045 | | | .fn() | | |
1046 | | +---------------------+ | |
1047 | | | .flags & FL_VAR_REF | | |
1048 | | +---------------------+ | |
1049 | +------------------------| .var_ref_idx | | |
1050 | +---------------------+ | |
1051 | ||
1052 | As you can see, for a field variable, two hist_fields are created: one | |
1053 | representing the variable, in this case next_pid, and one to actually | |
1054 | get the value of the field from the trace stream, like a normal val | |
1055 | field does. These are created separately from normal variable | |
1056 | creation and are saved in the hist_data->field_vars[] array. See | |
1057 | below for how these are used. In addition, a reference hist_field is | |
1058 | also created, which is needed to reference the field variables such as | |
1059 | $next_pid variable in the trace() action. | |
1060 | ||
1061 | Note that $wakeup_lat is also a variable reference, referencing the | |
1062 | value of the expression common_timestamp-$ts0, and so also needs to | |
1063 | have a hist field entry representing that reference created. | |
1064 | ||
1065 | When hist_trigger_elt_update() is called to get the normal key and | |
1066 | value fields, it also calls update_field_vars(), which goes through | |
1067 | each field_var created for the histogram, and available from | |
1068 | hist_data->field_vars and calls val->fn() to get the data from the | |
1069 | current trace record, and then uses the var's var.idx to set the | |
1070 | variable at the var.idx offset in the appropriate tracing_map_elt's | |
1071 | variable at elt->vars[var.idx]. | |
1072 | ||
1073 | Once all the variables have been updated, resolve_var_refs() can be | |
1074 | called from event_hist_trigger(), and not only can our $ts0 and | |
1075 | $next_pid references be resolved but the $wakeup_lat reference as | |
1076 | well. At this point, the trace() action can simply access the values | |
1077 | assembled in the var_ref_vals[] array and generate the trace event. | |
1078 | ||
1079 | The same process occurs for the field variables associated with the | |
1080 | save() action. | |
1081 | ||
1082 | Abbreviations used in the diagram:: | |
1083 | ||
1084 | hist_data = struct hist_trigger_data | |
1085 | hist_data.fields = struct hist_field | |
1086 | field_var = struct field_var | |
1087 | fn = hist_field_fn_t | |
1088 | FL_KEY = HIST_FIELD_FL_KEY | |
1089 | FL_VAR = HIST_FIELD_FL_VAR | |
1090 | FL_VAR_REF = HIST_FIELD_FL_VAR_REF | |
1091 | ||
1092 | trace() action field variable test | |
1093 | ---------------------------------- | |
1094 | ||
1095 | This example adds to the previous test example by finally making use | |
1096 | of the wakeup_lat variable, but in addition also creates a couple of | |
1097 | field variables that then are all passed to the wakeup_latency() trace | |
1098 | action via the onmatch() handler. | |
1099 | ||
1100 | First, we create the wakeup_latency synthetic event:: | |
1101 | ||
1102 | # echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events | |
1103 | ||
1104 | Next, the sched_waking trigger from previous examples:: | |
1105 | ||
1106 | # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger | |
1107 | ||
1108 | Finally, as in the previous test example, we calculate and assign the | |
1109 | wakeup latency using the $ts0 reference from the sched_waking trigger | |
1110 | to the wakeup_lat variable, and finally use it along with a couple | |
1111 | sched_switch event fields, next_pid and next_comm, to generate a | |
1112 | wakeup_latency trace event. The next_pid and next_comm event fields | |
1113 | are automatically converted into field variables for this purpose:: | |
1114 | ||
1115 | # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger | |
1116 | ||
1117 | The sched_waking hist_debug output shows the same data as in the | |
1118 | previous test example:: | |
1119 | ||
1120 | # cat events/sched/sched_waking/hist_debug | |
1121 | ||
1122 | # event histogram | |
1123 | # | |
1124 | # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] | |
1125 | # | |
1126 | ||
1127 | hist_data: 00000000d60ff61f | |
1128 | ||
1129 | n_vals: 2 | |
1130 | n_keys: 1 | |
1131 | n_fields: 3 | |
1132 | ||
1133 | val fields: | |
1134 | ||
1135 | hist_data->fields[0]: | |
1136 | flags: | |
1137 | VAL: HIST_FIELD_FL_HITCOUNT | |
1138 | type: u64 | |
1139 | size: 8 | |
1140 | is_signed: 0 | |
1141 | ||
1142 | hist_data->fields[1]: | |
1143 | flags: | |
1144 | HIST_FIELD_FL_VAR | |
1145 | var.name: ts0 | |
1146 | var.idx (into tracing_map_elt.vars[]): 0 | |
1147 | type: u64 | |
1148 | size: 8 | |
1149 | is_signed: 0 | |
1150 | ||
1151 | key fields: | |
1152 | ||
1153 | hist_data->fields[2]: | |
1154 | flags: | |
1155 | HIST_FIELD_FL_KEY | |
1156 | ftrace_event_field name: pid | |
1157 | type: pid_t | |
1158 | size: 8 | |
1159 | is_signed: 1 | |
1160 | ||
1161 | The sched_switch hist_debug output shows the same key and value fields | |
1162 | as in the previous test example - note that wakeup_lat is still in the | |
1163 | val fields section, but that the new field variables are not there - | |
1164 | although the field variables are variables, they're held separately in | |
1165 | the hist_data's field_vars[] array. Although the field variables and | |
1166 | the normal variables are located in separate places, you can see that | |
1167 | the actual variable locations for those variables in the | |
1168 | tracing_map_elt.vars[] do have increasing indices as expected: | |
1169 | wakeup_lat takes the var.idx = 0 slot, while the field variables for | |
1170 | next_pid and next_comm have values var.idx = 1, and var.idx = 2. Note | |
1171 | also that those are the same values displayed for the variable | |
1172 | references corresponding to those variables in the variable reference | |
1173 | fields section. Since there are two triggers and thus two hist_data | |
1174 | addresses, those addresses also need to be accounted for when doing | |
1175 | the matching - you can see that the first variable refers to the 0 | |
1176 | var.idx on the previous hist trigger (see the hist_data address | |
1177 | associated with that trigger), while the second variable refers to the | |
1178 | 0 var.idx on the sched_switch hist trigger, as do all the remaining | |
1179 | variable references. | |
1180 | ||
1181 | Finally, the action tracking variables section just shows the system | |
1182 | and event name for the onmatch() handler:: | |
1183 | ||
1184 | # cat events/sched/sched_switch/hist_debug | |
1185 | ||
1186 | # event histogram | |
1187 | # | |
1188 | # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm) [active] | |
1189 | # | |
1190 | ||
1191 | hist_data: 0000000008f551b7 | |
1192 | ||
1193 | n_vals: 2 | |
1194 | n_keys: 1 | |
1195 | n_fields: 3 | |
1196 | ||
1197 | val fields: | |
1198 | ||
1199 | hist_data->fields[0]: | |
1200 | flags: | |
1201 | VAL: HIST_FIELD_FL_HITCOUNT | |
1202 | type: u64 | |
1203 | size: 8 | |
1204 | is_signed: 0 | |
1205 | ||
1206 | hist_data->fields[1]: | |
1207 | flags: | |
1208 | HIST_FIELD_FL_VAR | |
1209 | var.name: wakeup_lat | |
1210 | var.idx (into tracing_map_elt.vars[]): 0 | |
1211 | type: u64 | |
1212 | size: 0 | |
1213 | is_signed: 0 | |
1214 | ||
1215 | key fields: | |
1216 | ||
1217 | hist_data->fields[2]: | |
1218 | flags: | |
1219 | HIST_FIELD_FL_KEY | |
1220 | ftrace_event_field name: next_pid | |
1221 | type: pid_t | |
1222 | size: 8 | |
1223 | is_signed: 1 | |
1224 | ||
1225 | variable reference fields: | |
1226 | ||
1227 | hist_data->var_refs[0]: | |
1228 | flags: | |
1229 | HIST_FIELD_FL_VAR_REF | |
1230 | name: ts0 | |
1231 | var.idx (into tracing_map_elt.vars[]): 0 | |
1232 | var.hist_data: 00000000d60ff61f | |
1233 | var_ref_idx (into hist_data->var_refs[]): 0 | |
1234 | type: u64 | |
1235 | size: 8 | |
1236 | is_signed: 0 | |
1237 | ||
1238 | hist_data->var_refs[1]: | |
1239 | flags: | |
1240 | HIST_FIELD_FL_VAR_REF | |
1241 | name: wakeup_lat | |
1242 | var.idx (into tracing_map_elt.vars[]): 0 | |
1243 | var.hist_data: 0000000008f551b7 | |
1244 | var_ref_idx (into hist_data->var_refs[]): 1 | |
1245 | type: u64 | |
1246 | size: 0 | |
1247 | is_signed: 0 | |
1248 | ||
1249 | hist_data->var_refs[2]: | |
1250 | flags: | |
1251 | HIST_FIELD_FL_VAR_REF | |
1252 | name: next_pid | |
1253 | var.idx (into tracing_map_elt.vars[]): 1 | |
1254 | var.hist_data: 0000000008f551b7 | |
1255 | var_ref_idx (into hist_data->var_refs[]): 2 | |
1256 | type: pid_t | |
1257 | size: 4 | |
1258 | is_signed: 0 | |
1259 | ||
1260 | hist_data->var_refs[3]: | |
1261 | flags: | |
1262 | HIST_FIELD_FL_VAR_REF | |
1263 | name: next_comm | |
1264 | var.idx (into tracing_map_elt.vars[]): 2 | |
1265 | var.hist_data: 0000000008f551b7 | |
1266 | var_ref_idx (into hist_data->var_refs[]): 3 | |
1267 | type: char[16] | |
1268 | size: 256 | |
1269 | is_signed: 0 | |
1270 | ||
1271 | field variables: | |
1272 | ||
1273 | hist_data->field_vars[0]: | |
1274 | ||
1275 | field_vars[0].var: | |
1276 | flags: | |
1277 | HIST_FIELD_FL_VAR | |
1278 | var.name: next_pid | |
1279 | var.idx (into tracing_map_elt.vars[]): 1 | |
1280 | ||
1281 | field_vars[0].val: | |
1282 | ftrace_event_field name: next_pid | |
1283 | type: pid_t | |
1284 | size: 4 | |
1285 | is_signed: 1 | |
1286 | ||
1287 | hist_data->field_vars[1]: | |
1288 | ||
1289 | field_vars[1].var: | |
1290 | flags: | |
1291 | HIST_FIELD_FL_VAR | |
1292 | var.name: next_comm | |
1293 | var.idx (into tracing_map_elt.vars[]): 2 | |
1294 | ||
1295 | field_vars[1].val: | |
1296 | ftrace_event_field name: next_comm | |
1297 | type: char[16] | |
1298 | size: 256 | |
1299 | is_signed: 0 | |
1300 | ||
1301 | action tracking variables (for onmax()/onchange()/onmatch()): | |
1302 | ||
1303 | hist_data->actions[0].match_data.event_system: sched | |
1304 | hist_data->actions[0].match_data.event: sched_waking | |
1305 | ||
1306 | The commands below can be used to clean things up for the next test:: | |
1307 | ||
1308 | # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_comm)' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger | |
1309 | ||
1310 | # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger | |
1311 | ||
1312 | # echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events | |
1313 | ||
1314 | action_data and the trace() action | |
1315 | ---------------------------------- | |
1316 | ||
1317 | As mentioned above, when the trace() action generates a synthetic | |
1318 | event, all the parameters to the synthetic event either already are | |
1319 | variables or are converted into variables (via field variables), and | |
1320 | finally all those variable values are collected via references to them | |
1321 | into a var_ref_vals[] array. | |
1322 | ||
1323 | The values in the var_ref_vals[] array, however, don't necessarily | |
1324 | follow the same ordering as the synthetic event params. To address | |
1325 | that, struct action_data contains another array, var_ref_idx[] that | |
1326 | maps the trace action params to the var_ref_vals[] values. Below is a | |
1327 | diagram illustrating that for the wakeup_latency() synthetic event:: | |
1328 | ||
1329 | +------------------+ wakeup_latency() | |
1330 | | action_data | event params var_ref_vals[] | |
1331 | +------------------+ +-----------------+ +-----------------+ | |
1332 | | .var_ref_idx[] |--->| $wakeup_lat idx |---+ | | | |
1333 | +----------------+ +-----------------+ | +-----------------+ | |
1334 | | .synth_event | | $next_pid idx |---|-+ | $wakeup_lat val | | |
1335 | +----------------+ +-----------------+ | | +-----------------+ | |
1336 | . | +->| $next_pid val | | |
1337 | . | +-----------------+ | |
1338 | . | . | |
1339 | +-----------------+ | . | |
1340 | | | | . | |
1341 | +-----------------+ | +-----------------+ | |
1342 | +--->| $wakeup_lat val | | |
1343 | +-----------------+ | |
1344 | ||
1345 | Basically, how this ends up getting used in the synthetic event probe | |
1346 | function, trace_event_raw_event_synth(), is as follows:: | |
1347 | ||
1348 | for each field i in .synth_event | |
1349 | val_idx = .var_ref_idx[i] | |
1350 | val = var_ref_vals[val_idx] | |
1351 | ||
1352 | action_data and the onXXX() handlers | |
1353 | ------------------------------------ | |
1354 | ||
1355 | The hist trigger onXXX() actions other than onmatch(), such as onmax() | |
1356 | and onchange(), also make use of and internally create hidden | |
1357 | variables. This information is contained in the | |
1358 | action_data.track_data struct, and is also visible in the hist_debug | |
1359 | output as will be described in the example below. | |
1360 | ||
1361 | Typically, the onmax() or onchange() handlers are used in conjunction | |
1362 | with the save() and snapshot() actions. For example:: | |
1363 | ||
1364 | # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \ | |
1365 | onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> | |
1366 | /sys/kernel/debug/tracing/events/sched/sched_switch/trigger | |
1367 | ||
1368 | or:: | |
1369 | ||
1370 | # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0: \ | |
1371 | onmax($wakeup_lat).snapshot()' >> | |
1372 | /sys/kernel/debug/tracing/events/sched/sched_switch/trigger | |
1373 | ||
1374 | save() action field variable test | |
1375 | --------------------------------- | |
1376 | ||
1377 | For this example, instead of generating a synthetic event, the save() | |
1378 | action is used to save field values whenever an onmax() handler | |
1379 | detects that a new max latency has been hit. As in the previous | |
1380 | example, the values being saved are also field values, but in this | |
1381 | case, are kept in a separate hist_data array named save_vars[]. | |
1382 | ||
1383 | As in previous test examples, we set up the sched_waking trigger:: | |
1384 | ||
1385 | # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger | |
1386 | ||
1387 | In this case, however, we set up the sched_switch trigger to save some | |
1388 | sched_switch field values whenever we hit a new maximum latency. For | |
1389 | both the onmax() handler and save() action, variables will be created, | |
1390 | which we can use the hist_debug files to examine:: | |
1391 | ||
1392 | # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger | |
1393 | ||
1394 | The sched_waking hist_debug output shows the same data as in the | |
1395 | previous test examples:: | |
1396 | ||
1397 | # cat events/sched/sched_waking/hist_debug | |
1398 | ||
1399 | # | |
1400 | # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] | |
1401 | # | |
1402 | ||
1403 | hist_data: 00000000e6290f48 | |
1404 | ||
1405 | n_vals: 2 | |
1406 | n_keys: 1 | |
1407 | n_fields: 3 | |
1408 | ||
1409 | val fields: | |
1410 | ||
1411 | hist_data->fields[0]: | |
1412 | flags: | |
1413 | VAL: HIST_FIELD_FL_HITCOUNT | |
1414 | type: u64 | |
1415 | size: 8 | |
1416 | is_signed: 0 | |
1417 | ||
1418 | hist_data->fields[1]: | |
1419 | flags: | |
1420 | HIST_FIELD_FL_VAR | |
1421 | var.name: ts0 | |
1422 | var.idx (into tracing_map_elt.vars[]): 0 | |
1423 | type: u64 | |
1424 | size: 8 | |
1425 | is_signed: 0 | |
1426 | ||
1427 | key fields: | |
1428 | ||
1429 | hist_data->fields[2]: | |
1430 | flags: | |
1431 | HIST_FIELD_FL_KEY | |
1432 | ftrace_event_field name: pid | |
1433 | type: pid_t | |
1434 | size: 8 | |
1435 | is_signed: 1 | |
1436 | ||
1437 | The output of the sched_switch trigger shows the same val and key | |
1438 | values as before, but also shows a couple new sections. | |
1439 | ||
1440 | First, the action tracking variables section now shows the | |
1441 | actions[].track_data information describing the special tracking | |
1442 | variables and references used to track, in this case, the running | |
1443 | maximum value. The actions[].track_data.var_ref member contains the | |
1444 | reference to the variable being tracked, in this case the $wakeup_lat | |
1445 | variable. In order to perform the onmax() handler function, there | |
1446 | also needs to be a variable that tracks the current maximum by getting | |
1447 | updated whenever a new maximum is hit. In this case, we can see that | |
09068445 | 1448 | an auto-generated variable named ' __max' has been created and is |
16b585fe TZ |
1449 | visible in the actions[].track_data.track_var variable. |
1450 | ||
1451 | Finally, in the new 'save action variables' section, we can see that | |
1452 | the 4 params to the save() function have resulted in 4 field variables | |
1453 | being created for the purposes of saving the values of the named | |
1454 | fields when the max is hit. These variables are kept in a separate | |
1455 | save_vars[] array off of hist_data, so are displayed in a separate | |
1456 | section:: | |
1457 | ||
1458 | # cat events/sched/sched_switch/hist_debug | |
1459 | ||
1460 | # event histogram | |
1461 | # | |
1462 | # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) [active] | |
1463 | # | |
1464 | ||
1465 | hist_data: 0000000057bcd28d | |
1466 | ||
1467 | n_vals: 2 | |
1468 | n_keys: 1 | |
1469 | n_fields: 3 | |
1470 | ||
1471 | val fields: | |
1472 | ||
1473 | hist_data->fields[0]: | |
1474 | flags: | |
1475 | VAL: HIST_FIELD_FL_HITCOUNT | |
1476 | type: u64 | |
1477 | size: 8 | |
1478 | is_signed: 0 | |
1479 | ||
1480 | hist_data->fields[1]: | |
1481 | flags: | |
1482 | HIST_FIELD_FL_VAR | |
1483 | var.name: wakeup_lat | |
1484 | var.idx (into tracing_map_elt.vars[]): 0 | |
1485 | type: u64 | |
1486 | size: 0 | |
1487 | is_signed: 0 | |
1488 | ||
1489 | key fields: | |
1490 | ||
1491 | hist_data->fields[2]: | |
1492 | flags: | |
1493 | HIST_FIELD_FL_KEY | |
1494 | ftrace_event_field name: next_pid | |
1495 | type: pid_t | |
1496 | size: 8 | |
1497 | is_signed: 1 | |
1498 | ||
1499 | variable reference fields: | |
1500 | ||
1501 | hist_data->var_refs[0]: | |
1502 | flags: | |
1503 | HIST_FIELD_FL_VAR_REF | |
1504 | name: ts0 | |
1505 | var.idx (into tracing_map_elt.vars[]): 0 | |
1506 | var.hist_data: 00000000e6290f48 | |
1507 | var_ref_idx (into hist_data->var_refs[]): 0 | |
1508 | type: u64 | |
1509 | size: 8 | |
1510 | is_signed: 0 | |
1511 | ||
1512 | hist_data->var_refs[1]: | |
1513 | flags: | |
1514 | HIST_FIELD_FL_VAR_REF | |
1515 | name: wakeup_lat | |
1516 | var.idx (into tracing_map_elt.vars[]): 0 | |
1517 | var.hist_data: 0000000057bcd28d | |
1518 | var_ref_idx (into hist_data->var_refs[]): 1 | |
1519 | type: u64 | |
1520 | size: 0 | |
1521 | is_signed: 0 | |
1522 | ||
1523 | action tracking variables (for onmax()/onchange()/onmatch()): | |
1524 | ||
1525 | hist_data->actions[0].track_data.var_ref: | |
1526 | flags: | |
1527 | HIST_FIELD_FL_VAR_REF | |
1528 | name: wakeup_lat | |
1529 | var.idx (into tracing_map_elt.vars[]): 0 | |
1530 | var.hist_data: 0000000057bcd28d | |
1531 | var_ref_idx (into hist_data->var_refs[]): 1 | |
1532 | type: u64 | |
1533 | size: 0 | |
1534 | is_signed: 0 | |
1535 | ||
1536 | hist_data->actions[0].track_data.track_var: | |
1537 | flags: | |
1538 | HIST_FIELD_FL_VAR | |
1539 | var.name: __max | |
1540 | var.idx (into tracing_map_elt.vars[]): 1 | |
1541 | type: u64 | |
1542 | size: 8 | |
1543 | is_signed: 0 | |
1544 | ||
1545 | save action variables (save() params): | |
1546 | ||
1547 | hist_data->save_vars[0]: | |
1548 | ||
1549 | save_vars[0].var: | |
1550 | flags: | |
1551 | HIST_FIELD_FL_VAR | |
1552 | var.name: next_comm | |
1553 | var.idx (into tracing_map_elt.vars[]): 2 | |
1554 | ||
1555 | save_vars[0].val: | |
1556 | ftrace_event_field name: next_comm | |
1557 | type: char[16] | |
1558 | size: 256 | |
1559 | is_signed: 0 | |
1560 | ||
1561 | hist_data->save_vars[1]: | |
1562 | ||
1563 | save_vars[1].var: | |
1564 | flags: | |
1565 | HIST_FIELD_FL_VAR | |
1566 | var.name: prev_pid | |
1567 | var.idx (into tracing_map_elt.vars[]): 3 | |
1568 | ||
1569 | save_vars[1].val: | |
1570 | ftrace_event_field name: prev_pid | |
1571 | type: pid_t | |
1572 | size: 4 | |
1573 | is_signed: 1 | |
1574 | ||
1575 | hist_data->save_vars[2]: | |
1576 | ||
1577 | save_vars[2].var: | |
1578 | flags: | |
1579 | HIST_FIELD_FL_VAR | |
1580 | var.name: prev_prio | |
1581 | var.idx (into tracing_map_elt.vars[]): 4 | |
1582 | ||
1583 | save_vars[2].val: | |
1584 | ftrace_event_field name: prev_prio | |
1585 | type: int | |
1586 | size: 4 | |
1587 | is_signed: 1 | |
1588 | ||
1589 | hist_data->save_vars[3]: | |
1590 | ||
1591 | save_vars[3].var: | |
1592 | flags: | |
1593 | HIST_FIELD_FL_VAR | |
1594 | var.name: prev_comm | |
1595 | var.idx (into tracing_map_elt.vars[]): 5 | |
1596 | ||
1597 | save_vars[3].val: | |
1598 | ftrace_event_field name: prev_comm | |
1599 | type: char[16] | |
1600 | size: 256 | |
1601 | is_signed: 0 | |
1602 | ||
1603 | The commands below can be used to clean things up for the next test:: | |
1604 | ||
1605 | # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm)' >> events/sched/sched_switch/trigger | |
1606 | ||
1607 | # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger | |
1608 | ||
1609 | A couple special cases | |
1610 | ====================== | |
1611 | ||
1612 | While the above covers the basics of the histogram internals, there | |
1613 | are a couple of special cases that should be discussed, since they | |
09068445 | 1614 | tend to create even more confusion. Those are field variables on other |
16b585fe TZ |
1615 | histograms, and aliases, both described below through example tests |
1616 | using the hist_debug files. | |
1617 | ||
1618 | Test of field variables on other histograms | |
1619 | ------------------------------------------- | |
1620 | ||
1621 | This example is similar to the previous examples, but in this case, | |
1622 | the sched_switch trigger references a hist trigger field on another | |
1623 | event, namely the sched_waking event. In order to accomplish this, a | |
1624 | field variable is created for the other event, but since an existing | |
1625 | histogram can't be used, as existing histograms are immutable, a new | |
1626 | histogram with a matching variable is created and used, and we'll see | |
1627 | that reflected in the hist_debug output shown below. | |
1628 | ||
1629 | First, we create the wakeup_latency synthetic event. Note the | |
1630 | addition of the prio field:: | |
1631 | ||
1632 | # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events | |
1633 | ||
1634 | As in previous test examples, we set up the sched_waking trigger:: | |
1635 | ||
1636 | # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger | |
1637 | ||
1638 | Here we set up a hist trigger on sched_switch to send a wakeup_latency | |
1639 | event using an onmatch handler naming the sched_waking event. Note | |
1640 | that the third param being passed to the wakeup_latency() is prio, | |
1641 | which is a field name that needs to have a field variable created for | |
1642 | it. There isn't however any prio field on the sched_switch event so | |
1643 | it would seem that it wouldn't be possible to create a field variable | |
1644 | for it. The matching sched_waking event does have a prio field, so it | |
1645 | should be possible to make use of it for this purpose. The problem | |
1646 | with that is that it's not currently possible to define a new variable | |
1647 | on an existing histogram, so it's not possible to add a new prio field | |
1648 | variable to the existing sched_waking histogram. It is however | |
1649 | possible to create an additional new 'matching' sched_waking histogram | |
1650 | for the same event, meaning that it uses the same key and filters, and | |
1651 | define the new prio field variable on that. | |
1652 | ||
1653 | Here's the sched_switch trigger:: | |
1654 | ||
1655 | # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger | |
1656 | ||
1657 | And here's the output of the hist_debug information for the | |
1658 | sched_waking hist trigger. Note that there are two histograms | |
1659 | displayed in the output: the first is the normal sched_waking | |
1660 | histogram we've seen in the previous examples, and the second is the | |
1661 | special histogram we created to provide the prio field variable. | |
1662 | ||
1663 | Looking at the second histogram below, we see a variable with the name | |
1664 | synthetic_prio. This is the field variable created for the prio field | |
1665 | on that sched_waking histogram:: | |
1666 | ||
1667 | # cat events/sched/sched_waking/hist_debug | |
1668 | ||
1669 | # event histogram | |
1670 | # | |
1671 | # trigger info: hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] | |
1672 | # | |
1673 | ||
1674 | hist_data: 00000000349570e4 | |
1675 | ||
1676 | n_vals: 2 | |
1677 | n_keys: 1 | |
1678 | n_fields: 3 | |
1679 | ||
1680 | val fields: | |
1681 | ||
1682 | hist_data->fields[0]: | |
1683 | flags: | |
1684 | VAL: HIST_FIELD_FL_HITCOUNT | |
1685 | type: u64 | |
1686 | size: 8 | |
1687 | is_signed: 0 | |
1688 | ||
1689 | hist_data->fields[1]: | |
1690 | flags: | |
1691 | HIST_FIELD_FL_VAR | |
1692 | var.name: ts0 | |
1693 | var.idx (into tracing_map_elt.vars[]): 0 | |
1694 | type: u64 | |
1695 | size: 8 | |
1696 | is_signed: 0 | |
1697 | ||
1698 | key fields: | |
1699 | ||
1700 | hist_data->fields[2]: | |
1701 | flags: | |
1702 | HIST_FIELD_FL_KEY | |
1703 | ftrace_event_field name: pid | |
1704 | type: pid_t | |
1705 | size: 8 | |
1706 | is_signed: 1 | |
1707 | ||
1708 | ||
1709 | # event histogram | |
1710 | # | |
1711 | # trigger info: hist:keys=pid:vals=hitcount:synthetic_prio=prio:sort=hitcount:size=2048 [active] | |
1712 | # | |
1713 | ||
1714 | hist_data: 000000006920cf38 | |
1715 | ||
1716 | n_vals: 2 | |
1717 | n_keys: 1 | |
1718 | n_fields: 3 | |
1719 | ||
1720 | val fields: | |
1721 | ||
1722 | hist_data->fields[0]: | |
1723 | flags: | |
1724 | VAL: HIST_FIELD_FL_HITCOUNT | |
1725 | type: u64 | |
1726 | size: 8 | |
1727 | is_signed: 0 | |
1728 | ||
1729 | hist_data->fields[1]: | |
1730 | flags: | |
1731 | HIST_FIELD_FL_VAR | |
1732 | ftrace_event_field name: prio | |
1733 | var.name: synthetic_prio | |
1734 | var.idx (into tracing_map_elt.vars[]): 0 | |
1735 | type: int | |
1736 | size: 4 | |
1737 | is_signed: 1 | |
1738 | ||
1739 | key fields: | |
1740 | ||
1741 | hist_data->fields[2]: | |
1742 | flags: | |
1743 | HIST_FIELD_FL_KEY | |
1744 | ftrace_event_field name: pid | |
1745 | type: pid_t | |
1746 | size: 8 | |
1747 | is_signed: 1 | |
1748 | ||
1749 | Looking at the sched_switch histogram below, we can see a reference to | |
1750 | the synthetic_prio variable on sched_waking, and looking at the | |
1751 | associated hist_data address we see that it is indeed associated with | |
1752 | the new histogram. Note also that the other references are to a | |
1753 | normal variable, wakeup_lat, and to a normal field variable, next_pid, | |
1754 | the details of which are in the field variables section:: | |
1755 | ||
1756 | # cat events/sched/sched_switch/hist_debug | |
1757 | ||
1758 | # event histogram | |
1759 | # | |
1760 | # trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio) [active] | |
1761 | # | |
1762 | ||
1763 | hist_data: 00000000a73b67df | |
1764 | ||
1765 | n_vals: 2 | |
1766 | n_keys: 1 | |
1767 | n_fields: 3 | |
1768 | ||
1769 | val fields: | |
1770 | ||
1771 | hist_data->fields[0]: | |
1772 | flags: | |
1773 | VAL: HIST_FIELD_FL_HITCOUNT | |
1774 | type: u64 | |
1775 | size: 8 | |
1776 | is_signed: 0 | |
1777 | ||
1778 | hist_data->fields[1]: | |
1779 | flags: | |
1780 | HIST_FIELD_FL_VAR | |
1781 | var.name: wakeup_lat | |
1782 | var.idx (into tracing_map_elt.vars[]): 0 | |
1783 | type: u64 | |
1784 | size: 0 | |
1785 | is_signed: 0 | |
1786 | ||
1787 | key fields: | |
1788 | ||
1789 | hist_data->fields[2]: | |
1790 | flags: | |
1791 | HIST_FIELD_FL_KEY | |
1792 | ftrace_event_field name: next_pid | |
1793 | type: pid_t | |
1794 | size: 8 | |
1795 | is_signed: 1 | |
1796 | ||
1797 | variable reference fields: | |
1798 | ||
1799 | hist_data->var_refs[0]: | |
1800 | flags: | |
1801 | HIST_FIELD_FL_VAR_REF | |
1802 | name: ts0 | |
1803 | var.idx (into tracing_map_elt.vars[]): 0 | |
1804 | var.hist_data: 00000000349570e4 | |
1805 | var_ref_idx (into hist_data->var_refs[]): 0 | |
1806 | type: u64 | |
1807 | size: 8 | |
1808 | is_signed: 0 | |
1809 | ||
1810 | hist_data->var_refs[1]: | |
1811 | flags: | |
1812 | HIST_FIELD_FL_VAR_REF | |
1813 | name: wakeup_lat | |
1814 | var.idx (into tracing_map_elt.vars[]): 0 | |
1815 | var.hist_data: 00000000a73b67df | |
1816 | var_ref_idx (into hist_data->var_refs[]): 1 | |
1817 | type: u64 | |
1818 | size: 0 | |
1819 | is_signed: 0 | |
1820 | ||
1821 | hist_data->var_refs[2]: | |
1822 | flags: | |
1823 | HIST_FIELD_FL_VAR_REF | |
1824 | name: next_pid | |
1825 | var.idx (into tracing_map_elt.vars[]): 1 | |
1826 | var.hist_data: 00000000a73b67df | |
1827 | var_ref_idx (into hist_data->var_refs[]): 2 | |
1828 | type: pid_t | |
1829 | size: 4 | |
1830 | is_signed: 0 | |
1831 | ||
1832 | hist_data->var_refs[3]: | |
1833 | flags: | |
1834 | HIST_FIELD_FL_VAR_REF | |
1835 | name: synthetic_prio | |
1836 | var.idx (into tracing_map_elt.vars[]): 0 | |
1837 | var.hist_data: 000000006920cf38 | |
1838 | var_ref_idx (into hist_data->var_refs[]): 3 | |
1839 | type: int | |
1840 | size: 4 | |
1841 | is_signed: 1 | |
1842 | ||
1843 | field variables: | |
1844 | ||
1845 | hist_data->field_vars[0]: | |
1846 | ||
1847 | field_vars[0].var: | |
1848 | flags: | |
1849 | HIST_FIELD_FL_VAR | |
1850 | var.name: next_pid | |
1851 | var.idx (into tracing_map_elt.vars[]): 1 | |
1852 | ||
1853 | field_vars[0].val: | |
1854 | ftrace_event_field name: next_pid | |
1855 | type: pid_t | |
1856 | size: 4 | |
1857 | is_signed: 1 | |
1858 | ||
1859 | action tracking variables (for onmax()/onchange()/onmatch()): | |
1860 | ||
1861 | hist_data->actions[0].match_data.event_system: sched | |
1862 | hist_data->actions[0].match_data.event: sched_waking | |
1863 | ||
1864 | The commands below can be used to clean things up for the next test:: | |
1865 | ||
1866 | # echo '!hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)' >> events/sched/sched_switch/trigger | |
1867 | ||
1868 | # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger | |
1869 | ||
1870 | # echo '!wakeup_latency u64 lat; pid_t pid; int prio' >> synthetic_events | |
1871 | ||
1872 | Alias test | |
1873 | ---------- | |
1874 | ||
1875 | This example is very similar to previous examples, but demonstrates | |
1876 | the alias flag. | |
1877 | ||
1878 | First, we create the wakeup_latency synthetic event:: | |
1879 | ||
1880 | # echo 'wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events | |
1881 | ||
1882 | Next, we create a sched_waking trigger similar to previous examples, | |
1883 | but in this case we save the pid in the waking_pid variable:: | |
1884 | ||
1885 | # echo 'hist:keys=pid:waking_pid=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger | |
1886 | ||
1887 | For the sched_switch trigger, instead of using $waking_pid directly in | |
1888 | the wakeup_latency synthetic event invocation, we create an alias of | |
1889 | $waking_pid named $woken_pid, and use that in the synthetic event | |
1890 | invocation instead:: | |
1891 | ||
1892 | # echo 'hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger | |
1893 | ||
1894 | Looking at the sched_waking hist_debug output, in addition to the | |
1895 | normal fields, we can see the waking_pid variable:: | |
1896 | ||
1897 | # cat events/sched/sched_waking/hist_debug | |
1898 | ||
1899 | # event histogram | |
1900 | # | |
1901 | # trigger info: hist:keys=pid:vals=hitcount:waking_pid=pid,ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] | |
1902 | # | |
1903 | ||
1904 | hist_data: 00000000a250528c | |
1905 | ||
1906 | n_vals: 3 | |
1907 | n_keys: 1 | |
1908 | n_fields: 4 | |
1909 | ||
1910 | val fields: | |
1911 | ||
1912 | hist_data->fields[0]: | |
1913 | flags: | |
1914 | VAL: HIST_FIELD_FL_HITCOUNT | |
1915 | type: u64 | |
1916 | size: 8 | |
1917 | is_signed: 0 | |
1918 | ||
1919 | hist_data->fields[1]: | |
1920 | flags: | |
1921 | HIST_FIELD_FL_VAR | |
1922 | ftrace_event_field name: pid | |
1923 | var.name: waking_pid | |
1924 | var.idx (into tracing_map_elt.vars[]): 0 | |
1925 | type: pid_t | |
1926 | size: 4 | |
1927 | is_signed: 1 | |
1928 | ||
1929 | hist_data->fields[2]: | |
1930 | flags: | |
1931 | HIST_FIELD_FL_VAR | |
1932 | var.name: ts0 | |
1933 | var.idx (into tracing_map_elt.vars[]): 1 | |
1934 | type: u64 | |
1935 | size: 8 | |
1936 | is_signed: 0 | |
1937 | ||
1938 | key fields: | |
1939 | ||
1940 | hist_data->fields[3]: | |
1941 | flags: | |
1942 | HIST_FIELD_FL_KEY | |
1943 | ftrace_event_field name: pid | |
1944 | type: pid_t | |
1945 | size: 8 | |
1946 | is_signed: 1 | |
1947 | ||
1948 | The sched_switch hist_debug output shows that a variable named | |
1949 | woken_pid has been created but that it also has the | |
1950 | HIST_FIELD_FL_ALIAS flag set. It also has the HIST_FIELD_FL_VAR flag | |
1951 | set, which is why it appears in the val field section. | |
1952 | ||
1953 | Despite that implementation detail, an alias variable is actually more | |
1954 | like a variable reference; in fact it can be thought of as a reference | |
1955 | to a reference. The implementation copies the var_ref->fn() from the | |
1956 | variable reference being referenced, in this case, the waking_pid | |
1957 | fn(), which is hist_field_var_ref() and makes that the fn() of the | |
1958 | alias. The hist_field_var_ref() fn() requires the var_ref_idx of the | |
1959 | variable reference it's using, so waking_pid's var_ref_idx is also | |
1960 | copied to the alias. The end result is that when the value of alias | |
1961 | is retrieved, in the end it just does the same thing the original | |
1962 | reference would have done and retrieves the same value from the | |
1963 | var_ref_vals[] array. You can verify this in the output by noting | |
1964 | that the var_ref_idx of the alias, in this case woken_pid, is the same | |
1965 | as the var_ref_idx of the reference, waking_pid, in the variable | |
1966 | reference fields section. | |
1967 | ||
1968 | Additionally, once it gets that value, since it is also a variable, it | |
1969 | then saves that value into its var.idx. So the var.idx of the | |
1970 | woken_pid alias is 0, which it fills with the value from var_ref_idx 0 | |
1971 | when its fn() is called to update itself. You'll also notice that | |
1972 | there's a woken_pid var_ref in the variable refs section. That is the | |
1973 | reference to the woken_pid alias variable, and you can see that it | |
1974 | retrieves the value from the same var.idx as the woken_pid alias, 0, | |
1975 | and then in turn saves that value in its own var_ref_idx slot, 3, and | |
1976 | the value at this position is finally what gets assigned to the | |
1977 | $woken_pid slot in the trace event invocation:: | |
1978 | ||
1979 | # cat events/sched/sched_switch/hist_debug | |
1980 | ||
1981 | # event histogram | |
1982 | # | |
1983 | # trigger info: hist:keys=next_pid:vals=hitcount:woken_pid=$waking_pid,wakeup_lat=common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm) [active] | |
1984 | # | |
1985 | ||
1986 | hist_data: 0000000055d65ed0 | |
1987 | ||
1988 | n_vals: 3 | |
1989 | n_keys: 1 | |
1990 | n_fields: 4 | |
1991 | ||
1992 | val fields: | |
1993 | ||
1994 | hist_data->fields[0]: | |
1995 | flags: | |
1996 | VAL: HIST_FIELD_FL_HITCOUNT | |
1997 | type: u64 | |
1998 | size: 8 | |
1999 | is_signed: 0 | |
2000 | ||
2001 | hist_data->fields[1]: | |
2002 | flags: | |
2003 | HIST_FIELD_FL_VAR | |
2004 | HIST_FIELD_FL_ALIAS | |
2005 | var.name: woken_pid | |
2006 | var.idx (into tracing_map_elt.vars[]): 0 | |
2007 | var_ref_idx (into hist_data->var_refs[]): 0 | |
2008 | type: pid_t | |
2009 | size: 4 | |
2010 | is_signed: 1 | |
2011 | ||
2012 | hist_data->fields[2]: | |
2013 | flags: | |
2014 | HIST_FIELD_FL_VAR | |
2015 | var.name: wakeup_lat | |
2016 | var.idx (into tracing_map_elt.vars[]): 1 | |
2017 | type: u64 | |
2018 | size: 0 | |
2019 | is_signed: 0 | |
2020 | ||
2021 | key fields: | |
2022 | ||
2023 | hist_data->fields[3]: | |
2024 | flags: | |
2025 | HIST_FIELD_FL_KEY | |
2026 | ftrace_event_field name: next_pid | |
2027 | type: pid_t | |
2028 | size: 8 | |
2029 | is_signed: 1 | |
2030 | ||
2031 | variable reference fields: | |
2032 | ||
2033 | hist_data->var_refs[0]: | |
2034 | flags: | |
2035 | HIST_FIELD_FL_VAR_REF | |
2036 | name: waking_pid | |
2037 | var.idx (into tracing_map_elt.vars[]): 0 | |
2038 | var.hist_data: 00000000a250528c | |
2039 | var_ref_idx (into hist_data->var_refs[]): 0 | |
2040 | type: pid_t | |
2041 | size: 4 | |
2042 | is_signed: 1 | |
2043 | ||
2044 | hist_data->var_refs[1]: | |
2045 | flags: | |
2046 | HIST_FIELD_FL_VAR_REF | |
2047 | name: ts0 | |
2048 | var.idx (into tracing_map_elt.vars[]): 1 | |
2049 | var.hist_data: 00000000a250528c | |
2050 | var_ref_idx (into hist_data->var_refs[]): 1 | |
2051 | type: u64 | |
2052 | size: 8 | |
2053 | is_signed: 0 | |
2054 | ||
2055 | hist_data->var_refs[2]: | |
2056 | flags: | |
2057 | HIST_FIELD_FL_VAR_REF | |
2058 | name: wakeup_lat | |
2059 | var.idx (into tracing_map_elt.vars[]): 1 | |
2060 | var.hist_data: 0000000055d65ed0 | |
2061 | var_ref_idx (into hist_data->var_refs[]): 2 | |
2062 | type: u64 | |
2063 | size: 0 | |
2064 | is_signed: 0 | |
2065 | ||
2066 | hist_data->var_refs[3]: | |
2067 | flags: | |
2068 | HIST_FIELD_FL_VAR_REF | |
2069 | name: woken_pid | |
2070 | var.idx (into tracing_map_elt.vars[]): 0 | |
2071 | var.hist_data: 0000000055d65ed0 | |
2072 | var_ref_idx (into hist_data->var_refs[]): 3 | |
2073 | type: pid_t | |
2074 | size: 4 | |
2075 | is_signed: 1 | |
2076 | ||
2077 | hist_data->var_refs[4]: | |
2078 | flags: | |
2079 | HIST_FIELD_FL_VAR_REF | |
2080 | name: next_comm | |
2081 | var.idx (into tracing_map_elt.vars[]): 2 | |
2082 | var.hist_data: 0000000055d65ed0 | |
2083 | var_ref_idx (into hist_data->var_refs[]): 4 | |
2084 | type: char[16] | |
2085 | size: 256 | |
2086 | is_signed: 0 | |
2087 | ||
2088 | field variables: | |
2089 | ||
2090 | hist_data->field_vars[0]: | |
2091 | ||
2092 | field_vars[0].var: | |
2093 | flags: | |
2094 | HIST_FIELD_FL_VAR | |
2095 | var.name: next_comm | |
2096 | var.idx (into tracing_map_elt.vars[]): 2 | |
2097 | ||
2098 | field_vars[0].val: | |
2099 | ftrace_event_field name: next_comm | |
2100 | type: char[16] | |
2101 | size: 256 | |
2102 | is_signed: 0 | |
2103 | ||
2104 | action tracking variables (for onmax()/onchange()/onmatch()): | |
2105 | ||
2106 | hist_data->actions[0].match_data.event_system: sched | |
2107 | hist_data->actions[0].match_data.event: sched_waking | |
2108 | ||
2109 | The commands below can be used to clean things up for the next test:: | |
2110 | ||
2111 | # echo '!hist:keys=next_pid:woken_pid=$waking_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,$woken_pid,next_comm)' >> events/sched/sched_switch/trigger | |
2112 | ||
2113 | # echo '!hist:keys=pid:ts0=common_timestamp.usecs' >> events/sched/sched_waking/trigger | |
2114 | ||
2115 | # echo '!wakeup_latency u64 lat; pid_t pid; char comm[16]' >> synthetic_events |