Merge tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
[linux-block.git] / Documentation / trace / coresight.txt
CommitLineData
872234d3
MP
1 Coresight - HW Assisted Tracing on ARM
2 ======================================
3
4 Author: Mathieu Poirier <mathieu.poirier@linaro.org>
5 Date: September 11th, 2014
6
7Introduction
8------------
9
10Coresight is an umbrella of technologies allowing for the debugging of ARM
11based SoC. It includes solutions for JTAG and HW assisted tracing. This
12document is concerned with the latter.
13
14HW assisted tracing is becoming increasingly useful when dealing with systems
15that have many SoCs and other components like GPU and DMA engines. ARM has
16developed a HW assisted tracing solution by means of different components, each
b29d5c1f 17being added to a design at synthesis time to cater to specific tracing needs.
f8b66fe5 18Components are generally categorised as source, link and sinks and are
872234d3
MP
19(usually) discovered using the AMBA bus.
20
21"Sources" generate a compressed stream representing the processor instruction
22path based on tracing scenarios as configured by users. From there the stream
23flows through the coresight system (via ATB bus) using links that are connecting
24the emanating source to a sink(s). Sinks serve as endpoints to the coresight
25implementation, either storing the compressed stream in a memory buffer or
26creating an interface to the outside world where data can be transferred to a
27host without fear of filling up the onboard coresight memory buffer.
28
29At typical coresight system would look like this:
30
31 *****************************************************************
32 **************************** AMBA AXI ****************************===||
33 ***************************************************************** ||
34 ^ ^ | ||
35 | | * **
36 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ ||||||||||||
37 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System ||
38 |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory ||
39 | #######<-->: I : | #######<-->: I : : I : @@@<-| ||||||||||||
40 | # ETM # ::::: | # PTM # ::::: ::::: @ |
41 | ##### ^ ^ | ##### ^ ! ^ ! . | |||||||||
42 | |->### | ! | |->### | ! | ! . | || DAP ||
43 | | # | ! | | # | ! | ! . | |||||||||
44 | | . | ! | | . | ! | ! . | | |
45 | | . | ! | | . | ! | ! . | | *
46 | | . | ! | | . | ! | ! . | | SWD/
47 | | . | ! | | . | ! | ! . | | JTAG
48 *****************************************************************<-|
7af8792b 49 *************************** AMBA Debug APB ************************
872234d3
MP
50 *****************************************************************
51 | . ! . ! ! . |
52 | . * . * * . |
53 *****************************************************************
54 ******************** Cross Trigger Matrix (CTM) *******************
55 *****************************************************************
56 | . ^ . . |
57 | * ! * * |
58 *****************************************************************
59 ****************** AMBA Advanced Trace Bus (ATB) ******************
60 *****************************************************************
61 | ! =============== |
62 | * ===== F =====<---------|
63 | ::::::::: ==== U ====
64 |-->:: CTI ::<!! === N ===
65 | ::::::::: ! == N ==
66 | ^ * == E ==
67 | ! &&&&&&&&& IIIIIII == L ==
68 |------>&& ETB &&<......II I =======
69 | ! &&&&&&&&& II I .
70 | ! I I .
71 | ! I REP I<..........
72 | ! I I
73 | !!>&&&&&&&&& II I *Source: ARM ltd.
74 |------>& TPIU &<......II I DAP = Debug Access Port
75 &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell
76 ; PTM = Program Trace Macrocell
77 ; CTI = Cross Trigger Interface
78 * ETB = Embedded Trace Buffer
79 To trace port TPIU= Trace Port Interface Unit
80 SWD = Serial Wire Debug
81
7af8792b 82While on target configuration of the components is done via the APB bus,
872234d3
MP
83all trace data are carried out-of-band on the ATB bus. The CTM provides
84a way to aggregate and distribute signals between CoreSight components.
85
86The coresight framework provides a central point to represent, configure and
87manage coresight devices on a platform. This first implementation centers on
88the basic tracing functionality, enabling components such ETM/PTM, funnel,
89replicator, TMC, TPIU and ETB. Future work will enable more
90intricate IP blocks such as STM and CTI.
91
92
93Acronyms and Classification
94---------------------------
95
96Acronyms:
97
98PTM: Program Trace Macrocell
99ETM: Embedded Trace Macrocell
100STM: System trace Macrocell
101ETB: Embedded Trace Buffer
102ITM: Instrumentation Trace Macrocell
103TPIU: Trace Port Interface Unit
104TMC-ETR: Trace Memory Controller, configured as Embedded Trace Router
105TMC-ETF: Trace Memory Controller, configured as Embedded Trace FIFO
106CTI: Cross Trigger Interface
107
108Classification:
109
110Source:
111 ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM
112Link:
113 Funnel, replicator (intelligent or not), TMC-ETR
114Sinks:
115 ETBv1.0, ETB1.1, TPIU, TMC-ETF
116Misc:
117 CTI
118
119
120Device Tree Bindings
121----------------------
122
123See Documentation/devicetree/bindings/arm/coresight.txt for details.
124
125As of this writing drivers for ITM, STMs and CTIs are not provided but are
126expected to be added as the solution matures.
127
128
129Framework and implementation
130----------------------------
131
132The coresight framework provides a central point to represent, configure and
133manage coresight devices on a platform. Any coresight compliant device can
134register with the framework for as long as they use the right APIs:
135
136struct coresight_device *coresight_register(struct coresight_desc *desc);
137void coresight_unregister(struct coresight_device *csdev);
138
139The registering function is taking a "struct coresight_device *csdev" and
140register the device with the core framework. The unregister function takes
f8b66fe5 141a reference to a "struct coresight_device", obtained at registration time.
872234d3
MP
142
143If everything goes well during the registration process the new devices will
144show up under /sys/bus/coresight/devices, as showns here for a TC2 platform:
145
146root:~# ls /sys/bus/coresight/devices/
147replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
14820010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
149root:~#
150
151The functions take a "struct coresight_device", which looks like this:
152
153struct coresight_desc {
154 enum coresight_dev_type type;
155 struct coresight_dev_subtype subtype;
156 const struct coresight_ops *ops;
157 struct coresight_platform_data *pdata;
158 struct device *dev;
159 const struct attribute_group **groups;
160};
161
162
163The "coresight_dev_type" identifies what the device is, i.e, source link or
164sink while the "coresight_dev_subtype" will characterise that type further.
165
166The "struct coresight_ops" is mandatory and will tell the framework how to
167perform base operations related to the components, each component having
168a different set of requirement. For that "struct coresight_ops_sink",
169"struct coresight_ops_link" and "struct coresight_ops_source" have been
170provided.
171
172The next field, "struct coresight_platform_data *pdata" is acquired by calling
173"of_get_coresight_platform_data()", as part of the driver's _probe routine and
174"struct device *dev" gets the device reference embedded in the "amba_device":
175
176static int etm_probe(struct amba_device *adev, const struct amba_id *id)
177{
178 ...
179 ...
180 drvdata->dev = &adev->dev;
181 ...
182}
183
184Specific class of device (source, link, or sink) have generic operations
185that can be performed on them (see "struct coresight_ops"). The
186"**groups" is a list of sysfs entries pertaining to operations
187specific to that component only. "Implementation defined" customisations are
188expected to be accessed and controlled using those entries.
189
f29816b4 190
cd84d63a
SP
191Device Naming scheme
192------------------------
193The devices that appear on the "coresight" bus were named the same as their
194parent devices, i.e, the real devices that appears on AMBA bus or the platform bus.
195Thus the names were based on the Linux Open Firmware layer naming convention,
196which follows the base physical address of the device followed by the device
197type. e.g:
198
199root:~# ls /sys/bus/coresight/devices/
200 20010000.etf 20040000.funnel 20100000.stm 22040000.etm
201 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu
202 20070000.etr 20120000.replicator 220c0000.funnel
203 23040000.etm 23140000.etm 23340000.etm
204
205However, with the introduction of ACPI support, the names of the real
206devices are a bit cryptic and non-obvious. Thus, a new naming scheme was
207introduced to use more generic names based on the type of the device. The
208following rules apply:
209
210 1) Devices that are bound to CPUs, are named based on the CPU logical
211 number.
212
213 e.g, ETM bound to CPU0 is named "etm0"
214
215 2) All other devices follow a pattern, "<device_type_prefix>N", where :
216
217 <device_type_prefix> - A prefix specific to the type of the device
218 N - a sequential number assigned based on the order
219 of probing.
220
221 e.g, tmc_etf0, tmc_etr0, funnel0, funnel1
222
223Thus, with the new scheme the devices could appear as :
224
225root:~# ls /sys/bus/coresight/devices/
226 etm0 etm1 etm2 etm3 etm4 etm5 funnel0
227 funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0
228
229Some of the examples below might refer to old naming scheme and some
230to the newer scheme, to give a confirmation that what you see on your
231system is not unexpected. One must use the "names" as they appear on
232the system under specified locations.
233
237483aa
PP
234How to use the tracer modules
235-----------------------------
872234d3 236
f29816b4
MP
237There are two ways to use the Coresight framework: 1) using the perf cmd line
238tools and 2) interacting directly with the Coresight devices using the sysFS
239interface. Preference is given to the former as using the sysFS interface
240requires a deep understanding of the Coresight HW. The following sections
241provide details on using both methods.
242
2431) Using the sysFS interface:
244
245Before trace collection can start, a coresight sink needs to be identified.
872234d3
MP
246There is no limit on the amount of sinks (nor sources) that can be enabled at
247any given moment. As a generic operation, all device pertaining to the sink
248class will have an "active" entry in sysfs:
249
250root:/sys/bus/coresight/devices# ls
251replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm
25220010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm
253root:/sys/bus/coresight/devices# ls 20010000.etb
254enable_sink status trigger_cntr
255root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink
256root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink
2571
258root:/sys/bus/coresight/devices#
259
260At boot time the current etm3x driver will configure the first address
261comparator with "_stext" and "_etext", essentially tracing any instruction
262that falls within that range. As such "enabling" a source will immediately
263trigger a trace capture:
264
265root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source
266root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source
2671
268root:/sys/bus/coresight/devices# cat 20010000.etb/status
269Depth: 0x2000
270Status: 0x1
271RAM read ptr: 0x0
272RAM wrt ptr: 0x19d3 <----- The write pointer is moving
273Trigger cnt: 0x0
274Control: 0x1
275Flush status: 0x0
276Flush ctrl: 0x2001
277root:/sys/bus/coresight/devices#
278
279Trace collection is stopped the same way:
280
281root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source
282root:/sys/bus/coresight/devices#
283
284The content of the ETB buffer can be harvested directly from /dev:
285
286root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \
287of=~/cstrace.bin
288
28964+0 records in
29064+0 records out
29132768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s
292root:/sys/bus/coresight/devices#
293
294The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32.
295
296Following is a DS-5 output of an experimental loop that increments a variable up
297to a certain value. The example is simple and yet provides a glimpse of the
298wealth of possibilities that coresight provides.
299
300Info Tracing enabled
301Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr}
302Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc
303Instruction 0 0x8026B544 E3A03000 false MOV r3,#0
304Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4]
305Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4]
306Instruction 0 0x8026B550 E3530004 false CMP r3,#4
307Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
308Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
309Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
310Timestamp Timestamp: 17106715833
311Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4]
312Instruction 0 0x8026B550 E3530004 false CMP r3,#4
313Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
314Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
315Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
316Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4]
317Instruction 0 0x8026B550 E3530004 false CMP r3,#4
318Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
319Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
320Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
321Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
322Instruction 0 0x8026B550 E3530004 false CMP r3,#4
323Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
324Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
325Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
326Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4]
327Instruction 0 0x8026B550 E3530004 false CMP r3,#4
328Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
329Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
330Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
331Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4]
332Instruction 0 0x8026B550 E3530004 false CMP r3,#4
333Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1
334Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4]
335Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c
336Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1
337Instruction 0 0x8026B564 E1A0100D false MOV r1,sp
338Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0
339Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f
340Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4]
341Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368
342Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc]
343Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0]
344Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4
345Info Tracing enabled
346Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc
347Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc}
348Timestamp Timestamp: 17107041535
237483aa 349
f29816b4
MP
3502) Using perf framework:
351
352Coresight tracers are represented using the Perf framework's Performance
353Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of
354controlling when tracing gets enabled based on when the process of interest is
355scheduled. When configured in a system, Coresight PMUs will be listed when
356queried by the perf command line tool:
357
358 linaro@linaro-nano:~$ ./perf list pmu
359
360 List of pre-defined events (to be used in -e):
361
362 cs_etm// [Kernel PMU event]
363
364 linaro@linaro-nano:~$
365
366Regardless of the number of tracers available in a system (usually equal to the
367amount of processor cores), the "cs_etm" PMU will be listed only once.
368
369A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is
370listed along with configuration options within forward slashes '/'. Since a
371Coresight system will typically have more than one sink, the name of the sink to
cd84d63a
SP
372work with needs to be specified as an event option.
373On newer kernels the available sinks are listed in sysFS under:
374($SYSFS)/bus/event_source/devices/cs_etm/sinks/
375
376 root@localhost:/sys/bus/event_source/devices/cs_etm/sinks# ls
377 tmc_etf0 tmc_etr0 tpiu0
378
379On older kernels, this may need to be found from the list of coresight devices,
380available under ($SYSFS)/bus/coresight/devices/:
381
382 root:~# ls /sys/bus/coresight/devices/
383 etm0 etm1 etm2 etm3 etm4 etm5 funnel0
384 funnel1 funnel2 replicator0 stm0 tmc_etf0 tmc_etr0 tpiu0
f29816b4 385
cd84d63a 386 root@linaro-nano:~# perf record -e cs_etm/@tmc_etr0/u --per-thread program
f29816b4 387
cd84d63a
SP
388As mentioned above in section "Device Naming scheme", the names of the devices could
389look different from what is used in the example above. One must use the device names
390as it appears under the sysFS.
f29816b4
MP
391
392The syntax within the forward slashes '/' is important. The '@' character
393tells the parser that a sink is about to be specified and that this is the sink
394to use for the trace session.
395
396More information on the above and other example on how to use Coresight with
397the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub
398repository [3].
399
87bf4d68 4002.1) AutoFDO analysis using the perf tools:
6673016f
RW
401
402perf can be used to record and analyze trace of programs.
403
404Execution can be recorded using 'perf record' with the cs_etm event,
405specifying the name of the sink to record to, e.g:
406
cd84d63a 407 perf record -e cs_etm/@tmc_etr0/u --per-thread
6673016f
RW
408
409The 'perf report' and 'perf script' commands can be used to analyze execution,
410synthesizing instruction and branch events from the instruction trace.
411'perf inject' can be used to replace the trace data with the synthesized events.
412The --itrace option controls the type and frequency of synthesized events
413(see perf documentation).
414
415Note that only 64-bit programs are currently supported - further work is
416required to support instruction decode of 32-bit Arm programs.
417
418
419Generating coverage files for Feedback Directed Optimization: AutoFDO
420---------------------------------------------------------------------
421
422'perf inject' accepts the --itrace option in which case tracing data is
423removed and replaced with the synthesized events. e.g.
424
425 perf inject --itrace --strip -i perf.data -o perf.data.new
426
427Below is an example of using ARM ETM for autoFDO. It requires autofdo
428(https://github.com/google/autofdo) and gcc version 5. The bubble
429sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
430
431 $ gcc-5 -O3 sort.c -o sort
432 $ taskset -c 2 ./sort
433 Bubble sorting array of 30000 elements
434 5910 ms
435
cd84d63a 436 $ perf record -e cs_etm/@tmc_etr0/u --per-thread taskset -c 2 ./sort
6673016f
RW
437 Bubble sorting array of 30000 elements
438 12543 ms
439 [ perf record: Woken up 35 times to write data ]
440 [ perf record: Captured and wrote 69.640 MB perf.data ]
441
442 $ perf inject -i perf.data -o inj.data --itrace=il64 --strip
443 $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
444 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
445 $ taskset -c 2 ./sort_autofdo
446 Bubble sorting array of 30000 elements
447 5806 ms
87bf4d68
MP
448
449
450How to use the STM module
451-------------------------
452
453Using the System Trace Macrocell module is the same as the tracers - the only
454difference is that clients are driving the trace capture rather
455than the program flow through the code.
456
457As with any other CoreSight component, specifics about the STM tracer can be
458found in sysfs with more information on each entry being found in [1]:
459
cd84d63a 460root@genericarmv8:~# ls /sys/bus/coresight/devices/stm0
87bf4d68
MP
461enable_source hwevent_select port_enable subsystem uevent
462hwevent_enable mgmt port_select traceid
463root@genericarmv8:~#
464
465Like any other source a sink needs to be identified and the STM enabled before
466being used:
467
cd84d63a
SP
468root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink
469root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/stm0/enable_source
87bf4d68
MP
470
471From there user space applications can request and use channels using the devfs
472interface provided for that purpose by the generic STM API:
473
cd84d63a
SP
474root@genericarmv8:~# ls -l /dev/stm0
475crw------- 1 root root 10, 61 Jan 3 18:11 /dev/stm0
87bf4d68
MP
476root@genericarmv8:~#
477
478Details on how to use the generic STM API can be found here [2].
479
480[1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
5fb94e9c 481[2]. Documentation/trace/stm.rst
87bf4d68 482[3]. https://github.com/Linaro/perf-opencsd