Commit | Line | Data |
---|---|---|
2750ce1d JY |
1 | Intel hybrid support |
2 | -------------------- | |
3 | Support for Intel hybrid events within perf tools. | |
4 | ||
5 | For some Intel platforms, such as AlderLake, which is hybrid platform and | |
6 | it consists of atom cpu and core cpu. Each cpu has dedicated event list. | |
7 | Part of events are available on core cpu, part of events are available | |
8 | on atom cpu and even part of events are available on both. | |
9 | ||
10 | Kernel exports two new cpu pmus via sysfs: | |
11 | /sys/devices/cpu_core | |
12 | /sys/devices/cpu_atom | |
13 | ||
14 | The 'cpus' files are created under the directories. For example, | |
15 | ||
16 | cat /sys/devices/cpu_core/cpus | |
17 | 0-15 | |
18 | ||
19 | cat /sys/devices/cpu_atom/cpus | |
20 | 16-23 | |
21 | ||
22 | It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus. | |
23 | ||
2750ce1d JY |
24 | As before, use perf-list to list the symbolic event. |
25 | ||
26 | perf list | |
27 | ||
28 | inst_retired.any | |
29 | [Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom] | |
30 | inst_retired.any | |
31 | [Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core] | |
32 | ||
33 | The 'Unit: xxx' is added to brief description to indicate which pmu | |
34 | the event is belong to. Same event name but with different pmu can | |
35 | be supported. | |
36 | ||
37 | Enable hybrid event with a specific pmu | |
2750ce1d JY |
38 | |
39 | To enable a core only event or atom only event, following syntax is supported: | |
40 | ||
41 | cpu_core/<event name>/ | |
42 | or | |
43 | cpu_atom/<event name>/ | |
44 | ||
45 | For example, count the 'cycles' event on core cpus. | |
46 | ||
47 | perf stat -e cpu_core/cycles/ | |
48 | ||
49 | Create two events for one hardware event automatically | |
2750ce1d JY |
50 | |
51 | When creating one event and the event is available on both atom and core, | |
52 | two events are created automatically. One is for atom, the other is for | |
53 | core. Most of hardware events and cache events are available on both | |
54 | cpu_core and cpu_atom. | |
55 | ||
56 | For hardware events, they have pre-defined configs (e.g. 0 for cycles). | |
57 | But on hybrid platform, kernel needs to know where the event comes from | |
58 | (from atom or from core). The original perf event type PERF_TYPE_HARDWARE | |
59 | can't carry pmu information. So now this type is extended to be PMU aware | |
60 | type. The PMU type ID is stored at attr.config[63:32]. | |
61 | ||
62 | PMU type ID is retrieved from sysfs. | |
63 | /sys/devices/cpu_atom/type | |
64 | /sys/devices/cpu_core/type | |
65 | ||
66 | The new attr.config layout for PERF_TYPE_HARDWARE: | |
67 | ||
68 | PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA | |
69 | AA: hardware event ID | |
70 | EEEEEEEE: PMU type ID | |
71 | ||
72 | Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be | |
73 | PMU aware type. The PMU type ID is stored at attr.config[63:32]. | |
74 | ||
75 | The new attr.config layout for PERF_TYPE_HW_CACHE: | |
76 | ||
77 | PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB | |
78 | BB: hardware cache ID | |
79 | CC: hardware cache op ID | |
80 | DD: hardware cache op result ID | |
81 | EEEEEEEE: PMU type ID | |
82 | ||
83 | When enabling a hardware event without specified pmu, such as, | |
84 | perf stat -e cycles -a (use system-wide in this example), two events | |
85 | are created automatically. | |
86 | ||
87 | ------------------------------------------------------------ | |
88 | perf_event_attr: | |
89 | size 120 | |
90 | config 0x400000000 | |
91 | sample_type IDENTIFIER | |
92 | read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING | |
93 | disabled 1 | |
94 | inherit 1 | |
95 | exclude_guest 1 | |
96 | ------------------------------------------------------------ | |
97 | ||
98 | and | |
99 | ||
100 | ------------------------------------------------------------ | |
101 | perf_event_attr: | |
102 | size 120 | |
103 | config 0x800000000 | |
104 | sample_type IDENTIFIER | |
105 | read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING | |
106 | disabled 1 | |
107 | inherit 1 | |
108 | exclude_guest 1 | |
109 | ------------------------------------------------------------ | |
110 | ||
111 | type 0 is PERF_TYPE_HARDWARE. | |
112 | 0x4 in 0x400000000 indicates it's cpu_core pmu. | |
113 | 0x8 in 0x800000000 indicates it's cpu_atom pmu (atom pmu type id is random). | |
114 | ||
115 | The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus), | |
116 | and create 'cycles' (0x800000000) on cpu16-cpu23 (atom cpus). | |
117 | ||
118 | For perf-stat result, it displays two events: | |
119 | ||
120 | Performance counter stats for 'system wide': | |
121 | ||
122 | 6,744,979 cpu_core/cycles/ | |
123 | 1,965,552 cpu_atom/cycles/ | |
124 | ||
125 | The first 'cycles' is core event, the second 'cycles' is atom event. | |
126 | ||
127 | Thread mode example: | |
2750ce1d JY |
128 | |
129 | perf-stat reports the scaled counts for hybrid event and with a percentage | |
130 | displayed. The percentage is the event's running time/enabling time. | |
131 | ||
132 | One example, 'triad_loop' runs on cpu16 (atom core), while we can see the | |
133 | scaled value for core cycles is 160,444,092 and the percentage is 0.47%. | |
134 | ||
f2c24eba | 135 | perf stat -e cycles \-- taskset -c 16 ./triad_loop |
2750ce1d JY |
136 | |
137 | As previous, two events are created. | |
138 | ||
139 | ------------------------------------------------------------ | |
140 | perf_event_attr: | |
141 | size 120 | |
142 | config 0x400000000 | |
143 | sample_type IDENTIFIER | |
144 | read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING | |
145 | disabled 1 | |
146 | inherit 1 | |
147 | enable_on_exec 1 | |
148 | exclude_guest 1 | |
149 | ------------------------------------------------------------ | |
150 | ||
151 | and | |
152 | ||
153 | ------------------------------------------------------------ | |
154 | perf_event_attr: | |
155 | size 120 | |
156 | config 0x800000000 | |
157 | sample_type IDENTIFIER | |
158 | read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING | |
159 | disabled 1 | |
160 | inherit 1 | |
161 | enable_on_exec 1 | |
162 | exclude_guest 1 | |
163 | ------------------------------------------------------------ | |
164 | ||
165 | Performance counter stats for 'taskset -c 16 ./triad_loop': | |
166 | ||
167 | 233,066,666 cpu_core/cycles/ (0.43%) | |
168 | 604,097,080 cpu_atom/cycles/ (99.57%) | |
169 | ||
170 | perf-record: | |
2750ce1d JY |
171 | |
172 | If there is no '-e' specified in perf record, on hybrid platform, | |
173 | it creates two default 'cycles' and adds them to event list. One | |
174 | is for core, the other is for atom. | |
175 | ||
176 | perf-stat: | |
2750ce1d JY |
177 | |
178 | If there is no '-e' specified in perf stat, on hybrid platform, | |
179 | besides of software events, following events are created and | |
180 | added to event list in order. | |
181 | ||
182 | cpu_core/cycles/, | |
183 | cpu_atom/cycles/, | |
184 | cpu_core/instructions/, | |
185 | cpu_atom/instructions/, | |
186 | cpu_core/branches/, | |
187 | cpu_atom/branches/, | |
188 | cpu_core/branch-misses/, | |
189 | cpu_atom/branch-misses/ | |
190 | ||
191 | Of course, both perf-stat and perf-record support to enable | |
192 | hybrid event with a specific pmu. | |
193 | ||
194 | e.g. | |
195 | perf stat -e cpu_core/cycles/ | |
196 | perf stat -e cpu_atom/cycles/ | |
197 | perf stat -e cpu_core/r1a/ | |
198 | perf stat -e cpu_atom/L1-icache-loads/ | |
199 | perf stat -e cpu_core/cycles/,cpu_atom/instructions/ | |
200 | perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' | |
201 | ||
202 | But '{cpu_core/cycles/,cpu_atom/instructions/}' will return | |
203 | warning and disable grouping, because the pmus in group are | |
204 | not matched (cpu_core vs. cpu_atom). |