Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd
[linux-2.6-block.git] / Documentation / RCU / trace.txt
CommitLineData
64db4cff
PM
1CONFIG_RCU_TRACE debugfs Files and Formats
2
3
8e79e1f9
PM
4The rcutree and rcutiny implementations of RCU provide debugfs trace
5output that summarizes counters and state. This information is useful for
6debugging RCU itself, and can sometimes also help to debug abuses of RCU.
7The following sections describe the debugfs files and formats, first
8for rcutree and next for rcutiny.
64db4cff
PM
9
10
8e79e1f9 11CONFIG_TREE_RCU and CONFIG_TREE_PREEMPT_RCU debugfs Files and Formats
64db4cff 12
90e6ac36
PM
13These implementations of RCU provides several debugfs files under the
14top-level directory "rcu":
15
16rcu/rcudata:
17 Displays fields in struct rcu_data.
18rcu/rcudata.csv:
19 Comma-separated values spreadsheet version of rcudata.
20rcu/rcugp:
21 Displays grace-period counters.
22rcu/rcuhier:
23 Displays the struct rcu_node hierarchy.
24rcu/rcu_pending:
25 Displays counts of the reasons rcu_pending() decided that RCU had
26 work to do.
27rcu/rcutorture:
28 Displays rcutorture test progress.
29rcu/rcuboost:
30 Displays RCU boosting statistics. Only present if
31 CONFIG_RCU_BOOST=y.
64db4cff
PM
32
33The output of "cat rcu/rcudata" looks as follows:
34
d6714c22 35rcu_sched:
5ece5bab
PM
36 0 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=545/1/0 df=50 of=0 ri=0 ql=163 qs=NRW. kt=0/W/0 ktl=ebc3 b=10 ci=153737 co=0 ca=0
37 1 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=967/1/0 df=58 of=0 ri=0 ql=634 qs=NRW. kt=0/W/1 ktl=58c b=10 ci=191037 co=0 ca=0
38 2 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=1081/1/0 df=175 of=0 ri=0 ql=74 qs=N.W. kt=0/W/2 ktl=da94 b=10 ci=75991 co=0 ca=0
39 3 c=20942 g=20943 pq=1 pqc=20942 qp=1 dt=1846/0/0 df=404 of=0 ri=0 ql=0 qs=.... kt=0/W/3 ktl=d1cd b=10 ci=72261 co=0 ca=0
40 4 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=369/1/0 df=83 of=0 ri=0 ql=48 qs=N.W. kt=0/W/4 ktl=e0e7 b=10 ci=128365 co=0 ca=0
41 5 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=381/1/0 df=64 of=0 ri=0 ql=169 qs=NRW. kt=0/W/5 ktl=fb2f b=10 ci=164360 co=0 ca=0
42 6 c=20972 g=20973 pq=1 pqc=20972 qp=0 dt=1037/1/0 df=183 of=0 ri=0 ql=62 qs=N.W. kt=0/W/6 ktl=d2ad b=10 ci=65663 co=0 ca=0
43 7 c=20897 g=20897 pq=1 pqc=20896 qp=0 dt=1572/0/0 df=382 of=0 ri=0 ql=0 qs=.... kt=0/W/7 ktl=cf15 b=10 ci=75006 co=0 ca=0
64db4cff 44rcu_bh:
5ece5bab
PM
45 0 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=545/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/0 ktl=ebc3 b=10 ci=0 co=0 ca=0
46 1 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=967/1/0 df=3 of=0 ri=1 ql=0 qs=.... kt=0/W/1 ktl=58c b=10 ci=151 co=0 ca=0
47 2 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=1081/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/2 ktl=da94 b=10 ci=0 co=0 ca=0
48 3 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=1846/0/0 df=8 of=0 ri=1 ql=0 qs=.... kt=0/W/3 ktl=d1cd b=10 ci=0 co=0 ca=0
49 4 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=369/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/4 ktl=e0e7 b=10 ci=0 co=0 ca=0
50 5 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=381/1/0 df=4 of=0 ri=1 ql=0 qs=.... kt=0/W/5 ktl=fb2f b=10 ci=0 co=0 ca=0
51 6 c=1480 g=1480 pq=1 pqc=1479 qp=0 dt=1037/1/0 df=6 of=0 ri=1 ql=0 qs=.... kt=0/W/6 ktl=d2ad b=10 ci=0 co=0 ca=0
52 7 c=1474 g=1474 pq=1 pqc=1473 qp=0 dt=1572/0/0 df=8 of=0 ri=1 ql=0 qs=.... kt=0/W/7 ktl=cf15 b=10 ci=0 co=0 ca=0
64db4cff 53
bd58b430
PM
54The first section lists the rcu_data structures for rcu_sched, the second
55for rcu_bh. Note that CONFIG_TREE_PREEMPT_RCU kernels will have an
56additional section for rcu_preempt. Each section has one line per CPU,
57or eight for this 8-CPU system. The fields are as follows:
64db4cff
PM
58
59o The number at the beginning of each line is the CPU number.
60 CPUs numbers followed by an exclamation mark are offline,
61 but have been online at least once since boot. There will be
62 no output for CPUs that have never been online, which can be
63 a good thing in the surprisingly common case where NR_CPUS is
64 substantially larger than the number of actual CPUs.
65
66o "c" is the count of grace periods that this CPU believes have
2fa218d8
PM
67 completed. Offlined CPUs and CPUs in dynticks idle mode may
68 lag quite a ways behind, for example, CPU 6 under "rcu_sched"
69 above, which has been offline through not quite 40,000 RCU grace
70 periods. It is not unusual to see CPUs lagging by thousands of
71 grace periods.
64db4cff
PM
72
73o "g" is the count of grace periods that this CPU believes have
2fa218d8
PM
74 started. Again, offlined CPUs and CPUs in dynticks idle mode
75 may lag behind. If the "c" and "g" values are equal, this CPU
76 has already reported a quiescent state for the last RCU grace
77 period that it is aware of, otherwise, the CPU believes that it
78 owes RCU a quiescent state.
64db4cff
PM
79
80o "pq" indicates that this CPU has passed through a quiescent state
81 for the current grace period. It is possible for "pq" to be
82 "1" and "c" different than "g", which indicates that although
83 the CPU has passed through a quiescent state, either (1) this
84 CPU has not yet reported that fact, (2) some other CPU has not
85 yet reported for this grace period, or (3) both.
86
87o "pqc" indicates which grace period the last-observed quiescent
88 state for this CPU corresponds to. This is important for handling
89 the race between CPU 0 reporting an extended dynticks-idle
90 quiescent state for CPU 1 and CPU 1 suddenly waking up and
91 reporting its own quiescent state. If CPU 1 was the last CPU
92 for the current grace period, then the CPU that loses this race
93 will attempt to incorrectly mark CPU 1 as having checked in for
94 the next grace period!
95
96o "qp" indicates that RCU still expects a quiescent state from
2fa218d8
PM
97 this CPU. Offlined CPUs and CPUs in dyntick idle mode might
98 well have qp=1, which is OK: RCU is still ignoring them.
64db4cff 99
64db4cff
PM
100o "dt" is the current value of the dyntick counter that is incremented
101 when entering or leaving dynticks idle state, either by the
80d02085
PM
102 scheduler or by irq. The number after the "/" is the interrupt
103 nesting depth when in dyntick-idle state, or one greater than
104 the interrupt-nesting depth otherwise.
105
106 This field is displayed only for CONFIG_NO_HZ kernels.
107
108o "dn" is the current value of the dyntick counter that is incremented
109 when entering or leaving dynticks idle state via NMI. If both
110 the "dt" and "dn" values are even, then this CPU is in dynticks
111 idle mode and may be ignored by RCU. If either of these two
112 counters is odd, then RCU must be alert to the possibility of
113 an RCU read-side critical section running on this CPU.
64db4cff
PM
114
115 This field is displayed only for CONFIG_NO_HZ kernels.
116
117o "df" is the number of times that some other CPU has forced a
118 quiescent state on behalf of this CPU due to this CPU being in
119 dynticks-idle state.
120
121 This field is displayed only for CONFIG_NO_HZ kernels.
122
123o "of" is the number of times that some other CPU has forced a
124 quiescent state on behalf of this CPU due to this CPU being
2fa218d8 125 offline. In a perfect world, this might never happen, but it
64db4cff
PM
126 turns out that offlining and onlining a CPU can take several grace
127 periods, and so there is likely to be an extended period of time
128 when RCU believes that the CPU is online when it really is not.
129 Please note that erring in the other direction (RCU believing a
130 CPU is offline when it is really alive and kicking) is a fatal
131 error, so it makes sense to err conservatively.
132
133o "ri" is the number of times that RCU has seen fit to send a
134 reschedule IPI to this CPU in order to get it to report a
135 quiescent state.
136
137o "ql" is the number of RCU callbacks currently residing on
138 this CPU. This is the total number of callbacks, regardless
139 of what state they are in (new, waiting for grace period to
140 start, waiting for grace period to end, ready to invoke).
141
0ac3d136
PM
142o "qs" gives an indication of the state of the callback queue
143 with four characters:
144
145 "N" Indicates that there are callbacks queued that are not
146 ready to be handled by the next grace period, and thus
147 will be handled by the grace period following the next
148 one.
149
150 "R" Indicates that there are callbacks queued that are
151 ready to be handled by the next grace period.
152
153 "W" Indicates that there are callbacks queued that are
154 waiting on the current grace period.
155
156 "D" Indicates that there are callbacks queued that have
157 already been handled by a prior grace period, and are
158 thus waiting to be invoked. Note that callbacks in
159 the process of being invoked are not counted here.
160 Callbacks in the process of being invoked are those
161 that have been removed from the rcu_data structures
162 queues by rcu_do_batch(), but which have not yet been
163 invoked.
164
165 If there are no callbacks in a given one of the above states,
166 the corresponding character is replaced by ".".
167
90e6ac36 168o "kt" is the per-CPU kernel-thread state. The digit preceding
15ba0ba8 169 the first slash is zero if there is no work pending and 1
5ece5bab
PM
170 otherwise. The character between the first pair of slashes is
171 as follows:
90e6ac36
PM
172
173 "S" The kernel thread is stopped, in other words, all
174 CPUs corresponding to this rcu_node structure are
175 offline.
176
177 "R" The kernel thread is running.
178
179 "W" The kernel thread is waiting because there is no work
180 for it to do.
181
15ba0ba8
PM
182 "O" The kernel thread is waiting because it has been
183 forced off of its designated CPU or because its
184 ->cpus_allowed mask permits it to run on other than
185 its designated CPU.
186
90e6ac36
PM
187 "Y" The kernel thread is yielding to avoid hogging CPU.
188
189 "?" Unknown value, indicates a bug.
190
15ba0ba8
PM
191 The number after the final slash is the CPU that the kthread
192 is actually running on.
193
5ece5bab
PM
194o "ktl" is the low-order 16 bits (in hexadecimal) of the count of
195 the number of times that this CPU's per-CPU kthread has gone
196 through its loop servicing invoke_rcu_cpu_kthread() requests.
197
64db4cff
PM
198o "b" is the batch limit for this CPU. If more than this number
199 of RCU callbacks is ready to invoke, then the remainder will
200 be deferred.
201
269dcc1c
PM
202o "ci" is the number of RCU callbacks that have been invoked for
203 this CPU. Note that ci+ql is the number of callbacks that have
204 been registered in absence of CPU-hotplug activity.
205
206o "co" is the number of RCU callbacks that have been orphaned due to
2d999e03
PM
207 this CPU going offline. These orphaned callbacks have been moved
208 to an arbitrarily chosen online CPU.
269dcc1c
PM
209
210o "ca" is the number of RCU callbacks that have been adopted due to
211 other CPUs going offline. Note that ci+co-ca+ql is the number of
212 RCU callbacks registered on this CPU.
213
6fd9b3a4
PM
214There is also an rcu/rcudata.csv file with the same information in
215comma-separated-variable spreadsheet format.
216
64db4cff
PM
217
218The output of "cat rcu/rcugp" looks as follows:
219
d6714c22 220rcu_sched: completed=33062 gpnum=33063
64db4cff
PM
221rcu_bh: completed=464 gpnum=464
222
bd58b430
PM
223Again, this output is for both "rcu_sched" and "rcu_bh". Note that
224kernels built with CONFIG_TREE_PREEMPT_RCU will have an additional
225"rcu_preempt" line. The fields are taken from the rcu_state structure,
226and are as follows:
64db4cff
PM
227
228o "completed" is the number of grace periods that have completed.
229 It is comparable to the "c" field from rcu/rcudata in that a
230 CPU whose "c" field matches the value of "completed" is aware
231 that the corresponding RCU grace period has completed.
232
233o "gpnum" is the number of grace periods that have started. It is
234 comparable to the "g" field from rcu/rcudata in that a CPU
235 whose "g" field matches the value of "gpnum" is aware that the
236 corresponding RCU grace period has started.
237
238 If these two fields are equal (as they are for "rcu_bh" above),
239 then there is no grace period in progress, in other words, RCU
240 is idle. On the other hand, if the two fields differ (as they
bd58b430 241 do for "rcu_sched" above), then an RCU grace period is in progress.
64db4cff
PM
242
243
244The output of "cat rcu/rcuhier" looks as follows, with very long lines:
245
2d999e03 246c=6902 g=6903 s=2 jfq=3 j=72c7 nfqs=13142/nfqsng=0(13142) fqlh=6
12f5f524
PM
2471/1 ..>. 0:127 ^0
2483/3 ..>. 0:35 ^0 0/0 ..>. 36:71 ^1 0/0 ..>. 72:107 ^2 0/0 ..>. 108:127 ^3
2493/3f ..>. 0:5 ^0 2/3 ..>. 6:11 ^1 0/0 ..>. 12:17 ^2 0/0 ..>. 18:23 ^3 0/0 ..>. 24:29 ^4 0/0 ..>. 30:35 ^5 0/0 ..>. 36:41 ^0 0/0 ..>. 42:47 ^1 0/0 ..>. 48:53 ^2 0/0 ..>. 54:59 ^3 0/0 ..>. 60:65 ^4 0/0 ..>. 66:71 ^5 0/0 ..>. 72:77 ^0 0/0 ..>. 78:83 ^1 0/0 ..>. 84:89 ^2 0/0 ..>. 90:95 ^3 0/0 ..>. 96:101 ^4 0/0 ..>. 102:107 ^5 0/0 ..>. 108:113 ^0 0/0 ..>. 114:119 ^1 0/0 ..>. 120:125 ^2 0/0 ..>. 126:127 ^3
64db4cff 250rcu_bh:
2d999e03 251c=-226 g=-226 s=1 jfq=-5701 j=72c7 nfqs=88/nfqsng=0(88) fqlh=0
12f5f524
PM
2520/1 ..>. 0:127 ^0
2530/3 ..>. 0:35 ^0 0/0 ..>. 36:71 ^1 0/0 ..>. 72:107 ^2 0/0 ..>. 108:127 ^3
2540/3f ..>. 0:5 ^0 0/3 ..>. 6:11 ^1 0/0 ..>. 12:17 ^2 0/0 ..>. 18:23 ^3 0/0 ..>. 24:29 ^4 0/0 ..>. 30:35 ^5 0/0 ..>. 36:41 ^0 0/0 ..>. 42:47 ^1 0/0 ..>. 48:53 ^2 0/0 ..>. 54:59 ^3 0/0 ..>. 60:65 ^4 0/0 ..>. 66:71 ^5 0/0 ..>. 72:77 ^0 0/0 ..>. 78:83 ^1 0/0 ..>. 84:89 ^2 0/0 ..>. 90:95 ^3 0/0 ..>. 96:101 ^4 0/0 ..>. 102:107 ^5 0/0 ..>. 108:113 ^0 0/0 ..>. 114:119 ^1 0/0 ..>. 120:125 ^2 0/0 ..>. 126:127 ^3
64db4cff 255
bd58b430
PM
256This is once again split into "rcu_sched" and "rcu_bh" portions,
257and CONFIG_TREE_PREEMPT_RCU kernels will again have an additional
258"rcu_preempt" section. The fields are as follows:
64db4cff
PM
259
260o "c" is exactly the same as "completed" under rcu/rcugp.
261
262o "g" is exactly the same as "gpnum" under rcu/rcugp.
263
264o "s" is the "signaled" state that drives force_quiescent_state()'s
265 state machine.
266
267o "jfq" is the number of jiffies remaining for this grace period
268 before force_quiescent_state() is invoked to help push things
269dcc1c 269 along. Note that CPUs in dyntick-idle mode throughout the grace
64db4cff
PM
270 period will not report on their own, but rather must be check by
271 some other CPU via force_quiescent_state().
272
273o "j" is the low-order four hex digits of the jiffies counter.
274 Yes, Paul did run into a number of problems that turned out to
275 be due to the jiffies counter no longer counting. Why do you ask?
276
277o "nfqs" is the number of calls to force_quiescent_state() since
278 boot.
279
280o "nfqsng" is the number of useless calls to force_quiescent_state(),
281 where there wasn't actually a grace period active. This can
282 happen due to races. The number in parentheses is the difference
283 between "nfqs" and "nfqsng", or the number of times that
284 force_quiescent_state() actually did some real work.
285
286o "fqlh" is the number of calls to force_quiescent_state() that
287 exited immediately (without even being counted in nfqs above)
288 due to contention on ->fqslock.
289
290o Each element of the form "1/1 0:127 ^0" represents one struct
291 rcu_node. Each line represents one level of the hierarchy, from
292 root to leaves. It is best to think of the rcu_data structures
293 as forming yet another level after the leaves. Note that there
294 might be either one, two, or three levels of rcu_node structures,
295 depending on the relationship between CONFIG_RCU_FANOUT and
296 CONFIG_NR_CPUS.
0edf1a68 297
64db4cff
PM
298 o The numbers separated by the "/" are the qsmask followed
299 by the qsmaskinit. The qsmask will have one bit
300 set for each entity in the next lower level that
301 has not yet checked in for the current grace period.
302 The qsmaskinit will have one bit for each entity that is
303 currently expected to check in during each grace period.
304 The value of qsmaskinit is assigned to that of qsmask
305 at the beginning of each grace period.
306
bd58b430
PM
307 For example, for "rcu_sched", the qsmask of the first
308 entry of the lowest level is 0x14, meaning that we
309 are still waiting for CPUs 2 and 4 to check in for the
310 current grace period.
64db4cff 311
0edf1a68 312 o The characters separated by the ">" indicate the state
12f5f524 313 of the blocked-tasks lists. A "G" preceding the ">"
0edf1a68
PM
314 indicates that at least one task blocked in an RCU
315 read-side critical section blocks the current grace
12f5f524
PM
316 period, while a "E" preceding the ">" indicates that
317 at least one task blocked in an RCU read-side critical
318 section blocks the current expedited grace period.
319 A "T" character following the ">" indicates that at
320 least one task is blocked within an RCU read-side
321 critical section, regardless of whether any current
322 grace period (expedited or normal) is inconvenienced.
323 A "." character appears if the corresponding condition
324 does not hold, so that "..>." indicates that no tasks
325 are blocked. In contrast, "GE>T" indicates maximal
326 inconvenience from blocked tasks.
0edf1a68 327
64db4cff
PM
328 o The numbers separated by the ":" are the range of CPUs
329 served by this struct rcu_node. This can be helpful
330 in working out how the hierarchy is wired together.
331
332 For example, the first entry at the lowest level shows
333 "0:5", indicating that it covers CPUs 0 through 5.
334
335 o The number after the "^" indicates the bit in the
336 next higher level rcu_node structure that this
337 rcu_node structure corresponds to.
338
339 For example, the first entry at the lowest level shows
340 "^0", indicating that it corresponds to bit zero in
341 the first entry at the middle level.
6fd9b3a4
PM
342
343
344The output of "cat rcu/rcu_pending" looks as follows:
345
d6714c22 346rcu_sched:
d21670ac
PM
347 0 np=255892 qsp=53936 rpq=85 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741
348 1 np=261224 qsp=54638 rpq=33 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792
349 2 np=237496 qsp=49664 rpq=23 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629
350 3 np=236249 qsp=48766 rpq=98 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723
351 4 np=221310 qsp=46850 rpq=7 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110
352 5 np=237332 qsp=48449 rpq=9 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456
353 6 np=219995 qsp=46718 rpq=12 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834
354 7 np=249893 qsp=49390 rpq=42 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888
6fd9b3a4 355rcu_bh:
d21670ac
PM
356 0 np=146741 qsp=1419 rpq=6 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314
357 1 np=155792 qsp=12597 rpq=3 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180
358 2 np=136629 qsp=18680 rpq=1 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936
359 3 np=137723 qsp=2843 rpq=0 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863
360 4 np=123110 qsp=12433 rpq=0 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671
361 5 np=137456 qsp=4210 rpq=1 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235
362 6 np=120834 qsp=9902 rpq=2 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921
363 7 np=144888 qsp=26336 rpq=0 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542
6fd9b3a4 364
bd58b430
PM
365As always, this is once again split into "rcu_sched" and "rcu_bh"
366portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional
367"rcu_preempt" section. The fields are as follows:
6fd9b3a4
PM
368
369o "np" is the number of times that __rcu_pending() has been invoked
370 for the corresponding flavor of RCU.
371
372o "qsp" is the number of times that the RCU was waiting for a
373 quiescent state from this CPU.
374
d21670ac
PM
375o "rpq" is the number of times that the CPU had passed through
376 a quiescent state, but not yet reported it to RCU.
377
6fd9b3a4
PM
378o "cbr" is the number of times that this CPU had RCU callbacks
379 that had passed through a grace period, and were thus ready
380 to be invoked.
381
382o "cng" is the number of times that this CPU needed another
383 grace period while RCU was idle.
384
385o "gpc" is the number of times that an old grace period had
386 completed, but this CPU was not yet aware of it.
387
388o "gps" is the number of times that a new grace period had started,
389 but this CPU was not yet aware of it.
390
391o "nf" is the number of times that this CPU suspected that the
392 current grace period had run for too long, and thus needed to
393 be forced.
394
395 Please note that "forcing" consists of sending resched IPIs
396 to holdout CPUs. If that CPU really still is in an old RCU
397 read-side critical section, then we really do have to wait for it.
398 The assumption behing "forcing" is that the CPU is not still in
399 an old RCU read-side critical section, but has not yet responded
400 for some other reason.
401
402o "nn" is the number of times that this CPU needed nothing. Alert
403 readers will note that the rcu "nn" number for a given CPU very
404 closely matches the rcu_bh "np" number for that same CPU. This
405 is due to short-circuit evaluation in rcu_pending().
8e79e1f9
PM
406
407
90e6ac36
PM
408The output of "cat rcu/rcutorture" looks as follows:
409
410rcutorture test sequence: 0 (test in progress)
411rcutorture update version number: 615
412
413The first line shows the number of rcutorture tests that have completed
414since boot. If a test is currently running, the "(test in progress)"
415string will appear as shown above. The second line shows the number of
416update cycles that the current test has started, or zero if there is
417no test in progress.
418
419
420The output of "cat rcu/rcuboost" looks as follows:
421
4220:5 tasks=.... kt=W ntb=0 neb=0 nnb=0 j=2f95 bt=300f
423 balk: nt=0 egt=989 bt=0 nb=0 ny=0 nos=16
4246:7 tasks=.... kt=W ntb=0 neb=0 nnb=0 j=2f95 bt=300f
425 balk: nt=0 egt=225 bt=0 nb=0 ny=0 nos=6
426
427This information is output only for rcu_preempt. Each two-line entry
428corresponds to a leaf rcu_node strcuture. The fields are as follows:
429
430o "n:m" is the CPU-number range for the corresponding two-line
431 entry. In the sample output above, the first entry covers
432 CPUs zero through five and the second entry covers CPUs 6
433 and 7.
434
435o "tasks=TNEB" gives the state of the various segments of the
436 rnp->blocked_tasks list:
437
438 "T" This indicates that there are some tasks that blocked
439 while running on one of the corresponding CPUs while
440 in an RCU read-side critical section.
441
442 "N" This indicates that some of the blocked tasks are preventing
443 the current normal (non-expedited) grace period from
444 completing.
445
446 "E" This indicates that some of the blocked tasks are preventing
447 the current expedited grace period from completing.
448
449 "B" This indicates that some of the blocked tasks are in
450 need of RCU priority boosting.
451
452 Each character is replaced with "." if the corresponding
453 condition does not hold.
454
455o "kt" is the state of the RCU priority-boosting kernel
456 thread associated with the corresponding rcu_node structure.
457 The state can be one of the following:
458
459 "S" The kernel thread is stopped, in other words, all
460 CPUs corresponding to this rcu_node structure are
461 offline.
462
463 "R" The kernel thread is running.
464
465 "W" The kernel thread is waiting because there is no work
466 for it to do.
467
468 "Y" The kernel thread is yielding to avoid hogging CPU.
469
470 "?" Unknown value, indicates a bug.
471
472o "ntb" is the number of tasks boosted.
473
474o "neb" is the number of tasks boosted in order to complete an
475 expedited grace period.
476
477o "nnb" is the number of tasks boosted in order to complete a
478 normal (non-expedited) grace period. When boosting a task
479 that was blocking both an expedited and a normal grace period,
480 it is counted against the expedited total above.
481
482o "j" is the low-order 16 bits of the jiffies counter in
483 hexadecimal.
484
485o "bt" is the low-order 16 bits of the value that the jiffies
486 counter will have when we next start boosting, assuming that
487 the current grace period does not end beforehand. This is
488 also in hexadecimal.
489
490o "balk: nt" counts the number of times we didn't boost (in
491 other words, we balked) even though it was time to boost because
492 there were no blocked tasks to boost. This situation occurs
493 when there is one blocked task on one rcu_node structure and
494 none on some other rcu_node structure.
495
496o "egt" counts the number of times we balked because although
497 there were blocked tasks, none of them were blocking the
498 current grace period, whether expedited or otherwise.
499
500o "bt" counts the number of times we balked because boosting
501 had already been initiated for the current grace period.
502
503o "nb" counts the number of times we balked because there
504 was at least one task blocking the current non-expedited grace
505 period that never had blocked. If it is already running, it
506 just won't help to boost its priority!
507
508o "ny" counts the number of times we balked because it was
509 not yet time to start boosting.
510
511o "nos" counts the number of times we balked for other
512 reasons, e.g., the grace period ended first.
513
514
8e79e1f9
PM
515CONFIG_TINY_RCU and CONFIG_TINY_PREEMPT_RCU debugfs Files and Formats
516
517These implementations of RCU provides a single debugfs file under the
518top-level directory RCU, namely rcu/rcudata, which displays fields in
519rcu_bh_ctrlblk, rcu_sched_ctrlblk and, for CONFIG_TINY_PREEMPT_RCU,
520rcu_preempt_ctrlblk.
521
522The output of "cat rcu/rcudata" is as follows:
523
524rcu_preempt: qlen=24 gp=1097669 g197/p197/c197 tasks=...
525 ttb=. btg=no ntb=184 neb=0 nnb=183 j=01f7 bt=0274
526 normal balk: nt=1097669 gt=0 bt=371 b=0 ny=25073378 nos=0
527 exp balk: bt=0 nos=0
528rcu_sched: qlen: 0
529rcu_bh: qlen: 0
530
531This is split into rcu_preempt, rcu_sched, and rcu_bh sections, with the
532rcu_preempt section appearing only in CONFIG_TINY_PREEMPT_RCU builds.
533The last three lines of the rcu_preempt section appear only in
534CONFIG_RCU_BOOST kernel builds. The fields are as follows:
535
536o "qlen" is the number of RCU callbacks currently waiting either
537 for an RCU grace period or waiting to be invoked. This is the
538 only field present for rcu_sched and rcu_bh, due to the
539 short-circuiting of grace period in those two cases.
540
541o "gp" is the number of grace periods that have completed.
542
543o "g197/p197/c197" displays the grace-period state, with the
544 "g" number being the number of grace periods that have started
545 (mod 256), the "p" number being the number of grace periods
546 that the CPU has responded to (also mod 256), and the "c"
547 number being the number of grace periods that have completed
548 (once again mode 256).
549
550 Why have both "gp" and "g"? Because the data flowing into
551 "gp" is only present in a CONFIG_RCU_TRACE kernel.
552
553o "tasks" is a set of bits. The first bit is "T" if there are
554 currently tasks that have recently blocked within an RCU
555 read-side critical section, the second bit is "N" if any of the
556 aforementioned tasks are blocking the current RCU grace period,
557 and the third bit is "E" if any of the aforementioned tasks are
558 blocking the current expedited grace period. Each bit is "."
559 if the corresponding condition does not hold.
560
561o "ttb" is a single bit. It is "B" if any of the blocked tasks
562 need to be priority boosted and "." otherwise.
563
564o "btg" indicates whether boosting has been carried out during
565 the current grace period, with "exp" indicating that boosting
566 is in progress for an expedited grace period, "no" indicating
567 that boosting has not yet started for a normal grace period,
568 "begun" indicating that boosting has bebug for a normal grace
569 period, and "done" indicating that boosting has completed for
570 a normal grace period.
571
572o "ntb" is the total number of tasks subjected to RCU priority boosting
573 periods since boot.
574
575o "neb" is the number of expedited grace periods that have had
576 to resort to RCU priority boosting since boot.
577
578o "nnb" is the number of normal grace periods that have had
579 to resort to RCU priority boosting since boot.
580
90e6ac36 581o "j" is the low-order 16 bits of the jiffies counter in hexadecimal.
8e79e1f9 582
90e6ac36 583o "bt" is the low-order 16 bits of the value that the jiffies counter
8e79e1f9
PM
584 will have at the next time that boosting is scheduled to begin.
585
586o In the line beginning with "normal balk", the fields are as follows:
587
588 o "nt" is the number of times that the system balked from
589 boosting because there were no blocked tasks to boost.
590 Note that the system will balk from boosting even if the
591 grace period is overdue when the currently running task
592 is looping within an RCU read-side critical section.
593 There is no point in boosting in this case, because
594 boosting a running task won't make it run any faster.
595
596 o "gt" is the number of times that the system balked
597 from boosting because, although there were blocked tasks,
598 none of them were preventing the current grace period
599 from completing.
600
601 o "bt" is the number of times that the system balked
602 from boosting because boosting was already in progress.
603
604 o "b" is the number of times that the system balked from
605 boosting because boosting had already completed for
606 the grace period in question.
607
608 o "ny" is the number of times that the system balked from
609 boosting because it was not yet time to start boosting
610 the grace period in question.
611
612 o "nos" is the number of times that the system balked from
613 boosting for inexplicable ("not otherwise specified")
614 reasons. This can actually happen due to races involving
615 increments of the jiffies counter.
616
617o In the line beginning with "exp balk", the fields are as follows:
618
619 o "bt" is the number of times that the system balked from
620 boosting because there were no blocked tasks to boost.
621
622 o "nos" is the number of times that the system balked from
623 boosting for inexplicable ("not otherwise specified")
624 reasons.