Commit | Line | Data |
---|---|---|
09338fb0 | 1 | ======== |
48dba8ab | 2 | CPU load |
09338fb0 | 3 | ======== |
48dba8ab | 4 | |
09338fb0 MCC |
5 | Linux exports various bits of information via ``/proc/stat`` and |
6 | ``/proc/uptime`` that userland tools, such as top(1), use to calculate | |
7 | the average time system spent in a particular state, for example:: | |
48dba8ab VK |
8 | |
9 | $ iostat | |
10 | Linux 2.6.18.3-exp (linmac) 02/20/2007 | |
11 | ||
12 | avg-cpu: %user %nice %system %iowait %steal %idle | |
13 | 10.01 0.00 2.92 5.44 0.00 81.63 | |
14 | ||
15 | ... | |
16 | ||
17 | Here the system thinks that over the default sampling period the | |
18 | system spent 10.01% of the time doing work in user space, 2.92% in the | |
19 | kernel, and was overall 81.63% of the time idle. | |
20 | ||
09338fb0 | 21 | In most cases the ``/proc/stat`` information reflects the reality quite |
48dba8ab VK |
22 | closely, however due to the nature of how/when the kernel collects |
23 | this data sometimes it can not be trusted at all. | |
24 | ||
25 | So how is this information collected? Whenever timer interrupt is | |
26 | signalled the kernel looks what kind of task was running at this | |
27 | moment and increments the counter that corresponds to this tasks | |
28 | kind/state. The problem with this is that the system could have | |
29 | switched between various states multiple times between two timer | |
30 | interrupts yet the counter is incremented only for the last state. | |
31 | ||
32 | ||
33 | Example | |
34 | ------- | |
35 | ||
36 | If we imagine the system with one task that periodically burns cycles | |
09338fb0 | 37 | in the following manner:: |
48dba8ab | 38 | |
09338fb0 MCC |
39 | time line between two timer interrupts |
40 | |--------------------------------------| | |
41 | ^ ^ | |
42 | |_ something begins working | | |
43 | |_ something goes to sleep | |
44 | (only to be awaken quite soon) | |
48dba8ab VK |
45 | |
46 | In the above situation the system will be 0% loaded according to the | |
09338fb0 | 47 | ``/proc/stat`` (since the timer interrupt will always happen when the |
48dba8ab VK |
48 | system is executing the idle handler), but in reality the load is |
49 | closer to 99%. | |
50 | ||
51 | One can imagine many more situations where this behavior of the kernel | |
09338fb0 MCC |
52 | will lead to quite erratic information inside ``/proc/stat``:: |
53 | ||
54 | ||
55 | /* gcc -o hog smallhog.c */ | |
56 | #include <time.h> | |
57 | #include <limits.h> | |
58 | #include <signal.h> | |
59 | #include <sys/time.h> | |
60 | #define HIST 10 | |
61 | ||
62 | static volatile sig_atomic_t stop; | |
63 | ||
64 | static void sighandler (int signr) | |
65 | { | |
66 | (void) signr; | |
67 | stop = 1; | |
68 | } | |
69 | static unsigned long hog (unsigned long niters) | |
70 | { | |
71 | stop = 0; | |
72 | while (!stop && --niters); | |
73 | return niters; | |
74 | } | |
75 | int main (void) | |
76 | { | |
77 | int i; | |
78 | struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 }, | |
79 | .it_value = { .tv_sec = 0, .tv_usec = 1 } }; | |
80 | sigset_t set; | |
81 | unsigned long v[HIST]; | |
82 | double tmp = 0.0; | |
83 | unsigned long n; | |
84 | signal (SIGALRM, &sighandler); | |
85 | setitimer (ITIMER_REAL, &it, NULL); | |
86 | ||
87 | hog (ULONG_MAX); | |
88 | for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX); | |
89 | for (i = 0; i < HIST; ++i) tmp += v[i]; | |
90 | tmp /= HIST; | |
91 | n = tmp - (tmp / 3.0); | |
92 | ||
93 | sigemptyset (&set); | |
94 | sigaddset (&set, SIGALRM); | |
95 | ||
96 | for (;;) { | |
97 | hog (n); | |
98 | sigwait (&set, &i); | |
99 | } | |
100 | return 0; | |
101 | } | |
48dba8ab VK |
102 | |
103 | ||
104 | References | |
105 | ---------- | |
106 | ||
09338fb0 MCC |
107 | - http://lkml.org/lkml/2007/2/12/6 |
108 | - Documentation/filesystems/proc.txt (1.8) | |
48dba8ab VK |
109 | |
110 | ||
111 | Thanks | |
112 | ------ | |
113 | ||
114 | Con Kolivas, Pavel Machek |