Commit | Line | Data |
---|---|---|
898bd37a | 1 | ================= |
cbb5901b JA |
2 | Queue sysfs files |
3 | ================= | |
4 | ||
5 | This text file will detail the queue files that are located in the sysfs tree | |
6 | for each block device. Note that stacked devices typically do not export | |
7 | any settings, since their queue merely functions are a remapping target. | |
8 | These files are the ones found in the /sys/block/xxx/queue/ directory. | |
9 | ||
10 | Files denoted with a RO postfix are readonly and the RW postfix means | |
11 | read-write. | |
12 | ||
4004e90c | 13 | add_random (RW) |
898bd37a | 14 | --------------- |
db4ced14 | 15 | This file allows to turn off the disk entropy contribution. Default |
4004e90c NJ |
16 | value of this file is '1'(on). |
17 | ||
6728ac33 BVA |
18 | chunk_sectors (RO) |
19 | ------------------ | |
20 | This has different meaning depending on the type of the block device. | |
21 | For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors | |
22 | of the RAID volume stripe segment. For a zoned block device, either host-aware | |
23 | or host-managed, chunk_sectors indicates the size in 512B sectors of the zones | |
24 | of the device, with the eventual exception of the last zone of the device which | |
25 | may be smaller. | |
26 | ||
005411ea JL |
27 | dax (RO) |
28 | -------- | |
29 | This file indicates whether the device supports Direct Access (DAX), | |
30 | used by CPU-addressable storage to bypass the pagecache. It shows '1' | |
31 | if true, '0' if not. | |
32 | ||
4004e90c | 33 | discard_granularity (RO) |
898bd37a | 34 | ------------------------ |
4004e90c NJ |
35 | This shows the size of internal allocation of the device in bytes, if |
36 | reported by the device. A value of '0' means device does not support | |
37 | the discard functionality. | |
38 | ||
0034af03 | 39 | discard_max_hw_bytes (RO) |
898bd37a | 40 | ------------------------- |
4004e90c NJ |
41 | Devices that support discard functionality may have internal limits on |
42 | the number of bytes that can be trimmed or unmapped in a single operation. | |
43 | The discard_max_bytes parameter is set by the device driver to the maximum | |
44 | number of bytes that can be discarded in a single operation. Discard | |
45 | requests issued to the device must not exceed this limit. A discard_max_bytes | |
46 | value of 0 means that the device does not support discard functionality. | |
47 | ||
0034af03 JA |
48 | discard_max_bytes (RW) |
49 | ---------------------- | |
50 | While discard_max_hw_bytes is the hardware limit for the device, this | |
51 | setting is the software limit. Some devices exhibit large latencies when | |
52 | large discards are issued, setting this value lower will make Linux issue | |
53 | smaller discards and potentially help reduce latencies induced by large | |
54 | discard operations. | |
55 | ||
fbbe7c86 BVA |
56 | discard_zeroes_data (RO) |
57 | ------------------------ | |
58 | Obsolete. Always zero. | |
59 | ||
60 | fua (RO) | |
61 | -------- | |
62 | Whether or not the block driver supports the FUA flag for write requests. | |
63 | FUA stands for Force Unit Access. If the FUA flag is set that means that | |
64 | write requests must bypass the volatile cache of the storage device. | |
65 | ||
cbb5901b JA |
66 | hw_sector_size (RO) |
67 | ------------------- | |
68 | This is the hardware sector size of the device, in bytes. | |
69 | ||
005411ea JL |
70 | io_poll (RW) |
71 | ------------ | |
7158339d JM |
72 | When read, this file shows whether polling is enabled (1) or disabled |
73 | (0). Writing '0' to this file will disable polling for this device. | |
74 | Writing any non-zero value will enable this feature. | |
005411ea | 75 | |
10e6246e JA |
76 | io_poll_delay (RW) |
77 | ------------------ | |
78 | If polling is enabled, this controls what kind of polling will be | |
79 | performed. It defaults to -1, which is classic polling. In this mode, | |
80 | the CPU will repeatedly ask for completions without giving up any time. | |
81 | If set to 0, a hybrid polling mode is used, where the kernel will attempt | |
82 | to make an educated guess at when the IO will complete. Based on this | |
83 | guess, the kernel will put the process issuing IO to sleep for an amount | |
84 | of time, before entering a classic poll loop. This mode might be a | |
85 | little slower than pure classic polling, but it will be more efficient. | |
86 | If set to a value larger than 0, the kernel will put the process issuing | |
f9824952 | 87 | IO to sleep for this amount of microseconds before entering classic |
10e6246e JA |
88 | polling. |
89 | ||
bb351aba WZ |
90 | io_timeout (RW) |
91 | --------------- | |
92 | io_timeout is the request timeout in milliseconds. If a request does not | |
93 | complete in this time then the block driver timeout handler is invoked. | |
94 | That timeout handler can decide to retry the request, to fail it or to start | |
95 | a device recovery strategy. | |
96 | ||
4004e90c NJ |
97 | iostats (RW) |
98 | ------------- | |
99 | This file is used to control (on/off) the iostats accounting of the | |
100 | disk. | |
101 | ||
102 | logical_block_size (RO) | |
103 | ----------------------- | |
141fd28c | 104 | This is the logical block size of the device, in bytes. |
4004e90c | 105 | |
fbbe7c86 BVA |
106 | max_discard_segments (RO) |
107 | ------------------------- | |
108 | The maximum number of DMA scatter/gather entries in a discard request. | |
109 | ||
cbb5901b JA |
110 | max_hw_sectors_kb (RO) |
111 | ---------------------- | |
112 | This is the maximum number of kilobytes supported in a single data transfer. | |
113 | ||
4004e90c NJ |
114 | max_integrity_segments (RO) |
115 | --------------------------- | |
0c766e78 BVA |
116 | Maximum number of elements in a DMA scatter/gather list with integrity |
117 | data that will be submitted by the block layer core to the associated | |
118 | block driver. | |
4004e90c | 119 | |
cbb5901b JA |
120 | max_sectors_kb (RW) |
121 | ------------------- | |
122 | This is the maximum number of kilobytes that the block layer will allow | |
123 | for a filesystem request. Must be smaller than or equal to the maximum | |
124 | size allowed by the hardware. | |
125 | ||
4004e90c NJ |
126 | max_segments (RO) |
127 | ----------------- | |
0c766e78 BVA |
128 | Maximum number of elements in a DMA scatter/gather list that is submitted |
129 | to the associated block driver. | |
4004e90c NJ |
130 | |
131 | max_segment_size (RO) | |
132 | --------------------- | |
0c766e78 | 133 | Maximum size in bytes of a single element in a DMA scatter/gather list. |
4004e90c NJ |
134 | |
135 | minimum_io_size (RO) | |
136 | -------------------- | |
db4ced14 | 137 | This is the smallest preferred IO size reported by the device. |
4004e90c | 138 | |
cbb5901b JA |
139 | nomerges (RW) |
140 | ------------- | |
488991e2 AB |
141 | This enables the user to disable the lookup logic involved with IO |
142 | merging requests in the block layer. By default (0) all merges are | |
143 | enabled. When set to 1 only simple one-hit merges will be tried. When | |
144 | set to 2 no merge algorithms will be tried (including one-hit or more | |
145 | complex tree/hash lookups). | |
cbb5901b JA |
146 | |
147 | nr_requests (RW) | |
148 | ---------------- | |
149 | This controls how many requests may be allocated in the block layer for | |
150 | read or write requests. Note that the total allocated number may be twice | |
151 | this amount, since it applies only to reads or writes (not the accumulated | |
152 | sum). | |
153 | ||
a051661c TH |
154 | To avoid priority inversion through request starvation, a request |
155 | queue maintains a separate request pool per each cgroup when | |
156 | CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such | |
157 | per-block-cgroup request pool. IOW, if there are N block cgroups, | |
f884ab15 | 158 | each request queue may have up to N request pools, each independently |
a051661c TH |
159 | regulated by nr_requests. |
160 | ||
6728ac33 BVA |
161 | nr_zones (RO) |
162 | ------------- | |
163 | For zoned block devices (zoned attribute indicating "host-managed" or | |
164 | "host-aware"), this indicates the total number of zones of the device. | |
165 | This is always 0 for regular block devices. | |
166 | ||
4004e90c NJ |
167 | optimal_io_size (RO) |
168 | -------------------- | |
db4ced14 | 169 | This is the optimal IO size reported by the device. |
4004e90c NJ |
170 | |
171 | physical_block_size (RO) | |
172 | ------------------------ | |
173 | This is the physical block size of device, in bytes. | |
174 | ||
cbb5901b JA |
175 | read_ahead_kb (RW) |
176 | ------------------ | |
177 | Maximum number of kilobytes to read-ahead for filesystems on this block | |
178 | device. | |
179 | ||
4004e90c NJ |
180 | rotational (RW) |
181 | --------------- | |
182 | This file is used to stat if the device is of rotational type or | |
183 | non-rotational type. | |
184 | ||
cbb5901b JA |
185 | rq_affinity (RW) |
186 | ---------------- | |
5757a6d7 DW |
187 | If this option is '1', the block layer will migrate request completions to the |
188 | cpu "group" that originally submitted the request. For some workloads this | |
189 | provides a significant reduction in CPU cycles due to caching effects. | |
190 | ||
191 | For storage configurations that need to maximize distribution of completion | |
192 | processing setting this option to '2' forces the completion to run on the | |
193 | requesting cpu (bypassing the "group" aggregation logic). | |
cbb5901b JA |
194 | |
195 | scheduler (RW) | |
196 | -------------- | |
197 | When read, this file will display the current and available IO schedulers | |
198 | for this block device. The currently active IO scheduler will be enclosed | |
199 | in [] brackets. Writing an IO scheduler name to this file will switch | |
200 | control of this block device to that new IO scheduler. Note that writing | |
201 | an IO scheduler name to this file will attempt to load that IO scheduler | |
202 | module, if it isn't already present in the system. | |
203 | ||
93e9d8e8 JA |
204 | write_cache (RW) |
205 | ---------------- | |
206 | When read, this file will display whether the device has write back | |
207 | caching enabled or not. It will return "write back" for the former | |
208 | case, and "write through" for the latter. Writing to this file can | |
209 | change the kernels view of the device, but it doesn't alter the | |
210 | device state. This means that it might not be safe to toggle the | |
211 | setting from "write back" to "write through", since that will also | |
212 | eliminate cache flushes issued by the kernel. | |
cbb5901b | 213 | |
005411ea JL |
214 | write_same_max_bytes (RO) |
215 | ------------------------- | |
216 | This is the number of bytes the device can write in a single write-same | |
217 | command. A value of '0' means write-same is not supported by this | |
218 | device. | |
219 | ||
152c7776 BVA |
220 | wbt_lat_usec (RW) |
221 | ----------------- | |
87760e5e JA |
222 | If the device is registered for writeback throttling, then this file shows |
223 | the target minimum read latency. If this latency is exceeded in a given | |
224 | window of time (see wb_window_usec), then the writeback throttling will start | |
80e091d1 JA |
225 | scaling back writes. Writing a value of '0' to this file disables the |
226 | feature. Writing a value of '-1' to this file resets the value to the | |
227 | default setting. | |
87760e5e | 228 | |
297e3d85 SL |
229 | throttle_sample_time (RW) |
230 | ------------------------- | |
231 | This is the time window that blk-throttle samples data, in millisecond. | |
232 | blk-throttle makes decision based on the samplings. Lower time means cgroups | |
233 | have more smooth throughput, but higher CPU overhead. This exists only when | |
234 | CONFIG_BLK_DEV_THROTTLING_LOW is enabled. | |
cbb5901b | 235 | |
fbbe7c86 BVA |
236 | write_zeroes_max_bytes (RO) |
237 | --------------------------- | |
238 | For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of | |
239 | bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES | |
240 | is not supported. | |
241 | ||
f9824952 DLM |
242 | zoned (RO) |
243 | ---------- | |
244 | This indicates if the device is a zoned block device and the zone model of the | |
245 | device if it is indeed zoned. The possible values indicated by zoned are | |
246 | "none" for regular block devices and "host-aware" or "host-managed" for zoned | |
247 | block devices. The characteristics of host-aware and host-managed zoned block | |
248 | devices are described in the ZBC (Zoned Block Commands) and ZAC | |
249 | (Zoned Device ATA Command Set) standards. These standards also define the | |
250 | "drive-managed" zone model. However, since drive-managed zoned block devices | |
251 | do not support zone commands, they will be treated as regular block devices | |
252 | and zoned will report "none". | |
253 | ||
cbb5901b | 254 | Jens Axboe <jens.axboe@oracle.com>, February 2009 |