block, documentation: Document discard_zeroes_data, fua, max_discard_segments and...
[linux-2.6-block.git] / Documentation / block / queue-sysfs.txt
CommitLineData
cbb5901b
JA
1Queue sysfs files
2=================
3
4This text file will detail the queue files that are located in the sysfs tree
5for each block device. Note that stacked devices typically do not export
6any settings, since their queue merely functions are a remapping target.
7These files are the ones found in the /sys/block/xxx/queue/ directory.
8
9Files denoted with a RO postfix are readonly and the RW postfix means
10read-write.
11
4004e90c
NJ
12add_random (RW)
13----------------
db4ced14 14This file allows to turn off the disk entropy contribution. Default
4004e90c
NJ
15value of this file is '1'(on).
16
6728ac33
BVA
17chunk_sectors (RO)
18------------------
19This has different meaning depending on the type of the block device.
20For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
21of the RAID volume stripe segment. For a zoned block device, either host-aware
22or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
23of the device, with the eventual exception of the last zone of the device which
24may be smaller.
25
005411ea
JL
26dax (RO)
27--------
28This file indicates whether the device supports Direct Access (DAX),
29used by CPU-addressable storage to bypass the pagecache. It shows '1'
30if true, '0' if not.
31
4004e90c
NJ
32discard_granularity (RO)
33-----------------------
34This shows the size of internal allocation of the device in bytes, if
35reported by the device. A value of '0' means device does not support
36the discard functionality.
37
0034af03 38discard_max_hw_bytes (RO)
4004e90c
NJ
39----------------------
40Devices that support discard functionality may have internal limits on
41the number of bytes that can be trimmed or unmapped in a single operation.
42The discard_max_bytes parameter is set by the device driver to the maximum
43number of bytes that can be discarded in a single operation. Discard
44requests issued to the device must not exceed this limit. A discard_max_bytes
45value of 0 means that the device does not support discard functionality.
46
0034af03
JA
47discard_max_bytes (RW)
48----------------------
49While discard_max_hw_bytes is the hardware limit for the device, this
50setting is the software limit. Some devices exhibit large latencies when
51large discards are issued, setting this value lower will make Linux issue
52smaller discards and potentially help reduce latencies induced by large
53discard operations.
54
fbbe7c86
BVA
55discard_zeroes_data (RO)
56------------------------
57Obsolete. Always zero.
58
59fua (RO)
60--------
61Whether or not the block driver supports the FUA flag for write requests.
62FUA stands for Force Unit Access. If the FUA flag is set that means that
63write requests must bypass the volatile cache of the storage device.
64
cbb5901b
JA
65hw_sector_size (RO)
66-------------------
67This is the hardware sector size of the device, in bytes.
68
005411ea
JL
69io_poll (RW)
70------------
7158339d
JM
71When read, this file shows whether polling is enabled (1) or disabled
72(0). Writing '0' to this file will disable polling for this device.
73Writing any non-zero value will enable this feature.
005411ea 74
10e6246e
JA
75io_poll_delay (RW)
76------------------
77If polling is enabled, this controls what kind of polling will be
78performed. It defaults to -1, which is classic polling. In this mode,
79the CPU will repeatedly ask for completions without giving up any time.
80If set to 0, a hybrid polling mode is used, where the kernel will attempt
81to make an educated guess at when the IO will complete. Based on this
82guess, the kernel will put the process issuing IO to sleep for an amount
83of time, before entering a classic poll loop. This mode might be a
84little slower than pure classic polling, but it will be more efficient.
85If set to a value larger than 0, the kernel will put the process issuing
f9824952 86IO to sleep for this amount of microseconds before entering classic
10e6246e
JA
87polling.
88
bb351aba
WZ
89io_timeout (RW)
90---------------
91io_timeout is the request timeout in milliseconds. If a request does not
92complete in this time then the block driver timeout handler is invoked.
93That timeout handler can decide to retry the request, to fail it or to start
94a device recovery strategy.
95
4004e90c
NJ
96iostats (RW)
97-------------
98This file is used to control (on/off) the iostats accounting of the
99disk.
100
101logical_block_size (RO)
102-----------------------
141fd28c 103This is the logical block size of the device, in bytes.
4004e90c 104
fbbe7c86
BVA
105max_discard_segments (RO)
106-------------------------
107The maximum number of DMA scatter/gather entries in a discard request.
108
cbb5901b
JA
109max_hw_sectors_kb (RO)
110----------------------
111This is the maximum number of kilobytes supported in a single data transfer.
112
4004e90c
NJ
113max_integrity_segments (RO)
114---------------------------
0c766e78
BVA
115Maximum number of elements in a DMA scatter/gather list with integrity
116data that will be submitted by the block layer core to the associated
117block driver.
4004e90c 118
cbb5901b
JA
119max_sectors_kb (RW)
120-------------------
121This is the maximum number of kilobytes that the block layer will allow
122for a filesystem request. Must be smaller than or equal to the maximum
123size allowed by the hardware.
124
4004e90c
NJ
125max_segments (RO)
126-----------------
0c766e78
BVA
127Maximum number of elements in a DMA scatter/gather list that is submitted
128to the associated block driver.
4004e90c
NJ
129
130max_segment_size (RO)
131---------------------
0c766e78 132Maximum size in bytes of a single element in a DMA scatter/gather list.
4004e90c
NJ
133
134minimum_io_size (RO)
135--------------------
db4ced14 136This is the smallest preferred IO size reported by the device.
4004e90c 137
cbb5901b
JA
138nomerges (RW)
139-------------
488991e2
AB
140This enables the user to disable the lookup logic involved with IO
141merging requests in the block layer. By default (0) all merges are
142enabled. When set to 1 only simple one-hit merges will be tried. When
143set to 2 no merge algorithms will be tried (including one-hit or more
144complex tree/hash lookups).
cbb5901b
JA
145
146nr_requests (RW)
147----------------
148This controls how many requests may be allocated in the block layer for
149read or write requests. Note that the total allocated number may be twice
150this amount, since it applies only to reads or writes (not the accumulated
151sum).
152
a051661c
TH
153To avoid priority inversion through request starvation, a request
154queue maintains a separate request pool per each cgroup when
155CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such
156per-block-cgroup request pool. IOW, if there are N block cgroups,
f884ab15 157each request queue may have up to N request pools, each independently
a051661c
TH
158regulated by nr_requests.
159
6728ac33
BVA
160nr_zones (RO)
161-------------
162For zoned block devices (zoned attribute indicating "host-managed" or
163"host-aware"), this indicates the total number of zones of the device.
164This is always 0 for regular block devices.
165
4004e90c
NJ
166optimal_io_size (RO)
167--------------------
db4ced14 168This is the optimal IO size reported by the device.
4004e90c
NJ
169
170physical_block_size (RO)
171------------------------
172This is the physical block size of device, in bytes.
173
cbb5901b
JA
174read_ahead_kb (RW)
175------------------
176Maximum number of kilobytes to read-ahead for filesystems on this block
177device.
178
4004e90c
NJ
179rotational (RW)
180---------------
181This file is used to stat if the device is of rotational type or
182non-rotational type.
183
cbb5901b
JA
184rq_affinity (RW)
185----------------
5757a6d7
DW
186If this option is '1', the block layer will migrate request completions to the
187cpu "group" that originally submitted the request. For some workloads this
188provides a significant reduction in CPU cycles due to caching effects.
189
190For storage configurations that need to maximize distribution of completion
191processing setting this option to '2' forces the completion to run on the
192requesting cpu (bypassing the "group" aggregation logic).
cbb5901b
JA
193
194scheduler (RW)
195--------------
196When read, this file will display the current and available IO schedulers
197for this block device. The currently active IO scheduler will be enclosed
198in [] brackets. Writing an IO scheduler name to this file will switch
199control of this block device to that new IO scheduler. Note that writing
200an IO scheduler name to this file will attempt to load that IO scheduler
201module, if it isn't already present in the system.
202
93e9d8e8
JA
203write_cache (RW)
204----------------
205When read, this file will display whether the device has write back
206caching enabled or not. It will return "write back" for the former
207case, and "write through" for the latter. Writing to this file can
208change the kernels view of the device, but it doesn't alter the
209device state. This means that it might not be safe to toggle the
210setting from "write back" to "write through", since that will also
211eliminate cache flushes issued by the kernel.
cbb5901b 212
005411ea
JL
213write_same_max_bytes (RO)
214-------------------------
215This is the number of bytes the device can write in a single write-same
216command. A value of '0' means write-same is not supported by this
217device.
218
152c7776
BVA
219wbt_lat_usec (RW)
220-----------------
87760e5e
JA
221If the device is registered for writeback throttling, then this file shows
222the target minimum read latency. If this latency is exceeded in a given
223window of time (see wb_window_usec), then the writeback throttling will start
80e091d1
JA
224scaling back writes. Writing a value of '0' to this file disables the
225feature. Writing a value of '-1' to this file resets the value to the
226default setting.
87760e5e 227
297e3d85
SL
228throttle_sample_time (RW)
229-------------------------
230This is the time window that blk-throttle samples data, in millisecond.
231blk-throttle makes decision based on the samplings. Lower time means cgroups
232have more smooth throughput, but higher CPU overhead. This exists only when
233CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
cbb5901b 234
fbbe7c86
BVA
235write_zeroes_max_bytes (RO)
236---------------------------
237For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
238bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
239is not supported.
240
f9824952
DLM
241zoned (RO)
242----------
243This indicates if the device is a zoned block device and the zone model of the
244device if it is indeed zoned. The possible values indicated by zoned are
245"none" for regular block devices and "host-aware" or "host-managed" for zoned
246block devices. The characteristics of host-aware and host-managed zoned block
247devices are described in the ZBC (Zoned Block Commands) and ZAC
248(Zoned Device ATA Command Set) standards. These standards also define the
249"drive-managed" zone model. However, since drive-managed zoned block devices
250do not support zone commands, they will be treated as regular block devices
251and zoned will report "none".
252
cbb5901b 253Jens Axboe <jens.axboe@oracle.com>, February 2009