path: root/DEDUPE-TODO
diff options
authorBar David <>2021-06-17 15:39:58 +0300
committerBar David <>2021-07-15 08:55:15 +0300
commit0d71aa983a4dce75a088b3a4831d5b217df066fb (patch)
tree81b63a555ce0f7353067e3ad0e040fe80c9ac894 /DEDUPE-TODO
parent77c72e0f504364adf6a0e8f1155fdf3fd68ef248 (diff)
dedupe: allow to generate dedupe buffers from working set
This commit introduced new dedupe generation mode "working_set". Working set mode simulates a more realistic approach to deduped data, in which deduped buffers are generated from pre-existing working set - % size of the device or file. In other words, dedupe is not usually expected to be close in time with the source buffer, as well as source buffers are usually composed of small subset of the entire file or device. Signed-off-by: Bar David <>
Diffstat (limited to 'DEDUPE-TODO')
1 files changed, 19 insertions, 0 deletions
new file mode 100644
index 00000000..1f3ee9da
--- /dev/null
@@ -0,0 +1,19 @@
+- Mixed buffers of dedupe-able and compressible data.
+ Major usecase in performance benchmarking of storage subsystems.
+- Shifted dedup-able data.
+ Allow for dedup buffer generation to shift contents by random number
+ of sectors (fill the gaps with uncompressible data). Some storage
+ subsystems modernized the deduplication detection algorithms to look
+ for shifted data as well. For example, some databases push a timestamp
+ on the prefix of written blocks, which makes the underlying data
+ dedup-able in different alignment. FIO should be able to simulate such
+ workload.
+- Generation of similar data (but not exact).
+ A rising trend in enterprise storage systems.
+ Generation of "similar" data means random uncompressible buffers
+ that differ by few(configurable number of) bits from each other.
+ The storage subsystem usually identifies the similar buffers using
+ locality-sensitive hashing or other methods.