diff options
author | Bar David <Bar.David@dell.com> | 2021-06-17 15:39:58 +0300 |
---|---|---|
committer | Bar David <bardavvid@gmail.com> | 2021-07-15 08:55:15 +0300 |
commit | 0d71aa983a4dce75a088b3a4831d5b217df066fb (patch) | |
tree | 81b63a555ce0f7353067e3ad0e040fe80c9ac894 /DEDUPE-TODO | |
parent | 77c72e0f504364adf6a0e8f1155fdf3fd68ef248 (diff) | |
download | fio-0d71aa983a4dce75a088b3a4831d5b217df066fb.tar.gz fio-0d71aa983a4dce75a088b3a4831d5b217df066fb.tar.bz2 |
dedupe: allow to generate dedupe buffers from working set
This commit introduced new dedupe generation mode "working_set".
Working set mode simulates a more realistic approach to deduped data,
in which deduped buffers are generated from pre-existing working set -
% size of the device or file.
In other words, dedupe is not usually expected to be close
in time with the source buffer, as well as source buffers
are usually composed of small subset of the entire file or device.
Signed-off-by: Bar David <bardavvid@gmail.com>
Diffstat (limited to 'DEDUPE-TODO')
-rw-r--r-- | DEDUPE-TODO | 19 |
1 files changed, 19 insertions, 0 deletions
diff --git a/DEDUPE-TODO b/DEDUPE-TODO new file mode 100644 index 00000000..1f3ee9da --- /dev/null +++ b/DEDUPE-TODO @@ -0,0 +1,19 @@ +- Mixed buffers of dedupe-able and compressible data. + Major usecase in performance benchmarking of storage subsystems. + +- Shifted dedup-able data. + Allow for dedup buffer generation to shift contents by random number + of sectors (fill the gaps with uncompressible data). Some storage + subsystems modernized the deduplication detection algorithms to look + for shifted data as well. For example, some databases push a timestamp + on the prefix of written blocks, which makes the underlying data + dedup-able in different alignment. FIO should be able to simulate such + workload. + +- Generation of similar data (but not exact). + A rising trend in enterprise storage systems. + Generation of "similar" data means random uncompressible buffers + that differ by few(configurable number of) bits from each other. + The storage subsystem usually identifies the similar buffers using + locality-sensitive hashing or other methods. + |