diff options
Diffstat (limited to 'DEDUPE-TODO')
-rw-r--r-- | DEDUPE-TODO | 19 |
1 files changed, 19 insertions, 0 deletions
diff --git a/DEDUPE-TODO b/DEDUPE-TODO new file mode 100644 index 00000000..1f3ee9da --- /dev/null +++ b/DEDUPE-TODO @@ -0,0 +1,19 @@ +- Mixed buffers of dedupe-able and compressible data. + Major usecase in performance benchmarking of storage subsystems. + +- Shifted dedup-able data. + Allow for dedup buffer generation to shift contents by random number + of sectors (fill the gaps with uncompressible data). Some storage + subsystems modernized the deduplication detection algorithms to look + for shifted data as well. For example, some databases push a timestamp + on the prefix of written blocks, which makes the underlying data + dedup-able in different alignment. FIO should be able to simulate such + workload. + +- Generation of similar data (but not exact). + A rising trend in enterprise storage systems. + Generation of "similar" data means random uncompressible buffers + that differ by few(configurable number of) bits from each other. + The storage subsystem usually identifies the similar buffers using + locality-sensitive hashing or other methods. + |