summaryrefslogtreecommitdiff
path: root/DEDUPE-TODO
diff options
context:
space:
mode:
Diffstat (limited to 'DEDUPE-TODO')
-rw-r--r--DEDUPE-TODO19
1 files changed, 19 insertions, 0 deletions
diff --git a/DEDUPE-TODO b/DEDUPE-TODO
new file mode 100644
index 00000000..1f3ee9da
--- /dev/null
+++ b/DEDUPE-TODO
@@ -0,0 +1,19 @@
+- Mixed buffers of dedupe-able and compressible data.
+ Major usecase in performance benchmarking of storage subsystems.
+
+- Shifted dedup-able data.
+ Allow for dedup buffer generation to shift contents by random number
+ of sectors (fill the gaps with uncompressible data). Some storage
+ subsystems modernized the deduplication detection algorithms to look
+ for shifted data as well. For example, some databases push a timestamp
+ on the prefix of written blocks, which makes the underlying data
+ dedup-able in different alignment. FIO should be able to simulate such
+ workload.
+
+- Generation of similar data (but not exact).
+ A rising trend in enterprise storage systems.
+ Generation of "similar" data means random uncompressible buffers
+ that differ by few(configurable number of) bits from each other.
+ The storage subsystem usually identifies the similar buffers using
+ locality-sensitive hashing or other methods.
+