summaryrefslogtreecommitdiff
path: root/gettime.c
diff options
context:
space:
mode:
authorJens Axboe <axboe@kernel.dk>2012-12-18 08:23:44 +0100
committerJens Axboe <axboe@kernel.dk>2012-12-18 08:23:44 +0100
commit1d69e8acd7d338822bfe9630e7cdceee159c7bd2 (patch)
treef8883dfac15d6f19e435602b5db58907402d968f /gettime.c
parent23893646c9306b785f866fb9e976f3db673af0ca (diff)
downloadfio-1d69e8acd7d338822bfe9630e7cdceee159c7bd2.tar.gz
fio-1d69e8acd7d338822bfe9630e7cdceee159c7bd2.tar.bz2
gettime: fix race/bug with threads and time keeping
Sam writes: First, the 1 second latency events come in batches and those batches occur suspiciously close to a usec wrap (0.999999 us -> 1.000000 us). Second, if you subtract exactly 1 second from these outlier latencies, the remaining amount is very close to what our instrumented low level driver records for the IO latency and consistent with the expected latencies of our SSD. Similarly, the tv_usec portion of the timeval structs shows increasing values. See snippet below. Format is like start: <start_time.tv_sec>.<start_time.tv_usec> latency: 1004657 us, lba: 1111289192, start: 1355776806.995294 issue: 1355776806.995312 complete: 1355776807.999969 latency: 1000494 us, lba: 203093568, start: 1355776806.999456 issue: 1355776806.999475 complete: 1355776807.999969 latency: 1000404 us, lba: 551350992, start: 1355776806.999546 issue: 1355776806.999565 complete: 1355776807.999969 latency: 1000477 us, lba: 449672928, start: 1355776806.999484 issue: 1355776806.999492 complete: 1355776807.999969 All this pointed to the time collection code being buggy. Reviewing the code, I spotted this in fio_gettime(): /* * If Linux is using the tsc clock on non-synced processors, * sometimes time can appear to drift backwards. Fix that up. */ if (last_tv_valid) { if (tp->tv_sec < last_tv.tv_sec) tp->tv_sec = last_tv.tv_sec; else if (last_tv.tv_sec == tp->tv_sec && tp->tv_usec < last_tv.tv_usec) tp->tv_usec = last_tv.tv_usec; } last_tv_valid = 1; memcpy(&last_tv, tp, sizeof(*tp)); This does not appear to be multi-thread safe. Pre-emption can occur between either comparison and the subsequent update. Commenting it out makes the problem go away (at the expense of being subject to drift). How about making last_tv & last_tv_valid thread-local? Reported-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Diffstat (limited to 'gettime.c')
-rw-r--r--gettime.c4
1 files changed, 2 insertions, 2 deletions
diff --git a/gettime.c b/gettime.c
index f5be6bd0..1a7af377 100644
--- a/gettime.c
+++ b/gettime.c
@@ -18,8 +18,8 @@ static unsigned long cycles_per_usec;
static unsigned long last_cycles;
int tsc_reliable = 0;
#endif
-static struct timeval last_tv;
-static int last_tv_valid;
+static __thread struct timeval last_tv;
+static __thread int last_tv_valid;
enum fio_cs fio_clock_source = FIO_PREFERRED_CLOCK_SOURCE;
int fio_clock_source_set = 0;