gettime: fix cpuclock-test on AMD platforms
authorVincent Fu <vincentfu@gmail.com>
Tue, 27 Feb 2024 15:26:00 +0000 (10:26 -0500)
committerVincent Fu <vincent.fu@samsung.com>
Tue, 27 Feb 2024 17:36:45 +0000 (12:36 -0500)
Starting with gcc 11 __sync_synchronize() compiles to

lock or QWORD PTR [rsp], 0

on x86_64 platforms. Previously it compiled to an mfence instruction.

See line 47 of https://godbolt.org/z/xfE18K7b4 for an example.

On Intel platforms this change does not affect the result of fio's CPU
clock test. But on AMD platforms, this change causes fio's CPU clock
test to fail and fio to fall back to clock_gettime() instead of using
the CPU clock for timing.

This patch has fio explicitly use an mfence instruction instead of
__sync_synchornize() in the CPU clock test code on x86_64 platforms in
order to allow the CPU clock test to pass on AMD platforms.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20240227155856.5012-1-vincent.fu@samsung.com
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
arch/arch-x86_64.h
arch/arch.h
gettime.c

index 86ce1b7ed7dd0f56de77575bcbf2d2ba99e85901..b402dc6df39dbc4c7d677747571b6bb487a9c628 100644 (file)
@@ -26,6 +26,11 @@ static inline unsigned long arch_ffz(unsigned long bitmask)
        return bitmask;
 }
 
+static inline void tsc_barrier(void)
+{
+       __asm__ __volatile__("mfence":::"memory");
+}
+
 static inline unsigned long long get_cpu_clock(void)
 {
        unsigned int lo, hi;
index 3ee9b0538dbc62a4e047771413b5b73db5a6d03b..7e294ddfb888c73baceded2ad0bcb0c3d4b1d2ad 100644 (file)
@@ -108,6 +108,13 @@ extern unsigned long arch_flags;
 #include "arch-generic.h"
 #endif
 
+#if !defined(__x86_64__) && defined(CONFIG_SYNC_SYNC)
+static inline void tsc_barrier(void)
+{
+       __sync_synchronize();
+}
+#endif
+
 #include "../lib/ffz.h"
 /* IWYU pragma: end_exports */
 
index bc66a3ac9f908cb8512afdc73c9c240dd1ade8e3..5ca3120633dd409c669da255dd7e422258b3021d 100644 (file)
--- a/gettime.c
+++ b/gettime.c
@@ -623,7 +623,7 @@ static void *clock_thread_fn(void *data)
                        seq = *t->seq;
                        if (seq == UINT_MAX)
                                break;
-                       __sync_synchronize();
+                       tsc_barrier();
                        tsc = get_cpu_clock();
                } while (seq != atomic32_compare_and_swap(t->seq, seq, seq + 1));