io_uring: Avoid needless update of completion queue head pointer
I'm seeing a slowdown in io_uring performance on a POWER9 box when
the userspace and kernel polling threads are on two cores that
share an L2 cache.
fio_ioring_cqring_reap() always stores to the completion queue head
pointer, even if nothing was reaped and the value hasn't changed.
Changing this to only update the head pointer when it changes results
in a 95% improvement in performance on this particular test.
Signed-off-by: Anton Blanchard <anton@ozlabs.org>