engines/io_uring: relax CQ head atomic store ordering
fio_ioring_getevents() advances the io_uring CQ head index in
fio_ioring_cqring_reap() before fio_ioring_event() is called to read the
CQEs. In general this would allow the kernel to reuse the CQE slot
prematurely, but the CQ is sized large enough for the maximum iodepth
and a new io_uring operation isn't submitted until the CQE is processed.
Add a comment to explain why it's safe to advance the CQ head index
early. Use relaxed ordering for the store, as there aren't any accesses
to the CQEs that need to be ordered before the store.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>