[linux-block.git] / Documentation / atomic_t.txt


On atomic types (atomic_t atomic64_t and atomic_long_t).

The atomic type provides an interface to the architecture's means of atomic
RMW operations between CPUs (atomic operations on MMIO are not supported and
can lead to fatal traps on some platforms).

API
---

The 'full' API consists of (atomic64_ and atomic_long_ prefixes omitted for
brevity):

Non-RMW ops:

  atomic_read(), atomic_set()
  atomic_read_acquire(), atomic_set_release()


RMW atomic operations:

Arithmetic:

  atomic_{add,sub,inc,dec}()
  atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
  atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()


Bitwise:

  atomic_{and,or,xor,andnot}()
  atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()


Swap:

  atomic_xchg{,_relaxed,_acquire,_release}()
  atomic_cmpxchg{,_relaxed,_acquire,_release}()
  atomic_try_cmpxchg{,_relaxed,_acquire,_release}()


Reference count (but please see refcount_t):

  atomic_add_unless(), atomic_inc_not_zero()
  atomic_sub_and_test(), atomic_dec_and_test()


Misc:

  atomic_inc_and_test(), atomic_add_negative()
  atomic_dec_unless_positive(), atomic_inc_unless_negative()


Barriers:

  smp_mb__{before,after}_atomic()


TYPES (signed vs unsigned)
-----

While atomic_t, atomic_long_t and atomic64_t use int, long and s64
respectively (for hysterical raisins), the kernel uses -fno-strict-overflow
(which implies -fwrapv) and defines signed overflow to behave like
2s-complement.

Therefore, an explicitly unsigned variant of the atomic ops is strictly
unnecessary and we can simply cast, there is no UB.

There was a bug in UBSAN prior to GCC-8 that would generate UB warnings for
signed types.

With this we also conform to the C/C++ _Atomic behaviour and things like
P1236R1.


SEMANTICS
---------

Non-RMW ops:

The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
smp_store_release() respectively.

The one detail to this is that atomic_set{}() should be observable to the RMW
ops. That is:

  C atomic-set

  {
    atomic_set(v, 1);
  }

  P1(atomic_t *v)
  {
    atomic_add_unless(v, 1, 0);
  }

  P2(atomic_t *v)
  {
    atomic_set(v, 0);
  }

  exists
  (v=2)

In this case we would expect the atomic_set() from CPU1 to either happen
before the atomic_add_unless(), in which case that latter one would no-op, or
_after_ in which case we'd overwrite its result. In no case is "2" a valid
outcome.

This is typically true on 'normal' platforms, where a regular competing STORE
will invalidate a LL/SC or fail a CMPXCHG.

The obvious case where this is not so is when we need to implement atomic ops
with a lock:

  CPU0						CPU1

  atomic_add_unless(v, 1, 0);
    lock();
    ret = READ_ONCE(v->counter); // == 1
						atomic_set(v, 0);
    if (ret != u)				  WRITE_ONCE(v->counter, 0);
      WRITE_ONCE(v->counter, ret + 1);
    unlock();

the typical solution is to then implement atomic_set{}() with atomic_xchg().


RMW ops:

These come in various forms:

 - plain operations without return value: atomic_{}()

 - operations which return the modified value: atomic_{}_return()

   these are limited to the arithmetic operations because those are
   reversible. Bitops are irreversible and therefore the modified value
   is of dubious utility.

 - operations which return the original value: atomic_fetch_{}()

 - swap operations: xchg(), cmpxchg() and try_cmpxchg()

 - misc; the special purpose operations that are commonly used and would,
   given the interface, normally be implemented using (try_)cmpxchg loops but
   are time critical and can, (typically) on LL/SC architectures, be more
   efficiently implemented.

All these operations are SMP atomic; that is, the operations (for a single
atomic variable) can be fully ordered and no intermediate state is lost or
visible.


ORDERING  (go read memory-barriers.txt first)
--------

The rule of thumb:

 - non-RMW operations are unordered;

 - RMW operations that have no return value are unordered;

 - RMW operations that have a return value are fully ordered;

 - RMW operations that are conditional are unordered on FAILURE,
   otherwise the above rules apply.

Except of course when an operation has an explicit ordering like:

 {}_relaxed: unordered
 {}_acquire: the R of the RMW (or atomic_read) is an ACQUIRE
 {}_release: the W of the RMW (or atomic_set)  is a  RELEASE

Where 'unordered' is against other memory locations. Address dependencies are
not defeated.

Fully ordered primitives are ordered against everything prior and everything
subsequent. Therefore a fully ordered primitive is like having an smp_mb()
before and an smp_mb() after the primitive.


The barriers:

  smp_mb__{before,after}_atomic()

only apply to the RMW ops and can be used to augment/upgrade the ordering
inherent to the used atomic op. These barriers provide a full smp_mb().

These helper barriers exist because architectures have varying implicit
ordering on their SMP atomic primitives. For example our TSO architectures
provide full ordered atomics and these barriers are no-ops.

Thus:

  atomic_fetch_add();

is equivalent to:

  smp_mb__before_atomic();
  atomic_fetch_add_relaxed();
  smp_mb__after_atomic();

However the atomic_fetch_add() might be implemented more efficiently.

Further, while something like:

  smp_mb__before_atomic();
  atomic_dec(&X);

is a 'typical' RELEASE pattern, the barrier is strictly stronger than
a RELEASE. Similarly for something like:

  atomic_inc(&X);
  smp_mb__after_atomic();

is an ACQUIRE pattern (though very much not typical), but again the barrier is
strictly stronger than ACQUIRE. As illustrated:

  C strong-acquire

  {
  }

  P1(int *x, atomic_t *y)
  {
    r0 = READ_ONCE(*x);
    smp_rmb();
    r1 = atomic_read(y);
  }

  P2(int *x, atomic_t *y)
  {
    atomic_inc(y);
    smp_mb__after_atomic();
    WRITE_ONCE(*x, 1);
  }

  exists
  (r0=1 /\ r1=0)

This should not happen; but a hypothetical atomic_inc_acquire() --
(void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
since then:

  P1			P2

			t = LL.acq *y (0)
			t++;
			*x = 1;
  r0 = *x (1)
  RMB
  r1 = *y (0)
			SC *y, t;

is allowed.
Commit	Line	Data
706eeb3e PZ	1
	2	On atomic types (atomic_t atomic64_t and atomic_long_t).
	3
	4	The atomic type provides an interface to the architecture's means of atomic
	5	RMW operations between CPUs (atomic operations on MMIO are not supported and
	6	can lead to fatal traps on some platforms).
	7
	8	API
	9	---
	10
	11	The 'full' API consists of (atomic64_ and atomic_long_ prefixes omitted for
	12	brevity):
	13
	14	Non-RMW ops:
	15
	16	atomic_read(), atomic_set()
	17	atomic_read_acquire(), atomic_set_release()
	18
	19
	20	RMW atomic operations:
	21
	22	Arithmetic:
	23
	24	atomic_{add,sub,inc,dec}()
	25	atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
	26	atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()
	27
	28
	29	Bitwise:
	30
	31	atomic_{and,or,xor,andnot}()
	32	atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()
	33
	34
	35	Swap:
	36
	37	atomic_xchg{,_relaxed,_acquire,_release}()
	38	atomic_cmpxchg{,_relaxed,_acquire,_release}()
	39	atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
	40
	41
	42	Reference count (but please see refcount_t):
	43
	44	atomic_add_unless(), atomic_inc_not_zero()
	45	atomic_sub_and_test(), atomic_dec_and_test()
	46
	47
	48	Misc:
	49
	50	atomic_inc_and_test(), atomic_add_negative()
	51	atomic_dec_unless_positive(), atomic_inc_unless_negative()
	52
	53
	54	Barriers:
	55
	56	smp_mb__{before,after}_atomic()
	57
	58
f1887143 PZ	59	TYPES (signed vs unsigned)
	60	-----
	61
	62	While atomic_t, atomic_long_t and atomic64_t use int, long and s64
	63	respectively (for hysterical raisins), the kernel uses -fno-strict-overflow
	64	(which implies -fwrapv) and defines signed overflow to behave like
	65	2s-complement.
	66
	67	Therefore, an explicitly unsigned variant of the atomic ops is strictly
	68	unnecessary and we can simply cast, there is no UB.
	69
	70	There was a bug in UBSAN prior to GCC-8 that would generate UB warnings for
	71	signed types.
	72
	73	With this we also conform to the C/C++ _Atomic behaviour and things like
	74	P1236R1.
	75
706eeb3e PZ	76
	77	SEMANTICS
	78	---------
	79
	80	Non-RMW ops:
	81
	82	The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
	83	implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
	84	smp_store_release() respectively.
	85
	86	The one detail to this is that atomic_set{}() should be observable to the RMW
	87	ops. That is:
	88
	89	C atomic-set
	90
	91	{
	92	atomic_set(v, 1);
	93	}
	94
	95	P1(atomic_t *v)
	96	{
	97	atomic_add_unless(v, 1, 0);
	98	}
	99
	100	P2(atomic_t *v)
	101	{
	102	atomic_set(v, 0);
	103	}
	104
	105	exists
	106	(v=2)
	107
	108	In this case we would expect the atomic_set() from CPU1 to either happen
	109	before the atomic_add_unless(), in which case that latter one would no-op, or
	110	_after_ in which case we'd overwrite its result. In no case is "2" a valid
	111	outcome.
	112
	113	This is typically true on 'normal' platforms, where a regular competing STORE
	114	will invalidate a LL/SC or fail a CMPXCHG.
	115
	116	The obvious case where this is not so is when we need to implement atomic ops
	117	with a lock:
	118
	119	CPU0 CPU1
	120
	121	atomic_add_unless(v, 1, 0);
	122	lock();
	123	ret = READ_ONCE(v->counter); // == 1
	124	atomic_set(v, 0);
	125	if (ret != u) WRITE_ONCE(v->counter, 0);
	126	WRITE_ONCE(v->counter, ret + 1);
	127	unlock();
	128
	129	the typical solution is to then implement atomic_set{}() with atomic_xchg().
	130
	131
	132	RMW ops:
	133
	134	These come in various forms:
	135
	136	- plain operations without return value: atomic_{}()
	137
	138	- operations which return the modified value: atomic_{}_return()
	139
140	these are limited to the arithmetic operations because those are
141	reversible. Bitops are irreversible and therefore the modified value
142	is of dubious utility.
143
144	- operations which return the original value: atomic_fetch_{}()
145
146	- swap operations: xchg(), cmpxchg() and try_cmpxchg()
147
148	- misc; the special purpose operations that are commonly used and would,
149	given the interface, normally be implemented using (try_)cmpxchg loops but
150	are time critical and can, (typically) on LL/SC architectures, be more
151	efficiently implemented.
152
153	All these operations are SMP atomic; that is, the operations (for a single
154	atomic variable) can be fully ordered and no intermediate state is lost or
155	visible.
156
157
158	ORDERING (go read memory-barriers.txt first)
159	--------
160
161	The rule of thumb:
162
163	- non-RMW operations are unordered;
164
165	- RMW operations that have no return value are unordered;
166
167	- RMW operations that have a return value are fully ordered;
168
169	- RMW operations that are conditional are unordered on FAILURE,
170	otherwise the above rules apply.
171
172	Except of course when an operation has an explicit ordering like:
173
174	{}_relaxed: unordered
175	{}_acquire: the R of the RMW (or atomic_read) is an ACQUIRE
176	{}_release: the W of the RMW (or atomic_set) is a RELEASE
177
178	Where 'unordered' is against other memory locations. Address dependencies are
179	not defeated.
180
181	Fully ordered primitives are ordered against everything prior and everything
182	subsequent. Therefore a fully ordered primitive is like having an smp_mb()
183	before and an smp_mb() after the primitive.
184
185
186	The barriers:
187
188	smp_mb__{before,after}_atomic()
189
190	only apply to the RMW ops and can be used to augment/upgrade the ordering
191	inherent to the used atomic op. These barriers provide a full smp_mb().
192
193	These helper barriers exist because architectures have varying implicit
194	ordering on their SMP atomic primitives. For example our TSO architectures
195	provide full ordered atomics and these barriers are no-ops.
196
197	Thus:
198
199	atomic_fetch_add();
200
201	is equivalent to:
202
203	smp_mb__before_atomic();
204	atomic_fetch_add_relaxed();
205	smp_mb__after_atomic();
206
207	However the atomic_fetch_add() might be implemented more efficiently.
208
209	Further, while something like:
210
211	smp_mb__before_atomic();
212	atomic_dec(&X);
213
214	is a 'typical' RELEASE pattern, the barrier is strictly stronger than
215	a RELEASE. Similarly for something like:
216
ca110694 PZ	217	atomic_inc(&X);
	218	smp_mb__after_atomic();
	219
	220	is an ACQUIRE pattern (though very much not typical), but again the barrier is
	221	strictly stronger than ACQUIRE. As illustrated:
	222
	223	C strong-acquire
	224
	225	{
	226	}
	227
	228	P1(int x, atomic_t y)
	229	{
	230	r0 = READ_ONCE(*x);
	231	smp_rmb();
	232	r1 = atomic_read(y);
	233	}
	234
	235	P2(int x, atomic_t y)
	236	{
	237	atomic_inc(y);
	238	smp_mb__after_atomic();
	239	WRITE_ONCE(*x, 1);
	240	}
	241
	242	exists
	243	(r0=1 /\ r1=0)
	244
	245	This should not happen; but a hypothetical atomic_inc_acquire() --
	246	(void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
	247	since then:
	248
	249	P1 P2
	250
	251	t = LL.acq *y (0)
	252	t++;
	253	*x = 1;
	254	r0 = *x (1)
	255	RMB
	256	r1 = *y (0)
	257	SC *y, t;
706eeb3e	258
ca110694	259	is allowed.