[Sorry about the slow response.]
Post by Ivan MaidanskiHello Hans,
I agree about cas_full to be defined as acq_rel | acquire to avoid
performance regression on replacement asm with C11 atomics.
Post by Ivan MaidanskiBut the documentation should be refined and the whole code base should be
reviewed to avoid (or minimize) inconsistencies (e.g. the behavior should
not depend on AO_PREFER_GENERALIZED presence/absence).
Post by Ivan MaidanskiAnd we definitely do not want stability issues in apps using old AO
releases after update to the upcoming one (except for AArch64 which already
uses C11 atomics some of them might have too strict memory ordering
mapping).
Actually I agree that Aarch64 is a problem. We should probably change the
documentation to make it clear that for _acquire and _read the ordering is
between the load and subsequent operations, except in the nop case, where
it orders all prior and all subsequent operations. And similarly for
_release and _write. The Aarch64 implementation already weakens it in this
way.
The alternative is to not use Aarch64 acquire/release operations, and use
DMB instead. But that plays poorly with code that does this right.
Post by Ivan MaidanskiFor operations that don't include a read operation, we seem to map
_full to something like a release (if there's a write) plus a trailing
seq_cst fence.
Post by Ivan MaidanskiI see. Please correct the mapping below.
* read | acquire
* write | release
Yes. In hindsight, we should never have used read and write.
Post by Ivan Maidanski* full | acq_rel or seq_cst? should it match __sync_synchronize?
It should just be seq_cst, which I hope finally matches sync_synchronize.
Nobody should be using the latter.
Post by Ivan Maidanski* none | relax
* acq | acquire
Yes.
Post by Ivan Maidanski* read = load + nop_read
Or better yet, acquire. We should be strengthening the read and write
stuff to the more general acquire/release versions. It almost never makes a
difference and is much less error-prone.
Post by Ivan Maidanski* full = load + nop_full or load + nop_read?
I think I would just map it to seq_cst and clarify the spec.
Post by Ivan Maidanski* anything else worth to map directly to C11 atomics (among existing ao orders)?
I would map dd_acquire_read to acquire. Ideally it should map to consume,
but that's currently broken, and will map to acquire anyway.
Post by Ivan Maidanski* none | relax
* rel | release
* write = nop_write + none
I would again just use release.
Post by Ivan Maidanski* full = release + nop_full or nop_read + release?
* anything else?
* none | relax
* acq | acquire
* rel | release
Yes.
Post by Ivan Maidanski* full | acq_rel or seq_cst? (note that nop_full =
test_and_set_full(&dummy))
I would make it seq_cst. It probably doesn't really matter. We should
explicitly define nop_full to be atomic_thread_fence(memory_order_seq_cst).
test_and_set_full(&dummy)) actually doesn't do the right thing on ARMv8,
where it needs to be "dmb ish".
Post by Ivan Maidanski* anything else?
fetch_and_add: same as test_and_set?
I think so.
Post by Ivan Maidanskiand, or, xor: same as fetch_and_add?
I would change the documentation of _full to
_full: The associated operation is ordered with respect to both earlier and
later memory ops.
If the associated operation is nop, then this orders all earlier
memory operations with respect to subsequent ones.
AO_store_full or AO_nop_full are the normal ways to force a store
to be ordered with respect to a later load.