The memory order of the `store` to `weak` in `is_unique` is over-strict #149376
I expand the code of using `is_unique` and `downgrade` as something like the following, which models that there are three `Arc` instances.
use std::sync::atomic::{AtomicUsize, Ordering};
use std::thread;
fn main() {
let strong = AtomicUsize::new(3);
let weak = AtomicUsize::new(1);
thread::scope(|s| {
s.spawn(|| { // t1
if weak
.compare_exchange(1, usize::MAX, Ordering::Acquire, Ordering::Relaxed)
.is_ok()
{
let unique = strong.load(Ordering::Acquire) == 1; // #0
weak.store(1, Ordering::Relaxed); // #1
assert!(unique == false);
}
});
s.spawn(|| { // t2
let mut val = weak.load(Ordering::Relaxed);
loop {
if val == usize::MAX {
val = weak.load(Ordering::Relaxed);
continue;
}
if weak
.compare_exchange(val, val + 1, Ordering::Acquire, Ordering::Relaxed)
.is_ok()
{
break;
}
}
strong.fetch_sub(1, Ordering::Release); // #2
});
s.spawn(|| { // t3
let mut val = weak.load(Ordering::Relaxed);
loop {
if val == usize::MAX {
val = weak.load(Ordering::Relaxed);
continue;
}
if weak
.compare_exchange(val, val + 1, Ordering::Acquire, Ordering::Relaxed)
.is_ok()
{
break;
}
}
strong.fetch_sub(1, Ordering::Release); // #3
});
});
}
Merely change `weak.store(1, Ordering::Release);`(in the original implementation) to `weak.store(1, Ordering::Relaxed);`
Assuming the `CAS` operation in `t1` succeeds, that means the loops in `t2` and `t3` cannot exit except that one thereof reads `#1` and the other reads the value written by the RMW operation that reads `#1`. The question is whether `#0` can read `1` and the assertion fails. Assume that `#0` reads `1`, because the initial value of `strong` is `3`, this implies that `#2` and `#3` both happen before `#0`, which in turn implies that the load part of `weak` in `t2` and `t3` both happen before `#1`. According to [intro.races] p13
> If a value computation A of an atomic object M happens before an operation B that modifies M, then A takes its value from a side effect X on M, where X precedes B in the modification order of M.
`#1` is not visible to the load part of `weak` in `t2` and `t3`, so the loop in `t2` and `t3` cannot exit; this means, `#2` and `#3` cannot be reached by the corresponding control flows in their respective threads. This contradicts the assumption that `#0` reads `1` because `#2` and `#3` won't be executed in the lifetime of the program; otherwise, it would violate [intro.races] p10
> The value of an atomic object M, as determined by evaluation B, is the value stored by some unspecified side effect A that modifies M, where B does not happen before A.
More generally speaking, any thread claiming that its RMW operation on `strong` would be read by `#0` is impossible, because its `CAS` operation on `weak` does not expect `MAX`, and `#1` is not visible to the load part, so its loop cannot exit. And no other RMW operation on `weak` that increases count can precede it; otherwise, the modification order of `weak` is invalid
1 < MAX < weak_store_1 < RMW_weak_a < ... < RMW_weak_current < ... < weak_store_1
So, `#0` cannot read `1` and `2`; however, it can only read the initial value `3`. Is my analysis right? If the target is, as described in the comment of the source code, to check whether the thread holds the unique strong, I think the memory order used for the case is over-strict, and the memory order of the store to `weak` that writes `1` can at least be `relaxed`, IIUC.