Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the C/C++ compiler guaranteed to use the newer atomic instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR should add an explicit flag for clang to use arm8.1 or e.g.
mcpu=apple-m1
Currently, it relies on my observations that by default Clang targets >Arm 8.0 on M1 but if Apple decides to change the default internally we might end up in a situation where these compiler intrinsics will be lowered to 8.0 and without the memory barrier = potential non-reproduceable race conditions somewhere in the vmThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does passing in
mcpu=apple-m1
guarantee that the compiler is only ever use the new instructions?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM maps
apple-m1
toARMV8_5A
as seen in https:/llvm/llvm-project/blob/5ba0a9571b3ee3bc76f65e16549012a440d5a0fb/llvm/include/llvm/Support/AArch64TargetParser.def#L256-L257. However, I think the concern is valid and the full proof way to address it is to check explicitly the way it is done for windows counterpart in #70921. I am working on PR that will add similar check for linux-arm64 (reason stated in #70921 (comment)), so it should take care of these things for osx as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think inline asm solves all problems here (might be tricky with templates)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we can write a small test that validates that the intrinsic is lowered into LSE 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think you can reliably test for this. For example, you may see the old instruction only when there is a certain addressing mode needed or only when the code is cold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it use casal in debug? Can it switch to old LL/SC helper because of register pressure or if the old implementation is one day found faster (it could be).
It feels like inline asm could have more reliable guarantees.