-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] Enable EGPRs in JIT by adding REX2 encoding to the backend. #106557
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update comments. Merge the REX2 changes into the original legacy emit path bug fix: Set REX2.W with correct mask code. register encoding and prefix emitting logics. Add REX2 prefix emit logic bug fixes Add Stress mode for REX2 encoding and some bug fixes resolve comments: 1. add assertion check for UD opcodes. 2. add checks for EGPRs. Add REX2 to emitOutputAM, and let LEA to be REX2 compatible. Add REX2.X encoding for SIB byte But fixes: add REX2 prefix on the path in RI where MOV is specially handled. Enable REX2 encoding for `movups` fixed bugs in REX2 prefix emitting logic when working with map 1 instructions, and enabled REX2 for POPCNT legacy map index-er bug fixes some clean-up Adding initial APX unit testing path. Adding a coredistools dll that has LLVM APX disasm capability. It must be coppied into a CORE_ROOT manually. clean up work for REX2 narrow the REX2 scope to `sub` only some clean up based on the comments. bug fix resolve comment
- SV path is mostly for debugging purposes Added encoding unit tests for instructions with immediates
Code refactoring: AddX86PrefixIfNeeded.
… missing in JIT, may indicate these instructions are not being used in JIT, drop them for now.
…lled before adding any prefix.
Refactor REX2 encoding stress logics.
(this will have side effect that the estimated code will go up and mismatch with actual code size.)
dotnet-issue-labeler
bot
added
the
area-CodeGen-coreclr
CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
label
Aug 16, 2024
dotnet-policy-service
bot
added
the
community-contribution
Indicates that the PR has been added by a community member
label
Aug 16, 2024
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
BruceForstall
added
the
apx
Related to the Intel Advanced Performance Extensions (APX)
label
Sep 5, 2024
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
apx
Related to the Intel Advanced Performance Extensions (APX)
area-CodeGen-coreclr
CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
community-contribution
Indicates that the PR has been added by a community member
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR is the follow-up PR after #104637, which added the initial CPUID and XSAVE updates for APX.
This PR adds REX2 encoding functionality for legacy instructions which enables the use of EGPR for
add
,sub
, etc. Note that this PR focuses on REX2 encoding only: a follow up PR will enable EGPR support via the register allocator.Specification
REX2 is a 2-byte prefix with a leading byte of
0xD5
, detailed format below:Similar to REX prefix, it provides the extended bits for the MODRM.REG field, REX2.R4/R3, and MODRM.R/M field, REX2.B4/B3, and the index register in SIB byte, REX2.X4/X3, those bits will act as the higher 5th/4th bits and combine with the field in MODRM and SIB byte as a 5-bit binary to access up to 32 registers.
REX2 prefix is generally available for legacy-map-0 and legacy-map-1 instructions, say 1-byte opcode or 2-byte opcode with escape byte 0x0F, with some exceptions.
Like VEX/EVEX, REX2 is considered as the last prefix before the main opcode, so it can not co-exist with REX/VEX/EVEX.
Design
The bulk of the changes occur in the backend emitter.
As there is no existing hardware that has APX support yet, we had some hacks to bypass the CPUID checks. In this PR,
DOTNET_JitStressRex2Encoding
will force all the eligible instructions to be encoded in REX2, regardless the presence of EGPRs in the operand. We had another switchDOTNET_JitBypassAPXCheck
, with which will only bypass the APX CPUID check but JIT will encode REX2 only if needed, this is more useful when the LSRA changes come.Testing
We followed a multi-step testing plan to verify the encoding correctness and the semantic correctness.
Testing results will be presented below.
1. Emitter unit tests
In
codgenxarch.cpp
, similar togenAmd64EmitterUnitTestsSse2
, we used theJitLateDisasm
feature to insert instructions to encode as unit tests for emitter, andLateDisasm
will invoke LLVM to disasm the code stream, this gave us the chance to cross validate the disassembly from JIT and LLVM. The output of this step is to verify the emit paths are generating "correct" code that would not trigger #UD or have wrong semantics.Note that we are using a custom
coredistools.dll
which uses a recent LLVM that supports APX decoding.2. SuperPMI
In this step, we would run the SuperPMI pipeline to get the asmdiffs with REX2 on and off, the inputs are all the MCH files. This step will give us the chance to check if there is any assertion failure or internal error within JIT and since the pipeline will invoke
coredistools.dll
as well, so we can verify the encoding correctness in a larger scope.To ensure the new changes will not hit the existing code path in terms of throughput, we ran tpdiff with base JIT to be the main branch where changes are based on, and diff JIT to be the one with all the REX2 changes.
3. JIT unit tests
The 2 steps mentioned above are mainly verifying the encoding correctness of the generated binary code. Then the last will examine the semantic correctness of the generated code, say since we are simply forcing all the compatible instructions to be encoded in REX2, so the original semantics should not change, so we expect exactly the same output with REX2 on/off.
We used the existing CoreCLR unit test set:
JIT
and run it in the Intel SDE emulator.Follow-up plans
This PR is only intended to provide the REX2 encoding functionality to the JIT backend, in terms of how to properly use it, we are preparing another PR that includes the updates on LSRA such that JIT will be able to allocate EGPRs only when needed, and generate optimal code.