Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable AVX512 Additional 16 SIMD Registers #79544

Merged
merged 36 commits into from
Feb 8, 2023

Conversation

anthonycanino
Copy link
Contributor

@anthonycanino anthonycanino commented Dec 12, 2022

PR enables AVX512 additional 16 SIMD registers for use with the register allocator and codegen emitter. The PR largely consists of the following

  1. Extend xarch regMask type from uint32 to uint64 to make room for additional registers.
  2. Add MM16-MM31 to the register definitions (targetamd.h).
  3. Adjust the RBM macros to refer to globally defined variables (float register availability, calleer/callee save register sets etc.) that can be initialized at runtime (so as to add the upper SIMD registers to the definitions if AVX512 is available). Previously, the bank of registers was entirely determined by the architecture, whereas it is now determined by arch and ISA availability).
  4. Add a new mode, JitStressRegs=4 which forces an upper SIMD register to be selected during register allocation.
  5. Ensure that an upper SIMD register is selected for a gentree node only if the node can be lowered to an instruction that supports EVEX encoding.
  6. Refactor some of the instruction description backend to instead pass around and use an instrDesc struct instead of a pure instruction to determine when an EVEX prefix is needed. Previously, the instruction itself was enough to determine if SSE/VEX is needed --- now, we require more information about how the instruction is used, e.g., does it require an upper SIMD register.
  7. Add logic to emitter to place proper bytes in the EVEX prefix for the upper simd registers.

Of all the points above, (4) requires a little more elaboration. When the LSRA constraints are build up for a node, typically RBM_NONE is passed as the constraint mask for an ref interval, which eventually leads to the LSRA selecting the set of available registers for the ref interval type. For nodes whose lowering does not have an available EVEX form (see the commit message), one would like to specify the constraint mask to RBM_LOWFLOAT or an equivalent set of registers; however, this came into conflict with the way contained nodes are handling during lowering (and comments even state that the constraint mask feature will not work if contained nodes do not have the same type as the containing node).

To mitigate a lot of rewriting, I created a special RBM_LOWSIMD tag in the regMask type which functions much like RBM_NONE except that during the creation of a new ref position (LinearScan::newRefPosition) it will select the low SIMD nodes if the interval is a floating/simd type, otherwise it will default to RBM_NONE functionality.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 12, 2022
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 12, 2022
@ghost
Copy link

ghost commented Dec 12, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

PR is a draft for testing purposes. Will update with a full description of the changes when ready for review.

Author: anthonycanino
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@kunalspathak
Copy link
Member

Probably we need to figure out first how to fix the regressions showing up in windows/x86:

image

{
case IF_RWR_RRD_RRD:
case IF_RWR_RRD_RRD_CNS:
case IF_RWR_RRD_RRD_RRD:
Copy link
Member

@tannergooding tannergooding Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected that this one is here and in idHasReg4?

Could we maybe simplify it to default: return idHasReg4()?

@@ -78,10 +78,16 @@
#endif // !UNIX_AMD64_ABI
#define CSE_CONSTS 1 // Enable if we want to CSE constants

#define RBM_ALLFLOAT (RBM_XMM0 | RBM_XMM1 | RBM_XMM2 | RBM_XMM3 | RBM_XMM4 | RBM_XMM5 | RBM_XMM6 | RBM_XMM7 | RBM_XMM8 | RBM_XMM9 | RBM_XMM10 | RBM_XMM11 | RBM_XMM12 | RBM_XMM13 | RBM_XMM14 | RBM_XMM15)
#define RBM_LOWFLOAT (RBM_XMM0 | RBM_XMM1 | RBM_XMM2 | RBM_XMM3 | RBM_XMM4 | RBM_XMM5 | RBM_XMM6 | RBM_XMM7 | RBM_XMM8 | RBM_XMM9 | RBM_XMM10 | RBM_XMM11 | RBM_XMM12 | RBM_XMM13 | RBM_XMM14 | RBM_XMM15 )
#define RBM_HIGHFLOAT (RBM_XMM16 | RBM_XMM17 | RBM_XMM18 | RBM_XMM19 | RBM_XMM20 | RBM_XMM21 | RBM_XMM22 | RBM_XMM23 | RBM_XMM24 | RBM_XMM25 | RBM_XMM26 | RBM_XMM27 | RBM_XMM28 | RBM_XMM29 | RBM_XMM30 | RBM_XMM31)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could come up with a better name than "HIGHFLOAT"

From the initial 8 SIMD registers, we've had 2 extensions so far and its conceivable a 3rd could happen in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used "high" to refer to the 16 AVX512 registers mostly because that is the term reference in the Intel manual, with "next" being used to reference the 8 AVX registers.

I'm in favor of a better naming scheme, but not certain what to choose. We could select RBM_AVX512FLOAT or something along those lines...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding any more thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a particular preference and I don't think it's "strictly" necessary to update. More a passing thought about how it might be confusing.

@anthonycanino anthonycanino marked this pull request as ready for review December 14, 2022 18:05
@anthonycanino
Copy link
Contributor Author

Probably we need to figure out first how to fix the regressions showing up in windows/x86:

image

I am investigating why we see a throughput hit on x86, it wasn't quite what I thought it was earlier.

@BruceForstall
Copy link
Member

@anthonycanino Looks like you need to fix the formatting.

Diffs -- no diffs but reasonably significant TP regressions (~0.5%) on x64.

@BruceForstall
Copy link
Member

@anthonycanino
Copy link
Contributor Author

Do you want to include https:/dotnet/runtime/compare/main...BruceForstall:EnableJitStressHighAvx512Regs?expand=1 in this change?

Yes I think that would be a good idea. Do I merge it in this this PR or do we do that seperately?

@BruceForstall
Copy link
Member

Yes I think that would be a good idea. Do I merge it in this this PR or do we do that seperately?

Include the changes as a new commit. Doesn't need to be a branch merge from mine or anything like that.

@BruceForstall
Copy link
Member

Looks like this is likely to conflict with #79478

@anthonycanino
Copy link
Contributor Author

Looks like this is likely to conflict with #79478

I've rebased and pushed to test now.

src/coreclr/jit/compiler.h Outdated Show resolved Hide resolved
src/coreclr/jit/compiler.h Outdated Show resolved Hide resolved
src/coreclr/jit/compiler.cpp Outdated Show resolved Hide resolved
Comment on lines 3329 to 3340
#if defined(TARGET_AMD64)
rbmAllFloat = RBM_ALLFLOAT_INIT;
rbmFltCalleeTrash = RBM_FLT_CALLEE_TRASH_INIT;
rbmCntCalleeTrashFloat = CNT_CALLEE_TRASH_FLOAT_INIT;

if (DoJitStressEvexEncoding())
{
rbmAllFloat |= RBM_HIGHFLOAT;
rbmFltCalleeTrash |= RBM_HIGHFLOAT;
rbmCntCalleeTrashFloat += 16;
}
rbmCalleeTrash = RBM_CALLEE_TRASH_INIT;
rbmCalleeTrashNoGC = RBM_CALLEE_TRASH_NOGC_INIT;
rbmCalleeTrashWriteBarrier = RBM_CALLEE_TRASH_WRITEBARRIER_INIT;
rbmCalleeGCTrashWriteBarrier = RBM_CALLEE_GCTRASH_WRITEBARRIER_INIT;
rbmCalleeTrashWriteBarrierByref = RBM_CALLEE_TRASH_WRITEBARRIER_BYREF_INIT;
rbmCalleeGCTrashWriteBarrierByref = RBM_CALLEE_GCTRASH_WRITEBARRIER_BYREF_INIT;
rbmStopForGCTrash = RBM_STOP_FOR_GC_TRASH_INIT;
rbmInitPInvokeFrameTrash = RBM_INIT_PINVOKE_FRAME_TRASH_INIT;
rbmProfilerEnterTrash = RBM_PROFILER_ENTER_TRASH_INIT;
rbmProfilerLeaveTrash = RBM_PROFILER_LEAVE_TRASH_INIT;
rbmProfilerTailcallTrash = RBM_PROFILER_TAILCALL_TRASH_INIT;
#endif // TARGET_AMD64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these are global variables now, that are initialized every time the JIT is initialized?

Can we assume the memory write is atomic? Should they be initialized using InterlockedExchange64? Using Interlocked, we could assert they either have the same value as written, or zero (uninitialized).

I presume they are not on Compiler because either (1) it was inconvenient to plumb a Compiler* everywhere they are needed, (2) throughput, or (3) you couldn't/didn't want to use the TLS Compiler* (maybe only available in DEBUG?)

Could there be a scenario where these change for a single clrjit.dll instance? E.g., what if crossgen2/AOT compilation loads the JIT and compiles both an AVX-512 targeting function and a non-AVX-512 targeting function?

Currently, they are only changed with the JitStress mode. They will presumably be set based on ISA. Does this initialization location have all the appropriate ISA information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these are global variables now, that are initialized every time the JIT is initialized?

Can we assume the memory write is atomic? Should they be initialized using InterlockedExchange64? Using Interlocked, we could assert they either have the same value as written, or zero (uninitialized).

I presume they are not on Compiler because either (1) it was inconvenient to plumb a Compiler* everywhere they are needed, (2) throughput, or (3) you couldn't/didn't want to use the TLS Compiler* (maybe only available in DEBUG?)

Yes, this is why --- passing compiler around was both clunky and also incurred its own overhead because it needed to be checked on each access.

Could there be a scenario where these change for a single clrjit.dll instance? E.g., what if crossgen2/AOT compilation loads the JIT and compiles both an AVX-512 targeting function and a non-AVX-512 targeting function?

I could use some help here, because honestly I'm not familiar enough with the internals to understand this. As it's done now, I am assuming that these are unchanged and compiling for a single arch/ISA.

Currently, they are only changed with the JitStress mode. They will presumably be set based on ISA. Does this initialization location have all the appropriate ISA information?

The compiler object has this information as far as I know. Essentially, I would envision replacing with an compOpportunisticallyDependsOn(InstructionSet_AVX512F_VL) check.

@tannergooding @kunalspathak are my assumptions here correct? Will this be an issue with crossgen?

Copy link
Member

@tannergooding tannergooding Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to be filtered oncompOpportunisticallyDependsOn(InstructionSet_AVX512F), but I agree that I would expect crossgen to be work correctly presuming that.

I do think this needs to be "per compiler instance" and not part of the global state as we can have AltJits or other scenarios where one compiler supports an ISA but another does not.

If we could cheapen this further, such as pulling the relevant state into a local where the register allocator needs it, that would presumably be "better".

If the TP regression is "too much" even with that, I could imagine a few ways we could use a template or inlining to ensure the LSRA code still uses a constant and is primarily filtered from a centralized high level check (of course, this comes at the cost of increased deployment size for clrjit.dll).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could cheapen this further, such as pulling the relevant state into a local where the register allocator needs it

The problem is it is not just LSRA that depends on them, but optimizer and codegen as well. Hence it is little hard to extract that out just at the LSRA level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume they are not on Compiler because either (1) it was inconvenient to plumb a Compiler* everywhere they are needed, (2) throughput, or (3) you couldn't/didn't want to use the TLS Compiler* (maybe only available in DEBUG?)

The reason was #1 , but what you have brought up is a legitimate concern. The only way to avoid it to have them on compiler object and then pass compiler instance around. Depending on which area are using these macros, we could do something like, where you have a macro (per file since they are not that many) which takes compiler object, but at all the places where it is needed, you use a version of macro that actually call the one that takes compiler object. That way, at the callsite, you don't have to pass compiler object. Its ugly :(

#define ABC_WITH_COMPILER(compiler) compiler->isEvex ? x : y
#define ABC ABC_WITH_COMPILER(this)

 = ABC;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunalspathak @tannergooding do you have any more thoughts on how to adjust the PR for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try the approach in #79544 (comment) and #79544 (comment)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me try this approach #79544 (comment) then and see how the numbers look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunalspathak I have a draft of this up at 9c386bc . Currently waiting for test data / throughput.

To be honest, I do not like it, it seems really messy (even if we move the macro definitions to the call sites). It also seems the alternatives aren't great either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, with the change 9c386bc ...

image

It seems like most approaches have been pretty similar with throughput impact.

src/coreclr/jit/compiler.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/hwintrinsic.h Outdated Show resolved Hide resolved
src/coreclr/jit/hwintrinsic.h Outdated Show resolved Hide resolved
src/coreclr/jit/hwintrinsic.h Outdated Show resolved Hide resolved
src/coreclr/jit/target.h Outdated Show resolved Hide resolved
src/coreclr/jit/emitxarch.cpp Outdated Show resolved Hide resolved
@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Dec 15, 2022
@BruceForstall
Copy link
Member

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments for code around LSRA.

@@ -214,7 +214,8 @@ inline ssize_t emitter::emitGetInsAmdAny(instrDesc* id)

/*static*/ inline void emitter::emitEncodeCallGCregs(regMaskTP regmask, instrDesc* id)
{
assert((regmask & RBM_CALLEE_TRASH) == 0);
// TODO-XARCH-AVX512 global defined in compiler.h, not in scope here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix and uncomment?

src/coreclr/jit/targetamd64.h Outdated Show resolved Hide resolved
src/coreclr/jit/targetamd64.h Outdated Show resolved Hide resolved
@@ -1900,21 +1905,30 @@ int LinearScan::BuildIntrinsic(GenTree* tree)
break;
}
assert(tree->gtGetOp2IfPresent() == nullptr);

// TODO-XARCH-AVX512 this is overly constraining register available as NI_System_Math_Abs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this extracted out in a separate method (along with the comment) instead of having this check at multiple places?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't this be under isEvexCompatible check instead of just using RBM_LOWSIMD for AMD64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, its the NamedIntrinsics and not the HWIntrinsics, e.g., those defined in namedintrinsiclist.h.

I suppose I can try and add a similar marking for these but as far as I know there is not a set of flags for these named intrinsics?

srcCount += isRMW ? BuildDelayFreeUses(op3, op1) : BuildOperandUses(op3);
regMaskTP op3RegCandidates = RBM_NONE;
#if defined(TARGET_AMD64)
if (!isEvexCompatible)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I would expect the above code too.

if (mask == RBM_LOWSIMD)
{
// Constrain if we have to for float/simd types
if (varTypeIsFloating(theInterval->registerType) || varTypeIsSIMD(theInterval->registerType))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is heavy check to add in newRefPosition. Is there a way to move this check (you can have this logic in a helper method or something) to relevant callers which would definitely pass mask == RBM_LOWSIMD?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the callsite, since you already have the GenTree* for the RefPosition, you can do something like this:

regMaskTP mask = RBM_NONE;
if (varTypeIsFloat(node) || varTypeIsSIMD(node)) {
  mask  = lowSIMDRegs();
}

BuildUse(node, mask);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there was a case where specifying the mask at a BuildUse callsite did not cover, something related to a contained GT_INDIR node whose register set was not the same as its containing node.

Let me see if I can find the example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change 6cd4d43 we can see the following example cause some potential issues...

[MethodImpl(MethodImplOptions.NoInlining)]
static bool Test_Sse41_TestZ_Normal(in Vector128<Int32> x, in Vector128<Int32> y)
{
    return Sse41.TestZ(x, y);
}

which lowers to....

N001 (  1,  1) [000000] -----------                    t0 =    LCL_VAR   byref  V00 arg0         u:1 (last use) $80
                                                            /--*  t0     byref  
N002 (  7,  5) [000001] ---XG------                    t1 = *  IND       simd16
N003 (  1,  1) [000002] -----------                    t2 =    LCL_VAR   byref  V01 arg1         u:1 (last use) $81
                                                            /--*  t2     byref  
N004 (  7,  5) [000003] -c-XG------                    t3 = *  IND       simd16
                                                            /--*  t1     simd16 
                                                            +--*  t3     simd16 
N005 ( 15, 11) [000004] ---XG------                         *  HWINTRINSIC void   int PTEST <l:$243, c:$242>

Since TestZ associated lowered instruction ptest does not have an EVEX form, we restrict it's operands (which are both float variables that are contained) to the lower SIMD registers. I believe this creates an issue though since if we specify constraints on t3 = IND it also propagates to its child node t2 = LCL_VAR byref .... but the nodes have different variable types. Previously, RBM_NONE would have been passed down, which would let newRefPosition select a set of registers based on ref positions type.

Does this make sense? Essentially, RBM_LOWSIMD was a version of RBM_NONE to get around this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hhm, I thought you were planning to remove it. Did you forget to push?

@kunalspathak
Copy link
Member

Latest TP numbers:

image

@anthonycanino
Copy link
Contributor Author

@dotnet/avx512-contrib can I get a rerun of the avx512 stress pipelines?

@tannergooding
Copy link
Member

Requeued the failing jobs.

@anthonycanino
Copy link
Contributor Author

Latest TP numbers:

image

What are our thoughts on this? Are we in acceptable range?

@tannergooding
Copy link
Member

tannergooding commented Feb 7, 2023

-- Just worth noting that between November and now we've gotten multiply TP improvements in, including some that were -0.4% range, so we likely won't be regressing anything as compared to .NET 7

There are likely more TP improvements to be had for this code of course, but it's potentially something we can iterate on separately.

@kunalspathak
Copy link
Member

but it's potentially something we can iterate on separately.

I agree.

@anthonycanino
Copy link
Contributor Author

So are we ok with the changes now?

@BruceForstall
Copy link
Member

BruceForstall commented Feb 8, 2023

Can you remind me why there is any TP impact on x86?

[Edit] I suppose that could be due to the instruction ins => const instrDesc* id parameter change, for example.

@anthonycanino
Copy link
Contributor Author

Can you remind me why there is any TP impact on x86?

[Edit] I suppose that could be due to the instruction ins => const instrDesc* id parameter change, for example.

Yes, this is what I think.

@BruceForstall
Copy link
Member

It looks like on Linux (and Mac?) we'll need additional updates in src\coreclr\pal\src\thread\context.cpp, specifically CONTEXTFromNativeContext, CONTEXTToNativeContext, as well as FPREG_HasYmmRegisters() (need a FPREG_HasZmmRegisters()?) and audit other places where CONTEXT_XSTATE is used and is storing 16 registers, 16 bytes each. So it looks like they would need updating to save/restore kmask, the high 16 bytes of zmm0-zmm15 and all of zmm16-zmm31.

But maybe that's not causing problems in this change, and can be deferred to a separate change?

cc @janvorli @DeepakRajendrakumaran

Convert from macros to accessor functions for
RBM_ALLFLOAT, RBM_FLT_CALLEE_TRASH, CNT_CALLEE_TRASH_FLOAT.
Convert LSRA use of ACTUAL_REG_COUNT to AVAILABLE_REG_COUNT,
and create an accessor for that value for AMD64 as well.
@BruceForstall
Copy link
Member

BruceForstall commented Feb 8, 2023

The macros defining the AVX512-specific masks and counts were bothering me, so I wrote #81818, to implement them as accessor functions in the various classes that need them. I think it looks better this way. Comments?

[Edit] I created a PR to this overall change if you decide to take it: anthonycanino#6

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Register allocator portion looks good to me modulo the changes suggested by @BruceForstall in #81818.

const unsigned lsraRegOrderSize = ArrLen(lsraRegOrder);
static const regNumber lsraRegOrder[] = {REG_VAR_ORDER};
const unsigned lsraRegOrderSize = ArrLen(lsraRegOrder);
// TODO-XARCH-AVX512 we might want to move this to be configured with the rbm variables too
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this happen in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting will do in a follow up PR

src/coreclr/jit/lsraxarch.cpp Show resolved Hide resolved
@anthonycanino
Copy link
Contributor Author

The macros defining the AVX512-specific masks and counts were bothering me, so I wrote #81818, to implement them as accessor functions in the various classes that need them. I think it looks better this way. Comments?

[Edit] I created a PR to this overall change if you decide to take it: anthonycanino#6

Yes, I will pull in.

…egs-with-reg-accessors

Use inline accessor functions instead of macros
@BruceForstall
Copy link
Member

No diffs, TP regressions as before.

Failures looks like infra or unrelated.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your patience and efforts!

@kunalspathak kunalspathak merged commit ad52afd into dotnet:main Feb 8, 2023
@anthonycanino
Copy link
Contributor Author

Thanks all!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants