Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip FP registers processing if method don't have floating points use #85744

Merged
merged 4 commits into from
May 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/coreclr/jit/importer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,11 @@ void Compiler::impPushOnStack(GenTree* tree, typeInfo ti)
verCurrentState.esStack[verCurrentState.esStackDepth].seTypeInfo = ti;
verCurrentState.esStack[verCurrentState.esStackDepth++].val = tree;

if ((tree->gtType == TYP_LONG) && (compLongUsed == false))
if (tree->gtType == TYP_LONG)
{
compLongUsed = true;
}
else if (((tree->gtType == TYP_FLOAT) || (tree->gtType == TYP_DOUBLE)) && (compFloatingPointUsed == false))
else if ((tree->gtType == TYP_FLOAT) || (tree->gtType == TYP_DOUBLE))
{
compFloatingPointUsed = true;
}
Expand Down
8 changes: 6 additions & 2 deletions src/coreclr/jit/lsra.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,8 @@ void LinearScan::updateRegsFreeBusyState(RefPosition& refPosition,

regMaskTP LinearScan::internalFloatRegCandidates()
{
needNonIntegerRegisters = true;

if (compiler->compFloatingPointUsed)
{
return availableFloatRegs;
Expand Down Expand Up @@ -686,13 +688,15 @@ LinearScan::LinearScan(Compiler* theCompiler)
, refPositions(theCompiler->getAllocator(CMK_LSRA_RefPosition))
, listNodePool(theCompiler)
{
availableRegCount = ACTUAL_REG_COUNT;
needNonIntegerRegisters = false;

#if defined(TARGET_XARCH)
availableRegCount = ACTUAL_REG_COUNT;

#if defined(TARGET_AMD64)
rbmAllFloat = compiler->rbmAllFloat;
rbmFltCalleeTrash = compiler->rbmFltCalleeTrash;
#endif
#endif // TARGET_AMD64

if (!compiler->canUseEvexEncoding())
{
Expand Down
4 changes: 3 additions & 1 deletion src/coreclr/jit/lsra.h
Original file line number Diff line number Diff line change
Expand Up @@ -733,6 +733,7 @@ class LinearScan : public LinearScanInterface
unsigned int currentSpill[TYP_COUNT];
bool needFloatTmpForFPCall;
bool needDoubleTmpForFPCall;
bool needNonIntegerRegisters;

#ifdef DEBUG
private:
Expand Down Expand Up @@ -2041,13 +2042,14 @@ class LinearScan : public LinearScanInterface
}
#endif // TARGET_AMD64

#endif // TARGET_XARCH

unsigned availableRegCount;

unsigned get_AVAILABLE_REG_COUNT() const
{
return this->availableRegCount;
}
#endif // TARGET_XARCH

//------------------------------------------------------------------------
// calleeSaveRegs: Get the set of callee-save registers of the given RegisterType
Expand Down
14 changes: 14 additions & 0 deletions src/coreclr/jit/lsrabuild.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1940,6 +1940,14 @@ void LinearScan::buildPhysRegRecords()
RegRecord* curr = &physRegs[reg];
curr->regOrder = (unsigned char)i;
}

// TODO-CQ: We build physRegRecords before building intervals
// and refpositions. During building intervals/refposition, we
// would know if there are floating points used. If we can know
// that information before we build intervals, we can skip
// initializing the floating registers.
// For that `compFloatingPointUsed` should be set accurately
// before invoking allocator.
Comment on lines +1949 to +1950
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this bit in particular is a bit difficult as we need to pre-emptively set compFloatingPointUsed for any SIMD code. But that can't really account for dead code elimination or other factors that might later remove or transforms the nodes to no longer need it.

Copy link
Member

@tannergooding tannergooding May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it instead be possible to have regMaskTP LinearScan::internalFloatRegCandidates() lazily do the below, or is there other factors that require it?

That is, have it do something like:

regMaskTP LinearScan::internalFloatRegCandidates()
{
    if (!needNonIntegerRegisters)
    {
        // lazily initialize fpreg state here

        for (unsigned int i = 0; i < lsraRegOrderFltSize; i++)
        {
            regNumber  reg  = lsraRegOrderFlt[i];
            RegRecord* curr = &physRegs[reg];
            curr->regOrder  = (unsigned char)i;
        }

        needNonIntegerRegisters = true;
    }

    if (compiler->compFloatingPointUsed)
    {
        return availableFloatRegs;
    }
    else
    {
        return RBM_FLT_CALLEE_TRASH;
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we need the physRegRecord early right after identifying candidates.

Copy link
Member Author

@kunalspathak kunalspathak May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that can't really account for dead code elimination or other factors that might later remove or transforms the nodes to no longer need it.

True, but this is the best we can do I suppose, unless you are proposing to just start from scratch in LSRA and set it to true during BuildNode() if tree->GetType() != TYP_INT.

Edit: Doing the check in BuildNode() might prove costly though because we will be doing that for every single GenTree* node and not sure how many cases we will optimize for the scenarios where we didn't account for dead code elimination.

for (unsigned int i = 0; i < lsraRegOrderFltSize; i++)
{
regNumber reg = lsraRegOrderFlt[i];
Expand Down Expand Up @@ -2727,6 +2735,12 @@ void LinearScan::buildIntervals()
RefPosition* pos = newRefPosition((Interval*)nullptr, currentLoc, RefTypeBB, nullptr, RBM_NONE);
}

needNonIntegerRegisters |= compiler->compFloatingPointUsed;
if (!needNonIntegerRegisters)
{
availableRegCount = REG_INT_COUNT;
}

#ifdef DEBUG
// Make sure we don't have any blocks that were not visited
for (BasicBlock* const block : compiler->Blocks())
Expand Down
5 changes: 0 additions & 5 deletions src/coreclr/jit/target.h
Original file line number Diff line number Diff line change
Expand Up @@ -196,12 +196,7 @@ enum _regMask_enum : unsigned
#error Unsupported target architecture
#endif

#if defined(TARGET_XARCH)
// AVAILABLE_REG_COUNT is defined to be dynamic, based on whether AVX-512 high registers are available.
#define AVAILABLE_REG_COUNT get_AVAILABLE_REG_COUNT()
#else
#define AVAILABLE_REG_COUNT ACTUAL_REG_COUNT
#endif

/*****************************************************************************/

Expand Down