Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose the FusedMultiplyAdd and MultiplyAddEstimate APIs on relevant vector and scalar types #102181

Merged
merged 10 commits into from
May 16, 2024

Conversation

tannergooding
Copy link
Member

This resolves #98053 and makes progress towards #93513

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@tannergooding tannergooding marked this pull request as draft May 14, 2024 18:31
@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib for the JIT side.

This is one of the few remaining cases that should need JIT side work, most of the remaining APIs in #93513 will be implemented purely in managed code. The *Estimate APIs are special as they need to work in the face of R2R due to being intentionally non-deterministic across hardware and FusedMultiplyAdd represents a case that optimizes down to a single instruction and which is a core part of many already deeply complex algorithms, so the benefits of guaranteeing expansion make it good to hook up to the existing intrinsic support.

@EgorBo
Copy link
Member

EgorBo commented May 15, 2024

CC. @dotnet/jit-contrib for the JIT side.

This is one of the few remaining cases that should need JIT side work, most of the remaining APIs in #93513 will be implemented purely in managed code. The *Estimate APIs are special as they need to work in the face of R2R due to being intentionally non-deterministic across hardware and FusedMultiplyAdd represents a case that optimizes down to a single instruction and which is a core part of many already deeply complex algorithms, so the benefits of guaranteeing expansion make it good to hook up to the existing intrinsic support.

Should we rather introduce a new attribute to prevent methods from being pre-jitted (and inlined into pre-jitted code)?

// AdvSimd.FusedMultiplyAdd expects (addend, left, right), while the APIs take (left, right, addend)
// We expect op1 and op2 to have already been spilled

std::swap(op1, op3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just curious - who's responsible to spill them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nvm, I see

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically it's the importer code, prior to calling this method. Latter phases that might call such an API (like morph) are responsible for ensuring the swap is safe in the face of potential side effects.

There's, unfortunately, not really a way for us to do such a swap safely in the API itself (at least that I know of) nor to know if the caller has actually done it. So the typical approach has been to do the swap and comment that callers should be doing the validation.

impSpillSideEffect(true, verCurrentState.esStackDepth -
3 DEBUGARG("Spilling op1 side effects for MultiplyAddEstimate"));

impSpillSideEffect(true, verCurrentState.esStackDepth -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't better/simpler to use impSpillSideEffects(true, CHECK_SPILL_ALL DEBUGARG("spilling side-effects")); ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not something we've been doing in other scenarios.

AFAIR it comes down to not spilling values on the stack that aren't impacted. For example, we take 3 args, but the stack could have 4+ on it (spilling these is unnecessary since they are still processed in order with respect to our own op1/op2/op3; otherwise everyone who pops the stack would need to consider the need to spill such additional entries) and we don't need to ever spill the stack top (op3).

So we're doing this to ensure only the minimum number of items that need to be spilled are spilled.

@tannergooding
Copy link
Member Author

Should we rather introduce a new attribute to prevent methods from being pre-jitted (and inlined into pre-jitted code)?

We technically have one already, it's BypassReadyToRun (but we don't define or use it anyways as it was a legacy attribute for Crossgen1). The PR where I fixed this for Math.ReciprocalSqrtEstimate it was suggested to do it this other way instead.

I'm also not sure the number of *Estimate APIs is at the point where it's beneficial to define and expose that attribute and there's various other quirks that may need to be considered/solved in that scenario was well, plus the general consideration that some of these APIs are known to be used in hot paths of complex algorithms where inlining can fail, even with the inlining boost that methods using intrinsics get.

-- I do want to get to a point, someday, where we can move a lot of complexity out of the JIT and into the BCL instead. Ideally just the platform specific intrinsics and core xplat APIs which were exposed instead of platform specific APIs (like Create) are handled in the JIT and things like Vector2/3/4 or Vector128.WidenLower and APIs like that are handled entirely in managed code instead. I just don't think we're at that point yet today and need more work in the inliner and other places to ensure that we can get the desired codegen.

@tannergooding
Copy link
Member Author

CC. @stephentoub for the libraries side.

The one thing to note is that given how TensorPrimitives defines MultiplyAddEstimate and uses it in other APIs, the API was moved down to INumberBase<T> rather than being put on IFloatingPointIeee754<T>. This is still sensible for other types, it just always does the naive thing and isn't exposed as a member that's accessible from non-generic contexts.

@tannergooding
Copy link
Member Author

-- I do want to get to a point, someday, where we can move a lot of complexity out of the JIT and into the BCL instead. Ideally just the platform specific intrinsics and core xplat APIs which were exposed instead of platform specific APIs (like Create) are handled in the JIT and things like Vector2/3/4 or Vector128.WidenLower and APIs like that are handled entirely in managed code instead. I just don't think we're at that point yet today and need more work in the inliner and other places to ensure that we can get the desired codegen.

I've logged #102275 to track prototyping this work and seeing if we can identify some of the places that are still preventing it from happening.

This seems like a good thing to try and do around the time .NET 9 RC1 snaps and we have a bit of a small break before .NET 10 work really gets going.

@tannergooding tannergooding merged commit b7727f5 into dotnet:main May 16, 2024
183 checks passed
@tannergooding tannergooding deleted the fix-93513 branch May 16, 2024 21:11
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
…vector and scalar types (dotnet#102181)

* Expose FusedMultiplyAdd and MultiplyAddEstimate on the scalar and vector types

* Adding tests covering FusedMultiplyAdd and MultiplyAddEstimate for the vector types

* Ensure TensorPrimitives uses the xplat APIs on .NET 9+

* Apply formatting patch

* Fix an accidental change to GenericVectorTests

* Ensure Arm64 passes fma operands in the correct order

* Apply formatting patch

* Ensure all the Arm64 code paths are spilling and swapping operands

* Apply formatting patch

* Don't pop the stack value twice
@github-actions github-actions bot locked and limited conversation to collaborators Jun 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API Proposal]: MultiplyAddEstimate
2 participants