Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codegen improvements (removed unnecessary movsxd ops) #3378

Merged

Conversation

Sergio0694
Copy link
Member

@Sergio0694 Sergio0694 commented Jul 5, 2020

PR Type

What kind of change does this PR introduce?

  • Optimization

What is the current behavior?

The ReadOnlySpanExtensions.DangerousGetLookupReferenceAt and other extensions introduce an unnecessary movsxd instruction on x86-64, due to the Unsafe.Add<T>(ref T, int) overload being used. This is a (small) overhead that can be avoided.

Consider ReadOnlySpanExtensions.DangerousGetLookupReferenceAt, this method is currently JITed like so:

G_M34820_IG02:
       0FB7C1               movzx    rax, cx
       83F80B               cmp      eax, 11
       0F92C2               setb     dl
       0FB6D2               movzx    rdx, dl
       FFCA                 dec      edx
       F7D2                 not      edx
       23C2                 and      eax, edx
       4863C0               movsxd   rax, eax
       48BAA0B702E827020000 mov      rdx, 0x227E802B7A0
       0FB60410             movzx    rax, byte  ptr [rax+rdx]
                        ;; bbWeight=1    PerfScore 5.00
G_M34820_IG03:
       C3                   ret      
                        ;; bbWeight=1    PerfScore 1.00

; Total bytes of code 36, prolog size 0

You can see that movsxd rax, eax at offset 4863C0 which is not needed: that's just the offset and we don't really want to treat that as a signed integer, especially because movsxd has more overhead compared to eg. just doing a movzx.

What is the new behavior?

By changing the internal temp variables to be uint and by switching to the Unsafe.Add<T>(ref T, IntPtr) overload, we now get:

G_M34820_IG02:
       0FB7C1               movzx    rax, cx
       83F80B               cmp      eax, 11
       0F92C2               setb     dl
       0FB6D2               movzx    rdx, dl
       FFCA                 dec      edx
       F7D2                 not      edx
       23C2                 and      eax, edx
       48BAA0B7F64F99020000 mov      rdx, 0x2994FF6B7A0
       0FB60410             movzx    rax, byte  ptr [rax+rdx]
                        ;; bbWeight=1    PerfScore 4.75
G_M34820_IG03:
       C3                   ret      
                        ;; bbWeight=1    PerfScore 1.00

; Total bytes of code 33, prolog size 0

The codegen is a bit smaller and in particular that movsxd is completely gone, as the JIT can see the upper 32 bits of rax have already been zeroed by that first movzx rax, cx at the start of the method. We're now longer bothering to track the sign.

Other examples

One common extension (at least, personally) is the DangerousGetReferenceAt extension. This currently has an unnecessary movsxd instruction as well, again due to the Unsafe.Add<T>(ref T, int) overload being used (the input index is an int).

Here's the original codegen:

G_M18542_IG02:
       488B01               mov      rax, bword ptr [rcx]
       4863D2               movsxd   rdx, edx
       488D04D0             lea      rax, bword ptr [rax+8*rdx]
						;; bbWeight=1    PerfScore 2.75
G_M18542_IG03:
       C3                   ret      
						;; bbWeight=1    PerfScore 1.00

; Total bytes of code 11, prolog size 0, PerfScore 4.85

And here's after this PR:

G_M34253_IG02:
       488B01               mov      rax, bword ptr [rcx]
       8BD2                 mov      edx, edx
       488D04D0             lea      rax, bword ptr [rax+8*rdx]
						;; bbWeight=1    PerfScore 2.75
G_M34253_IG03:
       C3                   ret      
						;; bbWeight=1    PerfScore 1.00

; Total bytes of code 10, prolog size 0, PerfScore 4.75

In this case we've removed the extension entirely, on both x86 and x86-64. We're doing so by using:

Unsafe.Add(ref ..., (IntPtr)(void*)(uint)i);

Somewhat verbose, but it does the trick on both archs. It first does an unchecked conversion to uint (which is just a reinterpret-cast). We then first cast to void*, which lets the following IntPtr cast avoid the range check on x86 (since uint could be out of range there if the original index was negative). The final result is a clean mov, which automatically clears the upper 32 bits on x64.

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tested code with current supported SDKs
  • Pull Request has been submitted to the documentation repository instructions. Link:
  • Sample in sample app has been added / updated (for bug fixes / features)
  • Tests for the changes have been added (for bug fixes / features) (if applicable)
  • Header has been added to all new source files (run build/UpdateHeaders.bat)
  • Contains NO breaking changes

@ghost
Copy link

ghost commented Jul 5, 2020

Thanks Sergio0694 for opening a Pull Request! The reviewers will test the PR and highlight if there is any conflict or changes required. If the PR is approved we will proceed to merge the pull request 🙌

@ghost ghost assigned azchohfi Jul 5, 2020
@Sergio0694 Sergio0694 added high-performance 🚂 Issues/PRs for the Microsoft.Toolkit.HighPerformance package optimization ☄ Performance or memory usage improvements labels Jul 7, 2020
@Sergio0694 Sergio0694 changed the title Improved codegen for ReadOnlySpanExtensions.DangerousGetLookupReference Improved ReadOnlySpan<T>.DangerousGetLookupReference codegen Jul 8, 2020
@Sergio0694 Sergio0694 changed the title Improved ReadOnlySpan<T>.DangerousGetLookupReference codegen Codegen improvements (removed unnecessary movsxd ops for Unsafe.Add) Jul 12, 2020
@Sergio0694 Sergio0694 changed the title Codegen improvements (removed unnecessary movsxd ops for Unsafe.Add) Codegen improvements (removed unnecessary movsxd ops) Jul 12, 2020
Copy link
Contributor

@john-h-k john-h-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

(shame no C#9 nint yet tho 😭 )

@Sergio0694
Copy link
Member Author

@john-h-k Ahahah we'll get there! 🤣
Thanks for the review!

@Sergio0694 Sergio0694 closed this Jul 14, 2020
@Sergio0694 Sergio0694 reopened this Jul 14, 2020
@Sergio0694
Copy link
Member Author

Sergio0694 commented Jul 14, 2020

I have no idea how I managed to close this by accident, sorry for the dummy notifications to who is subscribed 😅

@Kyaa-dost Kyaa-dost added this to the 7.0 milestone Aug 25, 2020
@Sergio0694
Copy link
Member Author

The CI failed with a seemingly unrelated error after the last merge from master:

X Test_ColorHelper_ToColor_ScreenColor [10ms]
Error Message:
Test method UnitTests.Helpers.Test_ColorHelper.Test_ColorHelper_ToColor_ScreenColor threw exception:
System.FormatException: Input string was not in a correct format.
Stack Trace:
at System.Number.ThrowOverflowOrFormatException(Boolean, String) + 0x53

The CI in master is passing fine now though, so I'll just merge again in case this was already fixed in some other commit there.

@Sergio0694
Copy link
Member Author

Alright that worked 🎉

@michael-hawker
Copy link
Member

@Sergio0694 is there an open issue on the dotnet team about this and if this could be a future compiler optimization? 🙂

@ghost ghost removed the needs attention 👋 label Sep 25, 2020
@Sergio0694
Copy link
Member Author

@michael-hawker I've opened a related issue here, which is currently marked as Future, so no guarantees on when it would eventually be done. The changes on this PR though would enable this optimization on all runtimes, so even existing ones would get the same performance benefits here. Also not all the changes here are ones that could necessarily be done by the JIT in all cases, so this way we're just ensuring a better codegen in all scenarios 😊

@michael-hawker michael-hawker merged commit 86b7de7 into CommunityToolkit:master Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high-performance 🚂 Issues/PRs for the Microsoft.Toolkit.HighPerformance package optimization ☄ Performance or memory usage improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants