Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codegen for a constant-sized stackalloc into a Span<T> should be on par with explicit-sized no-init local declaration #52979

Open
Sergio0694 opened this issue May 19, 2021 · 11 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:2 Work that is important, but not critical for the release tenet-performance Performance related issue
Milestone

Comments

@Sergio0694
Copy link
Contributor

Sergio0694 commented May 19, 2021

Description

I believe the codegen for "safe constant stackallocs" could be improved, to match what can already be manually achieved by just declaring a local with [SkipLocalsInit], of a custom struct type of an explicit size. What I mean by "safe constant stackalloc":

  • The stackalloc size must be known at compile time (just like for [StructLayout(Size = ...)])
  • The stackalloc must directly be assigned into a Span<T> (Span<byte>?)

Here's the current codegen when using such a declaration in a method:

C# code (click to expand):
[SkipLocalsInit]
static void M1()
{
    Span<byte> span = stackalloc byte[64];
    Foo(span);
}
asm x86-64 code (click to expand):
; .NET 5 (from sharplab)
Program.M1()
    L0000: push rbp
    L0001: sub rsp, 0x40
    L0005: lea rbp, [rsp+0x20]
    L000a: xor eax, eax
    L000c: mov [rbp+0x10], rax
    L0010: mov rax, 0xc45859a823ae
    L001a: mov [rbp+8], rax
    L001e: add rsp, 0x20
    L0022: test [rsp], esp
    L0025: sub rsp, 0x40
    L0029: sub rsp, 0x20
    L002d: lea rcx, [rsp+0x20]
    L0032: mov eax, 0x40
    L0037: lea rdx, [rbp+0x10]
    L003b: mov [rdx], rcx
    L003e: mov [rdx+8], eax
    L0041: lea rcx, [rbp+0x10]
    L0045: call Program.Foo(System.Span`1<Byte>)
    L004a: mov rcx, 0xc45859a823ae
    L0054: cmp [rbp+8], rcx
    L0058: je short L005f
    L005a: call 0x00007ffc9de2d430
    L005f: nop
    L0060: lea rsp, [rbp+0x20]
    L0064: pop rbp
    L0065: ret
; .NET 6 (from a recent build, with Disasmo)
; Method Program:M1()
G_M41622_IG01:
       push     rbp
       sub      rsp, 64
       lea      rbp, [rsp+20H]
       xor      eax, eax
       mov      qword ptr [rbp+10H], rax
       mov      rax, 0xD1FFAB1E
       mov      qword ptr [rbp+08H], rax
						;; bbWeight=1    PerfScore 4.25

G_M41622_IG02:
       add      rsp, 32
       test     dword ptr [rsp], esp
       sub      rsp, 64
       sub      rsp, 32
       lea      rcx, [rsp+20H]
       mov      eax, 64
       mov      bword ptr [rbp+10H], rcx
       mov      dword ptr [rbp+18H], eax
       lea      rcx, bword ptr [rbp+10H]
       call     Program.Program:Foo(System.Span`1[Byte])
       mov      rcx, 0xD1FFAB1E
       cmp      qword ptr [rbp+08H], rcx
       je       SHORT G_M41622_IG03
       call     CORINFO_HELP_FAIL_FAST
						;; bbWeight=1    PerfScore 10.25

G_M41622_IG03:
       nop      
						;; bbWeight=1    PerfScore 0.25

G_M41622_IG04:
       lea      rsp, [rbp+20H]
       pop      rbp
       ret      
						;; bbWeight=1    PerfScore 2.00
; Total bytes of code: 99

And here's the codegen by using that trick with a local struct type with explicit layout:

C# code (click to expand):
[StructLayout(LayoutKind.Explicit, Size = 64)]
struct Space { }

[SkipLocalsInit]
static void M2()
{
    Space space;
    Span<byte> span = new(&space, 64);
    Foo(span);
}
asm x86-64 code (click to expand):
; .NET 5 (from sharplab)
Program.M2()
    L0000: sub rsp, 0x78
    L0004: xor eax, eax
    L0006: mov [rsp+0x28], rax
    L000b: lea rcx, [rsp+0x38]
    L0010: mov eax, 0x40
    L0015: lea rdx, [rsp+0x28]
    L001a: mov [rdx], rcx
    L001d: mov [rdx+8], eax
    L0020: lea rcx, [rsp+0x28]
    L0025: call Program.Foo(System.Span`1<Byte>)
    L002a: nop
    L002b: add rsp, 0x78
    L002f: ret
; .NET 6 (from a recent build, with Disasmo)
; Method Program:M2()
G_M36245_IG01:
       sub      rsp, 120
       xor      eax, eax
       mov      qword ptr [rsp+28H], rax
						;; bbWeight=1    PerfScore 1.50

G_M36245_IG02:
       lea      rcx, bword ptr [rsp+38H]
       mov      eax, 64
       mov      bword ptr [rsp+28H], rcx
       mov      dword ptr [rsp+30H], eax
       lea      rcx, bword ptr [rsp+28H]
       call     EncryptTest.Program:Foo(System.Span`1[Byte])
       nop      
						;; bbWeight=1    PerfScore 4.50

G_M36245_IG03:
       add      rsp, 120
       ret      
						;; bbWeight=1    PerfScore 1.25
; Total bytes of code: 46

No stack cookie, no stack guard page check, no branches, no pushed registers - just all the good stuff 😄

It should be doable to do this under these two conditions, given that Span<T> already protects against buffer overruns (so no stack cookie is fine), and most importantly given that we can already get this same codegen, just with extra steps though?

Configuration

The .NET 6 builds with Disasmo are with a local checked build from ~2 weeks ago.

Regression?

Nope.

category:cq
theme:stack-allocation

@Sergio0694 Sergio0694 added the tenet-performance Performance related issue label May 19, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels May 19, 2021
@jkotas
Copy link
Member

jkotas commented May 19, 2021

No stack cookie, no stack guard page check, no branches, no pushed registers - just all the good stuff

No stack cookie is not a good stuff for unsafe stack buffer manipulation like this. We should really be emitting the stack cookie for situations like this, see https:/dotnet/designs/blob/main/accepted/2021/runtime-security-mitigations.md#stack-buffer-overflow-protection

@Sergio0694
Copy link
Contributor Author

"No stack cookie is not a good stuff for unsafe stack buffer manipulation like this."

I'm not sure I understand what you mean by "unsafe stack manipulation", could you elaborate on that? The scenario here would be to only interact with the buffer through a Span<T> (and I mean not using unsafe APIs like MemoryMarshal on it of course, otherwise all security bets are off anyway in general), and that stack manipulation would basically be the same (or, not much different) to what would happen in eg. a method declaring a few Guid-s or some other large structs as locals? I mean, that's what the JIT is doing anyway with that workaround with that fat no-init struct type (I might be missing something) 🤔

@jkotas
Copy link
Member

jkotas commented May 19, 2021

&space is unsafe code, and structure of space suggests that it is a buffer. It is not good that we omit the stack cookie in this case.

I agree that it is more reasonable to omit stack cookies for 100% safe Span-based code, but it is not easy for the JIT to tell from the IL whether it is the case.

@jkotas
Copy link
Member

jkotas commented May 19, 2021

#52065 is related to this.

@tannergooding
Copy link
Member

I think there will always be conflicts between "performance" and "safety" and I doubt the JIT will be able to "always" do the right thing. Even native compilers provide switches and pragmas to turn these features on/off (often defaulting to "on") and so perhaps we should support suppressing these features in .NET as well.

I'm also not really a fan of calling the stack cookie a "security mitigation" (as its referred to in the doc linked above) as it really just blocks certain kinds of buffer overflow (most often accidental). If you happen to skip the cookie (either on purpose or on accident) then it effectively does nothing. Things like address sanitization, shadow stacks, or the CPU hardware features are often more effective modern alternatives for places where security is actually important.

@jkotas
Copy link
Member

jkotas commented May 19, 2021

a fan of calling the stack cookie a "security mitigation" (as its referred to in the doc linked above) as it really just blocks certain kinds of buffer overflow (most often accidental).

What would you call it instead? Mitigations are about reducing harmful effects, not reliably eliminating them. I have included the definition of the "mitigation" at the top of the doc since that is common misunderstanding.

@tannergooding
Copy link
Member

tannergooding commented May 19, 2021

I don't have a suggestion for a better name, I've just always viewed the stack cookie as more of a bounds checking/correctness/safety feature than a security one.

@GrabYourPitchforks
Copy link
Member

Agree that shadow stacks are likely a more effective mitigation going forward. But they're still not widely supported among processors our users are likely to use, and stack canaries have a history of catching these types of buffer overruns before they can lead to critical issues.

We really should be performing stack checks (canaries, shadow stacks, whatever tech) more frequently. But at the same time, it totally makes sense to investigate whether the JIT can detect certain patterns as safe and elide the check. I don't know enough about the JIT to say how much work that might entail, though. :(

@JulieLeeMSFT
Copy link
Member

CC @AndyAyersMS.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Jun 7, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the Future milestone Jun 7, 2021
@JulieLeeMSFT JulieLeeMSFT added the Priority:2 Work that is important, but not critical for the release label Jun 7, 2021
@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Jan 7, 2022

Figured I'd mention that if #32060 makes it (hopefully), this will also be allowed:

struct Space
{
    public byte Buffer[128];
}

[SkipLocalsInit]
static void M2()
{
    Space space;
    Foo(space.Buffer); // Buffer has an implicit Span<byte> conversion
}

This would presumably result in the same exact codegen as the new Span<byte>(&space, 128) example above, while only using safe code (except for [SkipLocalsInit], yes). This would mean that we'd end up in a situation where we'd have three perfectly equivalent ways of doing the same thing (ie. getting a stack-allocated Span<byte> of constant size), where the simplest one (ie. literally just doing Span<byte> span = stackalloc byte[128]) is the one that results in worse codegen, as that's the only case where the JIT is inserting a stack cookie. Wouldn't it make sense to remove it in that case too?

Additionally: if #32060 makes it, similarly it'll be possible to create a Span<T> from a local with fully safe code (ie. new Span<T>(ref local) will be possible). Wouldn't that be the same exact scenario as the stackalloc example then? Ie. either we assume that Span<T> is safe and the stack cookie is not needed, in which case the cookie should be removed by the stackalloc as well, or we assume it's never safe, in which case we'd then need to add a cookie every single time new Span<T>(ref T) is used too (please don't). Otherwise, wouldn't we have an inconsistency in what the JIT considers safe here and needing a cookie? 🤔

@jkotas
Copy link
Member

jkotas commented Jan 7, 2022

All stack buffers should get stack cookie by default, irrespective of the exact pattern used to create them. We may be missing it for some patterns today. As @GrabYourPitchforks said in #52979 (comment), we really should be performing stack checks more frequently.

If the JIT is able to prove that all code operating on the buffer is safe with proper bounds checks, it should be ok to omit the stack cookie. I am not sure how often would this optimization help in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:2 Work that is important, but not critical for the release tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants