Expose a `malloca` API that either stackallocs or creates an array. #52065

tannergooding · 2021-04-29T17:23:19Z

Background and Motivation

It is not uncommon, in performance oriented code, to want to stackalloc for small/short-lived collections. However, the exact size is not always well known in which case you want to fallback to creating an array instead.

Proposed API

namespace System.Runtime.CompilerServices
{
    public static unsafe partial class Unsafe
    {
        public static Span<T> Stackalloc<T>(int length);
        public static Span<T> StackallocOrCreateArray<T>(int length);
        public static Span<T> StackallocOrCreateArray<T>(int length, int maxStackallocLength);
    }
}

These APIs would be intrinsic to the JIT and would effectively be implemented as the following, except specially inlined into the function so the localloc scope is that of the calling method:

public static Span<T> StackallocOrCreateArray<T>(int length, int maxStackallocLength)
{
    return ((sizeof(T) * length) < maxStackallocLength) ? stackalloc T[length] : new T[length];
}

The variant that doesn't take maxStackallocLength would use some implementation defined default. Windows currently uses 1024.

Any T would be allowed and the JIT would simply do new T[length] for any types that cannot be stack allocated (reference types).

The text was updated successfully, but these errors were encountered:

ghost · 2021-04-29T17:23:25Z

Tagging subscribers to this area: @GrabYourPitchforks, @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

It is not uncommon, in performance oriented code, to want to stackalloc for small/short-lived collections. However, the exact size is not always well known in which case you want to fallback to creating an array instead.

Proposed API

namespace System.Runtime.CompilerServices
{
    public static unsafe partial class Unsafe
    {
        public static Span<T> Stackalloc(int length);
        public static Span<T> StackallocOrCreateArray<T>(int length);
        public static Span<T> StackallocOrCreateArray<T>(int length, int maxStackallocLength);
    }
}

These APIs would be intrinsic to the JIT and would effectively be implemented as the following, except specially inlined into the function so the localloc scope is that of the calling method:

public static Span<T> StackallocOrCreateArray<T>(int length, int maxStackallocLength)
{
    return ((sizeof(T) * length) < maxStackallocLength) ? stackalloc T[length] : new T[length];
}

The variant that doesn't take maxStackallocLength would use some implementation defined default. Windows currently uses 1024.

Any T would be allowed and the JIT would simply do new T[length] for any types that cannot be stack allocated (reference types).

Author:	tannergooding
Assignees:	-
Labels:	`api-suggestion`, `area-System.Memory`, `untriaged`
Milestone:	-

tannergooding · 2021-04-29T17:25:46Z

This issue came up on Twitter again (https://twitter.com/jaredpar/status/1387798562117873678?s=20) and we have valid use cases in the framework and compiler.

This has been somewhat stuck in limbo as runtime/framework saying "we need language support first" and the language saying "we need the runtime/framework to commit to doing this first".

We should review and approve this to unblock the language from committing to their work and can do all the appropriate implementation/prep work on the runtime side, without actually making it public until the language feature is available.

CC. @jkotas, @jaredpar, @GrabYourPitchforks

tannergooding · 2021-04-29T17:29:10Z

There would not be an API which opts to use the ArrayPool today. We don't use the ArrayPool for any type, but rather only specific sets of types. An API which does use some ArrayPool would likely need to return some other type which indicates whether the type needs to be returned.

Stackalloc was added at the request of Jared who gave the following reasoning:

stackalloc

returns a pointer hence no var

works only on where T: unmanaged

Yes we could use a runtime feature flag to let the compiler remove the restriction on where T : unmanaged. But that doesn't fix the var issue. At the point we're adding new APIs for stack alloc then seems better to simplify and use them for all forms of stack alloc. Makes the code more consistent, lets you have the same type of call sites (can flip between forms without having to also say flip var)

GrabYourPitchforks · 2021-04-29T17:29:39Z

Related to #25423. That proposal is a bit light on concrete APIs, but it suggests behaviors / analyzers / other ecosystem goodness we'd likely want to have around this construct.

jaredpar · 2021-04-29T17:32:12Z

This does require language changes to work correctly but the implementation is very straight forward. The compiler will just treat all of these calls as if they are not safe to escape from the calling method. Effectively it would have the same lifetime limitation as calling stackalloc today.

I think the best approach is to just have the compiler trigger on the FQN of the method. Essentially any API with this signature in any assembly would be treated this way. That would make it easier to write code that multi-targets between .NET Core and .NET Framework as the framework side of this could be implemented as new T[length] in all cases.

The other advantage of this API is that w can once again var all the things.

var local1 = stackalloc int[42]; // int*
var local2 = Unsafe.StackAlloc<int>(42); // Span<int>

jkotas · 2021-04-29T17:38:22Z

This is one of those features that requires a joint work from all runtimes/JIT/language/libraries. Our .NET 6 budget for features in this space was taken by the generic numerics. We should include this proposal next time we do planning in this area.

Approving this API without the resource commintment won't achieve much.

tannergooding · 2021-04-29T17:46:11Z

Approving this API without the resource commintment won't achieve much.

It gives us a surface on which this can be implemented given "free time" and can be prioritized appropriately.

The library work is approving the API and exposing the surface area.
The language work is recognizing these methods and based on Jared's comment is relatively trivial.

The JIT work should just be implementing it as a recursive named intrinsic and then creating the relevant nodes for:

if ((sizeof(T) * length) < maxStackallocLength)
{
    var x = stackalloc T[length];
    return new Span<T>(x, length);
}
else   
{
    var x = new T[length];
    return new Span<T>(x);
}

This is fairly straightforward, except for the newobj calls which involves getting the relevant method tokens

jkotas · 2021-04-29T17:53:47Z

The JIT work should just be implementing it as a recursive named intrinsic and then creating the relevant nodes for:

I do not think we would want to do a naive implementation like this. I think we would want to do explicit life-time tracking even when the lenght is over the threashold.

tannergooding · 2021-04-29T17:57:37Z

I think we would want to do explicit life-time tracking even when the lenght is over the threashold.

What's the scenario where the JIT needs to do additional tracking that isn't already covered by the language rules and by the existing tracking for Span<T>?

Users can and already do write the above today, just manually inlined. We are looking at doing exactly this already in one of the BigInteger PRs.
This is an API on Unsafe that exists to help simplify the logic and behave like alloca does in C/C++ and can be immensely simplified/unblock scenarios by doing the trivial implementation.
It then becomes a drop in replacement for what users are already doing.

jkotas · 2021-04-29T18:05:30Z

What's the scenario where the JIT needs to do additional tracking that isn't already covered by the language rules and by the existing tracking for Span?

We would be leaving performance on the table.

Majority of the existing stackalloc uses are using ArrayPool as the fallback. If the new API is not using pooled memory as the fallback, the majority of the existing stackalloc sites won't be able to use it.

jaredpar · 2021-04-29T18:16:00Z

It is also could be a common pattern to use stackalloc or ArrayPool, e.g.:

That requires a different level of language support. Supporting the non-arraypool case is very straight forward. It's just generalizing the existing lifetime restrictions we associate with stackalloc to instead be a collection of API calls. It's closer to a bug fix level of work than a new feature.

The ArrayPool case is very doable but it's definitely in the "new feature" category because we have to do the work to handle Free and design some cases around it. Example: do we make locals initialized with these APIs as unassignable? If we allow reassignment then we have to consider how that impacts us calling Free with the resulting value. Solvable items but definitely a bit of design work that needs to go into it.

tannergooding · 2021-04-29T18:16:59Z

That really sounds like an additional ask and one isn't strictly needed at the same time.

Pooling has a lot of different considerations and we ourselves largely only use it with a few primitive types (namely byte), not any T. It likewise requires:

a way to get the array from a Span<T>
knowing that Span<T> definitely points to the start of an Array
may involve custom pooling and not just ArrayPool
etc

I think its doable, but we could also unblock many scenarios with the above today and with minimal work.

jkotas · 2021-04-29T18:17:45Z

The ArrayPool case is very doable but it's definitely in the "new feature" category because we have to do the work to handle Free and design some cases around it.

I do not think we would necessarily want to deal with the pooling in Roslyn, nor have it backed by the ArrayPool as it exist today.

jkotas · 2021-04-29T18:20:59Z

we could also unblock many scenarios with the above today and with minimal work.

I do not see those scenarios. The minimal work just lets you do the same thing as what you can do with stackalloc today, just maybe saves you a few characters.

tannergooding · 2021-04-29T18:28:07Z

I do not see those scenarios

They exist everywhere that alloca is used in native. They exist in 3rd party libraries like ImageSharp.
They exist for types where array pooling isn't desirable because pooling has its own overhead and costs (and generally cost per type).

None of the existing proposals or discussions around this, including #25423 which has been around for 3 years, have really covered pooling as that is considered a more advanced scenario.

This covers the case of "I want to allocate on the stack for small data and on the heap for larger data" and where the limit for that might vary between platforms and architectures. Windows for example has a 1MB stack by default and uses 1024 bytes. Linux uses a 4MB stack and might want a different limit.

Encountering large lengths is typically expected to be rare, but not impossible. Its not unreasonable to simply new up an unpooled array in that scenario.

tannergooding · 2021-04-29T18:33:20Z

Pooling, for example, is likely only beneficial for types like byte, char, or int which are (for the vast majority case) the only usages in the BCL: https://source.dot.net/#System.Private.CoreLib/ArrayPool.cs,704a680ba600a2a4,references

EgorBo · 2021-04-29T18:40:18Z

stackalloc + new:

    Span<byte> span = Unsafe.StackallocOrCreateArray(len, 1024);
    // vs 
    Span<byte> span = len > 1024 ? new byte[len] : stackalloc byte[1024];

Indeed just saves a few characters (but nice to have).

But stackalloc + arraypool should save a lot 🙂 :

    byte[] arrayFromPool = null;
    Span<byte> span = len > 1024 ? (arrayFromPool = ArrayPool<byte>.Shared.Rent(len)) : stackalloc byte[1024];
    try
    {
    }
    finally
    {
        if (arrayFromPool != null)
            ArrayPool<byte>.Shared.Return(arrayFromPool );
    }

    // vs 
    Span<byte> span = Unsafe.StackallocOrPool(len, 1024);

jaredpar · 2021-04-29T18:44:31Z

Indeed just saves a few characters (but nice to have).

Has a couple of other benefits:

Path forward for supporting unmanaged types in stackalloc, particularly for code that needs to cross compile between .NET Core and Framework
Supports var

jaredpar · 2021-04-29T18:45:23Z

But stackalloc + arraypool should save a lot

I'm now seeing conflicting advice on whether or not arrays should be returned to the pool in a finally. Had others suggest that the finally is too much overhead and best to just let the array leak in the case of an exception.

gfoidl · 2021-04-29T18:47:34Z

@EgorBo your example with the pool would save even more when the Span is sliced to the desired length (as it's often needed that way when the length is given as argument).

jkotas · 2021-04-29T18:50:15Z

They exist for types where array pooling isn't desirable because pooling has its own overhead and costs (and generally cost per type).

This is due to current array pool design limitations. This is fixable by treating management of explicit lifetime memory as core runtime feature.

I'm now seeing conflicting advice on whether or not arrays should be returned to the pool in a finally

This depends on how performance sensitive your code is and how frequenly you expect exceptions to occur inside the scope. If your code is perf critical (e.g. number formatting) and you do not expect exceptions to ever occur inside the scope (e.g. the only exception you ever expect is out of memory), it is better to avoid finally as it is the common case in dotnet/runtime libraries.

tannergooding · 2021-04-29T19:04:10Z

This is due to current array pool design limitations. This is fixable by treating management of explicit lifetime memory as core runtime feature.

That also sounds like a feature that is potentially several releases out and which is going to require users and the compiler to review where it is good/correct to use.

Something like proposed here is usable in the interim, including for cases like params Span<T>. It represents something that many languages do provide and which is part of the "standard" set of memory allocation APIs commonly exposed by languages.
It likewise fills a gap for languages that don't have unsafe support or which don't have an implicit conversion to span, such as F#: fsharp/fslang-suggestions#720

Having to do Span<byte> span = len > 1024 ? new byte[len] : stackalloc byte[1024]; everywhere and then go and find/update every callsite if you end up changing the behavior or size limit isn't great.
Having an API is good for the same reason all helper/utility methods are good and helps with refactorings, maintainability, finding usages of the pattern, etc. It also allows it to easily be customized for different stack sizes, at least for Unix vs Windows and possibly vs MacOS or ARM or 32-bit vs 64-bit.

xoofx · 2021-04-29T19:35:36Z

What about making the allocator not necessarily bound to new byte[len] or ArrayPool<byte>.Shared.Rent(len) (e.g could come from e.g unmanaged memory pool)

namespace System.Runtime.CompilerServices
{
    public static unsafe partial class Unsafe
    {
        public static Span<T> Stackalloc<TAllocator, T>(int length, TAllocator allocator) 
                                                 where TAllocator: ISpanAllocator<T>
        // ...
    }
    
    public interface ISpanAllocator<T> {
         Span<T> Allocate(int length);   
    }
}

benaadams · 2021-04-29T19:36:09Z

Any T would be allowed and the JIT would simply do new T[length] for any types that cannot be stack allocated (reference types).

Could also allocate a series of ref fields (all null); and then allow indexing them as via Span

tannergooding · 2021-04-29T19:41:32Z

What about making the allocator not necessarily bound to new byte[len] or ArrayPool.Shared.Rent(len)

I think any API that isn't tracking either T* or T[] would need to return something like DisposableSpan<T> so the appropriate free could occur (or would need language support for the relevant TAllocator.Free to be called).

Otherwise, I think it falls into the general camp of what it seems @jkotas is proposing with runtime supported lifetime tracking.

xoofx · 2021-04-29T19:43:20Z

I think any API that isn't tracking either T* or T[] would need to return something like DisposableSpan<T> so the appropriate free could occur (or would need language support for the relevant TAllocator.Free to be called).

Oh, yeah true, Let's do it! 😅

xoofx · 2021-04-29T19:51:56Z

That starts to be as painful as implementing IEnumerable<T> 🙃

namespace System.Runtime.CompilerServices
{
    public static unsafe partial class Unsafe
    {
        public static TState Stackalloc<TAllocator, TState, T>(int length, TAllocator allocator, out Span<T> span) 
                                                 where TAllocator: ISpanAllocator<T, TState>
        // ...
    }
    
    public interface ISpanAllocator<T, TState> {
        Span<T> Allocate(int length, out TState state);   
        void Release(TState state);
    }
}

[Edit] Removed where TState: IDisposable as we have already ISpanAllocator.Release
[Edit2] Arg, actually, maybe it was better with the IDiposable, I told you, it's more painful than IEnumerable

kkokosa · 2021-04-30T16:45:58Z

I see again a new round of stackalloc, ArrayPool, "sufficent stack" discussion :) Just for a reference 👉 #24963

JimBobSquarePants · 2021-05-11T02:16:04Z

@tannergooding you weren’t misremembering, it was there but we’ve managed to incrementally remove all requirements as we optimised the code. The ArrayPool usage is gone now also.

Thealexbarney · 2021-07-24T22:21:19Z

What about a general language feature that took inspiration from C macros and C# source generators that could expand a "function call" into something else? This way users would be able to write their own StackallocOrCreateArray<T> that's tailored to their specific case.

Like the suggestion that the APIs be a specialized JIT intrinsic that operates in the frame of the caller, except more general. Maybe a function that could still only access arguments passed to it and its own variables which would still be scoped to itself, but could do things like stackalloc or return. It could be inlined into the caller by the C# or JIT compiler.

That idea is somewhat limited due to trying to be safer than straight text replacement, and I admit I don't know what other issues it might bring up.

A more powerful feature might be able to turn something like

Log.Info($"Some expensive expression: {ExpensiveFoo()}");

into

if (Log.InfoLogEnabled) {
    Log.LogImpl(LogLevel.Info, $"Some expensive expression: {ExpensiveFoo()}");
}

Currently you'd have to either move the check if logging's enabled into the caller, always build the string passed into the log function, or pass a lambda, all of which either have larger maintenance or performance costs to some degree.

I don't know if either of these completely solve the ArrayPool case. It would at least need some work on the design. For example, a user could use a struct like the following and then add a using var returner = new ArrayPoolReturner<T>(); before trying to do stackalloc or allocate the array, but that could easily cause problems when doing something like assigning the allocated Span<T> to a variable outside the scope where StackallocOrCreateArray<T> is called.

struct ArrayPoolReturner<T> : IDisposable {
    private T[] _array;
    
    public void SetRentedArray(T[] array) { /* impl */ }
    
    public void Dispose()
    {
        if(_array != null)
            ArrayPool<T>.Shared.Return(_array);
            
        _array = null;
    }
}

acaly · 2021-07-25T01:18:23Z

@Thealexbarney There is a planned interpolated string improvement that solves your logger scenario: https:/dotnet/csharplang/blob/f4d1c13a6a2ffd09b2e46b0bed57f2629640e440/proposals/improved-interpolated-strings.md.

Thealexbarney · 2021-07-25T21:35:24Z

@Thealexbarney There is a planned interpolated string improvement that solves your logger scenario

Ah, I wasn't aware of that part of the new interpolated string APIs. Although the logging example was meant as more of a scenario that most people would be familiar with rather than the only reason for bringing up the idea.

AraHaan · 2021-09-08T03:10:13Z

Background and Motivation

It is not uncommon, in performance oriented code, to want to stackalloc for small/short-lived collections. However, the exact size is not always well known in which case you want to fallback to creating an array instead.

Proposed API
namespace System.Runtime.CompilerServices
{
    public static unsafe partial class Unsafe
    {
        public static Span<T> Stackalloc<T>(int length);
        public static Span<T> StackallocOrCreateArray<T>(int length);
        public static Span<T> StackallocOrCreateArray<T>(int length, int maxStackallocLength);
    }
}
These APIs would be intrinsic to the JIT and would effectively be implemented as the following, except specially inlined into the function so the localloc scope is that of the calling method:
public static Span<T> StackallocOrCreateArray<T>(int length, int maxStackallocLength)
{
    return ((sizeof(T) * length) < maxStackallocLength) ? stackalloc T[length] : new T[length];
}
The variant that doesn't take maxStackallocLength would use some implementation defined default. Windows currently uses 1024.

Any T would be allowed and the JIT would simply do new T[length] for any types that cannot be stack allocated (reference types).

How would the stackalloced buffer remain valid after that function returns? I thought about this and I think the way stackalloc works today is that it lives only until the function returns, then it is freed (GC'd).

Also if it does work, what about ref types, that later needs resized? a prime example is my open Pull request in dotnet/winforms that is a bit tricky because I either have to: A: rent a 32k buffer all at once and stack overflow the thing with a super large stack allocation, or B: Use ArrayPool (also something that can fail the tests in that repository), or C: Some way to allocate a clean buffer using something like this, but later be able to realloca that stuff using stackalloc for the ref type (or create an array).

MichalStrehovsky · 2021-09-08T03:58:05Z

How would the stackalloced buffer remain valid after that function returns? I thought about this and I think the way stackalloc works today is that it lives only until the function returns, then it is freed (GC'd).

I think this is in the text you're quoting: "These APIs would be intrinsic to the JIT and would effectively be implemented as the following, except specially inlined into the function so the localloc scope is that of the calling method:"

Xyncgas · 2023-02-21T12:03:53Z

When parsing compressed stream, I find myself decompressing the stream and then parsing the result, the decompression is local to the function and not exposed to the user therefore instead of asking user to pass in a buffer I would much rather creating a buffer in stack to put the compressed bytes before decompressing them and transforming them and the size is small (a couple bytes)

Which is hard to do in F#

and the guys from F# seems to favor this instead of giving me a function that does stackalloac and returns a span

also while we are at it, would be nice to have value type equivalent of, MemoryStream and BinarySerializer for me to use together with stack allocated buffer

Although without these I can still use other hacks to decompress the bytes to stack buffer without MemoryStream or BinaryWriter/BinaryReader

timcassell · 2023-10-08T04:39:51Z

Any T would be allowed and the JIT would simply do new T[length] for any types that cannot be stack allocated (reference types).

Why can't stackalloc reference types be supported? I get that the GC doesn't track it, but why can't it?

weltkante · 2023-10-08T14:00:31Z

Why can't stackalloc reference types be supported?

Because the contract of a reference to a reference type is that anyone can take and keep such a reference for later, something allocated on a stack won't be able to hold that contract. You'd need to come up with a new contract to support what you're asking for, or allow breaking the memory safety of .NET

timcassell · 2023-10-08T14:08:34Z

Why can't stackalloc reference types be supported?

Because the contract of a reference to a reference type is that anyone can take and keep such a reference for later, something allocated on a stack won't be able to hold that contract. You'd need to come up with a new contract to support what you're asking for, or allow breaking the memory safety of .NET

I'm talking about stackalloc object[length], not stackalloc object().

ayende · 2023-10-08T14:11:44Z

Consider this code:

// Assume [SkipLocalInit]
var items = stackalloc string[2];
items[0] = new string('a', 255);
GC.Collect();
Console.WriteLine(items[0]);

The problem is likely that you now need to scan the stack itself for those roots as well.
And there is also an issue with the second value there, which may be garbage because of the SkipLocalInit, leaving aside the fact that raw buffers like that are problematic, since they are often used in.... interesting ways.

timcassell · 2023-10-08T14:21:17Z

The problem is likely that you now need to scan the stack itself for those roots as well.

What's the problem with that? Afaik, the GC already scans the stack for references.

And there is also an issue with the second value there, which may be garbage because of the SkipLocalInit.

There's an easy solution to that: the runtime enforces zero-initializing managed types, ignoring SkipLocalsInit. I think it already does that with managed locals anyway.

PatVax · 2024-07-08T22:23:37Z

Consider this code:
// Assume [SkipLocalInit]
var items = stackalloc string[2];
items[0] = new string('a', 255);
GC.Collect();
Console.WriteLine(items[0]);
The problem is likely that you now need to scan the stack itself for those roots as well. And there is also an issue with the second value there, which may be garbage because of the SkipLocalInit, leaving aside the fact that raw buffers like that are problematic, since they are often used in.... interesting ways.

This code already works:

Buffer b = new();
Console.WriteLine(b[0] is null);
Console.WriteLine(b[1] is null);
Console.WriteLine(b[2] is null);

b[0] = new string("Test");

Console.WriteLine(b[0]);

GC.Collect(2, GCCollectionMode.Default, true);
GC.WaitForPendingFinalizers();

Console.WriteLine(b[0]);

[InlineArray(3)]
struct Buffer
{
    private string? _element;
}

Why wouldn't it work with shorter syntax like stackalloc string[3] or Unsafe.Stackalloc<string>(3)?

ayende · 2024-07-09T06:46:57Z

You are missing the [SkipLocalInit] scenario. In that case, there may be garbage there.

PatVax · 2024-07-09T08:34:20Z

Given my understanding of [SkipLocalsInit] is correct:

Run();
Run();

[SkipLocalsInit]
void Run()
{
    Buffer b = new();
    Console.WriteLine(b[0] is null);
    Console.WriteLine(b[1] is null);
    Console.WriteLine(b[2] is null);

    b[0] = new string("Test");

    Console.WriteLine(b[0]);

    GC.Collect(2, GCCollectionMode.Default, true);
    GC.WaitForPendingFinalizers();

    Console.WriteLine(b[0]);
}

[InlineArray(3)]
struct Buffer
{
    private string? _element;
}

Still gives correct Output. Even though at the second call b[0] should definitely be not null after the first call.

weltkante · 2024-07-09T10:12:45Z

Given my understanding of [SkipLocalsInit] is correct:

You declared it but aren't using it, calling the constructor initializes, stackalloc doesn't. You'd want to try Unsafe.SkipInit in addition to the attribute to create the variable, not call the constructor (though I don't know if it'd even let you do that, it would be a recipe to generate corrupt memory, treating garbage memory as a valid reference, causing access violations or memory corruption if you try to dereference them)

tannergooding · 2024-07-09T16:08:18Z

[SkipLocalsInit] and Unsafe.SkipInit are ignored for reference type fields/locals. It is a strict requirement that these always be null or valid if they point into the GC heap (which couldn't be guaranteed for arbitrary memory).

colejohnson66 · 2024-10-14T12:57:10Z

Any T would be allowed and the JIT would simply do new T[length] for any types that cannot be stack allocated (reference types).

Why can't stackalloc reference types be supported? I get that the GC doesn't track it, but why can't it?

Even more curious is why [InlineArray] support was added, but stackalloc was never revisited:

https://sharplab.io/#v2:C4LgTgrgdgPgAgJgIwFgBQcAMACOSB0AStMAJYC2ApvgMID25ADqQDaVgDK7AbqQMaUAzgG506OAGZcCbDRYBDQYPQBvdNg3SE6zWrSaD2DsHl8A1gEEWLOn3nBKAEwAqAC1JQA5gB48mAHzY8tgAvNiOlABm8hAswKL6hhryANqE8lCODPgcrvJgTvgAcpQAHsAAFAAsCACUALqh2ABEDoLAzQlJGgCqgpRuHp4V8rVdmgC+YokajGCk3PaUuEgAbNjQgvKRy3BV2H0D7l4VHIwZvkgB2ILnULU6GnrdKwCcFbcZ+AAylF7ArjGj0MkToYGwFQ8wGwpCamGEMOw3hudx+f08AIRpAA1NiHjNDM8Xpo8O8ACTNFIqUgAGgQE0aYRUFSgEChtR6UC2O3wFkEAAU6FD2BUCpEURkUqR6rUQKUABwTJoqT5QKWNAD8GpaLNiLFqzQmzSBBIMUwJ5uBKQAklAWB5KBYwGB5ABPap1erAyQ3YCQPjQ4ymSzWWxLFzHHzOfzAgwAd1c7GWzmwIGkwKJSTmCyW2BTAH1KGwqFBgPDgeaJkA

using System;
using System.Runtime.CompilerServices;

public class Class
{
    public static void Main()
    {
        StackAllocatedThing<string> a = default;
        a[Random.Shared.Next(42)] = "test";
        UseThing(a);
    }

    private static unsafe void UseThing(Span<string> span)
    {
        Console.WriteLine(span.Length);
        for (int i = 0; i < span.Length; i++)
        {
            Console.WriteLine($"[{i,2}] = {(nuint)Unsafe.AsPointer(ref span[i]):x8} = {span[i] ?? "(null)"}");
        }
    }

    [InlineArray(42)]
    public struct StackAllocatedThing<T>
        where T : class
    {
        private T _element0;
    }
}

Ideally, StackAllocatedThing<string> a = default could just be Span<string> a = stackalloc string[42], but we can't.

jaredpar · 2024-10-14T13:37:47Z

Even more curious is why [InlineArray] support was added,

The inability to use fixed sized buffers of types other than core primitives was a significant blocker for low level scenarios. It's a restriction that goes back to C# 1.0 and a sore point since then. This hit a tipping point a few releases ago, the C# and runtime team collaborated to solve that problem and [InlineArray] was the result.

Ideally, StackAllocatedThing a = default could just be Span a = stackalloc string[42], but we can't.

That is a reasonable language suggestion. Essentially, create a language feature stackallloc <type>[<count>] that under the hood is backed by a [InlineArray]. It's been suggested a couple of times. The reason it hasn't happened is there just hasn't been enough of a need to have us take it on (nor is there a full proposal for this).

colejohnson66 · 2024-10-14T14:45:23Z

Except a language-level translation won't suffice because stackallocs don't always have a compile-time size. stackalloc T[count] is a common thing. Why can't the runtime just remove that restriction?

tannergooding · 2024-10-14T15:09:14Z

stackalloc T[count] is a common thing

It's not that common, in part because it is very expensive and often slower than simply new T[count]. stackalloc can come with many additional considerations, needs stack overflow checks, more expensive zeroing, buffer overrun protection, and more things due to the potential security issues that occur.

It is then "best practice" to keep stack allocations small (all stackallocs for a single method should typically add up to not more than 1024 bytes) and to never make them "dynamic" in length (instead rounding up to the largest buffer size).

This guidance is true even in native code (C, C++, assembly, etc) and not following it can in some cases interfere with or break internal CPU optimizations (such as mirroring stack spills to the register file).

Aniobodo · 2024-10-15T06:13:16Z

It is then "best practice" to keep stack allocations small (all stackallocs for a single method should typically add up to not more than 1024 bytes) and to never make them "dynamic" in length (instead rounding up to the largest buffer size).

Dynamic length works well if you reliably know your data source.

tannergooding · 2024-10-15T17:05:51Z

Dynamic lengths function as intended in many scenarios. However, they can lead to various issues including hurting performance and potentially opening yourself up to security problems (even if the data source is known).

There are multiple recommendations in this space that are effectively industry standard and they allow you to achieve the same overall thing without introducing the same risks. Those industry standards and recommendations should be considered alongside any API exposed here or future work done by the runtime to enable new scenarios.

tannergooding added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 29, 2021

dotnet-issue-labeler bot added area-System.Memory untriaged New issue has not been triaged by the area owner labels Apr 29, 2021

GrabYourPitchforks mentioned this issue May 12, 2021

Add a _malloca-like API Span<T>.Alloc(int length) #25423

Closed

jkotas mentioned this issue May 19, 2021

Codegen for a constant-sized stackalloc into a Span<T> should be on par with explicit-sized no-init local declaration #52979

Open

Happypig375 mentioned this issue Jun 2, 2021

Add a safe stackallocspan function that returns a Span fsharp/fslang-suggestions#720

Closed

5 tasks

jeffhandley added this to the Future milestone Jul 13, 2021

jeffhandley removed the untriaged New issue has not been triaged by the area owner label Jul 13, 2021

Joe4evr mentioned this issue Nov 29, 2021

[Question] how to return stack allocated bytes without putting it in the heap? #62127

Closed

Expose a malloca API that either stackallocs or creates an array. #52065

Expose a malloca API that either stackallocs or creates an array. #52065

Comments

tannergooding commented Apr 29, 2021 • edited by jaredpar Loading

Background and Motivation

Proposed API

ghost commented Apr 29, 2021

Background and Motivation

Proposed API

tannergooding commented Apr 29, 2021

tannergooding commented Apr 29, 2021

GrabYourPitchforks commented Apr 29, 2021 • edited Loading

jaredpar commented Apr 29, 2021

jkotas commented Apr 29, 2021

tannergooding commented Apr 29, 2021 • edited Loading

jkotas commented Apr 29, 2021

tannergooding commented Apr 29, 2021

jkotas commented Apr 29, 2021 • edited Loading

jaredpar commented Apr 29, 2021

tannergooding commented Apr 29, 2021

jkotas commented Apr 29, 2021 • edited Loading

jkotas commented Apr 29, 2021

tannergooding commented Apr 29, 2021

tannergooding commented Apr 29, 2021

EgorBo commented Apr 29, 2021

jaredpar commented Apr 29, 2021

jaredpar commented Apr 29, 2021

gfoidl commented Apr 29, 2021

jkotas commented Apr 29, 2021 • edited Loading

tannergooding commented Apr 29, 2021

xoofx commented Apr 29, 2021 • edited Loading

benaadams commented Apr 29, 2021

tannergooding commented Apr 29, 2021 • edited Loading

xoofx commented Apr 29, 2021

xoofx commented Apr 29, 2021 • edited Loading

kkokosa commented Apr 30, 2021 • edited Loading

JimBobSquarePants commented May 11, 2021 • edited Loading

Thealexbarney commented Jul 24, 2021

acaly commented Jul 25, 2021 • edited Loading

Thealexbarney commented Jul 25, 2021

AraHaan commented Sep 8, 2021

Background and Motivation

Proposed API

MichalStrehovsky commented Sep 8, 2021

Xyncgas commented Feb 21, 2023 • edited Loading

timcassell commented Oct 8, 2023

weltkante commented Oct 8, 2023

timcassell commented Oct 8, 2023

ayende commented Oct 8, 2023 • edited Loading

timcassell commented Oct 8, 2023

PatVax commented Jul 8, 2024

ayende commented Jul 9, 2024

PatVax commented Jul 9, 2024

weltkante commented Jul 9, 2024 • edited Loading

tannergooding commented Jul 9, 2024

colejohnson66 commented Oct 14, 2024 • edited Loading

jaredpar commented Oct 14, 2024

colejohnson66 commented Oct 14, 2024

tannergooding commented Oct 14, 2024

Aniobodo commented Oct 15, 2024

tannergooding commented Oct 15, 2024

Expose a `malloca` API that either stackallocs or creates an array. #52065

Expose a `malloca` API that either stackallocs or creates an array. #52065

tannergooding commented Apr 29, 2021 •

edited by jaredpar

Loading

GrabYourPitchforks commented Apr 29, 2021 •

edited

Loading

tannergooding commented Apr 29, 2021 •

edited

Loading

jkotas commented Apr 29, 2021 •

edited

Loading

jkotas commented Apr 29, 2021 •

edited

Loading

jkotas commented Apr 29, 2021 •

edited

Loading

xoofx commented Apr 29, 2021 •

edited

Loading

tannergooding commented Apr 29, 2021 •

edited

Loading

xoofx commented Apr 29, 2021 •

edited

Loading

kkokosa commented Apr 30, 2021 •

edited

Loading

JimBobSquarePants commented May 11, 2021 •

edited

Loading

acaly commented Jul 25, 2021 •

edited

Loading

Xyncgas commented Feb 21, 2023 •

edited

Loading

ayende commented Oct 8, 2023 •

edited

Loading

weltkante commented Jul 9, 2024 •

edited

Loading

colejohnson66 commented Oct 14, 2024 •

edited

Loading