-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose a malloca
API that either stackallocs or creates an array.
#52065
Comments
Tagging subscribers to this area: @GrabYourPitchforks, @carlossanlop Issue DetailsBackground and MotivationIt is not uncommon, in performance oriented code, to want to Proposed APInamespace System.Runtime.CompilerServices
{
public static unsafe partial class Unsafe
{
public static Span<T> Stackalloc(int length);
public static Span<T> StackallocOrCreateArray<T>(int length);
public static Span<T> StackallocOrCreateArray<T>(int length, int maxStackallocLength);
}
} These APIs would be public static Span<T> StackallocOrCreateArray<T>(int length, int maxStackallocLength)
{
return ((sizeof(T) * length) < maxStackallocLength) ? stackalloc T[length] : new T[length];
} The variant that doesn't take Any
|
This issue came up on Twitter again (https://twitter.com/jaredpar/status/1387798562117873678?s=20) and we have valid use cases in the framework and compiler. This has been somewhat stuck in limbo as runtime/framework saying "we need language support first" and the language saying "we need the runtime/framework to commit to doing this first". We should review and approve this to unblock the language from committing to their work and can do all the appropriate implementation/prep work on the runtime side, without actually making it public until the language feature is available. |
There would not be an API which opts to use the
|
Related to #25423. That proposal is a bit light on concrete APIs, but it suggests behaviors / analyzers / other ecosystem goodness we'd likely want to have around this construct. |
This does require language changes to work correctly but the implementation is very straight forward. The compiler will just treat all of these calls as if they are not safe to escape from the calling method. Effectively it would have the same lifetime limitation as calling I think the best approach is to just have the compiler trigger on the FQN of the method. Essentially any API with this signature in any assembly would be treated this way. That would make it easier to write code that multi-targets between .NET Core and .NET Framework as the framework side of this could be implemented as The other advantage of this API is that w can once again var all the things. var local1 = stackalloc int[42]; // int*
var local2 = Unsafe.StackAlloc<int>(42); // Span<int> |
This is one of those features that requires a joint work from all runtimes/JIT/language/libraries. Our .NET 6 budget for features in this space was taken by the generic numerics. We should include this proposal next time we do planning in this area. Approving this API without the resource commintment won't achieve much. |
It gives us a surface on which this can be implemented given "free time" and can be prioritized appropriately. The library work is approving the API and exposing the surface area. The JIT work should just be implementing it as a recursive named intrinsic and then creating the relevant nodes for: if ((sizeof(T) * length) < maxStackallocLength)
{
var x = stackalloc T[length];
return new Span<T>(x, length);
}
else
{
var x = new T[length];
return new Span<T>(x);
} This is fairly straightforward, except for the |
I do not think we would want to do a naive implementation like this. I think we would want to do explicit life-time tracking even when the lenght is over the threashold. |
What's the scenario where the JIT needs to do additional tracking that isn't already covered by the language rules and by the existing tracking for Users can and already do write the above today, just manually inlined. We are looking at doing exactly this already in one of the |
We would be leaving performance on the table. Majority of the existing stackalloc uses are using ArrayPool as the fallback. If the new API is not using pooled memory as the fallback, the majority of the existing stackalloc sites won't be able to use it. |
That requires a different level of language support. Supporting the non-arraypool case is very straight forward. It's just generalizing the existing lifetime restrictions we associate with The |
That really sounds like an additional ask and one isn't strictly needed at the same time. Pooling has a lot of different considerations and we ourselves largely only use it with a few primitive types (namely
I think its doable, but we could also unblock many scenarios with the above today and with minimal work. |
I do not think we would necessarily want to deal with the pooling in Roslyn, nor have it backed by the ArrayPool as it exist today. |
I do not see those scenarios. The minimal work just lets you do the same thing as what you can do with stackalloc today, just maybe saves you a few characters. |
They exist everywhere that None of the existing proposals or discussions around this, including #25423 which has been around for 3 years, have really covered pooling as that is considered a more advanced scenario. This covers the case of "I want to allocate on the stack for small data and on the heap for larger data" and where the limit for that might vary between platforms and architectures. Windows for example has a 1MB stack by default and uses 1024 bytes. Linux uses a 4MB stack and might want a different limit. Encountering large lengths is typically expected to be rare, but not impossible. Its not unreasonable to simply new up an unpooled array in that scenario. |
Pooling, for example, is likely only beneficial for types like |
Span<byte> span = Unsafe.StackallocOrCreateArray(len, 1024);
// vs
Span<byte> span = len > 1024 ? new byte[len] : stackalloc byte[1024]; Indeed just saves a few characters (but nice to have). But byte[] arrayFromPool = null;
Span<byte> span = len > 1024 ? (arrayFromPool = ArrayPool<byte>.Shared.Rent(len)) : stackalloc byte[1024];
try
{
}
finally
{
if (arrayFromPool != null)
ArrayPool<byte>.Shared.Return(arrayFromPool );
}
// vs
Span<byte> span = Unsafe.StackallocOrPool(len, 1024); |
Has a couple of other benefits:
|
I'm now seeing conflicting advice on whether or not arrays should be returned to the pool in a |
@EgorBo your example with the pool would save even more when the Span is sliced to the desired length (as it's often needed that way when the length is given as argument). |
This is due to current array pool design limitations. This is fixable by treating management of explicit lifetime memory as core runtime feature.
This depends on how performance sensitive your code is and how frequenly you expect exceptions to occur inside the scope. If your code is perf critical (e.g. number formatting) and you do not expect exceptions to ever occur inside the scope (e.g. the only exception you ever expect is out of memory), it is better to avoid finally as it is the common case in dotnet/runtime libraries. |
That also sounds like a feature that is potentially several releases out and which is going to require users and the compiler to review where it is good/correct to use. Something like proposed here is usable in the interim, including for cases like Having to do |
What about making the allocator not necessarily bound to namespace System.Runtime.CompilerServices
{
public static unsafe partial class Unsafe
{
public static Span<T> Stackalloc<TAllocator, T>(int length, TAllocator allocator)
where TAllocator: ISpanAllocator<T>
// ...
}
public interface ISpanAllocator<T> {
Span<T> Allocate(int length);
}
} |
Could also allocate a series of ref fields (all null); and then allow indexing them as via Span |
I think any API that isn't tracking either Otherwise, I think it falls into the general camp of what it seems @jkotas is proposing with runtime supported lifetime tracking. |
Oh, yeah true, Let's do it! 😅 |
That starts to be as painful as implementing namespace System.Runtime.CompilerServices
{
public static unsafe partial class Unsafe
{
public static TState Stackalloc<TAllocator, TState, T>(int length, TAllocator allocator, out Span<T> span)
where TAllocator: ISpanAllocator<T, TState>
// ...
}
public interface ISpanAllocator<T, TState> {
Span<T> Allocate(int length, out TState state);
void Release(TState state);
}
} [Edit] Removed |
I see again a new round of |
@tannergooding you weren’t misremembering, it was there but we’ve managed to incrementally remove all requirements as we optimised the code. The ArrayPool usage is gone now also. |
What about a general language feature that took inspiration from C macros and C# source generators that could expand a "function call" into something else? This way users would be able to write their own Like the suggestion that the APIs be a specialized JIT intrinsic that operates in the frame of the caller, except more general. Maybe a function that could still only access arguments passed to it and its own variables which would still be scoped to itself, but could do things like stackalloc or return. It could be inlined into the caller by the C# or JIT compiler. That idea is somewhat limited due to trying to be safer than straight text replacement, and I admit I don't know what other issues it might bring up. A more powerful feature might be able to turn something like Log.Info($"Some expensive expression: {ExpensiveFoo()}"); into if (Log.InfoLogEnabled) {
Log.LogImpl(LogLevel.Info, $"Some expensive expression: {ExpensiveFoo()}");
} Currently you'd have to either move the check if logging's enabled into the caller, always build the string passed into the log function, or pass a lambda, all of which either have larger maintenance or performance costs to some degree. I don't know if either of these completely solve the struct ArrayPoolReturner<T> : IDisposable {
private T[] _array;
public void SetRentedArray(T[] array) { /* impl */ }
public void Dispose()
{
if(_array != null)
ArrayPool<T>.Shared.Return(_array);
_array = null;
}
} |
@Thealexbarney There is a planned interpolated string improvement that solves your logger scenario: https:/dotnet/csharplang/blob/f4d1c13a6a2ffd09b2e46b0bed57f2629640e440/proposals/improved-interpolated-strings.md. |
Ah, I wasn't aware of that part of the new interpolated string APIs. Although the logging example was meant as more of a scenario that most people would be familiar with rather than the only reason for bringing up the idea. |
How would the stackalloced buffer remain valid after that function returns? I thought about this and I think the way stackalloc works today is that it lives only until the function returns, then it is freed (GC'd). Also if it does work, what about ref types, that later needs resized? a prime example is my open Pull request in dotnet/winforms that is a bit tricky because I either have to: A: rent a 32k buffer all at once and stack overflow the thing with a super large stack allocation, or B: Use ArrayPool (also something that can fail the tests in that repository), or C: Some way to allocate a clean buffer using something like this, but later be able to |
I think this is in the text you're quoting: "These APIs would be intrinsic to the JIT and would effectively be implemented as the following, except specially inlined into the function so the localloc scope is that of the calling method:" |
When parsing compressed stream, I find myself decompressing the stream and then parsing the result, the decompression is local to the function and not exposed to the user therefore instead of asking user to pass in a buffer I would much rather creating a buffer in stack to put the compressed bytes before decompressing them and transforming them and the size is small (a couple bytes) Which is hard to do in F# and the guys from F# seems to favor this instead of giving me a function that does stackalloac and returns a span also while we are at it, would be nice to have value type equivalent of, MemoryStream and BinarySerializer for me to use together with stack allocated buffer Although without these I can still use other hacks to decompress the bytes to stack buffer without MemoryStream or BinaryWriter/BinaryReader |
Why can't stackalloc reference types be supported? I get that the GC doesn't track it, but why can't it? |
Because the contract of a reference to a reference type is that anyone can take and keep such a reference for later, something allocated on a stack won't be able to hold that contract. You'd need to come up with a new contract to support what you're asking for, or allow breaking the memory safety of .NET |
I'm talking about |
Consider this code:
The problem is likely that you now need to scan the stack itself for those roots as well. |
What's the problem with that? Afaik, the GC already scans the stack for references.
There's an easy solution to that: the runtime enforces zero-initializing managed types, ignoring |
This code already works: Buffer b = new();
Console.WriteLine(b[0] is null);
Console.WriteLine(b[1] is null);
Console.WriteLine(b[2] is null);
b[0] = new string("Test");
Console.WriteLine(b[0]);
GC.Collect(2, GCCollectionMode.Default, true);
GC.WaitForPendingFinalizers();
Console.WriteLine(b[0]);
[InlineArray(3)]
struct Buffer
{
private string? _element;
} Why wouldn't it work with shorter syntax like |
You are missing the |
Given my understanding of Run();
Run();
[SkipLocalsInit]
void Run()
{
Buffer b = new();
Console.WriteLine(b[0] is null);
Console.WriteLine(b[1] is null);
Console.WriteLine(b[2] is null);
b[0] = new string("Test");
Console.WriteLine(b[0]);
GC.Collect(2, GCCollectionMode.Default, true);
GC.WaitForPendingFinalizers();
Console.WriteLine(b[0]);
}
[InlineArray(3)]
struct Buffer
{
private string? _element;
} Still gives correct Output. Even though at the second call |
You declared it but aren't using it, calling the constructor initializes, stackalloc doesn't. You'd want to try |
|
Even more curious is why using System;
using System.Runtime.CompilerServices;
public class Class
{
public static void Main()
{
StackAllocatedThing<string> a = default;
a[Random.Shared.Next(42)] = "test";
UseThing(a);
}
private static unsafe void UseThing(Span<string> span)
{
Console.WriteLine(span.Length);
for (int i = 0; i < span.Length; i++)
{
Console.WriteLine($"[{i,2}] = {(nuint)Unsafe.AsPointer(ref span[i]):x8} = {span[i] ?? "(null)"}");
}
}
[InlineArray(42)]
public struct StackAllocatedThing<T>
where T : class
{
private T _element0;
}
} Ideally, |
The inability to use fixed sized buffers of types other than core primitives was a significant blocker for low level scenarios. It's a restriction that goes back to C# 1.0 and a sore point since then. This hit a tipping point a few releases ago, the C# and runtime team collaborated to solve that problem and
That is a reasonable language suggestion. Essentially, create a language feature |
Except a language-level translation won't suffice because stackallocs don't always have a compile-time size. |
It's not that common, in part because it is very expensive and often slower than simply It is then "best practice" to keep stack allocations small (all stackallocs for a single method should typically add up to not more than 1024 bytes) and to never make them "dynamic" in length (instead rounding up to the largest buffer size). This guidance is true even in native code (C, C++, assembly, etc) and not following it can in some cases interfere with or break internal CPU optimizations (such as mirroring stack spills to the register file). |
Dynamic length works well if you reliably know your data source. |
Dynamic lengths function as intended in many scenarios. However, they can lead to various issues including hurting performance and potentially opening yourself up to security problems (even if the data source is known). There are multiple recommendations in this space that are effectively industry standard and they allow you to achieve the same overall thing without introducing the same risks. Those industry standards and recommendations should be considered alongside any API exposed here or future work done by the runtime to enable new scenarios. |
Background and Motivation
It is not uncommon, in performance oriented code, to want to
stackalloc
for small/short-lived collections. However, the exact size is not always well known in which case you want to fallback to creating an array instead.Proposed API
These APIs would be
intrinsic
to the JIT and would effectively be implemented as the following, except specially inlined into the function so thelocalloc
scope is that of the calling method:The variant that doesn't take
maxStackallocLength
would use some implementation defined default. Windows currently uses1024
.Any
T
would be allowed and the JIT would simply donew T[length]
for any types that cannot be stack allocated (reference types).The text was updated successfully, but these errors were encountered: