Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: Improved interpolated strings #4487

Open
1 of 4 tasks
Tracked by #829 ...
333fred opened this issue Mar 1, 2021 · 28 comments
Open
1 of 4 tasks
Tracked by #829 ...

[Proposal]: Improved interpolated strings #4487

333fred opened this issue Mar 1, 2021 · 28 comments
Assignees
Labels
Implemented Needs ECMA Spec This feature has been implemented in C#, but still needs to be merged into the ECMA specification Proposal champion Proposal
Milestone

Comments

@333fred
Copy link
Member

333fred commented Mar 1, 2021

Improved interpolated strings

  • Proposed
  • Prototype: Not Started
  • Implementation: Not Started
  • Specification: Not Started

Split out from #2302.

Specification is here: https:/dotnet/csharplang/blob/main/proposals/csharp-10.0/improved-interpolated-strings.md

LDM Notes:

https:/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-03-01.md#interpolated-string-improvements
https:/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-03-15.md#interpolated-string-improvements
https:/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-03-24.md#improved-interpolated-strings
https:/dotnet/csharplang/blob/main/meetings/2021/LDM-2021-08-25.md

@yaakov-h
Copy link
Member

yaakov-h commented Mar 2, 2021

I don't understand this part of the spec:

By making the builder an out parameter we allow overloading of the GetInterpolatedStringBuilder method by builder type, which is useful for scenarios like the logger above, which could have TraceLoggerParamsBuilder/DebugLoggerParamsBuilder/WarningLoggerParamsBuilder/etc.

How would the overload occur in practice when the compiler translates the string interpolation expression using out var or equivalent?:

TraceLoggerParamsBuilder.GetInterpolatedStringBuilder(baseLength: 47, formatHoleCount: 1, receiverTemp, out var builder);

How would the compiler know which overload to select if there were multiple? By my reading of the spec, GetInterpolatedStringBuilder would have to be implemented as a static method on 3 different types in order to have TraceLoggerParamsBuilder/DebugLoggerParamsBuilder/WarningLoggerParamsBuilder rather than being able to use an overload.

@333fred
Copy link
Member Author

333fred commented Mar 2, 2021

@yaakov-h I used var for convenience there. It would not be var in practice.

@yaakov-h
Copy link
Member

yaakov-h commented Mar 2, 2021

In that case how would the compiler ever resolve an overload to something other than the type itself?

When an interpolated_string_expression is passed as an argument to a method, we look at the type of the parameter. If the parameter type has a static method GetInterpolatedStringBuilder ... and has an out parameter of the type of original method's parameter

It seems like those two types will always be the same, so where is there room for overloading?

Or would it be the case that a base class could provide overloads for a number of its descendant classes?

@333fred
Copy link
Member Author

333fred commented Mar 2, 2021

In that case how would the compiler ever resolve an overload to something other than the type itself?

I'm not sure what you mean. The idea is that a logging provider could provide GetInterpolatedString(..., out TraceLoggerParamsBuilder/InfoLoggerParamsBuilder/etc builder), and then LogTrace(TraceLoggerParamsBuilder builder), LogInfo(InfoLoggerParamsBuilder builder), etc. We'd then know which version of GetInterpolatedString to call based on the builder type.

@yaakov-h
Copy link
Member

yaakov-h commented Mar 2, 2021

I'm confused then.

My understanding based on my reading of the spec is that the parameter type has to provide GetInterpolatedString, so you would need:

ref struct TraceLoggerParamsBuilder
{
    public static void GetInterpolatedStringBuilder(..., out TraceLoggerParamsBuilder builder) { ... }
}

ref struct InfoLoggerParamsBuilder
{
    public static void GetInterpolatedStringBuilder(..., out InfoLoggerParamsBuilder builder) { ... }
}

ref struct WarnLoggerParamsBuilder
{
    public static void GetInterpolatedStringBuilder(..., out WarnLoggerParamsBuilder builder) { ... }
}

interface ILog
{
    void LogTrace(TraceLoggerParamsBuilder builder);
    void LogInfo(InfoLoggerParamsBuilder builder);
    void LogWarn(WarnLoggerParamsBuilder builder);
}

How could this be restructured such that one class defines all three overloads of GetInterpolatedStringBuilder?

@bjbr-dev
Copy link

How could this be restructured such that one class defines all three overloads of GetInterpolatedStringBuilder?

Could we you attributes instead?

ref struct LogParamsBuilder {
     public static InfoLoggerParamsBuilder GetTraceInterpolatedStringBuilder(...) { ... }
     public static InfoLoggerParamsBuilder GetInfoInterpolatedStringBuilder(...) { ... }
     public static InfoLoggerParamsBuilder GetWarnInterpolatedStringBuilder(...) { ... }
}


interface ILog
{
    void LogTrace([ParamBuilder("GetTraceInterpolatedStringBuilder")] LogParamsBuilder builder);
    void LogInfo([ParamBuilder("GetInfoInterpolatedStringBuilder")] LogParamsBuilder builder);
    void LogWarn([ParamBuilder("GetWarnInterpolatedStringBuilder")] LogParamsBuilder builder);
}

Where the attribute takes the method name of the type.
It could infer the type from the parameter type like the current proposal, or optionally take a typeof(LogParamsBuilder ) if for some reason the builder is different from the ParamsBuilder

Is there a performance gain by using an out parameter or was it just for overload resolution? It seems more intuitive to return a value then to have a void method with a single out parameter.

@bjbr-dev
Copy link

Im not sure on how the compiler would behave here, but it seems easier for the compiler if is a parameter attribute. If its an arbitrary class as a parameter to a method, then presumably every method call has to figure out whether the parameter is a possible param builder. Whereas the attribute will efficiently and obviously indicate this behaviour

@333fred
Copy link
Member Author

333fred commented Mar 26, 2021

then presumably every method call has to figure out whether the parameter is a possible param builder.

Yes, but only in cases where an interpolated string is actually passed as an argument, and only for that particular parameter type. This is not likely to be a very expensive check, I'm not worried about it :).

@ufcpp
Copy link

ufcpp commented Mar 31, 2021

How can this treat item format?

Log($"{i:X4}");

@333fred
Copy link
Member Author

333fred commented Mar 31, 2021

How can this treat item format?

However the builder wants? It needs to have an overload of TryFormat that accepts the interpolation hole content, which is both i and the string "X4". What happens after that is up to the builder type.

@ufcpp
Copy link

ufcpp commented Mar 31, 2021

I want to avoid string allocation of i.ToString("X4"). As an alternative idea, I want ISpanFormattable to be public.

Span<char> buffer = stackalloc char[4];
((ISpanFormattablei)i).Format(buffer, out var written, "X4");
Log(buffer.AsSpan(0, written));

@yaakov-h
Copy link
Member

yaakov-h commented Mar 31, 2021

Wouldn't the existing design avoid that kind of allocation if you chose to implement it kinda like this?:

public bool TryFormat(int i, string format)
{
    Span<char> buffer = stackalloc char[4];
    bool formatted = i.TryFormat(buffer, out var written, format);
    Debug.Assert(formatted == true);

    Log(buffer.AsSpan(0, written));
}

@333fred
Copy link
Member Author

333fred commented Mar 31, 2021

Wouldn't the existing design avoid that kind of allocation if you chose to implement it kinda like this?:

public bool TryFormat(int i, string format)
{
    Span<char> buffer = stackalloc char[4];
    bool formatted = i.Try.Format(buffer, out var written, format);
    Debug.Assert(formatted == true);

    Log(buffer.AsSpan(0, written));
}

Yup. The builder implements the TryFormat method. We never call ToString on any components in this proposal.

@ufcpp
Copy link

ufcpp commented Mar 31, 2021

@miyu
Copy link

miyu commented Apr 2, 2021

Would implementing TryFormat automatically affect the default ToString's behavior? I didn't see this mentioned in the spec.

struct Point
{
    public int X, Y;
    public bool TryFormat(Span<char> destination, out in charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider) =>
        destination.TryWrite($"X={X} Y={Y}", out charsWritten, provider);
    public override string ToString() => $"{this}"; // is this line necessary? or do I repeat it in every file?
}

(Based on an example graciously provided by @stephentoub in dotnet/runtime#50601)

@333fred
Copy link
Member Author

333fred commented Apr 2, 2021

I don't particularly understand what you're asking @miyu. Presumably, the first parameter of TryWrite takes a builder type, and that would format the string however it wants.

@KalleOlaviNiemitalo
Copy link

https:/dotnet/csharplang/blob/d41fcd0377db1ecccfa8de9185b9aa07f894b342/proposals/improved-interpolated-strings.md#lowering does not say how the compiler generates the int baseLength, int formatHoleCount arguments for the Create call. Is baseLength required to be exactly the total number of characters outside the interpolation holes, or is the compiler allowed to pad baselength to improve the chance that the initial buffer will be large enough? I imagine such padding could be beneficial if the compiler generates calls of the public void TryFormatInterpolationHole<T>(T value, int alignment, string format) method proposed in dotnet/runtime#50601 and notices an unusually large alignment.

@333fred
Copy link
Member Author

333fred commented Apr 3, 2021

Is baseLength required to be exactly the total number of characters outside the interpolation holes, or is the compiler allowed to pad baselength to improve the chance that the initial buffer will be large enough?

The exact length. That's why both values are passed, so the underlying API can make its own padding decisions.

@stephentoub
Copy link
Member

stephentoub commented Apr 3, 2021

@KalleOlaviNiemitalo has a good point, though. The compiler ends up potentially having more information it could pass to the Create method to enable the builder to do better presizing. The builder isn't currently told how big each hole is, and so can only guess at some average, but the compiler may be given more info in advance, e.g. an alignment that provides a minimum length on a particular hole's size. We should consider whether baseLength should instead be the minimum length the compiler guarantees will be required to format, rather than it just being the number of characters in the base strings. (It must, however, never overestimate, or some builders will end up being functionally incorrect.)

I've also been wondering if the compiler might be able to help the builder track past state to inform future use. Imagine if the builder was passed some kind of unique id per call site. It could then use that id if desired to remember the typical length of strings created for that call site in past executions in that process, in order to better presize in the Create method for future iterations. We might be able to approximate this with a CallerMemberName on Create, but that's also likely to be non-unique. (This could also be an unnecessary attempt at over optimization in the typical case that could actually deoptimize.)

@KalleOlaviNiemitalo
Copy link

I meant the spec doesn't have any wording to disallow a compiler from generating e.g. InterpolatedStringBuilder.Create(-1, -1) regardless of what is in the interpolated string expression. The argument values should be specified rather than only implied by parameter names.

@333fred
Copy link
Member Author

333fred commented Apr 4, 2021

The lowering section still needs to be fleshed out, and that will be based on LDM feedback on other parts of the proposal.

@ufcpp
Copy link

ufcpp commented Aug 14, 2021

It may already be too late, but can the parameter of AppendLiteral be char in addition to string?

e.g. now,

string s = $"{a}.{b}.{c}.{d}"

is lowered to

handler.AppendFormatted(a);
handler.AppendLiteral(".");
handler.AppendFormatted(b);
handler.AppendLiteral(".");
handler.AppendFormatted(c);
handler.AppendLiteral(".");
handler.AppendFormatted(d);

however can handler.AppendLiteral(".") be changed to handler.AppendLiteral('.') if the literal is a single character and the handler has a char overload?

@stephentoub
Copy link
Member

can the parameter of AppendLiteral be char in addition to string?

Is your concern performance? It won't matter with dotnet/runtime#57217.

@ufcpp
Copy link

ufcpp commented Aug 14, 2021

Is your concern performance?

Yes.

Is dotnet/runtime#57217 as fast as char overload? Can ldstr "." (var s = ".") be optimized as fast as ldc.i4.s 46 (char c = '.')?

@svick
Copy link
Contributor

svick commented Aug 14, 2021

@ufcpp As far as I can tell, it can. In my simplified test code, both versions end up with the same code for the fast path and using mov word ptr [edx+eax*2], 0x2e for the ..

@jnm2
Copy link
Contributor

jnm2 commented Aug 14, 2021

Does that mean the recently-created https://docs.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1834 is now obsolete, or is StringBuilder itself still slower?

image

@stephentoub
Copy link
Member

stephentoub commented Aug 14, 2021

No, this optimization only works due to the call site being a literal. With AppendLiteral, that's expected 100% of the time with normal usage. That expectation let's us mark it as aggressive inlining essentially without penalty. That's not the case for StringBuilder; it's very frequently used with variables, which would make aggressively inlining it a bad choice, and adding a fast path for single chars would potentially regress a common case.

@333fred 333fred added the Implemented Needs ECMA Spec This feature has been implemented in C#, but still needs to be merged into the ECMA specification label Jan 11, 2022
@tats-u
Copy link

tats-u commented May 6, 2024

We need to consider adding a new parameter that indicates the handler treats string or char constants in the interpolated string in the same way as literal strings there to the constructor of InterpolatedStringHandlerAttribute.

Console.Error.WriteLine(string.Create(CultureInfo.InvariantCulture, $"The value of {nameof(FooBar)} is incorrect: {FooBar.Value}"));

nameof(FooBar) can be merged into the adjacent literals here. ("The value of FooBar is incorrect: ") However, some custom handlers may not prefer this behavior.

If the interpolated string is evaluated as string, we can optimize it at the compiler's discretion: dotnet/roslyn#72308 dotnet/roslyn#71801
However, the specification of InterpolatedStringHandlerAttribute must be revised to deal with those that are evaluated as handler types (especially those other than DefaultInterpolatedStringHandler).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Implemented Needs ECMA Spec This feature has been implemented in C#, but still needs to be merged into the ECMA specification Proposal champion Proposal
Projects
None yet
Development

No branches or pull requests