Execution-despatch optimization (reducing heap allocations) #271

markrendle · 2017-07-14T11:57:17Z

Hi. I've been looking at the internal source code and thinking about optimization, in the spirit of new .NET low-allocations and everything. This is not intended as criticism (Polly is awesome), just a suggestion and an offer to contribute some work.

Anyway.

The way Polly currently handles work-to-be-executed is, at the simplest level, as a Func<TResult>. Any state necessary for the operation must therefore be provided as closures, incurring heap allocations. There's a move in some of the Core projects towards instead passing arguments along with delegates to avoid these closure allocations, so I thought I'd have a go at that in the context of Polly. So, for example, instead of

public string Get(int id) {
  return policy.Execute(() => id.ToString());
}

you would have

public string Get(int id) {
  return policy.Execute(n => n.ToString(), id);
}

Except it turns out that's a bit of a bugger, because in the internal code you have to multiply all the internal engines' Execute and ExecuteAndCapture by however many overloads you have (Func<T1,TResult>, Func<T1,T2,TResult>, etc), and that's not fun for anybody.

So then I thought about wrapping the delegate and its argument(s) in a struct which implemented a single, consistent interface, like this:

public struct PollyAction<T1, TResult> : IPollyAction<TResult>
{
	private readonly Func<T1, TResult> _action;
	private readonly T1 _arg1;

	public PollyAction(Func<T1, TResult> action, T1 arg1)
	{
		_action = action;
		_arg1 = arg1;
	}

	public TResult Execute() => _action(_arg1);
}

public interface IPollyAction<TResult>
{
	TResult Execute();
}

Then all the Execute and ExecuteAndCapture calls can just take an instance of that interface:

public TResult Execute<TResult>(IPollyAction<TResult> action) {
  while (true) {
    try {
      return action.Execute();
    } catch {
      // Whatever exception handling
    }
  }
}

Except, that also causes allocations because the struct gets boxed to the IPollyAction<TResult> interface.

But if you change the Execute method to this:

public TResult Execute<TAction, TResult>(TAction action)
  where TAction : IPollyAction<TResult>
{
  while (true) {
    try {
      action.Execute();
      break;
    } catch {
      // Whatever exception handling
    }
  }
}

then the generic method acts on the struct and no boxing, and hence no GC allocation, occurs. The only downside is that the call to this method has to explicitly specify the generic type parameters because C#'s overload inference doesn't recurse, but it's not too bad and will be hidden inside the library code.

I threw together a very quick BenchmarkDotNet comparing this approach to the current closure one:

public class Benchmarks
{
	[Benchmark]
	public int Closure()
	{
		int x = 2;
		return Execute(() => x * 2);
	}

	[Benchmark]
	public int Action()
	{
		int x = 2;
		return Execute<PollyAction<int, int>, int>(new PollyAction<int, int>(n => n * 2, x));
	}

	private static T Execute<T>(Func<T> action) => action();

	private static TResult Execute<TAction, TResult>(TAction action)
		where TAction : IPollyAction<TResult>
		=> action.Execute();
}

BenchmarkDotNet=v0.10.8, OS=Windows 7 SP1 (6.1.7601)
Processor=Intel Xeon CPU E5-2660 0 2.20GHz, ProcessorCount=2
Frequency=14318180 Hz, Resolution=69.8413 ns, Timer=HPET
dotnet cli version=1.0.0
  [Host]     : .NET Core 4.6.25009.03, 64bit RyuJIT
  DefaultJob : .NET Core 4.6.25009.03, 64bit RyuJIT

Method	Mean	Error	StdDev	Gen 0	Allocated
Closure	20.653 ns	0.4642 ns	1.1561 ns	0.0139	88 B
PollyAction	8.418 ns	0.1315 ns	0.1027 ns	-	0 B

Using the struct wrapper eliminates heap allocations and cuts execution time by >50% just for this simple example. Polly currently creates multiple closures per-call.

Question is, does this matter enough, and are the maintainers interested in this level of optimization? If it does and they are, I'm happy to spike something more representative, like redoing RetryPolicy to use this approach, to enable further discussion.

The text was updated successfully, but these errors were encountered:

reisenberger · 2017-07-15T21:45:46Z

@markrendle Thanks for the really detailed engagement! No criticism taken - the biggest challenge with Polly is simply getting through all the ideas we have. Fair to say that the main focus since App-vNext took over has been new features ... there are gains that can be made in perf for sure (aware of things that it would be good to squash out/change from the inherited codebase), so thanks for the reminder - and offer of help!

Quick response today - more very soon. Briefly, I think we should consider this both internally (where Polly may unwantedly use closures internally that we can eliminate); and externally (where executing a user delegate that wants input parameters currently requires those params to be passed in by closure ... eliminating that). Had you been focusing more on one or the other, or both? On the public API in particular, I want to reflect on and surface a few options for how that could look.

markrendle · 2017-07-16T14:01:32Z

@reisenberger Cool cool cool.

I was considering both aspects, the internal and external code. For the user-facing API, it should be as simple as offering overloads of the various methods that take delegates with additional parameters, and arguments for those parameters. My only slight concern there is the interaction with the existing overloads that take a Context and CancellationToken, but my brief sandboxing suggests it should be reasonably straightfoward and the C# compiler will hit the right overloads. The biggest issue there is you get to the point where you might want to code-gen the generic parameter overloads, from Func<T1,TResult> to Func<T1,...,T8,TResult>.

On the internal code I think it's more of a hunt-and-pick process, once the basic library code is in place.

Like I say, I'm very happy to pitch in PRs if issues are opened with up-for-grabs labels.

reisenberger · 2018-01-06T11:27:14Z

(triage of open issues) This is not forgotten. TL;DR related work is happening as part of/tied in with wider changes to Polly.

Where Polly internally uses closures, these are being removed (/the intent is to) as part of the work on metrics PROPOSAL: Add metrics to Polly #326
Where Polly currently requires closures to capture input parameters to user delegates, the package-parameters-and-delegate-in-a-struct approach looks a strong direction to take this. It would probably require syntax changes (as part of a 'Polly vNext') to make new overloads more feasible to integrate.

For ref for anyone interested in performance: we benchmarked Polly throughput at Polly v5.1 at around a million operations/second for all policy types. (IE, using a Policy adds ~1 microsecond to your call.) We can continue to use those benchmarks and post stuff in the #performance channel on slack as things evolve.

Add structs which hold an executable action, and its input parameters, for all Policy executions. Allows us to remove the allocations generated by closures, in all Polly executions. See App-vNext#271

grant-d · 2018-11-14T02:30:11Z

Here's a similar example I came across. I believe this would be better served as a DoNothing method, which would have less runtime overhead:

public static RetryPolicy<TResult> WaitAndRetry<TResult>(this PolicyBuilder<TResult> policyBuilder, ...)
{
    Action<DelegateResult<TResult>, TimeSpan, int, Context> doNothing = (_, __, ___, ____) => { };

    return policyBuilder.WaitAndRetry(retryCount, sleepDurationProvider, doNothing);
}

Should be:

public static RetryPolicy<TResult> WaitAndRetry<TResult>(this PolicyBuilder<TResult> policyBuilder, ...)
{
    return policyBuilder.WaitAndRetry(retryCount, sleepDurationProvider, DoNothing);
}

private static void DoNothing(…) { }

reisenberger · 2020-01-25T22:35:28Z

@grant-d Thank you for the extra suggestion about allocations here. Benchmarking identified that in fact the opposite applied: a DoNothing() method caused extra allocations. See benchmark and possible explanations in #741 . In any case, the point became moot as treating the no-callback-delegate-is-configured case as null trumped any other option. PR #742 implemented that optimisation and it will release with Polly v8.0.0.

Thanks again for everything you've contributed to Polly! (including the wait-and-retry helpers)

reisenberger · 2020-01-25T23:37:53Z

PR #748 delivers on the main recommendations of this issue, and will be released as part of Polly v8.0.0

Allocations per execution reduce as follows: (stats from BenchmarkDotNet)

Sync executions

Method	Polly v7.2.0 allocations (bytes)	Polly v8.0.0.21-f5ce247 allocations (bytes)
`ISyncPolicy.Execute(Action)`	248 B	72 B *
`ISyncPolicy.Execute<TResult>(Func<TResult>)`	160 B	72 B
`ISyncPolicy<TResult>.Execute(Func<TResult>)`	160 B	72 B
`Policy.Wrap(ISyncPolicy, ISyncPolicy).Execute(Action)`	432 B	72 B
`Policy.Wrap(ISyncPolicy, ISyncPolicy).Execute<TResult>(Func<TResult>)`	256 B	72 B
`Policy.Wrap(ISyncPolicy<TResult>, ISyncPolicy<TResult>) .Execute(Func<TResult>)`	256 B	72 B

*The 72 bytes represent the Context travelling with the execution, to support traceability per execution.

Async executions

Async executions inevitably carry lightly greater allocations:

a limited number of Task instances must be allocated;
allocations are incurred by the .NET async machinery such as the async/await state machine and continuation delegates.

Significant savings have nonetheless been achieved. Polly uses async/await-elision internally wherever possible to reduce the extra Tasks and async-machinery allocated. An execution through the simplest async policy has been minimised to incur only two awaits: one to correctly scope the Context travelling with the execution; the other necessarily to await the delegate the user passes to execute.

Method	Polly v7.2.0 allocations (bytes)	Polly v8.0.0.21-f5ce247 allocations (bytes)
`IAsyncPolicy.ExecuteAsync(Func<Task>)`	400 B	72 B
`IAsyncPolicy.ExecuteAsync<TResult>(Func<Task<TResult>>)`	304 B	144 B
`IAsyncPolicy<TResult>.ExecuteAsync(Func<Task<TResult>>)`	304 B	144 B
`Policy.WrapAsync(IAsyncPolicy, IAsyncPolicy).ExecuteAsync(Func<Task>)`	672 B	72 B
`Policy.WrapAsync(IAsyncPolicy, IAsyncPolicy).ExecuteAsync<TResult>(Func<Task<TResult>>)`	768 B	288 B
`Policy.WrapAsync(IAsyncPolicy<TResult>, IAsyncPolicy<TResult>) .ExecuteAsync(Func<Task<TResult>>)`	768 B	288 B

These statistics represent allocations attributable to execution despatch, calculated with NoOpPolicy.

Latencies

With these changes order-of-magnitude of execution latency has been maintained at ~1microsecond (one-millionth-second) extra latency per Polly policy applied to an execution, for all policy types.

reisenberger · 2020-01-25T23:43:19Z

Polly v8.0.0 will also add new execute overloads allowing passing strongly-typed input objects to delegates to be executed, without using a closure:

void Execute<T1>(Action<Context, CancellationToken, T1> action, Context context, CancellationToken cancellationToken, T1 input1)
void Execute<T1, T2>(Action<Context, CancellationToken, T1, T2> action, Context context, CancellationToken cancellationToken, T1 input1, T2 input2)

TResult Execute<T1, TResult>(Func<Context, CancellationToken, T1, TResult> func, Context context, CancellationToken cancellationToken, T1 input1)
TResult Execute<T1, T2, TResult>(Func<Context, CancellationToken, T1, T2, TResult> func, Context context, CancellationToken cancellationToken, T1 input1, T2 input2)

Task ExecuteAsync<T1>(Func<Context, CancellationToken, bool, T1, Task> action, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1)
Task ExecuteAsync<T1, T2>(Func<Context, CancellationToken, bool, T1, T2, Task> action, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1, T2 input2)

Task<TResult> ExecuteAsync<T1, TResult>(Func<Context, CancellationToken, bool, T1, Task<TResult>> func, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1)
Task<TResult> ExecuteAsync<T1, T2, TResult>(Func<Context, CancellationToken, bool, T1, T2, Task<TResult>> func, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1, T2 input2)

(and similar .ExecuteAndCapture/Async(...) variants)

reisenberger · 2020-01-25T23:45:45Z

Breaking changes: Custom policies hosted outside Polly will need to make a couple of small adjustments (method naming, change of execution-despatch type) to dovetail with the new execution despatch. Details to be summarised in full for the v8.0.0 release.

SimonCropp · 2021-06-06T23:47:45Z

with the PR merged, should this be closed?

or is there more work to be done? and can i help?

martincostello · 2021-06-07T07:48:43Z

I think this is still open pending Polly v8 (whenever that happens - it's pretty much on-ice during Dylan's absence).

SimonCropp · 2021-06-07T12:24:22Z

seems other issues in the v8 milestone are closed when fixed https:/App-vNext/Polly/issues?q=is%3Aclosed+milestone%3Av8.0.0

martintmk · 2023-06-19T10:36:18Z

Hey folks, this is addressed in V8 where the ResilienceStrategy allow passing TState to lambda. This way, you can use static lambdas that use the state for executions.

Polly/bench/Polly.Core.Benchmarks/ResilienceStrategyBenchmark.cs

Line 21 in 6fcf0f6

 await NullResilienceStrategy.Instance.ExecuteAsync((_, _) => new ValueTask<string>("dummy"), context, "state").ConfigureAwait(false); 

@martincostello , I think we can close this one.

cc @SimonCropp

martincostello · 2023-06-19T10:37:47Z

Yep, I think agree - v8 should resolve this scenario.

markrendle changed the title ~~Optimization~~ Optimization (reducing heap allocations) Jul 14, 2017

reisenberger added the performance label Jul 19, 2017

reisenberger added the awaiting wider syntax changes label Jan 2, 2018

reisenberger added in progress v6.0-alpha ready and removed awaiting wider syntax changes labels Mar 4, 2018

reisenberger removed the in progress label Sep 8, 2018

This was referenced Jan 3, 2019

Inherit IReadOnlyPolicyRegistry from IEnumerable<TKey, IsPolicy> #520

Closed

Clean up the Polly code base #557

Closed

moerwald mentioned this issue Jan 21, 2019

[Issue 557]: Changed local Func/Action to local function. #562

Closed

4 tasks

reisenberger mentioned this issue Dec 30, 2019

Performance: handle non-configured policy callbacks as null (reduce allocations) #741

Closed

reisenberger mentioned this issue Jan 25, 2020

Reduce allocations for execution despatch (#271) #748

Merged

4 tasks

reisenberger changed the title ~~Optimization (reducing heap allocations)~~ Execution-despatch optimization (reducing heap allocations) Jan 25, 2020

reisenberger added this to the v8.0.0 milestone Jan 25, 2020

reisenberger added a commit that referenced this issue Jan 25, 2020

Reduce allocations for execution despatch (#271) (#748)

9de173b

reisenberger added the breaking change label Jan 25, 2020

martincostello removed the v8.0-alpha ready label Jun 16, 2023

martincostello removed this from the v8.0.0 milestone Jun 16, 2023

martincostello closed this as completed Jun 19, 2023

martincostello added v8 Issues related to the new version 8 of the Polly library. and removed breaking change labels Jun 19, 2023

martincostello added this to the v8.0.0 milestone Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution-despatch optimization (reducing heap allocations) #271

Execution-despatch optimization (reducing heap allocations) #271

markrendle commented Jul 14, 2017 •

edited

Loading

reisenberger commented Jul 15, 2017 •

edited

Loading

markrendle commented Jul 16, 2017

reisenberger commented Jan 6, 2018 •

edited

Loading

grant-d commented Nov 14, 2018

reisenberger commented Jan 25, 2020

reisenberger commented Jan 25, 2020

reisenberger commented Jan 25, 2020

reisenberger commented Jan 25, 2020

SimonCropp commented Jun 6, 2021

martincostello commented Jun 7, 2021

SimonCropp commented Jun 7, 2021

martintmk commented Jun 19, 2023

martincostello commented Jun 19, 2023

Execution-despatch optimization (reducing heap allocations) #271

Execution-despatch optimization (reducing heap allocations) #271

Comments

markrendle commented Jul 14, 2017 • edited Loading

reisenberger commented Jul 15, 2017 • edited Loading

markrendle commented Jul 16, 2017

reisenberger commented Jan 6, 2018 • edited Loading

grant-d commented Nov 14, 2018

reisenberger commented Jan 25, 2020

reisenberger commented Jan 25, 2020

Sync executions

Async executions

Latencies

reisenberger commented Jan 25, 2020

reisenberger commented Jan 25, 2020

SimonCropp commented Jun 6, 2021

martincostello commented Jun 7, 2021

SimonCropp commented Jun 7, 2021

martintmk commented Jun 19, 2023

martincostello commented Jun 19, 2023

markrendle commented Jul 14, 2017 •

edited

Loading

reisenberger commented Jul 15, 2017 •

edited

Loading

reisenberger commented Jan 6, 2018 •

edited

Loading