Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution-despatch optimization (reducing heap allocations) #271

Closed
markrendle opened this issue Jul 14, 2017 · 13 comments
Closed

Execution-despatch optimization (reducing heap allocations) #271

markrendle opened this issue Jul 14, 2017 · 13 comments
Labels
performance v8 Issues related to the new version 8 of the Polly library.
Milestone

Comments

@markrendle
Copy link

markrendle commented Jul 14, 2017

Hi. I've been looking at the internal source code and thinking about optimization, in the spirit of new .NET low-allocations and everything. This is not intended as criticism (Polly is awesome), just a suggestion and an offer to contribute some work.

Anyway.

The way Polly currently handles work-to-be-executed is, at the simplest level, as a Func<TResult>. Any state necessary for the operation must therefore be provided as closures, incurring heap allocations. There's a move in some of the Core projects towards instead passing arguments along with delegates to avoid these closure allocations, so I thought I'd have a go at that in the context of Polly. So, for example, instead of

public string Get(int id) {
  return policy.Execute(() => id.ToString());
}

you would have

public string Get(int id) {
  return policy.Execute(n => n.ToString(), id);
}

Except it turns out that's a bit of a bugger, because in the internal code you have to multiply all the internal engines' Execute and ExecuteAndCapture by however many overloads you have (Func<T1,TResult>, Func<T1,T2,TResult>, etc), and that's not fun for anybody.

So then I thought about wrapping the delegate and its argument(s) in a struct which implemented a single, consistent interface, like this:

public struct PollyAction<T1, TResult> : IPollyAction<TResult>
{
	private readonly Func<T1, TResult> _action;
	private readonly T1 _arg1;

	public PollyAction(Func<T1, TResult> action, T1 arg1)
	{
		_action = action;
		_arg1 = arg1;
	}

	public TResult Execute() => _action(_arg1);
}

public interface IPollyAction<TResult>
{
	TResult Execute();
}

Then all the Execute and ExecuteAndCapture calls can just take an instance of that interface:

public TResult Execute<TResult>(IPollyAction<TResult> action) {
  while (true) {
    try {
      return action.Execute();
    } catch {
      // Whatever exception handling
    }
  }
}

Except, that also causes allocations because the struct gets boxed to the IPollyAction<TResult> interface.

But if you change the Execute method to this:

public TResult Execute<TAction, TResult>(TAction action)
  where TAction : IPollyAction<TResult>
{
  while (true) {
    try {
      action.Execute();
      break;
    } catch {
      // Whatever exception handling
    }
  }
}

then the generic method acts on the struct and no boxing, and hence no GC allocation, occurs. The only downside is that the call to this method has to explicitly specify the generic type parameters because C#'s overload inference doesn't recurse, but it's not too bad and will be hidden inside the library code.

I threw together a very quick BenchmarkDotNet comparing this approach to the current closure one:

public class Benchmarks
{
	[Benchmark]
	public int Closure()
	{
		int x = 2;
		return Execute(() => x * 2);
	}

	[Benchmark]
	public int Action()
	{
		int x = 2;
		return Execute<PollyAction<int, int>, int>(new PollyAction<int, int>(n => n * 2, x));
	}

	private static T Execute<T>(Func<T> action) => action();

	private static TResult Execute<TAction, TResult>(TAction action)
		where TAction : IPollyAction<TResult>
		=> action.Execute();
}
BenchmarkDotNet=v0.10.8, OS=Windows 7 SP1 (6.1.7601)
Processor=Intel Xeon CPU E5-2660 0 2.20GHz, ProcessorCount=2
Frequency=14318180 Hz, Resolution=69.8413 ns, Timer=HPET
dotnet cli version=1.0.0
  [Host]     : .NET Core 4.6.25009.03, 64bit RyuJIT
  DefaultJob : .NET Core 4.6.25009.03, 64bit RyuJIT

Method Mean Error StdDev Gen 0 Allocated
Closure 20.653 ns 0.4642 ns 1.1561 ns 0.0139 88 B
PollyAction 8.418 ns 0.1315 ns 0.1027 ns - 0 B

Using the struct wrapper eliminates heap allocations and cuts execution time by >50% just for this simple example. Polly currently creates multiple closures per-call.

Question is, does this matter enough, and are the maintainers interested in this level of optimization? If it does and they are, I'm happy to spike something more representative, like redoing RetryPolicy to use this approach, to enable further discussion.

@markrendle markrendle changed the title Optimization Optimization (reducing heap allocations) Jul 14, 2017
@reisenberger
Copy link
Member

reisenberger commented Jul 15, 2017

@markrendle Thanks for the really detailed engagement! No criticism taken - the biggest challenge with Polly is simply getting through all the ideas we have. Fair to say that the main focus since App-vNext took over has been new features ... there are gains that can be made in perf for sure (aware of things that it would be good to squash out/change from the inherited codebase), so thanks for the reminder - and offer of help!

Quick response today - more very soon. Briefly, I think we should consider this both internally (where Polly may unwantedly use closures internally that we can eliminate); and externally (where executing a user delegate that wants input parameters currently requires those params to be passed in by closure ... eliminating that). Had you been focusing more on one or the other, or both? On the public API in particular, I want to reflect on and surface a few options for how that could look.

@markrendle
Copy link
Author

@reisenberger Cool cool cool.

I was considering both aspects, the internal and external code. For the user-facing API, it should be as simple as offering overloads of the various methods that take delegates with additional parameters, and arguments for those parameters. My only slight concern there is the interaction with the existing overloads that take a Context and CancellationToken, but my brief sandboxing suggests it should be reasonably straightfoward and the C# compiler will hit the right overloads. The biggest issue there is you get to the point where you might want to code-gen the generic parameter overloads, from Func<T1,TResult> to Func<T1,...,T8,TResult>.

On the internal code I think it's more of a hunt-and-pick process, once the basic library code is in place.

Like I say, I'm very happy to pitch in PRs if issues are opened with up-for-grabs labels.

@reisenberger
Copy link
Member

reisenberger commented Jan 6, 2018

(triage of open issues) This is not forgotten. TL;DR related work is happening as part of/tied in with wider changes to Polly.

  • Where Polly internally uses closures, these are being removed (/the intent is to) as part of the work on metrics PROPOSAL: Add metrics to Polly #326
  • Where Polly currently requires closures to capture input parameters to user delegates, the package-parameters-and-delegate-in-a-struct approach looks a strong direction to take this. It would probably require syntax changes (as part of a 'Polly vNext') to make new overloads more feasible to integrate.

For ref for anyone interested in performance: we benchmarked Polly throughput at Polly v5.1 at around a million operations/second for all policy types. (IE, using a Policy adds ~1 microsecond to your call.) We can continue to use those benchmarks and post stuff in the #performance channel on slack as things evolve.

reisenberger added a commit to reisenberger/Polly that referenced this issue Mar 3, 2018
Add structs which hold an executable action, and its input parameters,
for all Policy executions.  Allows us to remove the allocations
generated by closures, in all Polly executions. See App-vNext#271
@grant-d
Copy link

grant-d commented Nov 14, 2018

Here's a similar example I came across. I believe this would be better served as a DoNothing method, which would have less runtime overhead:

public static RetryPolicy<TResult> WaitAndRetry<TResult>(this PolicyBuilder<TResult> policyBuilder, ...)
{
    Action<DelegateResult<TResult>, TimeSpan, int, Context> doNothing = (_, __, ___, ____) => { };

    return policyBuilder.WaitAndRetry(retryCount, sleepDurationProvider, doNothing);
}

Should be:

public static RetryPolicy<TResult> WaitAndRetry<TResult>(this PolicyBuilder<TResult> policyBuilder, ...)
{
    return policyBuilder.WaitAndRetry(retryCount, sleepDurationProvider, DoNothing);
}

private static void DoNothing() { }

@reisenberger
Copy link
Member

@grant-d Thank you for the extra suggestion about allocations here. Benchmarking identified that in fact the opposite applied: a DoNothing() method caused extra allocations. See benchmark and possible explanations in #741 . In any case, the point became moot as treating the no-callback-delegate-is-configured case as null trumped any other option. PR #742 implemented that optimisation and it will release with Polly v8.0.0.

Thanks again for everything you've contributed to Polly! (including the wait-and-retry helpers)

@reisenberger reisenberger changed the title Optimization (reducing heap allocations) Execution-despatch optimization (reducing heap allocations) Jan 25, 2020
@reisenberger
Copy link
Member

PR #748 delivers on the main recommendations of this issue, and will be released as part of Polly v8.0.0

Allocations per execution reduce as follows: (stats from BenchmarkDotNet)

Sync executions

Method Polly v7.2.0 allocations (bytes) Polly v8.0.0.21-f5ce247 allocations (bytes)
ISyncPolicy.Execute(Action) 248 B 72 B *
ISyncPolicy.Execute<TResult>(Func<TResult>) 160 B 72 B
ISyncPolicy<TResult>.Execute(Func<TResult>) 160 B 72 B
Policy.Wrap(ISyncPolicy, ISyncPolicy).Execute(Action) 432 B 72 B
Policy.Wrap(ISyncPolicy, ISyncPolicy).Execute<TResult>(Func<TResult>) 256 B 72 B
Policy.Wrap(ISyncPolicy<TResult>, ISyncPolicy<TResult>) .Execute(Func<TResult>) 256 B 72 B

*The 72 bytes represent the Context travelling with the execution, to support traceability per execution.

Async executions

Async executions inevitably carry lightly greater allocations:

  • a limited number of Task instances must be allocated;
  • allocations are incurred by the .NET async machinery such as the async/await state machine and continuation delegates.

Significant savings have nonetheless been achieved. Polly uses async/await-elision internally wherever possible to reduce the extra Tasks and async-machinery allocated. An execution through the simplest async policy has been minimised to incur only two awaits: one to correctly scope the Context travelling with the execution; the other necessarily to await the delegate the user passes to execute.

Method Polly v7.2.0 allocations (bytes) Polly v8.0.0.21-f5ce247 allocations (bytes)
IAsyncPolicy.ExecuteAsync(Func<Task>) 400 B 72 B
IAsyncPolicy.ExecuteAsync<TResult>(Func<Task<TResult>>) 304 B 144 B
IAsyncPolicy<TResult>.ExecuteAsync(Func<Task<TResult>>) 304 B 144 B
Policy.WrapAsync(IAsyncPolicy, IAsyncPolicy).ExecuteAsync(Func<Task>) 672 B 72 B
Policy.WrapAsync(IAsyncPolicy, IAsyncPolicy).ExecuteAsync<TResult>(Func<Task<TResult>>) 768 B 288 B
Policy.WrapAsync(IAsyncPolicy<TResult>, IAsyncPolicy<TResult>) .ExecuteAsync(Func<Task<TResult>>) 768 B 288 B

These statistics represent allocations attributable to execution despatch, calculated with NoOpPolicy.


Latencies

With these changes order-of-magnitude of execution latency has been maintained at ~1microsecond (one-millionth-second) extra latency per Polly policy applied to an execution, for all policy types.

@reisenberger
Copy link
Member

Polly v8.0.0 will also add new execute overloads allowing passing strongly-typed input objects to delegates to be executed, without using a closure:

void Execute<T1>(Action<Context, CancellationToken, T1> action, Context context, CancellationToken cancellationToken, T1 input1)
void Execute<T1, T2>(Action<Context, CancellationToken, T1, T2> action, Context context, CancellationToken cancellationToken, T1 input1, T2 input2)

TResult Execute<T1, TResult>(Func<Context, CancellationToken, T1, TResult> func, Context context, CancellationToken cancellationToken, T1 input1)
TResult Execute<T1, T2, TResult>(Func<Context, CancellationToken, T1, T2, TResult> func, Context context, CancellationToken cancellationToken, T1 input1, T2 input2)

Task ExecuteAsync<T1>(Func<Context, CancellationToken, bool, T1, Task> action, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1)
Task ExecuteAsync<T1, T2>(Func<Context, CancellationToken, bool, T1, T2, Task> action, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1, T2 input2)

Task<TResult> ExecuteAsync<T1, TResult>(Func<Context, CancellationToken, bool, T1, Task<TResult>> func, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1)
Task<TResult> ExecuteAsync<T1, T2, TResult>(Func<Context, CancellationToken, bool, T1, T2, Task<TResult>> func, Context context, CancellationToken cancellationToken, bool continueOnCapturedContext, T1 input1, T2 input2)

(and similar .ExecuteAndCapture/Async(...) variants)

@reisenberger
Copy link
Member

Breaking changes: Custom policies hosted outside Polly will need to make a couple of small adjustments (method naming, change of execution-despatch type) to dovetail with the new execution despatch. Details to be summarised in full for the v8.0.0 release.

@SimonCropp
Copy link
Contributor

with the PR merged, should this be closed?

or is there more work to be done? and can i help?

@martincostello
Copy link
Member

I think this is still open pending Polly v8 (whenever that happens - it's pretty much on-ice during Dylan's absence).

@SimonCropp
Copy link
Contributor

seems other issues in the v8 milestone are closed when fixed https:/App-vNext/Polly/issues?q=is%3Aclosed+milestone%3Av8.0.0

@martincostello martincostello removed this from the v8.0.0 milestone Jun 16, 2023
@martintmk
Copy link
Contributor

Hey folks, this is addressed in V8 where the ResilienceStrategy allow passing TState to lambda. This way, you can use static lambdas that use the state for executions.

await NullResilienceStrategy.Instance.ExecuteAsync((_, _) => new ValueTask<string>("dummy"), context, "state").ConfigureAwait(false);

@martincostello , I think we can close this one.

cc @SimonCropp

@martincostello
Copy link
Member

Yep, I think agree - v8 should resolve this scenario.

@martincostello martincostello added v8 Issues related to the new version 8 of the Polly library. and removed breaking change labels Jun 19, 2023
@martincostello martincostello added this to the v8.0.0 milestone Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance v8 Issues related to the new version 8 of the Polly library.
Projects
No open projects
Status: Done
Development

No branches or pull requests

6 participants