Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve back-off logic for retrying after load errors #3370

Closed
joluet opened this issue Oct 18, 2017 · 5 comments
Closed

Improve back-off logic for retrying after load errors #3370

joluet opened this issue Oct 18, 2017 · 5 comments
Assignees

Comments

@joluet
Copy link

joluet commented Oct 18, 2017

The current loader implementation retries after an error with linear backoff:

return Math.min((errorCount - 1) * 1000, 5000);

Instead it should use exponential backoff to reduce the potential API load in case of many client errors. That could look like this:

private long getRetryDelayMillis() {
      return Math.min( Math.pow(errorCount - 1, 2) * 1000, 8000);
}
@ojw28
Copy link
Contributor

ojw28 commented Oct 18, 2017

Without a justification that sounds like quite an arbitrary statement. Please explain why ;)...

@joluet
Copy link
Author

joluet commented Oct 18, 2017

When there is a server problem that affects all clients at the same time then all client requests suddenly become synchronized as each client retries at fixed intervals from that point onwards. This can lead to a Thundering Herd Problem (similar: Cache Stampede).

Exponential backoff mitigates the problem by cutting back the total request rate and accelerating the desynchronization on subsequent errors. Adding random jitter would spread out the load even further.

See https://www.awsarchitectureblog.com/2015/03/backoff.html for more information on this.

@ojw28
Copy link
Contributor

ojw28 commented Oct 18, 2017

I think adding some jitter to ensure de-synchronization a fairly non-controversial suggestion. We should do that. I'm not sure how obvious it is that exponential back-off is better than what we're doing currently, however. Reducing the total request rate might better for server-side issues, but might be worse for other types of connectivity problems. For example if there's a local connectivity issue it's probably better for the client to retry quite aggressively, since this will allow for more rapid recovery and minimize the risk of user visible re-buffering.

As an aside, we do eventually plan to allow injection of the retry policy, which will allow applications to change the back-off logic if they want non-standard behavior.

@joluet
Copy link
Author

joluet commented Oct 18, 2017

That sounds fair.

Being able to inject a custom retry policy would be great. 👍 Ideally, it would also be possible to differentiate between server errors and network errors and apply different policies.
This way exponential backoff could be used for server errors and rapid retry for network errors.

An intermediate solution with random jitter could maybe look similar to this:

private long getRetryDelayMillis() {
      Random random = new Random();
      int randomJitter = random.nextInt(1000);
      return Math.min((errorCount - 1) * 1000 + randomJitter, 5000); 
}

@ojw28 ojw28 changed the title Loader should retry with exponential backoff Improve back-off logic for retrying after load errors Oct 18, 2017
@AquilesCanta AquilesCanta self-assigned this Nov 2, 2017
ojw28 pushed a commit that referenced this issue Jun 25, 2018
Issue:#3370

-------------
Created by MOE: https:/google/moe
MOE_MIGRATED_REVID=201996109
ojw28 pushed a commit that referenced this issue Jul 12, 2018
Issue:#2844
Issue:#3370
Issue:#2981

-------------
Created by MOE: https:/google/moe
MOE_MIGRATED_REVID=204149284
ojw28 pushed a commit that referenced this issue Aug 6, 2018
Issue:#2844
Issue:#3370
Issue:#2981

-------------
Created by MOE: https:/google/moe
MOE_MIGRATED_REVID=206927295
@AquilesCanta
Copy link
Contributor

Please try implementing your own LoadErrorHandlingPolicy (while adding your own back-off logic) and let us know if you run into any issues.

@google google locked and limited conversation to collaborators Jan 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants