Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for faster shard reallocation on graceful shutdown #1005

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mathieufortin01
Copy link

Issue #, if available: #845

Description of changes:
We have this latency issue in production where on graceful shutdown we have seen up to 40s before records resume processing on a new container.

The cause of it is that on graceful shutdown, the lease stays owned. Others wait for lease to expire before picking them up. See #845 for details, this is pretty well explained.

The changes im proposing here are:

With these changes, I was able to go below 1s before records resumed consumption.

So I would like some opinions on this. We will test this in various envs and see the impacts. If this goes through this would be a contribution by Autodesk.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

* Interval at which the lease taker will execute.
* If unspecified, an interval will be calculated based on the lease duration.
*/
private long leaseTakerIntervalMillis = -1L;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing this property to clients.

* If leases should be evicted or not on shutdown requested.
* By default, leases are not evicted.
*/
private boolean evictLeaseOnShutdown = false;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New property. If false, same behavior as today which is wait for lease to be expired before taking it.

@@ -191,9 +191,9 @@ synchronized Map<String, Lease> takeLeases(Callable<Long> timeProvider)
return takenLeases;
}

List<Lease> expiredLeases = getExpiredLeases();
List<Lease> availableLeases = getAvailableLeases();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in this file are all related to changing expired lease concept for available lease concept.

@@ -482,7 +491,7 @@ private Set<Lease> computeLeasesToTake(List<Lease> expiredLeases) {
}

} finally {
scope.addData("ExpiredLeases", expiredLeases.size(), StandardUnit.COUNT, MetricsLevel.SUMMARY);
scope.addData("AvailableLeases", availableLeases.size(), StandardUnit.COUNT, MetricsLevel.SUMMARY);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whats the best way to go here.

@@ -48,21 +45,19 @@
import software.amazon.kinesis.retrieval.kpl.ExtendedSequenceNumber;

@RunWith(MockitoJUnitRunner.class)
public class DynamoDBLeaseCoordinatorIntegrationTest {
private static final int ATTEMPTS = 20;
public class DynamoDBLeaseCoordinatorIntegrationTest extends LeaseIntegrationTest {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now extends from LeaseIntegrationTest, so I could remove a whole bunch of test setup.

@mathieufortin01 mathieufortin01 marked this pull request as ready for review October 20, 2022 13:53
@stair-aws stair-aws added the v2.x Issues related to the 2.x version label Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.x Issues related to the 2.x version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants