Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KCL continues to hold leases after unexpected shutdown due to Error #1207

Open
dbottini opened this issue Sep 12, 2023 · 0 comments
Open

KCL continues to hold leases after unexpected shutdown due to Error #1207

dbottini opened this issue Sep 12, 2023 · 0 comments
Labels
v2.x Issues related to the 2.x version

Comments

@dbottini
Copy link

Hello, we identified an edge condition in how KCL responds to Error type Throwable (as opposed to Exception), that leaves a broken KCL that holds leases indefinitely, but doesn't process from shards or do any other work.

KCL:2.4.8

In our scenario, we had an issue where we could not load a class at runtime in our RecordProcessor.processRecords implementation.

java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy
	at service.processor.primary.PrimaryRecordProcessor.getBaseEvent(PrimaryRecordProcessor.java:463)
	at service.processor.primary.PrimaryRecordProcessor.lambda$processRecords$0(PrimaryRecordProcessor.java:108)
	at java.base/java.util.ArrayList.forEach(Unknown Source)
	at service.processor.primary.PrimaryRecordProcessor.processRecords(PrimaryRecordProcessor.java:105)
	at software.amazon.kinesis.lifecycle.ProcessTask.callProcessRecords(ProcessTask.java:224)
	at software.amazon.kinesis.lifecycle.ProcessTask.call(ProcessTask.java:162)
	at software.amazon.kinesis.lifecycle.ShardConsumer.executeTask(ShardConsumer.java:336)
	at software.amazon.kinesis.lifecycle.ShardConsumer.processData(ShardConsumer.java:322)
	at software.amazon.kinesis.lifecycle.ShardConsumer.handleInput(ShardConsumer.java:156)
	at software.amazon.kinesis.lifecycle.ShardConsumerSubscriber.onNext(ShardConsumerSubscriber.java:158)
	at software.amazon.kinesis.lifecycle.ShardConsumerSubscriber.onNext(ShardConsumerSubscriber.java:36)
	at software.amazon.kinesis.lifecycle.NotifyingSubscriber.onNext(NotifyingSubscriber.java:56)
	at software.amazon.kinesis.lifecycle.NotifyingSubscriber.onNext(NotifyingSubscriber.java:27)
	at io.reactivex.rxjava3.internal.util.HalfSerializer.onNext(HalfSerializer.java:46)
	at io.reactivex.rxjava3.internal.subscribers.StrictSubscriber.onNext(StrictSubscriber.java:97)
	at io.reactivex.rxjava3.internal.operators.flowable.FlowableObserveOn$ObserveOnSubscriber.runAsync(FlowableObserveOn.java:403)
	at io.reactivex.rxjava3.internal.operators.flowable.FlowableObserveOn$BaseObserveOnSubscriber.run(FlowableObserveOn.java:178)
	at io.reactivex.rxjava3.internal.schedulers.ExecutorScheduler$ExecutorWorker$BooleanRunnable.run(ExecutorScheduler.java:324)
	at io.reactivex.rxjava3.internal.schedulers.ExecutorScheduler$ExecutorWorker.runEager(ExecutorScheduler.java:289)
	at io.reactivex.rxjava3.internal.schedulers.ExecutorScheduler$ExecutorWorker.run(ExecutorScheduler.java:250)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path: /usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib [in thread "ShardRecordProcessor-0001"]
	at java.base/java.lang.ClassLoader.loadLibrary(Unknown Source)
	at java.base/java.lang.Runtime.loadLibrary0(Unknown Source)
	at java.base/java.lang.System.loadLibrary(Unknown Source)
	at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:185)
	at org.xerial.snappy.SnappyLoader.loadSnappyApi(SnappyLoader.java:157)
	at org.xerial.snappy.Snappy.init(Snappy.java:70)
	at org.xerial.snappy.Snappy.<clinit>(Snappy.java:47)
	... 23 common frames omitted

It may be worth noting that we start our KCL using the following:

ExecutorService executor = Executors.newCachedThreadPool();
executor.submit(scheduler);

This is a pretty standard model, but it is fundamentally different from the KCL documentation's model. We think our mechanism is pretty standard, but it may affect error propagation in a case like this - it's also hard to tell whether the KCL docs are set up this way because it's a one-file example application or because this is an implicit requirement.

Thread schedulerThread = new Thread(scheduler);
schedulerThread.setDaemon(true);
schedulerThread.start();

What I think KCL should handle differently is that the Scheduler halting would trigger better cleanups of resources and executor service, even if as anomalously as an Error.

Ideally I'd love to see things like a healthcheck on KCL that we can call to identify an unhealthy KCL process, which we can just poll occasionally as part of normal service healthcheck operations.

@stair-aws stair-aws added the v2.x Issues related to the 2.x version label Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.x Issues related to the 2.x version
Projects
None yet
Development

No branches or pull requests

2 participants