Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-1995]Kill the writer thread when timeout happens to release the lock #3871

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ZihanLi58
Copy link
Contributor

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):
    Now when there is timeout happens talking with HDFS, we timeout and fail the job but the thread won't be killed in this case, and all other calls to HDFS will be blocked by this hanging thread. 

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

state.setProp(CURRENT_PARTITIONED_WRITERS_COUNTER, partitionWriters.size() + 1);
return future.get(writeTimeoutInterval, TimeUnit.SECONDS);
} catch (ExecutionException | InterruptedException e) {
throw new RuntimeException("Error creating writer", e);
} catch (TimeoutException e) {
throw new RuntimeException(String.format("Failed to create writer due to timeout. The operation timed out after %s seconds.", writeTimeoutInterval), e);
}
finally {
if (future != null && !future.isDone()) {
future.cancel(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have the intended effect? In the issues you were seeing, did we see threads that are sleeping / waiting on IO, which need to be interrupted via cancel?

I am asking this because if the thread needs to actively check if it's being requested to cancel (unless it's actively waiting / sleeping).

Read the below to see what I am describing
https://stackoverflow.com/questions/28043225/future-cancel-does-not-work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@homatthew thanks for the information. The thread here is blocking on getting the HDFS mount table which is an IO operation. I also did a test to do a for loop to talk with HDFS and do timeout, and make sure cancel can work correctly in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants