Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SegmentReplication flaky integ tests #8134

Merged
merged 5 commits into from
Jun 27, 2023

Conversation

sachinpkale
Copy link
Member

@sachinpkale sachinpkale commented Jun 19, 2023

Description

  • Posting the comment as it is: The getLatestSegmentInfosAndCheckpoint method is eventually invoked and failing with the call to getEngine().config().getCodec().getName() because the engine is already closed. The method already has logic to return empty if the shard is not open, but it is shut after this check. So we need to ensure the shard is not closed while this method is invoked or if we don't want to block shard close for this catch the error and gracefully return.
  • Some of the tests in SegmentReplicationIT are flaky. This was observed while debugging flakiness in SegmentReplicationRemoteStoreIT which extends SegmentReplicationIT with remote store settings.
  • Reproducing the flakiness is not straightforward and using the same seed will not guarantee of failure. The only way of reproducing the issue is to run the test multiple times.
  • After inspecting the logs of the failure, we found that the two nodes in the cluster are acting as data as well as cluster manager nodes. As part of test flow, when we stop one of the nodes, it may result in quorum loss issue for the cluster given only 2 nodes.
  • To avoid this issue, in this PR, we have explicitly added a cluster manager only node for each test. We have also made primary and replica as data only nodes to make the tests more deterministic.

Related Issues

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDropPrimaryDuringReplication
      1 org.opensearch.remotestore.RemoteStoreRefreshListenerIT.testRemoteRefreshRetryOnFailure

@codecov
Copy link

codecov bot commented Jun 19, 2023

Codecov Report

Merging #8134 (2417663) into main (90678c2) will increase coverage by 0.07%.
The diff coverage is 29.41%.

@@             Coverage Diff              @@
##               main    #8134      +/-   ##
============================================
+ Coverage     70.98%   71.06%   +0.07%     
- Complexity    56687    56737      +50     
============================================
  Files          4722     4722              
  Lines        267608   267619      +11     
  Branches      39216    39218       +2     
============================================
+ Hits         189973   190172     +199     
+ Misses        61582    61463     -119     
+ Partials      16053    15984      -69     
Impacted Files Coverage Δ
...in/java/org/opensearch/index/shard/IndexShard.java 70.41% <29.41%> (-0.55%) ⬇️

... and 474 files with indirect coverage changes

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Sachin Kale <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      2 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication
      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testDropPrimaryDuringReplication
      1 org.opensearch.index.IndexServiceTests.testAsyncTranslogTrimTaskOnClosedIndex

@sachinpkale sachinpkale requested a review from mch2 June 22, 2023 04:46
@gbbafna gbbafna merged commit 0c7ba94 into opensearch-project:main Jun 27, 2023
@sachinpkale sachinpkale added the backport 2.x Backport to 2.x branch label Jun 27, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jun 27, 2023
Signed-off-by: Sachin Kale <[email protected]>
(cherry picked from commit 0c7ba94)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
imRishN pushed a commit to imRishN/OpenSearch that referenced this pull request Jun 27, 2023
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/OpenSearch that referenced this pull request Jun 27, 2023
sudarshan-baliga pushed a commit to Gaurav614/OpenSearch that referenced this pull request Jun 29, 2023
sachinpkale pushed a commit to sachinpkale/OpenSearch that referenced this pull request Jul 4, 2023
Signed-off-by: Sachin Kale <[email protected]>
(cherry picked from commit 0c7ba94)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
sachinpkale pushed a commit to sachinpkale/OpenSearch that referenced this pull request Jul 5, 2023
Signed-off-by: Sachin Kale <[email protected]>
(cherry picked from commit 0c7ba94)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
sachinpkale pushed a commit to sachinpkale/OpenSearch that referenced this pull request Jul 10, 2023
Signed-off-by: Sachin Kale <[email protected]>
(cherry picked from commit 0c7ba94)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
sachinpkale pushed a commit to sachinpkale/OpenSearch that referenced this pull request Jul 11, 2023
Signed-off-by: Sachin Kale <[email protected]>
(cherry picked from commit 0c7ba94)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gbbafna pushed a commit that referenced this pull request Jul 11, 2023
(cherry picked from commit 0c7ba94)

Signed-off-by: Sachin Kale <[email protected]>
baba-devv pushed a commit to baba-devv/OpenSearch that referenced this pull request Jul 29, 2023
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants