Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit is flaky #9589

Closed
sohami opened this issue Aug 28, 2023 · 6 comments
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Remote

Comments

@sohami
Copy link
Collaborator

sohami commented Aug 28, 2023

Describe the bug
The org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit
is flaky. It was unmuted as part of #8931:

org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitRefresh

java.lang.AssertionError: expected:<7> but was:<6>
	at __randomizedtesting.SeedInfo.seed([EA36272CF1AD08E7:82B0311AC996231A]:0)
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:633)
	at org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimary(RemoteIndexShardTests.java:139)
	at org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit(RemoteIndexShardTests.java:79)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:578)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1623)

To Reproduce

REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit" -Dtests.seed=EA36272CF1AD08E7 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=lt-LT -Dtests.timezone=America/Argentina/San_Luis -Druntime.java=20

Expected behavior
Test must always pass

Plugins
Standard

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

CI
Additional context
https://build.ci.opensearch.org/job/gradle-check/23645/testReport/junit/org.opensearch.index.shard/RemoteIndexShardTests/testNRTReplicaWithRemoteStorePromotedAsPrimaryCommitCommit/

@BhumikaSaini-Amazon
Copy link
Contributor

This appears to be related to #9624. I think all of the 5 tests are flaky due to testNRTReplicaWithRemoteStorePromotedAsPrimary being flaky (the other tests consume this for the validation). Fixing testNRTReplicaWithRemoteStorePromotedAsPrimary to ensure it isn't flaky should likely resolve the others.

@sachinpkale sachinpkale added v2.11.0 Issues and PRs related to version 2.11.0 and removed v2.10.0 labels Sep 23, 2023
@sejli
Copy link
Member

sejli commented Oct 17, 2023

@Frederic-Chopin, could you pick this up for OSCI? Thanks!

@Frederic-Chopin
Copy link

Sure! Thanks!

@andrross andrross added Storage:Remote and removed Storage:Durability Issues and PRs related to the durability framework v2.11.0 Issues and PRs related to version 2.11.0 labels Feb 21, 2024
@rramachand21
Copy link
Member

@Frederic-Chopin is this being worked on? If its not fixed, do we have clarity if this is targeted for 2.13? Adding back the untriaged label so we can discuss this in our triage and backlog review meeting.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7 8]
@sohami Thanks for creating this issue

@bowenlan-amzn
Copy link
Member

Another flaky test from this class

org.opensearch.index.shard.RemoteIndexShardTests.testNRTReplicaWithRemoteStorePromotedAsPrimaryRefreshCommit

java.lang.AssertionError: RecoveryFailedException[[test][0]: Recovery failed from {s1}{s1}{Ks4a3VAsRlK3GrzVJWObdw}{0.0.0.0}{0.0.0.0:119}{dimrs}{} into {s0}{s0}{wSXL-d2ESWad47mYmy8m1A}{0.0.0.0}{0.0.0.0:118}{dimrs}{} ([test][0]: Recovery failed from {s0}{s0}{wSXL-d2ESWad47mYmy8m1A}{0.0.0.0}{0.0.0.0:118}{dimrs}{} into {s1}{s1}{Ks4a3VAsRlK3GrzVJWObdw}{0.0.0.0}{0.0.0.0:119}{dimrs}{})]; nested: RecoveryFailedException[[test][0]: Recovery failed from {s0}{s0}{wSXL-d2ESWad47mYmy8m1A}{0.0.0.0}{0.0.0.0:118}{dimrs}{} into {s1}{s1}{Ks4a3VAsRlK3GrzVJWObdw}{0.0.0.0}{0.0.0.0:119}{dimrs}{}]; nested: RecoveryEngineException[Phase[1] prepare target for translog failed]; nested: CorruptIndexException[misplaced codec footer (file truncated?): length=0 but footerLength==16 (resource=metadata__9223372036854775800__9223372036854775804__9223372036854775804__9223372036854775806__-1039764442__9223370318105717958__1)];

https://build.ci.opensearch.org/job/gradle-check/41277/testReport/junit/org.opensearch.index.shard/RemoteIndexShardTests/testNRTReplicaWithRemoteStorePromotedAsPrimaryRefreshCommit_2/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Storage:Remote
Projects
Status: ✅ Done
Development

No branches or pull requests

10 participants