Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.search.nested.SimpleNestedIT.testExplain is flaky #11413

Closed
reta opened this issue Nov 30, 2023 · 8 comments · Fixed by #11681
Closed

[BUG] org.opensearch.search.nested.SimpleNestedIT.testExplain is flaky #11413

reta opened this issue Nov 30, 2023 · 8 comments · Fixed by #11681

Comments

@reta
Copy link
Collaborator

reta commented Nov 30, 2023

Describe the bug
The test case org.opensearch.search.nested.SimpleNestedIT.testExplain {p0={"search.concurrent_segment_search.enabled":"true"}} is flaky

java.lang.AssertionError: 
Expected: a string starting with "0.36464313 = Score based on 2 child docs in range from 0 to 1"
     but: was "0.36464313 = Score based on 2 child docs in range from 12 to 13, using score mode Total
  0.18232156 = weight(nested1.n_field1:n_value1 in 12) [PerFieldSimilarity], result of:
    0.18232156 = score(freq=1.0), computed as boost * idf * tf from:
      2.2 = boost
      0.18232156 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
        2 = n, number of documents containing term
        2 = N, total number of documents with field
      0.45454544 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
        1.0 = freq, occurrences of term within document
        1.2 = k1, term saturation parameter
        0.75 = b, length normalization parameter
        1.0 = dl, length of field
        1.0 = avgdl, average length of field
"
java.lang.AssertionError: 
Expected: a string starting with "0.36464313 = Score based on 2 child docs in range from 0 to 1"
     but: was "0.36464313 = Score based on 2 child docs in range from 12 to 13, using score mode Total
  0.18232156 = weight(nested1.n_field1:n_value1 in 12) [PerFieldSimilarity], result of:
    0.18232156 = score(freq=1.0), computed as boost * idf * tf from:
      2.2 = boost
      0.18232156 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
        2 = n, number of documents containing term
        2 = N, total number of documents with field
      0.45454544 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
        1.0 = freq, occurrences of term within document
        1.2 = k1, term saturation parameter
        0.75 = b, length normalization parameter
        1.0 = dl, length of field
        1.0 = avgdl, average length of field
"
	at __randomizedtesting.SeedInfo.seed([A16D8358F150F61C:1799CEB9DDFB0C53]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.search.nested.SimpleNestedIT.testExplain(SimpleNestedIT.java:500)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1583)

To Reproduce

/gradlew ':server:internalClusterTest' --tests "org.opensearch.search.nested.SimpleNestedIT" -Dtests.method="testExplain {p0={"search.concurrent_segment_search.enabled":"true"}}" -Dtests.seed=A16D8358F150F61C 

Expected behavior
The test must always pass

Plugins
Standard

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • CI

Additional context

@peternied
Copy link
Member

Thanks for filing this issue

@jed326
Copy link
Collaborator

jed326 commented Nov 30, 2023

Just at a glance this looks related to ingesting dummy docs for the concurrent search path. The expected score is still correct.

@jed326 jed326 assigned neetikasinghal and unassigned jed326 Nov 30, 2023
@neetikasinghal
Copy link
Contributor

I am not able to reproduce this with the above seed, also tried to run it for 100 times, but still not able to reproduce it.

@neetikasinghal
Copy link
Contributor

I am able now to reproduce this with seed=1D20738151FB4EBD

@jed326
Copy link
Collaborator

jed326 commented Dec 11, 2023

The explain itself is coming from Lucene: https:/apache/lucene/blob/a6f70ad2bb0b682eb65feb522358ee6d16cad766/lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java#L432-L440

The difference in output seems to be coming from different start/end docs, which makes sense since we have deleted docs in our segments for the concurrent search path. It seems to me we can just fix the test itself to accommodate.

@neetikasinghal
Copy link
Contributor

neetikasinghal commented Jan 8, 2024

Here's some info on the docIds in lucene:

  • A DocId in lucene is not actually unique to the Index but is unique to a Segment. Lucene does this mainly to optimize writing and compression. Since it is only unique to a Segment, how can a Doc be uniquely identified at the Index level? The solution is simple. The segments are ordered. To take a simple example, an Index has two segments and each segment has 100 docs respectively. The DocId's in the Segment are 0-100 but when they are converted to the Index level, the range of the DocId's in the second Segment is converted to 100-200.

  • DocId's are unique within a Segment, numbered progressively from zero. But this does not mean that the DocId's are continuous. When a Doc is deleted, there is a gap.

  • The DocId corresponding to a document can change, usually when Segments are merged.

The test fails on the validation of range of the docIds in the assertion, the range changes as with indexRandomForConcurrentSearch function there are several bogus documents ingested and deleted which could trigger background merges and cause the range of the docIds matched with the search query to change.

@sohami
Copy link
Collaborator

sohami commented Jan 9, 2024

@neetikasinghal Seems like the docId in the output of explain is the lucene internal docId and not the docId set in _id field by OpenSearch. In that case, I think it would be useful to have a tests with these bogus docs ingested once and then validating that output of explain API in concurrent and non-concurrent case is same on same index. We can extract it to a separate test class if needed.

@neetikasinghal
Copy link
Contributor

neetikasinghal commented Jan 9, 2024

@sohami makes sense. The range of the docs should not vary across concurrent and non-concurrent case. However, in order to do this, we need to pull the test out into a separate class and turn off the parameterization as indexRandomForMultipleSlices function could lead to having different range across different test run and will not be consistent.
I have updated the PR based on this, please review: https:/opensearch-project/OpenSearch/pull/11681/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants