Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix InternalAutoDateHistogram reproducible failure #32723

Merged
merged 8 commits into from
Aug 17, 2018

Conversation

pcsanwald
Copy link
Contributor

Closes #32215. This PR addresses two issues:

  1. Reproducible test failure outlined by @DaveCTurner in [CI] InternalAutoDateHistogramTests#testReduceRandom reproducable failure #32215:
./gradlew :server:test -Dtests.seed=55C6AF1793E2275B -Dtests.class=org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogramTests -Dtests.method="testReduceRandom" -Dtests.security.manager=true -Dtests.locale=en-IN -Dtests.timezone=Pacific/Chuuk

The problem here was a mistake in setting up the expected results: I had originally written the test setup to start with the smallest possible innerInterval and try the larger ones, but the approach in the reduce method is to try the largest and step down. I've updated the logic to reflect this and run the test with -Dtests.iters=200 to convince myself this won't cause a further failure.

  1. In the course of tracking down the first problem, I've noticed what I believe is an actual bug in the auto interval implementation: instead of using the maximumInnerInterval of the current rounding, we use the index of the initial rounding passed in.

To fix this, I've updated the logic and added a test case that fails using the old code path, and passes with the modified code path.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

RoundingInfo[] roundings = new RoundingInfo[6];
DateTimeZone timeZone = DateTimeZone.UTC;
roundings[0] = new RoundingInfo(createRounding(DateTimeUnit.SECOND_OF_MINUTE, timeZone),
1000L, 1000);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a large innerInterval for the first rounding used by the function is important to trigger the problematic code path


OffsetDateTime timestamp = Instant.parse("2018-01-01T00:00:01.000Z").atOffset(ZoneOffset.UTC);
int result = InternalAutoDateHistogram.getAppropriateRounding(timestamp.toEpochSecond()*1000,
timestamp.plusDays(1).toEpochSecond()*1000, 0, roundings, 25);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've also gotta start with an index that we know won't be used: to trigger the code path, the function needs to iterate past the first rounding.

@DaveCTurner
Copy link
Contributor

On repeated runs I still see failures. For instance a run of 1000 iterations yielded this:

  Gradle Version        : 4.9
  OS Info               : Mac OS X 10.13.6 (x86_64)
  JDK Version (gradle)  : Oracle Corporation 10 [Java HotSpot(TM) 64-Bit Server VM 10+46]
  JAVA_HOME (gradle)    : /Library/Java/JavaVirtualMachines/jdk-10.jdk/Contents/Home
  JDK Version (compile) : Oracle Corporation 10 [Java HotSpot(TM) 64-Bit Server VM 10+46]
  JAVA_HOME (compile)   : /Library/Java/JavaVirtualMachines/jdk-10.jdk/Contents/Home
  JDK Version (runtime) : Oracle Corporation 1.8.0_144 [Java HotSpot(TM) 64-Bit Server VM 25.144-b01]
  JAVA_HOME (runtime)   : /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home
  Random Testing Seed   : 6A6B58FA134BCB5D
...
  2> REPRODUCE WITH: ./gradlew :server:test -Dtests.seed=21F8DDAAB657B52A -Dtests.class=org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogramTests -Dtests.method="testReduceRandom { seed=[21F8DDAAB657B52A:D3C6326F8CF6C4B0]}" -Dtests.security.manager=true -Dtests.locale=zh-HK -Dtests.timezone=Indian/Mahe
FAILURE 0.01s | InternalAutoDateHistogramTests.testReduceRandom { seed=[21F8DDAAB657B52A:D3C6326F8CF6C4B0]} <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected:<{1533765180000=53, 1533766980000=0, 1533768780000=3, 1533770580000=0, 1533772380000=2}> but was:<{1533762000000=53, 1533765600000=3, 1533769200000=2}>
   >  at __randomizedtesting.SeedInfo.seed([21F8DDAAB657B52A:D3C6326F8CF6C4B0]:0)
   >  at org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogramTests.assertReduced(InternalAutoDateHistogramTests.java:157)
   >  at org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogramTests.assertReduced(InternalAutoDateHistogramTests.java:45)
   >  at org.elasticsearch.test.InternalAggregationTestCase.testReduceRandom(InternalAggregationTestCase.java:264)
   >  at java.lang.Thread.run(Thread.java:748)
  2> NOTE: leaving temporary files on disk at: /Users/davidturner/src/elasticsearch-master/server/build/testrun/test/J0/temp/org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogramTests_21F8DDAAB657B52A-001
  2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{}, maxPointsInLeafNode=191, maxMBSortInHeap=7.887615016643835, sim=RandomSimilarity(queryNorm=false): {}, locale=zh-HK, timezone=Indian/Mahe
  2> NOTE: Mac OS X 10.13.6 x86_64/Oracle Corporation 1.8.0_144 (64-bit)/cpus=8,threads=1,free=466868936,total=522190848
  2> NOTE: All tests run in this JVM: [InternalAutoDateHistogramTests]

I can reproduce this on the command line:

./gradlew :server:test -Dtests.seed=21F8DDAAB657B52A -Dtests.class=org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogramTests -Dtests.method="testReduceRandom" -Dtests.security.manager=true -Dtests.locale=zh-HK -Dtests.timezone=Indian/Mahe -Dtests.iters=1000

However I haven't been able to get this single case to reproduce.

import java.time.Instant;
import java.time.OffsetDateTime;
import java.time.ZoneOffset;
import java.util.*;
Copy link
Contributor

@DaveCTurner DaveCTurner Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jenkins is not amused ;) There's a setting in IntelliJ to avoid wildcard imports (at least until it hits an unreasonable number, I use 9999) and probably in other IDEs too.

@pcsanwald
Copy link
Contributor Author

@DaveCTurner thanks for the review, I'll track down the failure. once more into the breach! :)

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a minor comment but I think the change looks good. I haven't approved the PR only because it looks like there is another failure that is being looked into as well here.

@@ -77,6 +84,22 @@ protected InternalAutoDateHistogram createTestInstance(String name,
return new InternalAutoDateHistogram(name, buckets, targetBuckets, bucketInfo, format, pipelineAggregators, metaData);
}

public void testGetAppropriateRounding() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems to be testing a specific scenario rather than testing the method in general. Could we maybe make the name reflect this so its not assumed that its testing the method in general? Would it also be helpful to add a comment/javaDoc which explains the scenario it is testing?

@pcsanwald
Copy link
Contributor Author

I was able to reproduce the problem @DaveCTurner found reliably by adding the @Seed("D3C6326F8CF6C4B0") annotation to InternalAggregationTestCase.testReduceRandom and am pushing a fix to address.

@pcsanwald
Copy link
Contributor Author

@DaveCTurner @colings86 this is ready (fingers crossed) for final review: I reproduced the failures and believe I have addressed them, and have run a bunch of -Dtests.iters=1000 locally to convince myself this is reliable.

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good but it seems you left the @Seed annotation on the test which need to be removed but I also think it would be worth running a battery of test runs again to make sure there aren't more bugs lurking when the seed is removed. Assuming that battery of tests passes this LGTM

@@ -229,6 +230,7 @@ protected T createUnmappedInstance(String name,
return createTestInstance(name, pipelineAggregators, metaData);
}

@Seed("ED2D551807CFA25B")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you intend to commit this?

@DaveCTurner
Copy link
Contributor

Removing the @Seed annotation, I still get failures:

  2> REPRODUCE WITH: ./gradlew :server:test -Dtests.seed=E98A8421B3EA8CE8 -Dtests.class=org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogramTests -Dtests.method="testReduceRandom { seed=[E98A8421B3EA8CE8:7264D28AC74663]}" -Dtests.security.manager=true -Dtests.locale=vo-001 -Dtests.timezone=Africa/Harare
FAILURE 0.00s | InternalAutoDateHistogramTests.testReduceRandom { seed=[E98A8421B3EA8CE8:7264D28AC74663]} <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: expected:<{1533895200000=117}> but was:<{1533895200000=27, 1533906000000=90}>
   >    at __randomizedtesting.SeedInfo.seed([E98A8421B3EA8CE8:7264D28AC74663]:0)
   >    at org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogramTests.assertReduced(InternalAutoDateHistogramTests.java:168)
   >    at org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogramTests.assertReduced(InternalAutoDateHistogramTests.java:50)
   >    at org.elasticsearch.test.InternalAggregationTestCase.testReduceRandom(InternalAggregationTestCase.java:264)
   >    at jdk.internal.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
   >    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
   >    at java.base/java.lang.Thread.run(Thread.java:844)

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Aug 10, 2018

If it helps, my preferred way to run lots of iterations of tests like this is the following:

date | tee pass-times.log; while ./gradlew --console=plain :server:test -Dtests.class=org.elasticsearch.search.aggregations.bucket.histogram.InternalAutoDateHistogramTests -Dtests.iters=10000 2>&1 > testoutput.log; do date | tee -a pass-times.log; done

This runs the tests until they fail, capturing the log output, and tries different timezones and locales (which I guess to be important here) in a way that a single ./gradlew run cannot. It outputs a date at the start of each run so you can ensure progress.

@pcsanwald
Copy link
Contributor Author

I've used @DaveCTurner's command as a test, and after running a bunch of local iterations, I believe this PR is ready for re-review. This proved quite tricky to get correct!

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for beasting the tests @DaveCTurner and @pcsanwald

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - one nit about whitespace

@@ -480,7 +480,7 @@ private int getAppropriateRounding(long minKey, long maxKey, int roundingIdx,
currentKey = currentRounding.nextRoundingValue(currentKey);
}
currentRoundingIdx++;
} while (requiredBuckets > (targetBuckets * roundings[roundingIdx].getMaximumInnerInterval())
} while (requiredBuckets > (targetBuckets * roundings[currentRoundingIdx-1].getMaximumInnerInterval())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace nit: indentation looks wrong, and spaces around the - sign?

@pcsanwald pcsanwald merged commit ca54aac into elastic:master Aug 17, 2018
pcsanwald pushed a commit to pcsanwald/elasticsearch that referenced this pull request Aug 17, 2018
@pcsanwald pcsanwald mentioned this pull request Aug 17, 2018
jasontedor added a commit that referenced this pull request Aug 18, 2018
* elastic/master: (46 commits)
  NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (#32764)
  [DOCS] Splits the users API documentation into multiple pages (#32825)
  [DOCS] Splits the token APIs into separate pages (#32865)
  [DOCS] Creates redirects for role management APIs page
  Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966)
  TEST: Mute testRetentionPolicyChangeDuringRecovery
  [DOCS] Fixes more broken links to role management APIs
  [Docs] Tweaks and fixes to rollup docs
  [DOCS] Fixes links to role management APIs
  [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature
  [DOCS] Splits the roles API documentation into multiple pages (#32794)
  [TEST]  Run pre 6.4 nodes in non-FIPS JVMs (#32901)
  Make Geo Context Mapping Parsing More Strict (#32821)
  [ML] fix updating opened jobs scheduled events (#31651) (#32881)
  Scripted metric aggregations: add deprecation warning and system property to control legacy params (#31597)
  Tests: Fix timezone conversion in DateTimeUnitTests
  Enable FIPS140LicenseBootstrapCheck (#32903)
  Fix InternalAutoDateHistogram reproducible failure (#32723)
  Remove assertion in testDocStats on deletedDocs counter (#32914)
  HLRC: Move ML request converters into their own class (#32906)
  ...
pcsanwald pushed a commit that referenced this pull request Aug 18, 2018
Update test logic to correctly bucket intervals.
jasontedor pushed a commit that referenced this pull request Aug 21, 2018
Update test logic to correctly bucket intervals.
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] InternalAutoDateHistogramTests#testReduceRandom reproducable failure
5 participants