-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit tests sometimes time out. #2472
Comments
Could perhaps be related: Here, locally, it seems it blows up when trying to display the counterexample: The error was specific to #2521, but maybe the size of the counterexample could be related to this issue. |
Yeah nice hypothesis @Anviking - plausible because Arbitrary checkpoints will probably include Arbitrary TokenBundles now. |
I tried reproducing the above and failed. Strange thing is that it seems like I ran It's also interesting how this issue only occurs on hydra, and mostly on certain days. |
Are you sure of that? |
Using a modified* bors-stats.sh, all 12 failures of this ticket are hydra-failures. Browsing old related tickets manually GH, I only see hydra too. Maybe there's some Buildkite failures too, sure, I can't say for sure, but at least it seems to be mostly hydra. Output
*) Should make that script more powerful btw. |
I also looked into And maybe it is a bit slow?
|
I did manage to observe timeouts locally finally. Several times in a row, before it disappeared. I wonder if HLS was consuming all my memory at that point. (But I didn't notice anything at the time) On the same track, it seems https://downloads.haskell.org/~ghc/7.4.1/docs/html/users_guide/runtime-control.html Want to try playing around setting |
2537: Fix massive memory leak in unit tests r=Anviking a=Anviking # Issue Number ADP-758, #2472 # Overview - [x] Make sure to also close the DBLayers we create! - [x] <s>Use validateGenerator to also test non-QSM Arbitrary instances</s> - [ ] <s>TODO: Look a bit more at the generators / validation of them, could perhaps split the PR too.</s> # Comments Before this fix the memory usage would steadily climb upwards 5-15GB. Now it's at most ~40 MB. ### Running part of unit tests before fix: <img width="1167" alt="ska__rmavbild_2021-02-25_kl _12 34 43" src="https://user-images.githubusercontent.com/304423/109172195-cafd8c00-7782-11eb-8afd-a40732fccded.png"> ### Running all unit tests after fix: <img width="1399" alt="Skärmavbild 2021-02-25 kl 15 54 16" src="https://user-images.githubusercontent.com/304423/109171975-938edf80-7782-11eb-904b-9404708fbdde.png"> <!-- Additional comments or screenshots to attach if any --> <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Jira will detect and link to this PR once created, but you can also link this PR in the description of the corresponding ticket ✓ Acknowledge any changes required to the Wiki ✓ Finally, in the PR description delete any empty sections and all text commented in <!--, so that this text does not appear in merge commit messages. --> Co-authored-by: Johannes Lund <[email protected]>
2537: Fix massive memory leak in unit tests r=Anviking a=Anviking # Issue Number ADP-758, #2472 # Overview - [x] Make sure to also close the DBLayers we create! - [x] <s>Use validateGenerator to also test non-QSM Arbitrary instances</s> - [ ] <s>TODO: Look a bit more at the generators / validation of them, could perhaps split the PR too.</s> # Comments Before this fix the memory usage would steadily climb upwards 5-15GB. Now it's at most ~40 MB. ### Running part of unit tests before fix: <img width="1167" alt="ska__rmavbild_2021-02-25_kl _12 34 43" src="https://user-images.githubusercontent.com/304423/109172195-cafd8c00-7782-11eb-8afd-a40732fccded.png"> ### Running all unit tests after fix: <img width="1399" alt="Skärmavbild 2021-02-25 kl 15 54 16" src="https://user-images.githubusercontent.com/304423/109171975-938edf80-7782-11eb-904b-9404708fbdde.png"> <!-- Additional comments or screenshots to attach if any --> <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Jira will detect and link to this PR once created, but you can also link this PR in the description of the corresponding ticket ✓ Acknowledge any changes required to the Wiki ✓ Finally, in the PR description delete any empty sections and all text commented in <!--, so that this text does not appear in merge commit messages. --> Co-authored-by: Johannes Lund <[email protected]>
Should be fixed by #2537 |
Seen again 🙁 #2542 (comment) |
This might help reduce the unit test timeouts on macOS hydra builds. Investigating #2472 Jonathan found the following tests to fail with limited ulimit -v: 36% Match String: "Checkpoint" 44% Match String: "correct time measures" 100% Match String: "different request ids" 20% Match String: "Private Key" 32% Match String: "Tx History" 28% Match String: "Wallet Metadata" where entries 2 and 3 are from LogginsSpec. The timeouts in CI were in these following locations: 12 times (28%) 000000 // from "Not Allowed Methods" 9 times (21%) Checkpoint 8 times (19%) rollback 5 times (12%) 8601 // from e.g. "ISO 8601 extended format without timezones" 4 times (9%) JSON 4 times (9%) MVar 4 times (9%) readStakeDistribution // only in the past, so now fixed 1 times (2%) Coverage which either suggest there are an 10+ lines of output not shown in the output, or the LoggingSpec isn't to blame. So it would be very interesting to see whether this commit reduces timeouts or not.
This might help reduce the unit test timeouts on macOS hydra builds. Investigating #2472 Jonathan found the following tests to fail with limited ulimit -v: 36% Match String: "Checkpoint" 44% Match String: "correct time measures" 100% Match String: "different request ids" 20% Match String: "Private Key" 32% Match String: "Tx History" 28% Match String: "Wallet Metadata" where entries 2 and 3 are from LogginsSpec. The timeouts in CI were in these following locations: 12 times (28%) 000000 // from "Not Allowed Methods" 9 times (21%) Checkpoint 8 times (19%) rollback 5 times (12%) 8601 // from e.g. "ISO 8601 extended format without timezones" 4 times (9%) JSON 4 times (9%) MVar 4 times (9%) readStakeDistribution // only in the past, so now fixed 1 times (2%) Coverage which either suggest there are an 10+ lines of output not shown in the output, or the LoggingSpec isn't to blame. So it would be very interesting to see whether this commit reduces timeouts or not.
This might help reduce the unit test timeouts on macOS hydra builds. Investigating #2472 Jonathan found the following tests to fail with limited ulimit -v: 36% Match String: "Checkpoint" 44% Match String: "correct time measures" 100% Match String: "different request ids" 20% Match String: "Private Key" 32% Match String: "Tx History" 28% Match String: "Wallet Metadata" where entries 2 and 3 are from LogginsSpec. The timeouts in CI were in these following locations: 12 times (28%) 000000 // from "Not Allowed Methods" 9 times (21%) Checkpoint 8 times (19%) rollback 5 times (12%) 8601 // from e.g. "ISO 8601 extended format without timezones" 4 times (9%) JSON 4 times (9%) MVar 4 times (9%) readStakeDistribution // only in the past, so now fixed 1 times (2%) Coverage which either suggest there are an 10+ lines of output not shown in the output, or the LoggingSpec isn't to blame. So it would be very interesting to see whether this commit reduces timeouts or not.
This might help reduce the unit test timeouts on macOS hydra builds. Investigating #2472 Jonathan found the following tests to fail with limited ulimit -v: 36% Match String: "Checkpoint" 44% Match String: "correct time measures" 100% Match String: "different request ids" 20% Match String: "Private Key" 32% Match String: "Tx History" 28% Match String: "Wallet Metadata" where entries 2 and 3 are from LogginsSpec. The timeouts in CI were in these following locations: 12 times (28%) 000000 // from "Not Allowed Methods" 9 times (21%) Checkpoint 8 times (19%) rollback 5 times (12%) 8601 // from e.g. "ISO 8601 extended format without timezones" 4 times (9%) JSON 4 times (9%) MVar 4 times (9%) readStakeDistribution // only in the past, so now fixed 1 times (2%) Coverage which either suggest there are an 10+ lines of output not shown in the output, or the LoggingSpec isn't to blame. So it would be very interesting to see whether this commit reduces timeouts or not.
Unit tests on macOS hydra builds are regularly timing out. Rough analysis suggests more than 50% of recent unit timeouts occur in the MVar DB properties: > Broken down by tags/issues: > 21 times (100%) #2472 Unit tests sometimes time out. > 19 times (90%) mac > 19 times (90%) hydra > 10 times (48%) MVar > 2 times (10%) buildkite (./scripts/bors-stats.rb list --fetch-system --tag "#2472" --details true --annotate MVar) I believe as we run the DB properties on the real Sqlite DB, and the MVar properties in linux, we shouldn't lose much from disabling them on macOS.
2714: Mark MVar DB properties pending on macOS r=Anviking a=Anviking # Issue Number #2472 / ADP-970 # Overview <!-- Detail in a few bullet points the work accomplished in this PR --> - [x] Make `MVar` DB properties pending on macOS (see commit message) # Comments <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Jira will detect and link to this PR once created, but you can also link this PR in the description of the corresponding ticket ✓ Acknowledge any changes required to the Wiki ✓ Finally, in the PR description delete any empty sections and all text commented in <!--, so that this text does not appear in merge commit messages. --> Co-authored-by: Johannes Lund <[email protected]>
Grouping recent Mac hydra timeouts by month in lack of better ideas, but not sure what to tell from it:
|
2721: Ensure faucet setup runs on BFT node without rollbacks r=rvl a=Anviking # Issue Number ADP-970, #2720, #2428 # Overview <!-- Detail in a few bullet points the work accomplished in this PR --> - [x] Ensure faucet setup runs on BFT node, such that rollbacks can't mess it up # Comments <!-- Additional comments or screenshots to attach if any --> <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Jira will detect and link to this PR once created, but you can also link this PR in the description of the corresponding ticket ✓ Acknowledge any changes required to the Wiki ✓ Finally, in the PR description delete any empty sections and all text commented in <!--, so that this text does not appear in merge commit messages. --> 2723: Use LineBuffering in unit tests r=rvl a=Anviking # Issue Number ADP-970 / #2472 # Overview <!-- Detail in a few bullet points the work accomplished in this PR --> - [x] Try using line buffering in unit tests, like we do for integration tests # Comments <!-- Additional comments or screenshots to attach if any --> <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Jira will detect and link to this PR once created, but you can also link this PR in the description of the corresponding ticket ✓ Acknowledge any changes required to the Wiki ✓ Finally, in the PR description delete any empty sections and all text commented in <!--, so that this text does not appear in merge commit messages. --> Co-authored-by: Johannes Lund <[email protected]> Co-authored-by: Rodney Lorrimar <[email protected]>
2723: Use LineBuffering in unit tests r=rvl a=Anviking # Issue Number ADP-970 / #2472 # Overview <!-- Detail in a few bullet points the work accomplished in this PR --> - [x] Try using line buffering in unit tests, like we do for integration tests # Comments <!-- Additional comments or screenshots to attach if any --> <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Jira will detect and link to this PR once created, but you can also link this PR in the description of the corresponding ticket ✓ Acknowledge any changes required to the Wiki ✓ Finally, in the PR description delete any empty sections and all text commented in <!--, so that this text does not appear in merge commit messages. --> Co-authored-by: Johannes Lund <[email protected]> Co-authored-by: Rodney Lorrimar <[email protected]>
2723: Use LineBuffering in unit tests r=Anviking a=Anviking # Issue Number ADP-970 / #2472 # Overview <!-- Detail in a few bullet points the work accomplished in this PR --> - [x] Try using line buffering in unit tests, like we do for integration tests # Comments <!-- Additional comments or screenshots to attach if any --> <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Jira will detect and link to this PR once created, but you can also link this PR in the description of the corresponding ticket ✓ Acknowledge any changes required to the Wiki ✓ Finally, in the PR description delete any empty sections and all text commented in <!--, so that this text does not appear in merge commit messages. --> Co-authored-by: Johannes Lund <[email protected]> Co-authored-by: Rodney Lorrimar <[email protected]>
Attempt to ensure visible progress in the macOS hydra job. An hypothesis is that #2472 is caused by heavy load and unfocused resources from running the tests concurrently, risking that the slowest hspec runner - and thererefore the stdout - being silent for 900s causing hydra to timeout. Setting -j 1 should hopefully focus the resource we have in one place. It should go silent less often, at the expense of the full run getting slower.
Attempt to ensure visible progress in the macOS hydra job. An hypothesis is that #2472 is caused by heavy load and unfocused resources from running the tests concurrently, risking that the slowest hspec runner - and thererefore the stdout - being silent for 900s causing hydra to timeout. Setting -j 1 should hopefully focus the resource we have in one place. It should go silent less often, at the expense of the full run getting slower.
Attempt to ensure visible progress in the macOS hydra job. An hypothesis is that #2472 is caused by heavy load and unfocused resources from running the tests concurrently, risking that the slowest hspec runner - and thererefore the stdout - being silent for 900s causing hydra to timeout. Setting -j 1 should hopefully focus the resource we have in one place. It should go silent less often, at the expense of the full run getting slower.
2727: Set -j 1 in macOS nix unit tests r=rvl a=Anviking # Issue Number #2472 / ADP-970 # Overview - [x] Set `-j 1` for hydra Mac builds, in hope this alleviates unit timeouts # Comments A hypothesis is that #2472 is caused by heavy load and unfocused resources from running the tests concurrently, risking that the slowest hspec runner - and thererefore the stdout - being silent for 900s causing hydra to timeout. Setting -j 1 should hopefully focus the resource we have in one place. It should go silent less often, at the expense of the full run getting slower. ## Testing locally Tested locally with `nix-build -A checks.cardano-wallet-core`. Seems to work. It now takes 399s instead of 299.7877 on my machine (Not that dramatic slowdown). We have a `-with-rtsopts=-N4` in the .cabal file. I'm not sure if that's good or not, but I suspect it doesn't matter as much as the `-j` option. Co-authored-by: Johannes Lund <[email protected]>
Though we haven't had that many merges since the merging of #2727, we've not seen it again, and I think it helped. |
Context
Test Case
?
Failure / Counter-example
The text was updated successfully, but these errors were encountered: