Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: test cluster setup - no wallets are syncing #3461

Open
1 task done
Anviking opened this issue Aug 25, 2022 · 0 comments
Open
1 task done

Flaky test: test cluster setup - no wallets are syncing #3461

Anviking opened this issue Aug 25, 2022 · 0 comments
Assignees
Labels
Test failure A flaky test or nightly CI failure

Comments

@Anviking
Copy link
Member

Anviking commented Aug 25, 2022

Please ensure:

  • This is actually a flaky test already present in the code and not caused by your PR.

Context

  • Problem with cluster setup
  • Seen more after Improve cluster setup reliability #3444 (but not exclusively after?)
  • Logs from pool-1 when it happens: integration-pool-1.log
    • Some badInputs and chainDensity 0 looking suspicious
    • Possibly the epoch length is too short (80 slots / 16 s), and the first pool doesn't have time to produce a block before the chain gets broken?

Job name

integration

Test case name(s)

Most tests

Error message

src/Test/Integration/Framework/DSL.hs:1261:66:
 194) API Specifications, BYRON_HW_WALLETS, HW_WALLETS_03 - Cannot do operations requiring private key, Cannot send tx
      Waited longer than 90s to resolve action: "restoreWalletFromPubKey: wallet is 100% synced ".
      expected: Ready
       but got: Syncing (Quantity (Percentage (7 % 1250)))

 To rerun use: --match "/API Specifications/BYRON_HW_WALLETS/HW_WALLETS_03 - Cannot do operations requiring private key/Cannot send tx/"

Build link

#3427 (comment)

@Anviking Anviking added the Test failure A flaky test or nightly CI failure label Aug 25, 2022
@Anviking Anviking self-assigned this Aug 25, 2022
@Anviking Anviking changed the title Flaky test: no wallets are syncing Flaky test: test cluster setup - no wallets are syncing Aug 25, 2022
Anviking added a commit that referenced this issue Aug 25, 2022
In #3461 we are seeing the cluster setup fail. It seems the chain gets
broken. A pool must produce a block within 3k/f = previously 75 slots
or the chain will be broken, with "NoLedgerView" being a symptom.
(https://input-output-rnd.slack.com/archives/CR599HMFX/p1649430846682959?thread_ts=1649430803.174879&cid=CR599HMFX)

We do see TraceNoLedgerView in the logs of the CI failures:

[pool-1:cardano.node.Forge:Error:34] [2022-08-25 09:29:47.00 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNoLedgerView"),("slot",Number 30.0)]))]
[pool-1:cardano.node.LeadershipCheck:Info:34] [2022-08-25 09:29:47.20 UTC] {"chainDensity":0,"credentials":"Cardano","delegMapSize":4,"kind":"TraceStartLeadershipCheck","slot":31,"utxoSize":5263}
[pool-1:cardano.node.Forge:Error:34] [2022-08-25 09:29:47.20 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNoLedgerView"),("slot",Number 31.0)]))]

/however/ the first TraceNoLedgerView we see is at slot 30, well before
slot 75.

Regardless, this commit increases the epoch length in hope that it might
alleviate the issue.

Changing the epochLength /could/ lead to other problems, but e.g.
`waitForNextEpoch` should still work correctly with a 90s timeout
waiting at most for 32s, so hopefully we're good.
iohk-bors bot added a commit that referenced this issue Aug 25, 2022
3462: Try increasing the epoch length in integration tests r=Anviking a=Anviking

- [x] Try doubling the epoch length from 80 slots to 160 slots, and also double k

### Comments

In #3461 we are seeing wallets having trouble syncing at all. It seems there isn't even a valid chain. A pool must produce a block within 3k/f = previously 75 slots
from genesis or the chain will be broken, with "NoLedgerView" being a symptom.
(https://input-output-rnd.slack.com/archives/CR599HMFX/p1649430846682959?thread_ts=1649430803.174879&cid=CR599HMFX)

We do see TraceNoLedgerView in the logs of the CI failures:

```
[pool-1:cardano.node.Forge:Error:34] [2022-08-25 09:29:47.00 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNoLedgerView"),("slot",Number 30.0)]))]
[pool-1:cardano.node.LeadershipCheck:Info:34] [2022-08-25 09:29:47.20 UTC] {"chainDensity":0,"credentials":"Cardano","delegMapSize":4,"kind":"TraceStartLeadershipCheck","slot":31,"utxoSize":5263}
[pool-1:cardano.node.Forge:Error:34] [2022-08-25 09:29:47.20 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNoLedgerView"),("slot",Number 31.0)]))]
```

/however/ the first TraceNoLedgerView we see is at slot 30, well before
slot 75.

Regardless, this commit increases the epoch length in hope that it might
alleviate the issue.

Changing the epochLength /could/ lead to other problems, but e.g.
`waitForNextEpoch` should still work correctly with a 90s timeout
waiting at most for 32s, so hopefully we're good.

<!-- Additional comments, links, or screenshots to attach, if any. -->

### Issue Number

ADP-2171 / #3461

<!-- Reference the Jira/GitHub issue that this PR relates to, and which requirements it tackles.
  Note: Jira issues of the form ADP- will be auto-linked. -->


Co-authored-by: Johannes Lund <[email protected]>
iohk-bors bot added a commit that referenced this issue Aug 25, 2022
3462: Try increasing the epoch length in integration tests r=Anviking a=Anviking

- [x] Try doubling the epoch length from 80 slots to 160 slots, and also double k

### Comments

In #3461 we are seeing wallets having trouble syncing at all. It seems there isn't even a valid chain. A pool must produce a block within 3k/f = previously 75 slots
from genesis or the chain will be broken, with "NoLedgerView" being a symptom.
(https://input-output-rnd.slack.com/archives/CR599HMFX/p1649430846682959?thread_ts=1649430803.174879&cid=CR599HMFX)

We do see TraceNoLedgerView in the logs of the CI failures:

```
[pool-1:cardano.node.Forge:Error:34] [2022-08-25 09:29:47.00 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNoLedgerView"),("slot",Number 30.0)]))]
[pool-1:cardano.node.LeadershipCheck:Info:34] [2022-08-25 09:29:47.20 UTC] {"chainDensity":0,"credentials":"Cardano","delegMapSize":4,"kind":"TraceStartLeadershipCheck","slot":31,"utxoSize":5263}
[pool-1:cardano.node.Forge:Error:34] [2022-08-25 09:29:47.20 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceNoLedgerView"),("slot",Number 31.0)]))]
```

/however/ the first TraceNoLedgerView we see is at slot 30, well before
slot 75.

Regardless, this commit increases the epoch length in hope that it might
alleviate the issue.

Changing the epochLength /could/ lead to other problems, but e.g.
`waitForNextEpoch` should still work correctly with a 90s timeout
waiting at most for 32s, so hopefully we're good.

<!-- Additional comments, links, or screenshots to attach, if any. -->

### Issue Number

ADP-2171 / #3461

<!-- Reference the Jira/GitHub issue that this PR relates to, and which requirements it tackles.
  Note: Jira issues of the form ADP- will be auto-linked. -->


Co-authored-by: Johannes Lund <[email protected]>
@Anviking Anviking mentioned this issue Sep 16, 2022
7 tasks
@Anviking Anviking mentioned this issue Dec 16, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Test failure A flaky test or nightly CI failure
Projects
None yet
Development

No branches or pull requests

1 participant