-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
twister with --device-testing sometimes overlaps tests #32679
Labels
area: Twister
Twister
bug
The issue is a bug, or the PR is fixing a bug
priority: medium
Medium impact/importance bug
Comments
andyross
added
bug
The issue is a bug, or the PR is fixing a bug
area: Twister
Twister
labels
Feb 25, 2021
andyross
pushed a commit
to andyross/zephyr
that referenced
this issue
Mar 2, 2021
CPython is sometimes described as "single threaded" due to the GIL, but the interpreter will still "preemptively" switch between threads (the details seem poorly documented). So the time between checking whether acquire is 1 and decrementing the count could result in more than one thread seeing an "available" device, and more than one test being run (simultaneously, on the same physical device!). We have a big herd of threads all polling for this, so in a large test run this would happen maybe one time out of 20-30 tries. Use a lock. Also remove the very similar looking DUT.get_available_device() method, which had the same bug but appears to be dead code. Fixes zephyrproject-rtos#32679 Signed-off-by: Andy Ross <[email protected]>
andyross
pushed a commit
to andyross/zephyr
that referenced
this issue
Mar 3, 2021
CPython is sometimes described as "single threaded" due to the GIL, but the interpreter will still "preemptively" switch between threads (the details seem poorly documented). So the time between checking whether acquire is 1 and decrementing the count could result in more than one thread seeing an "available" device, and more than one test being run (simultaneously, on the same physical device!). We have a big herd of threads all polling for this, so in a large test run this would happen maybe one time out of 20-30 tries. Use a lock. Also remove the very similar looking DUT.get_available_device() method, which had the same bug but appears to be dead code. Fixes zephyrproject-rtos#32679 Signed-off-by: Andy Ross <[email protected]>
nashif
pushed a commit
that referenced
this issue
Mar 4, 2021
CPython is sometimes described as "single threaded" due to the GIL, but the interpreter will still "preemptively" switch between threads (the details seem poorly documented). So the time between checking whether acquire is 1 and decrementing the count could result in more than one thread seeing an "available" device, and more than one test being run (simultaneously, on the same physical device!). We have a big herd of threads all polling for this, so in a large test run this would happen maybe one time out of 20-30 tries. Use a lock. Also remove the very similar looking DUT.get_available_device() method, which had the same bug but appears to be dead code. Fixes #32679 Signed-off-by: Andy Ross <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area: Twister
Twister
bug
The issue is a bug, or the PR is fixing a bug
priority: medium
Medium impact/importance bug
While working on #32678 I discovered that twister isn't perfectly serialized when doing device testing. Sometimes I'll see a second test fire up unbidden, which then clobbers the running one and (for the specific case of that script) causes a hang where the scripts (which need to wait for each other) never complete. Twister then times out the next 1-3 tests before getting back in sync.
You can watch this happen with "ps" in a loop, I've seen as many as three log reading scripts running at the same time.
This seems much more common after a failure, and especially after a (legitimate, real device hang) timeout.
The rule obviously has to be that no script shall be started before the scripts from the previous run are killed and known to be dead.
The text was updated successfully, but these errors were encountered: