-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check
randomly returning true unexpectedly
#2912
Comments
Thanks for reporting this 🙇 I have been running a (slightly modified in VUs , rate and sleep) test that does 500iter/s for around 10h at this point. There hasn't been a single occurance of this :( Can you please check that your memory is stable with memtest86 (for example). Do you have other stability issues or applications that act strangely sometimes? I am going to continue running the test and will even try some more experiments. Looking at the code the only explanation will be some kind of very strange bug in the js VM ... which is unlikely, but still 🤷. |
Thank you for the reply. I hadn't thought of memory errors. That script was being run in a managed cloud environment, and I can't run something like memtest86 in this environment, but I'm trying to run memtester. It hasn't reported errors so far, though, but I will continue trying to run it. In my environment, this issue seems to become more likely when there is a lot of concurrent activity (thousands of iterations per second, reaching the maximum of preallocated VUs). |
Before this check was only returning `false` if it also could emit it, and true otherwise. This should rarely be a problem as the same context that is being checked here is the one that interrupts the VM - so no code will be able to run after this. Unfortunately the code checking if it should be emitted is racing with the one that interrupts the VM. So it is possible for the VM to still not be interrupted when a `check` returns a wrong value. It is even possible for more code to be run before the interrupt is actually called. The code still checks the context as this also updates the internal check structure and we don't want to that if the context is done. The above should be changed with #2869 Fixes #2912
Ok I managed to get it to happen and I was definitely not going to get it with what I was trying 🤦 tl;dr: this happens only when the test is stopping and the check gets "interrupted" mid checking, but the VM is not interrupted yet. More in depth:Is the place where the check will set the return to false: Lines 230 to 231 in bbfe357
But you will never get to it if the context is done Lines 211 to 213 in bbfe357
The context being "done" means that test is ending. But if the stars align the Lines 530 to 533 in bbfe357
I managed to reproduce this with even more VUs then the provided script and shorter duration and setting I do not get it every time even this way, but
Where I also log the VU in the script and added a line to log when the As it can be seen there is clear correlation of "context done" and then the check returning "wrongly". This has been the case for 6 years, so I expect this is really hard to hit. Do you have a particular case where this happens a lot in a production script? |
Before this check was only returning `false` if it also could emit it, and true otherwise. This should rarely be a problem as the same context that is being checked here is the one that interrupts the VM - so no code will be able to run after this. Unfortunately the code checking if it should be emitted is racing with the one that interrupts the VM. So it is possible for the VM to still not be interrupted when a `check` returns a wrong value. It is even possible for more code to be run before the interrupt is actually called. The code still checks the context as this also updates the internal check structure and we don't want to that if the context is done. The above should be changed with #2869 Fixes #2912
Brief summary
During local tests with a large number of preallocated VUs, I have encountered situations
where
check
should have returnedfalse
, but it returnedtrue
.This is difficult to reproduce, since it happens randomly. It seems to be caused by a race condition.
k6 version
0.42.0.
OS
WSL2 (Ubuntu 20.04) in Windows 10.
Docker version and image (if applicable)
Docker version: 23.0.0; image:
loadimpact/k6:0.42.0
.Steps to reproduce the problem
Unfortunately, this problem is difficult to reproduce. It seems to be caused by a race condition,
and I haven't been able to find a scenario under which it is guaranteed to happen.
You may need to run the below steps multiple times before you see the issue come up.
I was able to see this problem happen several times when running the following minimal example in Docker, using the following command:
cat script.js | docker run --cpus=2 --memory=4G -i loadimpact/k6:0.42.0 run -
The
script.js
file has the following content:Expected behaviour
The
console.log
lines should never be printed, since both checks always returnfalse
.Actual behaviour
Sometimes, randomly, one (or both) of the
console.log
lines is printed.The text was updated successfully, but these errors were encountered: