`check` randomly returning true unexpectedly #2912

pcsegal · 2023-02-12T19:16:44Z

Brief summary

During local tests with a large number of preallocated VUs, I have encountered situations
where check should have returned false, but it returned true.

This is difficult to reproduce, since it happens randomly. It seems to be caused by a race condition.

k6 version

0.42.0.

OS

WSL2 (Ubuntu 20.04) in Windows 10.

Docker version and image (if applicable)

Docker version: 23.0.0; image: loadimpact/k6:0.42.0.

Steps to reproduce the problem

Unfortunately, this problem is difficult to reproduce. It seems to be caused by a race condition,
and I haven't been able to find a scenario under which it is guaranteed to happen.

You may need to run the below steps multiple times before you see the issue come up.

I was able to see this problem happen several times when running the following minimal example in Docker, using the following command:

cat script.js | docker run --cpus=2 --memory=4G -i loadimpact/k6:0.42.0 run -

The script.js file has the following content:

import { check, sleep } from 'k6';

export const options = {
    scenarios: {
        test: {
            executor: "constant-arrival-rate",
            preAllocatedVUs: 20000,
            rate: 10000,
            timeUnit: "1s",
            duration: "60s",
            exec: "default",
        },
    }
};

function randomNumber() {
    return (Math.random() * 5) + 2
}

export default function() {
    sleep(randomNumber())
    const checkOutput1 = check(null, {
        "my check 1": (res) => false,
    })
    if (checkOutput1) {
        console.log("checkOutput1:", checkOutput1)
    }
    sleep(randomNumber())
    const checkOutput2 = check(null, {
        "my check 2": (res) => false,
    })
    if (checkOutput2) {
        console.log("checkOutput1:", checkOutput1, "checkOutput2:", checkOutput2)
    }
}

Expected behaviour

The console.log lines should never be printed, since both checks always return false.

Actual behaviour

Sometimes, randomly, one (or both) of the console.log lines is printed.

The text was updated successfully, but these errors were encountered:

mstoykov · 2023-02-13T08:52:36Z

Thanks for reporting this 🙇

I have been running a (slightly modified in VUs , rate and sleep) test that does 500iter/s for around 10h at this point.

There hasn't been a single occurance of this :(

Can you please check that your memory is stable with memtest86 (for example). Do you have other stability issues or applications that act strangely sometimes?

I am going to continue running the test and will even try some more experiments. Looking at the code the only explanation will be some kind of very strange bug in the js VM ... which is unlikely, but still 🤷.

pcsegal · 2023-02-13T13:48:02Z

Thank you for the reply.

I hadn't thought of memory errors. That script was being run in a managed cloud environment, and I can't run something like memtest86 in this environment, but I'm trying to run memtester. It hasn't reported errors so far, though, but I will continue trying to run it.

In my environment, this issue seems to become more likely when there is a lot of concurrent activity (thousands of iterations per second, reaching the maximum of preallocated VUs).

Before this check was only returning `false` if it also could emit it, and true otherwise. This should rarely be a problem as the same context that is being checked here is the one that interrupts the VM - so no code will be able to run after this. Unfortunately the code checking if it should be emitted is racing with the one that interrupts the VM. So it is possible for the VM to still not be interrupted when a `check` returns a wrong value. It is even possible for more code to be run before the interrupt is actually called. The code still checks the context as this also updates the internal check structure and we don't want to that if the context is done. The above should be changed with #2869 Fixes #2912

mstoykov · 2023-02-14T10:19:26Z

Ok I managed to get it to happen and I was definitely not going to get it with what I was trying 🤦

tl;dr: this happens only when the test is stopping and the check gets "interrupted" mid checking, but the VM is not interrupted yet.

More in depth:

Is the place where the check will set the return to false:

k6/js/modules/k6/k6.go

Lines 230 to 231 in bbfe357

 // A single failure makes the return value false. 

 succ = false

But you will never get to it if the context is done

k6/js/modules/k6/k6.go

Lines 211 to 213 in bbfe357

 // Emit! (But only if we have a valid context.) 

 select { 

 case <-ctx.Done():

The context being "done" means that test is ending. But if the stars align the check will return (a true as it was never set to false) and continue executing the code before the code that interrupts the js VM runs

k6/js/runner.go

Lines 530 to 533 in bbfe357

 go func() { 

 <-ctx.Done() 

 vu.Runtime.Interrupt(context.Canceled) 

 }()

I managed to reproduce this with even more VUs then the provided script and shorter duration and setting gracefulStop to "0s".

I do not get it every time even this way, but GOMAXPROCS=256 does make it more likely, and I managed to get this:

INFO[0010] context done                                  VU=2448
INFO[0010] checkOutput1: false checkOutput2: true 2448   source=console
INFO[0010] context done                                  VU=18682
INFO[0010] checkOutput1: false checkOutput2: true 18682  source=console
INFO[0010] context done                                  VU=18555
INFO[0010] checkOutput1: true 18555                      source=console
INFO[0010] context done                                  VU=18555
INFO[0010] checkOutput1: true checkOutput2: true 18555   source=console

Where I also log the VU in the script and added a line to log when the check code goes through the context done branch in the select above with the VU as well.

As it can be seen there is clear correlation of "context done" and then the check returning "wrongly".

This has been the case for 6 years, so I expect this is really hard to hit. Do you have a particular case where this happens a lot in a production script?

Before this check was only returning `false` if it also could emit it, and true otherwise. This should rarely be a problem as the same context that is being checked here is the one that interrupts the VM - so no code will be able to run after this. Unfortunately the code checking if it should be emitted is racing with the one that interrupts the VM. So it is possible for the VM to still not be interrupted when a `check` returns a wrong value. It is even possible for more code to be run before the interrupt is actually called. The code still checks the context as this also updates the internal check structure and we don't want to that if the context is done. The above should be changed with #2869 Fixes #2912

pcsegal added the bug label Feb 12, 2023

mstoykov added evaluation needed proposal needs to be validated or tested before fully implementing it in k6 awaiting user waiting for user to respond labels Feb 13, 2023

mstoykov mentioned this issue Feb 14, 2023

k6.check: Fix returned values when iteration is ending #2915

Merged

mstoykov removed evaluation needed proposal needs to be validated or tested before fully implementing it in k6 awaiting user waiting for user to respond labels Feb 14, 2023

na-- added this to the v0.44.0 milestone Feb 20, 2023

mstoykov closed this as completed in #2915 Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`check` randomly returning true unexpectedly #2912

`check` randomly returning true unexpectedly #2912

pcsegal commented Feb 12, 2023 •

edited

Loading

mstoykov commented Feb 13, 2023

pcsegal commented Feb 13, 2023 •

edited

Loading

mstoykov commented Feb 14, 2023

check randomly returning true unexpectedly #2912

check randomly returning true unexpectedly #2912

Comments

pcsegal commented Feb 12, 2023 • edited Loading

Brief summary

k6 version

OS

Docker version and image (if applicable)

Steps to reproduce the problem

Expected behaviour

Actual behaviour

mstoykov commented Feb 13, 2023

pcsegal commented Feb 13, 2023 • edited Loading

mstoykov commented Feb 14, 2023

More in depth:

`check` randomly returning true unexpectedly #2912

`check` randomly returning true unexpectedly #2912

pcsegal commented Feb 12, 2023 •

edited

Loading

pcsegal commented Feb 13, 2023 •

edited

Loading