-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Add automatic test group re-running for flaky tests #44927
Comments
Pinging @elastic/kibana-operations |
I think we should probably focus on #44934 first, so that retries are lower impact. Additionally, I think it's important that this effort also include reporting to ES what suites needed to be retried, and what the result of the retry was. We should be able to figure out from there what suites are flaky and assign work that way, hopefully eventually automating that too. |
There's a draft PR where I've implemented something that works here: #53961 There are minimal changes to reporting / alerting / github checks / etc and it maybe works close to well enough for a trial run. As currently implemented:
To combat the potentially longer build times, we could (not sure how I feel about this part yet):
Which would mean that:
Any thoughts on any of this? |
This is great. Really impressed by how feature complete this already feels. I agree that the Github triggers are more or less providing continuous builds during U.S. workday. How busy is CI on the tracked branches during the rest of the time? If it's pretty much the same, I like the idea of moving to an interval while allowing concurrency to provide more consistency. |
Let me clarify: I'm saying that, at least at the moment, Github triggers aren't really doing anything at the moment for master and 7.x. master and 7.x are set to build every hour without concurrent builds. The builds take about 1h20m+, which means the timer fires for the next build before the current build is complete. This means we're continuously building all day long. So, turning off Github triggers wouldn't affect when builds happen, because we always start a new build as soon as the previous one finishes. So, if we turned them off and allowed concurrent timer builds, we'd actually get more builds per day, even if some of the builds get longer because of flaky ciGroups retrying. |
Once #44925 is finished, it will be pretty easy to add automatic re-running of individual ciGroups to add some protection and against flaky tests.
In subsequent work, we will likely be removing ciGroups altogether, in favor of running small test suites in a sort of queue-based fashion. In this case, re-running tests will mean re-running a small suite instead of a full ciGroup.
Other considerations:
The text was updated successfully, but these errors were encountered: