Make test-restart more user friendly. #3294

skycastlelily · 2024-10-16T15:06:28Z

Pull Request Checklist

skycastlelily · 2024-10-16T15:15:07Z

This merge request sets DEFAULT_TEST_RESTART_LIMIT to 10, I guess for most users 3 would be sufficient, but 1 will make some users( kernel folks to name one ) to hit the confusing "crashed too many times" error, unless they set restart_max_count larger. When I hit that error, I could find the problem by searching the code,but I guess we should do users a favor:)

happz · 2024-10-16T15:35:16Z

tmt/steps/execute/internal.py

- result.note = 'crashed too many times'
+ result.note = """
+ crashed too many times,you may want to set restart_max_count larger.
+ """


Multiline string will most likely carry those empty lines into the note. It might be better to split it into multiple strings:

result.note = ('crashed too many times,' 'you may want to set restart_max_count larger')

there is no restart_max_count object in user-visible part of tmt, the correct form is restart-max-count.

space after comma is needed, "times, you ..."

happz · 2024-10-16T15:36:44Z

Not so fast! :)

but 1 will make some users( kernel folks to name one ) to hit the confusing "crashed too many times" error, unless they set restart_max_count larger.

Why don't they set it then? :) If they already know their test may very well need to be restarted seven times in every second run, why not make that fact visible in test metadata by allowing for more restarts? Relying on default values may not be the best idea. And bumping the default to fit this particular scenario either.

When I hit that error, I could find the problem by searching the code,but I guess we should do users a favor:)

Can you share more about what was the test doing and why it had to be restarted more than once? And possibly other tests you're using that suffer from the low default? Right now I'm convinced the low default makes sense, that users are responsible for raising it per test because they are more aware of the context, and that a test that needs to be restarted multiple times is worth such a metadata update. But it seems like a large user group hitting this, I might be wrong, so let's find out more about the actual use cases.

skycastlelily · 2024-10-17T04:00:32Z

Not so fast! :)

Yeah, one reason I sent this mr is to show that I reviewed your mr Not because it's just two lines,but because I'm familiar with the stuff, and I looked into and tested it thoroughly. however, I definitely should be more patient:)

Why don't they set it then? :) If they already know their test may very

well need to be restarted seven times in every second run, why not make that fact visible in test metadata by allowing for >more restarts? Relying on default values may not be the best idea. Because, I guess many of tmt users would be those transferred from beaker?And, they don't need to set that in beaker,and I think making tmt a more user-friendly tool than beaker is definitely the goal we should reach,right?

Can you share more about what was the test doing and why it had to be

restarted more than once? I worked with kernel QE for Upstream first project, and recalled that some of their testcases need to be restarted more than once. Before I said some users would "crash too many" , I searched their testcases quickly, and there do have some testcases need reboot twice, or three times, most of them are network,storage and filesystem.Plus, I think set max-default-value to minus+1 is ...hmm, anyway, it won't hurt those who use low value if we set the value higher. To name one,when testing kdump, the default low value will make them face the "crash too many" error in real world( https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes).As, they need to set crashkernel and reboot,and then trigger kernel panic,which will result in a reboot,ie, two reboot totally. At least,we need update"crash too many" message,which is really confusing,right? TBO, when I search that message in tmt code, I'm not even sure there will be any result:)

…

On Wed, Oct 16, 2024 at 11:37 PM Miloš Prchlík ***@***.***> wrote: Not so fast! :) but 1 will make some users( kernel folks to name one ) to hit the confusing "crashed too many times" error, unless they set restart_max_count larger. Why don't they set it then? :) If they already know their test may very well need to be restarted seven times in every second run, why not make that fact visible in test metadata by allowing for more restarts? Relying on default values may not be the best idea. And bumping the default to fit this particular scenario either. When I hit that error, I could find the problem by searching the code,but I guess we should do users a favor:) Can you share more about what was the test doing and why it had to be restarted more than once? And possibly other tests you're using that suffer from the low default? Right now I'm convinced the low default makes sense, that users are responsible for raising it per test because they are more aware of the context, and that a test that needs to be restarted multiple times is worth such a metadata update. But it seems like a large user group hitting this, I might be wrong, so let's find out more about the actual use cases. — Reply to this email directly, view it on GitHub <#3294 (comment)>, or unsubscribe <https:/notifications/unsubscribe-auth/AKFR23DWNTSOCFE4PIO3SWLZ32B2HAVCNFSM6AAAAABQBWJHP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXGE4TSOBXGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

happz · 2024-10-21T16:45:48Z

Why don't they set it then? :) If they already know their test may very well need to be restarted seven times in every second run, why not make that fact visible in test metadata by allowing for more restarts? Relying on default values may not be the best idea.

Because, I guess many of tmt users would be those transferred from beaker?And, they don't need to set that in beaker,and I think making tmt a more user-friendly tool than beaker is definitely the goal we should reach,right?

True, but tmt shouldn't be a 1:1 replacement for Beaker. Every migration should consider what, how, and why. "It worked in Beaker" alone should not be enough - it does not imply it was correct :)

Can you share more about what was the test doing and why it had to be restarted more than once?

I worked with kernel QE for Upstream first project, and recalled that some of their testcases need to be restarted more than once. Before I said some users would "crash too many" , I searched their testcases quickly, and there do have some testcases need reboot twice, or three times, most of them are network,storage and filesystem.Plus, I think set max-default-value to minus+1 is ...hmm, anyway, it won't hurt those who use low value if we set the value higher. To name one,when testing kdump, the default low value will make them face the "crash too many" error in real world( https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes).As, they need to set crashkernel and reboot,and then trigger kernel panic,which will result in a reboot,ie, two reboot totally.

Good info, thank you. But, I would argue that if a test is known or even expected to restart several times, it should be noted in its metadata. It's like a duration: the default will never fit all, and if I as a test developer know the test will need 4 hours to finish, it's my responsibility to set duration accordingly.

the default low value will make them face the "crash too many" error in real world... they need to set crashkernel and reboot,and then trigger kernel panic,which will result in a reboot,ie, two reboot totally.

IMO that's a very good reason to set the test metadata to correctly announce what the test does. Why not do exactly that? I understand that by bumping the default they wouldn't have to, but I believe that would be a bad policy.

At least,we need update"crash too many" message,which is really confusing,right? TBO, when I search that message in tmt code, I'm not even sure there will be any result:)

Absolutely, improving the message and adding a hint makes perfect sense.

Make test-restart more user friendly.

cce2be6

skycastlelily requested review from psss, lukaszachy, happz, thrix and janhavlin as code owners October 16, 2024 15:06

happz reviewed Oct 16, 2024

View reviewed changes

squash:update

9ef1c7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make test-restart more user friendly. #3294

Make test-restart more user friendly. #3294

skycastlelily commented Oct 16, 2024 •

edited

Loading

skycastlelily commented Oct 16, 2024

happz Oct 16, 2024

happz commented Oct 16, 2024

skycastlelily commented Oct 17, 2024 via email

happz commented Oct 21, 2024

Make test-restart more user friendly. #3294

Are you sure you want to change the base?

Make test-restart more user friendly. #3294

Conversation

skycastlelily commented Oct 16, 2024 • edited Loading

skycastlelily commented Oct 16, 2024

happz Oct 16, 2024

Choose a reason for hiding this comment

happz commented Oct 16, 2024

skycastlelily commented Oct 17, 2024 via email

happz commented Oct 21, 2024

skycastlelily commented Oct 16, 2024 •

edited

Loading