Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make test-restart more user friendly. #3294

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

skycastlelily
Copy link
Collaborator

@skycastlelily skycastlelily commented Oct 16, 2024

Pull Request Checklist

  • implement the feature
  • write the documentation
  • extend the test coverage
  • update the specification
  • adjust plugin docstring
  • modify the json schema
  • mention the version
  • include a release note

@skycastlelily
Copy link
Collaborator Author

This merge request sets DEFAULT_TEST_RESTART_LIMIT to 10, I guess for most users 3 would be sufficient, but 1 will make some users( kernel folks to name one ) to hit the confusing "crashed too many times" error, unless they set restart_max_count larger. When I hit that error, I could find the problem by searching the code,but I guess we should do users a favor:)

result.note = 'crashed too many times'
result.note = """
crashed too many times,you may want to set restart_max_count larger.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Multiline string will most likely carry those empty lines into the note. It might be better to split it into multiple strings:
result.note = ('crashed too many times,'
               'you may want to set restart_max_count larger')
  1. there is no restart_max_count object in user-visible part of tmt, the correct form is restart-max-count.
  2. space after comma is needed, "times, you ..."

@happz
Copy link
Collaborator

happz commented Oct 16, 2024

Not so fast! :)

but 1 will make some users( kernel folks to name one ) to hit the confusing "crashed too many times" error, unless they set restart_max_count larger.

Why don't they set it then? :) If they already know their test may very well need to be restarted seven times in every second run, why not make that fact visible in test metadata by allowing for more restarts? Relying on default values may not be the best idea. And bumping the default to fit this particular scenario either.

When I hit that error, I could find the problem by searching the code,but I guess we should do users a favor:)

Can you share more about what was the test doing and why it had to be restarted more than once? And possibly other tests you're using that suffer from the low default? Right now I'm convinced the low default makes sense, that users are responsible for raising it per test because they are more aware of the context, and that a test that needs to be restarted multiple times is worth such a metadata update. But it seems like a large user group hitting this, I might be wrong, so let's find out more about the actual use cases.

@skycastlelily
Copy link
Collaborator Author

skycastlelily commented Oct 17, 2024 via email

@happz
Copy link
Collaborator

happz commented Oct 21, 2024

Why don't they set it then? :) If they already know their test may very well need to be restarted seven times in every second run, why not make that fact visible in test metadata by allowing for more restarts? Relying on default values may not be the best idea.

Because, I guess many of tmt users would be those transferred from beaker?And, they don't need to set that in beaker,and I think making tmt a more user-friendly tool than beaker is definitely the goal we should reach,right?

True, but tmt shouldn't be a 1:1 replacement for Beaker. Every migration should consider what, how, and why. "It worked in Beaker" alone should not be enough - it does not imply it was correct :)

Can you share more about what was the test doing and why it had to be restarted more than once?

I worked with kernel QE for Upstream first project, and recalled that some of their testcases need to be restarted more than once. Before I said some users would "crash too many" , I searched their testcases quickly, and there do have some testcases need reboot twice, or three times, most of them are network,storage and filesystem.Plus, I think set max-default-value to minus+1 is ...hmm, anyway, it won't hurt those who use low value if we set the value higher. To name one,when testing kdump, the default low value will make them face the "crash too many" error in real world( https://fedoraproject.org/wiki/How_to_use_kdump_to_debug_kernel_crashes).As, they need to set crashkernel and reboot,and then trigger kernel panic,which will result in a reboot,ie, two reboot totally.

Good info, thank you. But, I would argue that if a test is known or even expected to restart several times, it should be noted in its metadata. It's like a duration: the default will never fit all, and if I as a test developer know the test will need 4 hours to finish, it's my responsibility to set duration accordingly.

the default low value will make them face the "crash too many" error in real world... they need to set crashkernel and reboot,and then trigger kernel panic,which will result in a reboot,ie, two reboot totally.

IMO that's a very good reason to set the test metadata to correctly announce what the test does. Why not do exactly that? I understand that by bumping the default they wouldn't have to, but I believe that would be a bad policy.

At least,we need update"crash too many" message,which is really confusing,right? TBO, when I search that message in tmt code, I'm not even sure there will be any result:)

Absolutely, improving the message and adding a hint makes perfect sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants