Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input for stestr run --concurrency flag for fractional nprocs #202

Open
mtreinish opened this issue Nov 7, 2018 · 8 comments
Open

Comments

@mtreinish
Copy link
Owner

Right now the --concurrency flag for stestr run takes in an integer, if that value is 0 it runs nproc workers, if that value is greater than 0 it runs that many workers. But we don't have a way to specify a value that is a fraction of nproc easily (like nproc/2 or nproc/4). This can be useful for some test suites, especially if they're very resource intensive, where running nproc workers isn't viable but you still want the number of workers to scale dynamically based on the size of the machine.

There are 2 ways I can see handling this first we also take a string in for concurrency and define an input for the fractions. Like stestr run --concurrency 'n/2' will run with nprocs / 2 workers. Or we could use the negative space and stick with integers to define the same thing, so something like stest run --concurrency -2 will run with nprocs / 2 workers.

@mtreinish
Copy link
Owner Author

@masayukig do you have any thoughts on this?

@masayukig
Copy link
Collaborator

yeah, I think it's a good feature. I actually don't need like that option so far personally. But it makes sense to me.

So, regarding the option argument, I'm fine with either way. But I prefer -2 style because it's simpler.

I was also think like --concurrency '0.5' but I feel that would cause a confusion and inconsistency.

@mtreinish
Copy link
Owner Author

Well it mostly came up for me because of: Qiskit/qiskit#1228 where we're hitting occasionally memory issues. I'd rather set it to nproc/2 or nproc/4 rather than hard coding it to 2 workers.

@mtreinish
Copy link
Owner Author

As for the specific input definition I'm fine with anything as long as we make it clear. I only suggested '-2' or 'n/2' because I thought the implementation would be simpler. If we take a float my concern is that someone will do something like --concurrency 0.124332678212341. Granted we will always have the cast the result as an int so maybe that isn't a problem.

@mtreinish
Copy link
Owner Author

I had a thought instead of doing it as a fraction we could also just do subtraction. So --concurrency=-2 is nproc - 2 workers. Not sure if that's better though

@masayukig
Copy link
Collaborator

Hrm. I actually don't have a strong opinion here. If we know the nproc already, the subtraction might work well. But I also think, if we know that, can we specify the concurrency directly? So, I guess we use this feature when the nproc is ambiguous. If so, I feel treating the concurrency as a fraction might be better.

And, I'm curious about a corner case of the subtraction. What will happen when nproc =< abs(concurrency) && 0 > concurrency? For example, nproc=32 and --concurrency=-32.
Of course, I think we can treat this just no concurrency in this case, though.

@mtreinish
Copy link
Owner Author

Yeah, I think treating it as a fraction is probably better. I just wrote that down because it was an idea I had related to this, after thinking more I agree with subtraction is not a good idea.

As for that corner case yeah there are probably 2 ways to handle it, either we exit with an error code and say it's not a valid value, or we treat it as running serially. Actually, this made me think what happens today if we pass a negative number to --concurrency, so I ran a test:

stestr run --concurrency -23
list index out of range

We probably should change that error message to be more descriptive.

@masayukig
Copy link
Collaborator

heh, yeah. Oh, did you already start to implement this?

mtreinish added a commit that referenced this issue Dec 18, 2018
This commit adds a detailed error message when a user passes in a
concurrency outside the expected range (which is >= 0). Previously
we'd raise an unhandled IndexError because of going outside the list
bounds deeper in the code, which wasn't super useful or informative for
users. This fixes that by detecting when we have an invalid input up
front and print that out before we run anything.

Related to: #202
mtreinish added a commit that referenced this issue Feb 20, 2019
This commit adds a detailed error message when a user passes in a
concurrency outside the expected range (which is >= 0). Previously
we'd raise an unhandled IndexError because of going outside the list
bounds deeper in the code, which wasn't super useful or informative for
users. This fixes that by detecting when we have an invalid input up
front and print that out before we run anything.

Related to: #202
mtreinish added a commit that referenced this issue Feb 20, 2019
This commit adds a detailed error message when a user passes in a
concurrency outside the expected range (which is >= 0). Previously
we'd raise an unhandled IndexError because of going outside the list
bounds deeper in the code, which wasn't super useful or informative for
users. This fixes that by detecting when we have an invalid input up
front and print that out before we run anything.

Related to: #202
mtreinish added a commit to mtreinish/qiskit-core that referenced this issue Mar 16, 2019
In Qiskit#1228 we decreased the concurrency used for the test runner in CI to
avoid memory pressure and failing jobs. In that commit we set it to 2 as
a quick fix so we could unblock CI and continue landing patches. A
followup was to do the analysis and find the sweet spot for concurrency
and stability. At the same time we opened mtreinish/stestr#202 to add
support to the test runner for running fractional nprocs. That is still
under discussion, but in the meantime this commit adds support to the
makefile for using fractional nprocs concurrency. It adjusts are default
concurrency from 2 to be nprocs / 2 if nprocs > 3. This should enable us
to increase the number of test workers we run on travis linux jobs but
still maintain reliability.
kdk pushed a commit to Qiskit/qiskit that referenced this issue Apr 5, 2019
* Make test running concurrency dynamic based on nprocs

In #1228 we decreased the concurrency used for the test runner in CI to
avoid memory pressure and failing jobs. In that commit we set it to 2 as
a quick fix so we could unblock CI and continue landing patches. A
followup was to do the analysis and find the sweet spot for concurrency
and stability. At the same time we opened mtreinish/stestr#202 to add
support to the test runner for running fractional nprocs. That is still
under discussion, but in the meantime this commit adds support to the
makefile for using fractional nprocs concurrency. It adjusts are default
concurrency from 2 to be nprocs / 2 if nprocs > 3. This should enable us
to increase the number of test workers we run on travis linux jobs but
still maintain reliability.

* Hard code osx to 2 workers

* Add debug statement about core count and worker count

* Use printf to round floats to int
lia-approves pushed a commit to edasgupta/qiskit-terra that referenced this issue Jul 30, 2019
* Make test running concurrency dynamic based on nprocs

In Qiskit#1228 we decreased the concurrency used for the test runner in CI to
avoid memory pressure and failing jobs. In that commit we set it to 2 as
a quick fix so we could unblock CI and continue landing patches. A
followup was to do the analysis and find the sweet spot for concurrency
and stability. At the same time we opened mtreinish/stestr#202 to add
support to the test runner for running fractional nprocs. That is still
under discussion, but in the meantime this commit adds support to the
makefile for using fractional nprocs concurrency. It adjusts are default
concurrency from 2 to be nprocs / 2 if nprocs > 3. This should enable us
to increase the number of test workers we run on travis linux jobs but
still maintain reliability.

* Hard code osx to 2 workers

* Add debug statement about core count and worker count

* Use printf to round floats to int
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants