Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: parallel #73

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

feat: parallel #73

wants to merge 8 commits into from

Conversation

william-silversmith
Copy link
Contributor

@william-silversmith william-silversmith commented Sep 9, 2021

Adds parallel flag.

  • RFT pass
  • First pass (this is where most of the benefit is)
  • Relabel pass

The relabel pass was easy, but pretty marginal in terms of the performance impact. The changes that need to be made to the core equivalence pass are kind of outrageous. It might be better to maintain two separate pieces of code for single and parallel just so there's a sane version.

@william-silversmith william-silversmith added the performance Lower memory or faster computation. label Sep 9, 2021
@william-silversmith william-silversmith self-assigned this Sep 9, 2021
@william-silversmith
Copy link
Contributor Author

william-silversmith commented Sep 10, 2021

Got the first pass not crashing. Seems slower at the moment. Will have to figure out why. Looks like this speeds the first pass up, but slows down relabeling. Probably has something to do with making the renumbering array bigger. That can probably be handled with offsets.

@william-silversmith
Copy link
Contributor Author

william-silversmith commented Jul 17, 2024

Managed to increase perf from 120 MVx/sec to ~280 MVx/sec with 8 cores. Something isn't right, it should scale much closer to 1:1.

Some timing info from a 1GVx volume colored in with 8 copies of connectomics.npy.

Some things to observe:

  1. Allocate is big for parallel = 1 because the epl information was not used correctly (so disregard).
  2. Unify climbs with parallel because more slices need to be processed with increasing parallel to stitch the regions together.
  3. First pass is the most expensive process, and for some reason it is not declining linearly with the number of processors.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Lower memory or faster computation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant