Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling larger clusters #13

Open
2 of 3 tasks
rjpower opened this issue Feb 5, 2014 · 1 comment
Open
2 of 3 tasks

Handling larger clusters #13

rjpower opened this issue Feb 5, 2014 · 1 comment

Comments

@rjpower
Copy link
Contributor

rjpower commented Feb 5, 2014

When we have arrays with more than a few hundred tiles, I've noticed that our performance drops significantly; this is almost certainly due to the various extent operations needed to compute tiles. We can move the extent code to Cython which would give us a big speedup.

Also, the vast majority of arrays have tiles that are all the same shape; we can leverage this to avoid scanning a tile list, and instead use the tile shape to find the target tile, e.g.

pos_to_tile(pos, tile_shape):
  tx = pos[0] / tile_shape[0]
  ty = pos[1] / tile_shape[1]
  ...
  num_tiles_x = array.shape[0] / tile_shape.x
  return ty * num_tiles_x + tx
  • Run profiles to find bottlenecks for arrays with many tiles
  • Migrate extent.py to Cython
  • Special handling for regular tile shapes
@fegin
Copy link
Contributor

fegin commented Feb 10, 2014

Following table shows how much time each benchmark spends on extent.py.

Master Workers
large number of tiles 12% 6~10%
reshape 9% 8~10%
transpose 3% 3~10%
benchmark_pagerank 7% 2%
benchmark_lreg 1% 0%
benchmark_finance 12% 1%
benchmark_slice 10% 6~10%
benchmark_dot 1% 0%
benchmark_kmean 4% 0%

After migrating extent.py to Cython and replacing tuples with C arrays, these benchmark spend less than 1% on extent.py (not committed yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants