Expected throughput? #3

cbockman · 2019-04-24T01:55:23Z

Can you provide any insight into expected throughput, relative to a "base" transformer implementation?

I.e., if you consider two model with same hidden size, # layers, etc., will sparse_attention version run significantly slower (if yes, presumably because of recompute)?

Apologies if this was covered in the paper--I skimmed and didn't see it addressed.

Am considering getting this up and running--extremely interesting--but would like a sense on whether there is a major throughput hit before doing so.

Thank you--very neat to see successful evolution from https://openai.com/blog/block-sparse-gpu-kernels/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected throughput? #3

Expected throughput? #3

cbockman commented Apr 24, 2019

Expected throughput? #3

Expected throughput? #3

Comments

cbockman commented Apr 24, 2019