About running speed #23

NieShenRuc · 2023-03-28T07:59:44Z

Thanks for your excellent work!
I have mentioned that torchscale serially executes the operation of mapping x to q, k, and v, in line 84~86 in file torchscale/component/multihead_attention.py. Will this be slower in your approach compared to doing it in parallel? For example, self.qkv_proj=nn.Linear(embed_dim, 3 * embed_dim)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About running speed #23

About running speed #23

NieShenRuc commented Mar 28, 2023

About running speed #23

About running speed #23

Comments

NieShenRuc commented Mar 28, 2023