Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PaddlePaddle Hackathon 4 No.44】为 Paddle 优化 logsumexp op 在 GPU 上的计算性能 #52543

Closed
wants to merge 22 commits into from

Conversation

thunder95
Copy link
Contributor

PR types

Performance optimization

PR changes

OPs

Describe

目前 Paddle 内 logsumexp 算子 GPU 计算采用了eigen库实现,性能仍有明显的提升空间。
设计文档: PaddlePaddle/community#480

  • 开发环境:
  1. 设备:RTX 2070s
  2. 环境:CUDA10.2,cuDNN 7
  • 优化方法
    通过使用飞桨内部已经实现的KPS算子,性能明显得到提升。

完成优化后,Paddle与优化前的Paddle的性能对比效果:

Case No. device input_shape input_type Paddle Perf(ms) old_paddle Perf(ms) diff
1 RTX 2070s [64L, 64L] float32 0.00685 0.06810 faster than 894.16%
2 RTX 2070s [1024L, 512L] float32 0.01582 0.67155 faster than 4144.94%
3 RTX 2070s [64L, 64L] float16 0.00682 0.06718 faster than 885.04%
4 RTX 2070s [1024L, 512L] float16 0.016105 0.64455 faster than 3902.17%

完成优化后,Paddle与Pytorch的性能对比效果如下:

Case No. device input_shape input_type Paddle Perf(ms) Pytorch Perf(ms) diff
1 RTX 2070s [64L, 64L] float32 0.00685 0.03757 faster than 448.47%
2 RTX 2070s [1024L, 512L] float32 0.01582 0.05742 faster than 262.96%
3 RTX 2070s [64L, 64L] float16 0.00682 0.04035 faster than 491.64%
4 RTX 2070s [1024L, 512L] float16 0.016105 0.05294 faster than 228.72%

针对四种不同case, 优化后的性能有不同程度非常明显的提升。

@paddle-bot
Copy link

paddle-bot bot commented Apr 4, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@luotao1
Copy link
Contributor

luotao1 commented Apr 10, 2023

@luotao1 luotao1 closed this Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants