Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLM INFER] Append attn #9244

Open
wants to merge 48 commits into
base: develop
Choose a base branch
from

Conversation

yuanlehome
Copy link
Collaborator

@yuanlehome yuanlehome commented Oct 11, 2024

PR types

New features

PR changes

Others

Description

大模型推理attention组网重构,新的append_attn方案相比旧方案有10%到90%的性能提升。

目前已支持了llama/qwen/qwen-moe/mixtral的推理。

使用方式,原推理脚本的 --block_attn选项改为--append_attn即可。

TODO:

  • fp8推理适配
  • 性能数据补充,稍后见llm docs

Copy link

paddle-bot bot commented Oct 11, 2024

Thanks for your contribution!

Copy link

codecov bot commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 0% with 60 lines in your changes missing coverage. Please review.

Project coverage is 53.08%. Comparing base (04f3c20) to head (4011d89).
Report is 13 commits behind head on develop.

Files with missing lines Patch % Lines
...erimental/transformers/fused_transformer_layers.py 0.00% 38 Missing ⚠️
...dlenlp/experimental/transformers/qwen2/modeling.py 0.00% 8 Missing ⚠️
...dlenlp/experimental/transformers/llama/modeling.py 0.00% 7 Missing ⚠️
...enlp/experimental/transformers/mixtral/modeling.py 0.00% 5 Missing ⚠️
...lp/experimental/transformers/qwen2_moe/modeling.py 0.00% 1 Missing ⚠️
paddlenlp/experimental/transformers/utils.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9244      +/-   ##
===========================================
+ Coverage    52.81%   53.08%   +0.26%     
===========================================
  Files          660      657       -3     
  Lines       107281   106801     -480     
===========================================
+ Hits         56660    56690      +30     
+ Misses       50621    50111     -510     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

static_cast<uint8_t>(quant_value2 + 128.0f);
}
// write k
// 大分块 lane_id / 4 / 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中文注释删一删

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

太多了,留着无伤大雅吧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants