Skip to content

Commit

Permalink
c16/c8/c4 分离编译 加快编译速度
Browse files Browse the repository at this point in the history
  • Loading branch information
yuanlehome committed Oct 15, 2024
1 parent 2ef7c11 commit 4a4a4b4
Show file tree
Hide file tree
Showing 27 changed files with 6,476 additions and 5,625 deletions.
3 changes: 2 additions & 1 deletion csrc/gpu/append_attention.cu
Original file line number Diff line number Diff line change
Expand Up @@ -425,7 +425,8 @@ std::vector<paddle::Tensor> AppendAttention(
meta_data.token_nums = qkv_dims[0];
meta_data.kv_num_heads = key_cache_dims[1];
meta_data.head_dims = key_cache_dims[3];
const int total_num_head = qkv_dims[qkv_dims.size() - 1] / meta_data.head_dims;
const int total_num_head =
qkv_dims[qkv_dims.size() - 1] / meta_data.head_dims;
meta_data.q_num_heads = total_num_head - 2 * meta_data.kv_num_heads;

meta_data.max_blocks_per_seq = block_tables.dims()[1];
Expand Down
Loading

0 comments on commit 4a4a4b4

Please sign in to comment.