Skip to content

v1.7.1 - Continuous batching feature supports ChatGLM2/3.

Compare
Choose a tag to compare
@Duyi-Wang Duyi-Wang released this 12 Jun 05:27
· 48 commits to main since this release
38658b1

v1.7.1 - Continuous batching feature supports ChatGLM2/3.

Functionality

  • Add continuous batching support of ChatGLM2/3 models.
  • Qwen2Convert supports quantized Qwen2 models by GPTQ, such as GPTQ-Int8 and GPTQ-Int4, by param from_quantized_model="gptq".

BUG fix

  • Fixed the segament fault error when running with more than 2 ranks in vllm-xft serving.

What's Changed

Generated release nots

Full Changelog: v1.7.0...v1.7.1