Skip to content

Different from the output of the HF inference #280

Answered by zhuohan123
xcxhy asked this question in Q&A
Discussion options

You must be logged in to vote

The LLM inference process includes sampling, which is a random process. Because the implementation of HF and vLLM are different, it is normal to get different samples. However, if you perform argmax sampling (e.g., temperature=0), then you should be able to see the same results.

Replies: 6 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@tonystz
Comment options

@kaishxu
Comment options

@damandeep-hyprbots
Comment options

Answer selected by zhuohan123
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
10 participants
Converted from issue

This discussion was converted from issue #272 on June 27, 2023 15:25.