-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prefix_idxs如何确定 #15
Comments
啊这个就是按照附录里的每个任务的prompt模版(这个模版本身也是抄的别人在这个数据集上是怎么做的),然后取了label前面的两个token |
gsm8k的问题是,如果你用的prompt里要求模型显式输出The answer is xxx,那这儿就是'The answer is'的最后两个token(xxx之前的两个token),但如果答案要自行抽取的话,那就不能用上面的代码了。(我看gsm8k上怎么抽取答案好像也五花八门的,我也不确定怎么干好https:/facebookresearch/llama/issues/325) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
想问一下,在这个icl/util_classes/predictor_classes.py中的Predictor类中,prefix_idxs到底是怎么确定的,我看到不同的数据集有不同的设置方式。
if task_name == 'sst2':
self.prefix_idxs = [tokenizer.encode('Sentiment', add_special_tokens=False)[-1],
tokenizer.encode(':', add_special_tokens=False)[0]]
elif task_name == 'agnews':
self.prefix_idxs = [tokenizer.encode('Answer', add_special_tokens=False)[-1],
tokenizer.encode(':', add_special_tokens=False)[0]]
elif task_name == 'trec':
self.prefix_idxs = [tokenizer.encode(' Type', add_special_tokens=False)[-1],
tokenizer.encode(':', add_special_tokens=False)[0]]
elif task_name == 'emo':
self.prefix_idxs = [tokenizer.encode('Emotion', add_special_tokens=False)[-1],
tokenizer.encode(':', add_special_tokens=False)[0]]
我想问一下,如果对于其他的数据集(如gsm8k)应该怎么确定呢?谢谢
The text was updated successfully, but these errors were encountered: