-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocess.py 的 53-63行感觉有点问题 #9
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
假如tokens长度621
执行完8行时, start_index =200, end_index =400, train_list保存到200
进入循环,第一次执行到13行,start_index =400, end_index =600, train_list保存到400
判断600+50 > 621 退出,train_list保存到400,400-621 被遗弃
假如tokens长度651
执行完8行时, start_index =200, end_index =400, train_list保存到200
进入循环,第一次执行到13行,start_index =400, end_index =600, train_list保存到400
第二次执行到13行,start_index =600, end_index =800, train_list保存到600
判断800+50 > 621 退出,train_list保存到600,600-651 被遗弃
你这个代码会把tokens的最后50 到step+50-1 token删除,感觉不是你说的 剩下的数据长度,大于或等于50,才加入训练数据集
The text was updated successfully, but these errors were encountered: