-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wiki_train_entity_linker hanging on step 3 #5131
Comments
It is definitely counter intuitive if the progress bar stays stuck at 0% for too long. I will look into this ! |
Ok, I had another look and there's still a major inefficiency in the script, in the selection of your 8000 articles out of the pool of 6 million. Basically in short, if you "manually" create a smaller version of your I was running this on 165.000 training articles (and 1000 dev) with 1 epoch taking less than 24 hours, so your training time is definitely too slow. You can try cutting the |
Thanks, cutting the jsonl file into smaller pieces seems to have sped the process up! |
Thanks for letting me know - that confirms my suspicion! |
Ok, I was able to successfully train a model for entity linking, but it seems that because of the limits I set on training and test data, it's not actually that effective at performing NEL. What would you recommend is the most practical limit to set for the training and test data when training a model which also does not consume too much time? |
I haven't done an exhaustive search, ofcourse, but the setting of 165.000 training articles worked well for me. The dev test set can be kept small, like 1000 orso. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This is a follow-up on a previous memory issue.
Following the advice from that thread, and the instructions in the documentation, I installed the most recent version of spaCy and compiled it in a virtual environment. The updated scripts are supposed to address this memory issue. At first it seemed like that was fixed, but it looks like the script is hanging on step 3, when processing the dev data:
I've tried limiting the training data and testing data as well. I don't think it's a matter of time - step 2 took around 10 minutes, but this step has been hanging for about an hour. The closest issue I can find to this seems to be stuck at a different step, so I am not really sure what could be wrong.
Info about spaCy
EDIT: After running the script again with an even smaller dataset, it looks like it IS a matter of time, so I will be closing this issue for now
The text was updated successfully, but these errors were encountered: