-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An error while training the dependency parser #4402
Comments
Thanks for the report, I've hit this as well. See #4392 . We're still working on the fix, because my patch in that PR currently breaks the textcat CLI stuff added in v2.2 |
Oh, thank you. I've been using the |
If you want to build the other tokenizer into the spacy pipeline so you can just call If you think it wouldn't be too complicated to improve spacy's tokenizer, we would be happy to get contributions in this area! It looks like the Polish tokenizer has a lot of exceptions for abbreviations but otherwise isn't very different from English at this point in terms of other punctuation. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Ever since the 2.2 update I've been having trouble training parser models. Training both taggers and NER is fine, but with parsers I get an error:
How to reproduce the behaviour
I've converted the data with spaCy 2.2 converter. I run the train command with this input:
nice -n 19 python3 -m spacy train pl base_parser_22 LFG22/pl_lfg-ud-train.json LFG22/pl_lfg-ud-dev.json --pipeline parser --vectors vocab_kgr_100_handpruned22 --n-iter 50 --n-early-stopping 10 -G
I've tried this with other treebanks and the error persists. I do not recall encountering anything like it in the previous version.
Your Environment
The text was updated successfully, but these errors were encountered: