-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizer cache doesn't handle modifications to special cases or token_match correctly #4238
Comments
This caching problem has been making me think I was losing my mind while testing special cases and token_match with the tokenizer. Here's the commit that went missing in from v1->v2 that deals with the cache problem: I think that a solution like this could fix the problem, but I'm not sure it's 100% correct for v2. When I test this with 2.0.18 it seems to work, but I'm not sure why given the minimal differences in the tokenizer between 2.0.18 and 2.1.8. |
Wow, no idea how that patch went missing! Glad I wrote some notes on that... So can we just take that commit? |
No, it doesn't quite work, either. I have a new version coming... |
Flush tokenizer cache when affixes, token_match, or special cases are modified. Fixes explosion#4238, same issue as in explosion#1250.
Flush tokenizer cache when affixes, token_match, or special cases are modified. Fixes explosion#4238, same issue as in explosion#1250.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
How to reproduce the behaviour
The github suggested related issues were actually helpful! #1061 seems to have snuck back in. It works in 2.0.18, not in 2.1.0.
Modifications to special cases and token_match don't work if the pipeline has been run at least once due to the tokenizer cache.
Info about spaCy
The text was updated successfully, but these errors were encountered: