-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocessing in pipe() not working with custom attributes #4903
Comments
I think the not-particularly-satisfying solution is to check that the extensions are registered in You can check whether they're already there ( |
Ah yes, I thought the issue was familiar but I couldn't find it. Apparently the linked issue was closed, but I think this issue should stay open. It would be nice to get a permanent fix for this. Not verify satisfying, indeed, but if it works, I am satisfied with a dirty workaround for now. I'm not sure whether checking |
If they've been set before in |
Seems like using Lines 52 to 53 in f2d2247
I am really curious whether the underlying issue can actually be fixed, i.e. that also on Windows the Token.set_extension() in init would work, and what is actually causing it. |
On Windows, set_extension in init does not work properly when using multiprocessing. As an alternative, set them (again) in __call__. Also see explosion/spaCy#4903
Can replicate this on my system (though as a unit test, the code hangs), definitely looks like a bug on Windows. |
Hm, that's odd. Just now, I copy-pasted the snippet from my OP into a new environment and it runs fine (when you leave out the EDIT: Oops, I was replying to your previous entry about hanging and reproducibility. |
Yep I assume it's blocking on a child process, but I could still reproduce the error so there's a chance of debugging ;-) |
@svlandeg: I don't think you need to do any particular windows debugging here. You can see the same behavior in linux if you use spawn instead of fork. I don't see an obvious way to fix this without completely changing how custom extensions are implemented. Maybe useful warnings are possible, though? |
So do I understand it correctly that it's a Windows-specific bug in spaCy because Windows defaults to spawn, and Linux to fork? |
I'd say that it looks like a windows-specific bug because of the multiprocessing defaults, but it's really an issue with spawn and spacy's global variables. |
Ok I looked into this some more, and from what I've read, it's bad practice to rely on the current state to be transferred to the workers (and it'll only work on Linux, not macOS or Windows). Instead, the required state should be transferred, cf https://docs.python.org/3/library/multiprocessing.html#programming-guidelines:
Fixed in PR #5006 by specifically loading the Thanks again for the report and the helpful code snippet, @BramVanroy ! |
Ah, that's great! I feared this issue was going to be ignored because "meh, Windows". I am glad it did get more attention. Thanks @svlandeg! You can close this if you want (I didn't yet because the tests in the PR failed but feel free). |
No that's fine, this issue will close automatically if/when the PR gets merged ;-) |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I am using custom attributes and a custom pipeline for the first time, so it might very well be that there is a mistake on my part. However, the code works fine when not using the
n_process
argument.The problem: using
nlp.pipe(text, n_process=2)
will throw an AttributeError complaining that I am assiging a value to an unregistered extension attribute. The error is not thrown without then_process
argument.Trace:
My guess would be that
n_process
creates new processes that recreate thenlp
instance, but does not reinitialize its pipes - but that's just a guess.How to reproduce the behaviour
Your Environment
The text was updated successfully, but these errors were encountered: