Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text classification results (Document.cat) missing when multiprocessing enabled #4770

Closed
ejohb opened this issue Dec 5, 2019 · 2 comments · Fixed by #4774
Closed

Text classification results (Document.cat) missing when multiprocessing enabled #4770

ejohb opened this issue Dec 5, 2019 · 2 comments · Fixed by #4774
Labels
bug Bugs and behaviour differing from documentation feat / doc Feature: Doc, Span and Token objects feat / textcat Feature: Text Classifier

Comments

@ejohb
Copy link

ejohb commented Dec 5, 2019

How to reproduce the behaviour

On a model that contains some text classifiers, run this code with n_process=1. It works as expected; classifications are present:

text='my test text'
docs = self.model.pipe([text],n_process=1)
response = []
for i, (text, doc) in enumerate(zip(texts, docs), start=1):
    print(doc.to_json())
{'text': 'my test text', 'sents': [{'start': 0, 'end': 12}], 'cats': {...all of my cats are here..}, 'tokens': [{'id': 0, 'start': 0, 'end': 2}, {'id': 1, 'start': 3, 'end': 7}, {'id': 2, 'start': 8, 'end': 12}]}

Now run the same with n_process=-1, and they are missing:

text='my test text'
docs = self.model.pipe([text],n_process=-1)
response = []
for i, (text, doc) in enumerate(zip(texts, docs), start=1):
    print(doc.to_json())
    print(doc.cats)
{'text': 'my test text', 'sents': [{'start': 0, 'end': 12}], 'tokens': [{'id': 0, 'start': 0, 'end': 2}, {'id': 1, 'start': 3, 'end': 7}, {'id': 2, 'start': 8, 'end': 12}]}
{}

Your Environment

Info about spaCy

  • spaCy version: 2.2.3
  • Platform: Linux-4.4.0-18362-Microsoft-x86_64-with-glibc2.2.5
  • Python version: 3.8.0
@adrianeboyd adrianeboyd added bug Bugs and behaviour differing from documentation feat / doc Feature: Doc, Span and Token objects feat / textcat Feature: Text Classifier labels Dec 5, 2019
@adrianeboyd
Copy link
Contributor

Thanks for the report! I can replicate this and it looks like the cats are going missing when the Doc is serialized. I'm surprised this bug wasn't noticed a lot earlier...

@lock
Copy link

lock bot commented Jan 5, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / doc Feature: Doc, Span and Token objects feat / textcat Feature: Text Classifier
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants