Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POS tag of "LANG" assigned to tokens, causing a KeyError #3958

Closed
bdewilde opened this issue Jul 12, 2019 · 3 comments
Closed

POS tag of "LANG" assigned to tokens, causing a KeyError #3958

bdewilde opened this issue Jul 12, 2019 · 3 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@bdewilde
Copy link

bdewilde commented Jul 12, 2019

How to reproduce the behaviour

Hello! I upgraded to v2.1.5 💫 and ran into an issue POS-tagging a text that wasn't present yesterday in v2.1.4. Specifically, the en_core_web_sm model assigns "LANG" as a POS tag for some tokens, which afaik isn't a valid value. This, in turn, raises a KeyError when calling tok.pos_ on the offending tokens, since parts_of_speech.IDS doesn't have "LANG" as a key.

Diving in, I see that the POS tags are nonsensical:

>>> [(tok, doc.vocab[tok.pos].text) for tok in doc]
...
 (., 'PROPN'),
 (, 'EOL'),
 (In, 'ADJ'),
 (this, 'CCONJ'),
 (case, 'INTJ'),
 (it, 'PART'),
 (was, 'SYM'),
 (several, 'LANG'),
 (feet, 'INTJ'),
 (below, 'ADJ'),
 (it, 'PART'),
 (., 'PROPN'),
 (But, 'CONJ'),
 (a, 'CCONJ'),
 (section, 'INTJ'),
...

I have no idea what's gone wrong. Here's a full example, using a fresh install of both spacy and the model:

In [1]: import spacy

In [2]: nlp = spacy.load("en")

In [3]: doc = nlp("This is an example sentence.")

In [4]: [(tok, doc.vocab[tok.pos].text) for tok in doc]
Out[4]:
[(This, 'CCONJ'),
 (is, 'SYM'),
 (an, 'CCONJ'),
 (example, 'INTJ'),
 (sentence, 'INTJ'),
 (., 'PROPN')]

Your Environment

  • spaCy version: 2.1.5
  • Platform: Darwin-18.6.0-x86_64-i386-64bit
  • Python version: 3.7.0
  • Models: es, en, xx
@ines ines added the bug Bugs and behaviour differing from documentation label Jul 12, 2019
@honnibal
Copy link
Member

Damn, we threw the symbols table out of alignment. I don't get why our tests didn't catch this :(. Fix forth-coming, sorry!

@honnibal
Copy link
Member

honnibal commented Jul 12, 2019

Fixed, and v2.1.6 uploaded. Wheels coming soon. Thanks again for the quick report.

@lock
Copy link

lock bot commented Aug 11, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

3 participants