Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal POS Tag scheme discrepancies #4485

Closed
chrisjbryant opened this issue Oct 20, 2019 · 2 comments · Fixed by #4501
Closed

Universal POS Tag scheme discrepancies #4485

chrisjbryant opened this issue Oct 20, 2019 · 2 comments · Fixed by #4501
Labels
bug Bugs and behaviour differing from documentation feat / tagger Feature: Part-of-speech tagger

Comments

@chrisjbryant
Copy link

Although I raised this issue ages ago (#593) and I know some work has been done on it, I just want to let you know there is still a discrepancy in the way spacy maps fine POS tags to coarse POS tags for English.

Specifically, the current spacy tag_map maps PRP$ and WP$ to PRON while Universal Dependencies map them to DET (UD tag map). These are words like "my car" and "whose car", which UD lists as examples of determiners (link).

I also noticed some old information on the annotation specifications page:
https://spacy.io/api/annotation#pos-tagging

Specifically, the header in the Universal Part-of-speech Tags tab says you use the Universal Dependencies scheme, while the headers in the English and German tabs say you use the Google Universal Tagset. The latter is no longer true (although maybe it is for German?)!

@svlandeg svlandeg added enhancement Feature requests and improvements feat / tagger Feature: Part-of-speech tagger labels Oct 21, 2019
@adrianeboyd adrianeboyd added bug Bugs and behaviour differing from documentation and removed enhancement Feature requests and improvements labels Oct 22, 2019
@adrianeboyd
Copy link
Contributor

Thanks for bringing this up again! The tag maps should all be to UD v2 at this point, so this is a mistake. I'll verify the conversion tables based on the UD conversion info and update the documentation.

@lock
Copy link

lock bot commented Nov 23, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Nov 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / tagger Feature: Part-of-speech tagger
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants