Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect lemma from lemmatizer #3700

Closed
oltip opened this issue May 7, 2019 · 4 comments
Closed

incorrect lemma from lemmatizer #3700

oltip opened this issue May 7, 2019 · 4 comments
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization more-info-needed This issue needs more information

Comments

@oltip
Copy link

oltip commented May 7, 2019

How to reproduce the behavior

I am not sure it's a bug, but just wanted to let you know that the lemma of the following words: 'car', 'carriers', 'scar', (SIM) card,
is always car. Is it correct from the lemmatization point of views?

Your Environment

  • Operating System: windows
  • Python Version Used: 3.7
  • spaCy Version Used: 2.1.1
  • Environment Information: conda
@bjascob
Copy link
Contributor

bjascob commented May 8, 2019

Spacy has a number of lemmatization issues. I have a list of about 400 words that lemmatize incorrectly posted in issue #3665. The current release also has an issue where the lemma can randomly vary. This is fixed in PR #3646. That PR is merged but it hasn't made a release yet.

@honnibal
Copy link
Member

honnibal commented May 11, 2019

I agree there are outstanding lemmatization issues, but does scar really lemmatize to car?? I can't see why that would happen, and can't immediately reproduce it:

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> doc = nlp(u'He has a scar.')
>>> doc[-2].lemma_
'scar'

@oltip I'd love to get to the bottom of what you've observed, if you can provide more details. Could you try again with v2.1.3? We did fix a bug in v2.1.1 that might be causing the behaviour you're seeing.

@honnibal honnibal added feat / lemmatizer Feature: Rule-based and lookup lemmatization more-info-needed This issue needs more information labels May 11, 2019
@no-response
Copy link

no-response bot commented May 25, 2019

This issue has been automatically closed because there has been no response to a request for more information from the original author. With only the information that is currently in the issue, there's not enough information to take action. If you're the original author, feel free to reopen the issue if you have or find the answers needed to investigate further.

@no-response no-response bot closed this as completed May 25, 2019
@lock
Copy link

lock bot commented Jun 25, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feat / lemmatizer Feature: Rule-based and lookup lemmatization more-info-needed This issue needs more information
Projects
None yet
Development

No branches or pull requests

3 participants