Correction of default lemmatizer lookup in English (Issue # 4104) #4110

ajrader · 2019-08-12T19:38:04Z

Resolves issue 4102.

Description

made the following changes to lookup.py:
- 'dry' : 'dry'
- 'spun': 'spin'
- 'spun-dry': 'spin-dry'
created new test (test_issue4104.py) and verified it passed after the above changes.

Types of change

mainly feat / lemmatizer bug fix.

Note that this change will become obsolete once using a default lookup for a language is implemented.

Checklist

I have submitted the spaCy Contributor Agreement.
I ran the tests, and all new and existing tests passed.
My changes don't require a change to the documentation, or if they do, I've added all required information.

explosion-bot · 2019-08-12T19:54:27Z

Hi @ajrader, thanks for your pull request! 👍 It looks like you haven't filled in the spaCy Contributor Agreement (SCA) yet. The agrement ensures that we can use your contribution across the project. Once you've filled in the template, put it in the .github/contributors directory and add it to this pull request. If your pull request targets a branch that's not master, for example develop, make sure to submit the Contributor Agreement to the master branch. Thanks a lot!

If you've already included the Contributor Agreement in your pull request above, you can ignore this message.

ines · 2019-08-12T20:24:47Z

spacy/tests/regression/test_issue4104.py

+ """Test that English lookup lemmatization of spun & dry are correct"""
+ doc = get_doc(en_vocab, [t for t in text.split(" ")])
+ expected = {'dry': 'dry', 'spun': 'spin', 'spun-dry': 'spin-dry'}
+ assert [token.lemma_ for token in doc] == list(expected.values())


Thanks for adding the test! Looks like this one failed on Python 3.5 because dicts aren't ordered yet, so values() returns the key in a different order. (Totally not your fault btw, it's not exactly intuitive.) Calling sorted around both lists should resolve this and make sure the order is always the same.

Thanks. I went ahead and streamlined the test to not parametrize the string and compare it directly to a list of expected results.

ines · 2019-08-12T20:24:50Z

spacy/tests/regression/test_issue4104.py

+from ..util import get_doc
+
+@pytest.mark.parametrize('text', ['dry spun spun-dry'])
+


Suggested change

ines · 2019-08-12T20:26:03Z

spacy/tests/regression/test_issue4104.py

+import pytest
+from ..util import get_doc
+
+@pytest.mark.parametrize('text', ['dry spun spun-dry'])


Not sure we need to parametrize here because the expected values are hard-coded into the test anyways. So there's not really a motivation to try out different words here. So feel free to move the string 'dry spun spun-dry' into the function.

svlandeg

Awesome, thanks for creating the PR !

…e 4104 test

…ue4104

…plosion#4110) * pytest file for issue4104 established * edited default lookup english lemmatizer for spun; fixes issue 4102 * eliminated parameterization and sorted dictionary dependnency in issue 4104 test * added contributor agreement

ajrader and others added 3 commits August 12, 2019 15:24

pytest file for issue4104 established

4cacd13

edited default lookup english lemmatizer for spun; fixes issue 4102

7f0eeec

Merge branch 'master' into issue4104

97ca6c8

ines added feat / lemmatizer Feature: Rule-based and lookup lemmatization lang / en English language data and models perf / accuracy Performance: accuracy labels Aug 12, 2019

ines reviewed Aug 12, 2019

View reviewed changes

svlandeg reviewed Aug 12, 2019

View reviewed changes

ajrader added 3 commits August 13, 2019 11:26

eliminated parameterization and sorted dictionary dependnency in issu…

2bebc16

…e 4104 test

Merge branch 'issue4104' of https:/ajrader/spaCy into iss…

c189431

…ue4104

added contributor agreement

843b6a2

ines merged commit 2f36487 into explosion:master Aug 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correction of default lemmatizer lookup in English (Issue # 4104) #4110

Correction of default lemmatizer lookup in English (Issue # 4104) #4110

ajrader commented Aug 12, 2019

explosion-bot commented Aug 12, 2019

ines Aug 12, 2019

ajrader Aug 13, 2019

ines Aug 12, 2019

ines Aug 12, 2019

svlandeg left a comment

		from ..util import get_doc

		@pytest.mark.parametrize('text', ['dry spun spun-dry'])

Correction of default lemmatizer lookup in English (Issue # 4104) #4110

Correction of default lemmatizer lookup in English (Issue # 4104) #4110

Conversation

ajrader commented Aug 12, 2019

Description

Types of change

Checklist

explosion-bot commented Aug 12, 2019

ines Aug 12, 2019

Choose a reason for hiding this comment

ajrader Aug 13, 2019

Choose a reason for hiding this comment

ines Aug 12, 2019

Choose a reason for hiding this comment

ines Aug 12, 2019

Choose a reason for hiding this comment

svlandeg left a comment

Choose a reason for hiding this comment