Skip to content

Commit

Permalink
Fix inconsistant lemmatizer issue #3484 (#3646)
Browse files Browse the repository at this point in the history
* Fix inconsistant lemmatizer issue #3484

* Remove test case
  • Loading branch information
bjascob authored and ines committed May 4, 2019
1 parent b4d142e commit 955b95c
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions spacy/lemmatizer.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# coding: utf8
from __future__ import unicode_literals
from collections import OrderedDict

from .symbols import POS, NOUN, VERB, ADJ, PUNCT, PROPN
from .symbols import VerbForm_inf, VerbForm_none, Number_sing, Degree_pos
Expand Down Expand Up @@ -118,8 +119,8 @@ def lemmatize(string, index, exceptions, rules):
forms.append(form)
else:
oov_forms.append(form)
# Remove duplicates, and sort forms generated by rules alphabetically.
forms = list(set(forms))
# Remove duplicates but preserve the ordering of applied "rules"
forms = list(OrderedDict.fromkeys(forms))
# Put exceptions at the front of the list, so they get priority.
# This is a dodgy heuristic -- but it's the best we can do until we get
# frequencies on this. We can at least prune out problematic exceptions,
Expand Down

0 comments on commit 955b95c

Please sign in to comment.