Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow English stopwords with any type of apostrophe #3530

Merged
merged 6 commits into from
Apr 3, 2019

Conversation

svlandeg
Copy link
Member

@svlandeg svlandeg commented Apr 2, 2019

Fixing Issue #3521: Contraction n’t not tagged as stop word.

Description

  • I removed the English stop words with a hyphen from the generic stop list and added a small loop that for each such hyphenated stopword, adds all variants with all potential hyphens.
  • I added a relevant unit test that succeeds after implementing above fix
  • I added a failing unit test for Issue Tokenization regression #3449 (an issue I looked into before but can't be fixed generically)

Types of change

Small fix

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

Copy link
Member

@ines ines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍 All changes I suggested are minor things.

spacy/lang/en/stop_words.py Outdated Show resolved Hide resolved
spacy/lang/en/stop_words.py Outdated Show resolved Hide resolved
spacy/tests/regression/test_issue3449.py Show resolved Hide resolved
spacy/tests/regression/test_issue3521.py Show resolved Hide resolved
spacy/tests/regression/test_issue3521.py Outdated Show resolved Hide resolved
@ines ines added enhancement Feature requests and improvements lang / en English language data and models labels Apr 3, 2019
@ines ines changed the title Allow English stopwords with any type of hyphen Allow English stopwords with any type of apostrophe Apr 3, 2019
@ines ines merged commit 4faf62d into explosion:master Apr 3, 2019
@svlandeg svlandeg deleted the fix/issue_3521 branch April 3, 2019 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements lang / en English language data and models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants