Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count_by IS_ALPHA not giving expected result. #3869

Closed
PhilippeMarcotte opened this issue Jun 20, 2019 · 4 comments
Closed

Count_by IS_ALPHA not giving expected result. #3869

PhilippeMarcotte opened this issue Jun 20, 2019 · 4 comments
Assignees
Labels
bug Bugs and behaviour differing from documentation enhancement Feature requests and improvements feat / doc Feature: Doc, Span and Token objects

Comments

@PhilippeMarcotte
Copy link

How to reproduce the behaviour

import spacy
from spacy.attrs import IS_ALPHA

nlp = spacy.load("en_core_web_sm")
sentence = 'The next month Barry Siddall joined Stoke City on a free transfer, after Chris Pearce had established himself as the Vale\'s #1.'
doc = nlp(sentence)
print(doc.count_by(IS_ALPHA))
{}

count = 0
for token in doc:
    count += token.is_alpha
print(count)
21

It is bugged only for some sentences.
Other bugged examples:
'Indeed, making the one who remains do all the work has installed him into a position of such insolent tyranny, it will take a month at least to reduce him to his proper proportions.'
'It was a missed assignment, but it shouldn't have resulted in a turnover ...'

Your Environment

  • spaCy version: 2.1.4
  • Platform: Darwin-18.5.0-x86_64-i386-64bit
  • Python version: 3.7.3
  • Models: en_core_web_lg, en_core_web_sm
@ines ines added enhancement Feature requests and improvements feat / doc Feature: Doc, Span and Token objects labels Jun 20, 2019
@ines
Copy link
Member

ines commented Jun 20, 2019

The problem here is that the count_by method currently expects token attributes like POS or ORTH, not boolean flags of the lexeme like IS_ALPHA or LIKE_NUM.

I'm not sure what the analogous expected output should be for boolean flags, but it could probably return a dict keyed by True and False (or 1 and 0).

Edit: Turns out it should have worked but just... didn't.

@PhilippeMarcotte
Copy link
Author

Thank you for your answer.
This is good to know and makes the fact that it works in some cases even stranger.

Example:

import spacy
from spacy.attrs import IS_ALPHA

nlp = spacy.load("en_core_web_sm")
sentence = 'The story was to the effect that a young American student recently called on Professor Christlieb with a letter of introduction.'
doc = nlp(sentence)
print(doc.count_by(IS_ALPHA))
{1: 21}

@ines ines added the bug Bugs and behaviour differing from documentation label Jun 20, 2019
@ines
Copy link
Member

ines commented Jun 20, 2019

Okay, yes, so this is definitely strange and inconsistent either way!

@svlandeg svlandeg self-assigned this Jul 10, 2019
svlandeg added a commit to svlandeg/spaCy that referenced this issue Jul 10, 2019
@ines ines closed this as completed Jul 11, 2019
@lock
Copy link

lock bot commented Aug 10, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation enhancement Feature requests and improvements feat / doc Feature: Doc, Span and Token objects
Projects
None yet
Development

No branches or pull requests

3 participants