vocab["-0.23"].like_num is False #2782

phdowling · 2018-09-20T15:23:29Z

Pretty straight forward, it seems like negative numbers are currently not flagged as numbers in the Lexeme object.

sakatipomu · 2018-09-22T18:31:54Z

Yes. I am also facing the same issue but with token objects. Negative numbers are tagged as PUNCT Did you find any work around for this?

text = "My readings are -4.72, 4.72, -3.45 and 4.05"
text_nlp = nlp(text)
for token in text_nlp:
    print token.i,token.text,token.pos_,token.tag_

0 My ADJ PRP$
1 readings NOUN NNS
2 are VERB VBP
3 -4.72 PUNCT :
4 , PUNCT ,
5 4.72 NUM CD
6 , PUNCT ,
7 -3.45 PUNCT :
8 and CCONJ CC
9 4.05 NUM CD

DuyguA · 2018-09-23T18:39:08Z

I checked is_num feature for you:

https:/explosion/spaCy/blob/master/spacy/lang/en/lex_attrs.py

As far as I see, minus sign in front is not parsed. @ines what do you say?

phdowling · 2018-09-23T19:22:54Z

FYI, a leading "+" also is not parsed. Maybe both should just be included?

DuyguA · 2018-09-23T19:25:23Z

Sure, I meant no "sign bit" is included in parsing 😉 We can skip the initial sign character during the parse.

sakatipomu · 2018-09-24T06:46:18Z

I already tried like_num attribute in token class, but it is not working. Currently I am doing this

is_num=""
try:
    float(token.text)
    is_num = True
except ValueError:
    is_num = False

ines · 2018-09-27T13:34:27Z

This would be good to handle, yes. I'd propose the following:

def like_num(text):
    if text.startswith('+') or text.startswith('-'):
        text = text[1:]
    # rest of the function

DuyguA · 2018-09-27T13:55:00Z

Btw @ines , startswith can take a tuple as argument when I first found about it, I instantly fall in love ❤️ So this is possible:

def like_num(text):
  if text.startswith(('+', '-')):
    text = text[1:]
 #rest of the func

ines · 2018-09-27T13:56:46Z

Ahh, nice! That's even better. I'm just writing some tests for this for all languages that implement like_num, so we can test that it works as expected.

phdowling · 2018-09-27T14:10:33Z

What about possibly also handling "~5", etc? Also, "±1" ?I know we're getting into the grey area here, and this isn't very high priority, but those kind of cases might also be interesting. Of course at some point, it's probably up to the user to just handle this or use a regex for the use case. + and - are definitely a good improvement in the general case though.

DuyguA · 2018-09-27T14:29:48Z

From my side, why not... additions can go in a similar fashion. Then most probably we'd like to do:

def like_num(text):
  polarity_signs = ('-', '+', '~', '±') #More can come here
  if text.startswith(polarity_signs):
    text = text[1:]
 #rest of the func

ines · 2018-09-27T14:37:54Z

Sure! Just tested it and it seems to work as expected – it was only a case of adding more characters to the startswith check.

I also noticed that the tokenizer was currently always splitting the + prefix. This was a problem, because it meant that a token +123 could never exist. I changed it to only split if the next character is not 0-9. All existing tests pass, so I hope there aren't any unintended side-effects of this change.

lock · 2018-10-31T12:36:55Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added enhancement Feature requests and improvements lang / all Global language data labels Sep 24, 2018

ines mentioned this issue Sep 27, 2018

💫 Make like_num work for prefixed numbers #2808

Merged

3 tasks

ines closed this as completed Oct 1, 2018

lock bot locked as resolved and limited conversation to collaborators Oct 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vocab["-0.23"].like_num is False #2782

vocab["-0.23"].like_num is False #2782

phdowling commented Sep 20, 2018

sakatipomu commented Sep 22, 2018 •

edited

Loading

DuyguA commented Sep 23, 2018

phdowling commented Sep 23, 2018

DuyguA commented Sep 23, 2018

sakatipomu commented Sep 24, 2018

ines commented Sep 27, 2018

DuyguA commented Sep 27, 2018 •

edited

Loading

ines commented Sep 27, 2018

phdowling commented Sep 27, 2018

DuyguA commented Sep 27, 2018 •

edited

Loading

ines commented Sep 27, 2018 •

edited

Loading

lock bot commented Oct 31, 2018

vocab["-0.23"].like_num is False #2782

vocab["-0.23"].like_num is False #2782

Comments

phdowling commented Sep 20, 2018

sakatipomu commented Sep 22, 2018 • edited Loading

DuyguA commented Sep 23, 2018

phdowling commented Sep 23, 2018

DuyguA commented Sep 23, 2018

sakatipomu commented Sep 24, 2018

ines commented Sep 27, 2018

DuyguA commented Sep 27, 2018 • edited Loading

ines commented Sep 27, 2018

phdowling commented Sep 27, 2018

DuyguA commented Sep 27, 2018 • edited Loading

ines commented Sep 27, 2018 • edited Loading

lock bot commented Oct 31, 2018

sakatipomu commented Sep 22, 2018 •

edited

Loading

DuyguA commented Sep 27, 2018 •

edited

Loading

DuyguA commented Sep 27, 2018 •

edited

Loading

ines commented Sep 27, 2018 •

edited

Loading