Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should "/" not be a separate token and cause a split? #891

Closed
kootenpv opened this issue Mar 18, 2017 · 3 comments
Closed

Should "/" not be a separate token and cause a split? #891

kootenpv opened this issue Mar 18, 2017 · 3 comments

Comments

@kootenpv
Copy link
Contributor

kootenpv commented Mar 18, 2017

In [32]: [x for x in nlp("I want/need something")]
Out[32]: [I, want/need, something]

Perhaps you're thinking of more common cases where it shouldn't be split? I'd almost think you should make "/" to be a conj.

ines added a commit that referenced this issue Mar 19, 2017
@ines
Copy link
Member

ines commented Mar 19, 2017

Thanks – and that's weird, I totally would have thought that this was already split correctly. Just added a regression test and played around with this quickly – it might be as simply as just adding the / character here.

But we need to make sure to test this properly and make sure it doesn't break any of the tokenizers that rely on the default punctuation.

@ines ines closed this as completed in bf0f15e Apr 7, 2017
@ines
Copy link
Member

ines commented Apr 7, 2017

All worked as expected and the fix will be included in the next point release. The likely reason why / was excluded in the past was because we didn't have a good way of protecting URLs – since we introduced token_match and regex patterns for URLs, this is no problem anymore and / can be included in the infixes.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants