Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ValueError exception on empty Korean text. #4245

Merged
merged 1 commit into from
Sep 6, 2019

Conversation

b1uec0in
Copy link
Contributor

@b1uec0in b1uec0in commented Sep 6, 2019

Description

Fixed check_spaces method to return a generator of length 0 instead of 1 on empty string.

This fixes the following error issue.

File ".../spacy/lang/ko/init.py", line 72, in __call__
doc = Doc(self.vocab, words=surfaces, spaces=list(check_spaces(text, surfaces)))
File "doc.pyx", line 209, in spacy.tokens.doc.Doc.__init__
ValueError: [E027] Arguments 'words' and 'spaces' should be sequences of the same length, or 'spaces' should be left default at None. spaces should be a sequence of booleans, with True meaning that the word owns a ' ' character following it.

Types of change

Bug fix

Checklist

  • I have submitted the spaCy Contributor Agreement.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@svlandeg svlandeg added lang / ko Korean language data and models bug Bugs and behaviour differing from documentation feat / tokenizer Feature: Tokenizer labels Sep 6, 2019
@ines
Copy link
Member

ines commented Sep 6, 2019

Thanks a lot! 👍

@ines ines merged commit a55f5a7 into explosion:master Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / tokenizer Feature: Tokenizer lang / ko Korean language data and models
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants