Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Displacy no support for utf-8 #4138

Closed
avramandrei opened this issue Aug 17, 2019 · 2 comments
Closed

Displacy no support for utf-8 #4138

avramandrei opened this issue Aug 17, 2019 · 2 comments
Labels
bug Bugs and behaviour differing from documentation feat / cli Feature: Command-line interface feat / visualizers Feature: Built-in displaCy and other visualizers more-info-needed This issue needs more information

Comments

@avramandrei
Copy link

I get the following error when I try to use displacy to evaluate a Romanian corpus with entities.

Traceback (most recent call last):
  File "C:\Users\avramus\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\avramus\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\avramus\AppData\Local\Programs\Python\Python36\lib\site-packages\spacy\__main__.py", line 35, in <module>
    plac.call(commands[command], sys.argv[1:])
  File "C:\Users\avramus\AppData\Local\Programs\Python\Python36\lib\site-packages\plac_core.py", line 328, in call
    cmd, result = parser.consume(arglist)
  File "C:\Users\avramus\AppData\Local\Programs\Python\Python36\lib\site-packages\plac_core.py", line 207, in consume
    return cmd, self.func(*(args + varargs + extraopts), **kwargs)
  File "C:\Users\avramus\AppData\Local\Programs\Python\Python36\lib\site-packages\spacy\cli\evaluate.py", line 77, in evaluate
    ents=render_ents,
  File "C:\Users\avramus\AppData\Local\Programs\Python\Python36\lib\site-packages\spacy\cli\evaluate.py", line 89, in render_parses
    file_.write(html)
  File "C:\Users\avramus\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0103' in position 469: character maps to <undefined>

It looks like you don't open the html files for the entities and parses in UTF-8. I modified the code to open them in UTF-8 and it worked.

@ines
Copy link
Member

ines commented Aug 18, 2019

Just to confirm: What code / command did you run here? It looks like the error actually occurred in cli.evaluate, not in displacy directly? And that's also where the file is opened incorrectly?

Edit: Looks like the problem is in cli.evaluate, so I'm just fixing that.

@ines ines added feat / cli Feature: Command-line interface feat / visualizers Feature: Built-in displaCy and other visualizers more-info-needed This issue needs more information bug Bugs and behaviour differing from documentation labels Aug 18, 2019
@ines ines closed this as completed in 89f2b87 Aug 18, 2019
@lock
Copy link

lock bot commented Sep 17, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 17, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / cli Feature: Command-line interface feat / visualizers Feature: Built-in displaCy and other visualizers more-info-needed This issue needs more information
Projects
None yet
Development

No branches or pull requests

2 participants