-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion to IOB2 failing #3128
Comments
I ran the same command on spacy 2.0.18 and had no issue. The number of tokens could be checked and if the value is not expected, an error could be raised to say that the file is not formatted as expected. Here is the modified code that could be push into a PR if needed for the function read_iob: def read_iob(raw_sents):
sentences = []
for line in raw_sents:
if not line.strip():
continue
tokens = [re.split('[^\w\-]', line.strip())]
if len(tokens[0]) == 3:
words, pos, iob = zip(*tokens)
elif len(tokens[0]) == 2:
words, iob = zip(*tokens)
pos = ['-'] * len(words)
else:
raise Exception('The input file is not formatted as expected')
biluo = iob_to_biluo(iob)
sentences.append([
{'orth': w, 'tag': p, 'ner': ent}
for (w, p, ent) in zip(words, pos, biluo)
])
sentences = [{'tokens': sent} for sent in sentences]
paragraphs = [{'sentences': [sent]} for sent in sentences]
docs = [{'id': 0, 'paragraphs': [para]} for para in paragraphs]
return docs |
I ran an identical command on an identical input file and had no issue. If you figure out how your IOB file is different than the one given in #2970 please let us know! |
Correct.. |
* added contributor agreement * issue #3128 throw exception on bad IOB/2 formatting * Update spacy/cli/converters/iob2json.py with ValueError Co-Authored-By: gavrieltal <[email protected]>
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Hi,
following the closure of a previous request (#2970), for the conversion of the format IOB2, I was attempting a conversion with Spacy Version: 2.0.18 and Python 3.
The error message is ValueError: too many values to unpack (expected 2), below a detail.
The test is done with the same IOB2 file content of #2970
Alex|I-PER
is|O
going|O
to|O
Los|B-LOC
Angeles|I-LOC
Any help is appreciated.
MyMac-iMac:Spacy itsme$ python3 -m spacy convert my_test.iob2 . -c iob
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/spacy/main.py", line 31, in
plac.call(commands[command], sys.argv[1:])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/spacy/cli/convert.py", line 47, in convert
n_sents=n_sents, use_morphology=morphology)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/spacy/cli/converters/iob2json.py", line 16, in iob2json
sentences = read_iob(file)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/spacy/cli/converters/iob2json.py", line 35, in read_iob
words, iob = zip(*tokens)
ValueError: too many values to unpack (expected 2)
The text was updated successfully, but these errors were encountered: