Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion to IOB2 failing #3128

Closed
IzzyHibbert opened this issue Jan 8, 2019 · 4 comments
Closed

Conversion to IOB2 failing #3128

IzzyHibbert opened this issue Jan 8, 2019 · 4 comments
Labels
feat / cli Feature: Command-line interface usage General spaCy usage

Comments

@IzzyHibbert
Copy link

Hi,

following the closure of a previous request (#2970), for the conversion of the format IOB2, I was attempting a conversion with Spacy Version: 2.0.18 and Python 3.
The error message is ValueError: too many values to unpack (expected 2), below a detail.

The test is done with the same IOB2 file content of #2970
Alex|I-PER
is|O
going|O
to|O
Los|B-LOC
Angeles|I-LOC

Any help is appreciated.

MyMac-iMac:Spacy itsme$ python3 -m spacy convert my_test.iob2 . -c iob
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/spacy/main.py", line 31, in
plac.call(commands[command], sys.argv[1:])
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/spacy/cli/convert.py", line 47, in convert
n_sents=n_sents, use_morphology=morphology)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/spacy/cli/converters/iob2json.py", line 16, in iob2json
sentences = read_iob(file
)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/spacy/cli/converters/iob2json.py", line 35, in read_iob
words, iob = zip(*tokens)
ValueError: too many values to unpack (expected 2)

@mauryaland
Copy link
Contributor

I ran the same command on spacy 2.0.18 and had no issue.
Perhaps the problem comes from your .iob2 file because the error is "too many values to unpack (expected 2)". According to how the function is coded, I guess the lines are not split from your file as they should be and then the regex gives more than two tokens.

The number of tokens could be checked and if the value is not expected, an error could be raised to say that the file is not formatted as expected.

Here is the modified code that could be push into a PR if needed for the function read_iob:

def read_iob(raw_sents):
    sentences = []
    for line in raw_sents:
        if not line.strip():
            continue
        tokens = [re.split('[^\w\-]', line.strip())]
        if len(tokens[0]) == 3:
            words, pos, iob = zip(*tokens)
        elif len(tokens[0]) == 2:
            words, iob = zip(*tokens)
            pos = ['-'] * len(words)
        else:
            raise Exception('The input file is not formatted as expected')
        biluo = iob_to_biluo(iob)
        sentences.append([
            {'orth': w, 'tag': p, 'ner': ent}
            for (w, p, ent) in zip(words, pos, biluo)
        ])
    sentences = [{'tokens': sent} for sent in sentences]
    paragraphs = [{'sentences': [sent]} for sent in sentences]
    docs = [{'id': 0, 'paragraphs': [para]} for para in paragraphs]
    return docs

@gavrieltal
Copy link
Contributor

I ran an identical command on an identical input file and had no issue. If you figure out how your IOB file is different than the one given in #2970 please let us know!

@ines ines added usage General spaCy usage feat / cli Feature: Command-line interface more-info-needed This issue needs more information labels Jan 14, 2019
@IzzyHibbert
Copy link
Author

Correct..
unfortunately I missed up the file extension and content.
Ticket closed.
Thanks guys!

@no-response no-response bot removed the more-info-needed This issue needs more information label Jan 15, 2019
ines pushed a commit that referenced this issue Jan 16, 2019
* added contributor agreement

* issue #3128 throw exception on bad IOB/2 formatting

* Update spacy/cli/converters/iob2json.py with ValueError

Co-Authored-By: gavrieltal <[email protected]>
@lock
Copy link

lock bot commented Feb 14, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Feb 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feat / cli Feature: Command-line interface usage General spaCy usage
Projects
None yet
Development

No branches or pull requests

4 participants