get_doc not storing POS properly #5048

svlandeg · 2020-02-21T20:53:25Z

How to reproduce the behaviour


    words = ["This", "is", "a", "sentence"]
    pos_s = ["DET", "VERB", "DET", "NOUN"]
    spaces = [" ", " ", " ", ""]
    deps_s = ["dep", "adj", "nn", "atm"]
    tags_s = ["DT", "VBZ", "DT", "NN"]

    nlp = English()
    strings = nlp.vocab.strings

    for w in words:
        strings.add(w)
    deps = [strings.add(d) for d in deps_s]
    pos = [strings.add(p) for p in pos_s]
    tags = [strings.add(t) for t in tags_s]

    attrs = [POS, DEP, TAG]
    array = numpy.array(list(zip(pos, deps, tags)), dtype="uint64")

    doc = Doc(nlp.vocab, words=words, spaces=spaces)
    doc.from_array(attrs, array)
    print("1", [(token.text, token.pos_, token.tag_) for token in doc])

    doc2 = get_doc(nlp.vocab, words=words, pos=pos_s, deps=deps_s, tags=tags_s)
    print("2", [(token.text, token.pos_, token.tag_) for token in doc2])

output:

1 [('This', 'DET', 'DT'), ('is', 'VERB', 'VBZ'), ('a', 'DET', 'DT'), ('sentence', 'NOUN', 'NN')]
2 [('This', 'DET', 'DT'), ('is', 'AUX', 'VBZ'), ('a', 'DET', 'DT'), ('sentence', 'NOUN', 'NN')]

For the second token, the POS should be VERB, not AUX.

Will be submitting a PR soon to fix this!

The text was updated successfully, but these errors were encountered:

lock · 2020-04-02T14:49:11Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

svlandeg added bug Bugs and behaviour differing from documentation feat / doc Feature: Doc, Span and Token objects labels Feb 21, 2020

svlandeg mentioned this issue Feb 21, 2020

Bugfix/get doc #5049

Merged

3 tasks

svlandeg added the tests New, missing or incorrect tests label Feb 21, 2020

honnibal closed this as completed in #5049 Mar 2, 2020

lock bot locked as resolved and limited conversation to collaborators Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_doc not storing POS properly #5048

get_doc not storing POS properly #5048

svlandeg commented Feb 21, 2020

lock bot commented Apr 2, 2020

get_doc not storing POS properly #5048

get_doc not storing POS properly #5048

Comments

svlandeg commented Feb 21, 2020

How to reproduce the behaviour

lock bot commented Apr 2, 2020