Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_doc not storing POS properly #5048

Closed
svlandeg opened this issue Feb 21, 2020 · 1 comment · Fixed by #5049
Closed

get_doc not storing POS properly #5048

svlandeg opened this issue Feb 21, 2020 · 1 comment · Fixed by #5049
Labels
bug Bugs and behaviour differing from documentation feat / doc Feature: Doc, Span and Token objects tests New, missing or incorrect tests

Comments

@svlandeg
Copy link
Member

How to reproduce the behaviour


    words = ["This", "is", "a", "sentence"]
    pos_s = ["DET", "VERB", "DET", "NOUN"]
    spaces = [" ", " ", " ", ""]
    deps_s = ["dep", "adj", "nn", "atm"]
    tags_s = ["DT", "VBZ", "DT", "NN"]

    nlp = English()
    strings = nlp.vocab.strings

    for w in words:
        strings.add(w)
    deps = [strings.add(d) for d in deps_s]
    pos = [strings.add(p) for p in pos_s]
    tags = [strings.add(t) for t in tags_s]

    attrs = [POS, DEP, TAG]
    array = numpy.array(list(zip(pos, deps, tags)), dtype="uint64")

    doc = Doc(nlp.vocab, words=words, spaces=spaces)
    doc.from_array(attrs, array)
    print("1", [(token.text, token.pos_, token.tag_) for token in doc])

    doc2 = get_doc(nlp.vocab, words=words, pos=pos_s, deps=deps_s, tags=tags_s)
    print("2", [(token.text, token.pos_, token.tag_) for token in doc2])

output:

1 [('This', 'DET', 'DT'), ('is', 'VERB', 'VBZ'), ('a', 'DET', 'DT'), ('sentence', 'NOUN', 'NN')]
2 [('This', 'DET', 'DT'), ('is', 'AUX', 'VBZ'), ('a', 'DET', 'DT'), ('sentence', 'NOUN', 'NN')]

For the second token, the POS should be VERB, not AUX.

Will be submitting a PR soon to fix this!

@svlandeg svlandeg added bug Bugs and behaviour differing from documentation feat / doc Feature: Doc, Span and Token objects labels Feb 21, 2020
@svlandeg svlandeg mentioned this issue Feb 21, 2020
3 tasks
@svlandeg svlandeg added the tests New, missing or incorrect tests label Feb 21, 2020
@lock
Copy link

lock bot commented Apr 2, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Apr 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / doc Feature: Doc, Span and Token objects tests New, missing or incorrect tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant