Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search.Beam: Fix invalid indexing #633

Merged
merged 1 commit into from
Apr 26, 2022

Conversation

danieldk
Copy link
Contributor

@danieldk danieldk commented Apr 12, 2022

If none of the states of the beam have valid moves, it was attempted to index into the vector of scored valid moves anyway, which results in invalid memory access.

Fix this by first checking that there valid moves before attempting to index into them.

@honnibal: I am not sure if this is the valid way to address this issue, but this is the source of access violations on Windows in explosion/spaCy#10633.

If none of the states of the beam have valid moves, it was attempted to
index into the vector of scored valid moves anyway, which results in
invalid memory access.

Fix this by first checking that there valid moves before attempting to
index into them.
@adrianeboyd
Copy link
Contributor

It is weird that the CI didn't run, I'll close and reopen to see what happens.

@adrianeboyd adrianeboyd reopened this Apr 12, 2022
@svlandeg svlandeg added bug Bugs and behaviour differing from documentation feat / beam Beam search labels Apr 15, 2022
Copy link
Member

@svlandeg svlandeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. If Matt agrees, we should probably do a release of Thinc to get the bug fixed in spaCy?

@honnibal
Copy link
Member

This is a fine guard but it should be impossible to have no valid continuations to a state. So is there a deeper problem here?

@danieldk
Copy link
Contributor Author

danieldk commented Apr 21, 2022

This is a fine guard but it should be impossible to have no valid continuations to a state. So is there a deeper problem here?

It is caused by this test:

https:/explosion/spaCy/blob/29afbdb91e5fecf513125a85f1ac1d165f40bc93/spacy/tests/parser/test_ner.py#L183

I guess there are no continuations because the pipe is not trained, so all labels that were added are in unseen_classes and can not be predicted?

@svlandeg
Copy link
Member

So should we somehow try to catch this earlier and provide a more meaningful user warning/error?

@danieldk
Copy link
Contributor Author

danieldk commented Apr 22, 2022

So should we somehow try to catch this earlier and provide a more meaningful user warning/error?

Let me try and see if I can add a warning to spaCy. Edit: I guess spaCy is the best place, because in Thinc this is a utility class and there may be applications where this is an acceptable outcome?

There still seems to be a deeper issue, since this doesn't happen in the parser in spaCy master.

@danieldk
Copy link
Contributor Author

Found the issue: the Parser.beam_parse/greedy_parse methods in my branch didn't ensure that labels were added. I have fixed this in the PR that adds back beam parsing (explosion/spaCy#10633). It should now (hopefully) also pass on Windows without this PR.

@svlandeg
Copy link
Member

Great! We should still merge this as well though :-)

@svlandeg svlandeg merged commit 6c3bd1b into explosion:master Apr 26, 2022
shadeMe pushed a commit to shadeMe/thinc that referenced this pull request May 2, 2022
If none of the states of the beam have valid moves, it was attempted to
index into the vector of scored valid moves anyway, which results in
invalid memory access.

Fix this by first checking that there valid moves before attempting to
index into them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / beam Beam search
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants