Distinction between outside, missing and blocked NER annotations #4307
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR attempts to process "empty" NER annotations more consistently.
O
(ent_iob == 2
) annotationsdoc.ents
with aSpan
of tokens with emptyent_type
.ent_iob
is then set to3
. In the transition system, these are recognized asU-
actions, i.e.UNIT
actions without alabel
.doc.ents = list(doc.ents)
now actually keeps the annotations on the token level consistent, instead of resettingO
to empty string. It does this by checking previous annotations for each token: if it wasnered
before, we put it atO
, otherwise empty string. This seems to be the most intuitive behaviour for a user inspecting the token-level data.Tests
test_ner.py
and for Issue 4267.test_doc_add_entities_set_ents_iob
was in the repo twice so I removed one, and changed the other to haveO
annotations.Open questions
nn_parser.move_names
now containsU-
. For now I removed it explicitely frommove_names
, but we could also adjust the unit tests. Depends on whether or not we want to keep that action internal.Caveat
For the "blocking" functionality to work with the statistical models, they'll have to be retrained.
Types of change
Enhancement
Checklist