Bugfix/fix entity ruler from disk #4670
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Solve a bug in the EntityRuler from_disk method
Description
issue4651
The PhraseMatcher isn't initialized correctly in the EntityRuler from_disk method when the phrase_matcher_attr attribute is specified.
This is because the from_disk function is used before the recovery of the phrase_matcher_attr attribute.
A way to solve this bug is to split the deserializer in a 2 keys dict, the patterns and the cfg. By this way we can first deserialize the cfg and initialize the PhraseMatcher with it, and secondly use the patterns part in the from_disk function, in order to add the patterns the PhraseMatcher that have been correctly initialized.
My environnement:
spaCy version: 2.2.2
Platform: Linux-4.9.0-11-amd64-x86_64-with-debian-9.11
Python version: 3.7.3
Tests run
test_issue4651_with_phrase_matcher_attr --> test that failed without modification of the EntityRuler from_disk method and work after.
test_issue4651_without_phrase_matcher_attr --> test that work all the time.
Types of change
bug fix
Checklist