Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/fix wikidata train entity linker #4509

Merged

Conversation

ZhuoruLin
Copy link
Contributor

Fix a NoneType iteration error in bin/wikidata_train_entity_linker.py

Description

The None default value for argument 'labels_discard' would cause a NoneType iteration error in the predict method of EntityLinker Pipeline. Passing in an empty list instead solved the problem.

Types of change

Bugfix

Checklist

  • [x ] I have submitted the spaCy Contributor Agreement.
  • [ x] I ran the tests, and all new and existing tests passed.
  • [ x] My changes don't require a change to the documentation, or if they do, I've added all required information.

@explosion-bot
Copy link
Collaborator

Hi @ZhuoruLin, thanks for your pull request! 👍 It looks like you haven't filled in the spaCy Contributor Agreement (SCA) yet. The agrement ensures that we can use your contribution across the project. Once you've filled in the template, put it in the .github/contributors directory and add it to this pull request. If your pull request targets a branch that's not master, for example develop, make sure to submit the Contributor Agreement to the master branch. Thanks a lot!

If you've already included the Contributor Agreement in your pull request above, you can ignore this message.

@svlandeg svlandeg added bug Bugs and behaviour differing from documentation feat / nel Feature: Named Entity linking labels Oct 22, 2019
@@ -77,6 +77,8 @@ def main(
if labels_discard:
labels_discard = [x.strip() for x in labels_discard.split(",")]
logger.info("Discarding {} NER types: {}".format(len(labels_discard), labels_discard))
else:
labels_discard = []

Copy link
Member

@svlandeg svlandeg Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix is definitely good to have here, thanks!

I'm actually surprised this throws an error. I'm assuming this happens because of this line in pipes.pyx ? I was assuming that self.cfg.get("labels_discard", []) would return [] if the key for labels_discard is passed with value None in cfg, but I guess that's not the case then?

Copy link
Contributor Author

@ZhuoruLin ZhuoruLin Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @svlandeg . I think that's due to the fact that at line 107 the key labels_discard got passed into config already. Therefore the self.cfg.get("labels_discard", []) would return the value for existing key rather than the default.

I am a big fan of SpaCy's NEL! Thank you for all the nice works!

Copy link
Member

@svlandeg svlandeg Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right :-) Could you perhaps also fix the code in pipes.pyx itself? In case that gets called from a different script than the wikidata_train_entity_linker.py one?

We could rewrite that line to two lines:

to_discard = self.cfg.get("labels_discard", [])
if to_discard and ent.label_ in to_discard: 

Which would make it work with both None as value as well as []. Or something such :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made the change :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome - I think this can be merged :-) Thanks again!

@ZhuoruLin ZhuoruLin force-pushed the bugfix/fix-wikidata-train-entity-linker branch from e757250 to c6305b0 Compare October 23, 2019 14:31
@ines ines merged commit 10d88b0 into explosion:master Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / nel Feature: Named Entity linking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants