Reproducibility for TextCat and Tok2Vec #6218

svlandeg · 2020-10-07T18:15:15Z

Description

Training results could differ between runs, even with a fixed random seed, as the HashEmbed layer could be getting a different seed/key. Fixing them resolves this issue.

This is a backport of #5735 and intended as quick bugfix for the next 2.x release.

Types of change

bug fix

Checklist

I have submitted the spaCy Contributor Agreement.
I ran the tests, and all new and existing tests passed.
My changes don't require a change to the documentation, or if they do, I've added all required information.

ensure fixed seed in HashEmbed layers

72eb8c4

svlandeg mentioned this pull request Oct 7, 2020

textcat model weights are not deterministic even with random.seed #6177

Closed

svlandeg added bug Bugs and behaviour differing from documentation feat / training Feature: Training utils, Example, Corpus and converters training Training and updating models and removed feat / training Feature: Training utils, Example, Corpus and converters labels Oct 7, 2020

forgot about the joys of python 2

90c06b6

honnibal merged commit 2998131 into explosion:master Oct 7, 2020

svlandeg deleted the bugfix/6177 branch October 8, 2020 06:57

svlandeg mentioned this pull request Nov 19, 2020

Bugfix textcat reproducibility on GPU #6411

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility for TextCat and Tok2Vec #6218

Reproducibility for TextCat and Tok2Vec #6218

svlandeg commented Oct 7, 2020 •

edited

Loading

Reproducibility for TextCat and Tok2Vec #6218

Reproducibility for TextCat and Tok2Vec #6218

Conversation

svlandeg commented Oct 7, 2020 • edited Loading

Description

Types of change

Checklist

svlandeg commented Oct 7, 2020 •

edited

Loading