Variable results for textcat on GPU (nightly) #6416

svlandeg · 2020-11-20T09:51:25Z

Issue #6373 pointed out a reproducibility issue on v2 with the ensemble textcat architecture. This was due to the ParametricAttention layer of the CNN model.

It looks like v3 has a similar reproducibility issue, but for a different reason. When I disable the ParametricAttention layer, I still get the reproducibility problem. This happens with the default ensemble and the CNN configs, but not with the BOW.

Script to reproduce:

from thinc.api import require_gpu, Config
import spacy
from spacy.training import Example
from spacy.lang.en import English
from spacy.pipeline.textcat import default_model_config


if __name__ == "__main__":
    require_gpu()

    for i in range(5):
        spacy.util.fix_random_seed(0)

        # Toy data
        text = "Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g."
        annots = {"cats": {"Labe1": 1.0, "Label2": 0.0, "Label3": 0.0}}

        # Set up component pipe
        nlp = English()
        pipe_cfg = Config().from_str(default_model_config)  # bow_model_config, cnn_model_config
        textcat = nlp.add_pipe("textcat", config=pipe_cfg)
        for label in annots["cats"]:
            textcat.add_label(label)

        # Training
        optimizer = nlp.begin_training()
        doc = nlp.make_doc(text)
        for _ in range(30):
            nlp.update([Example.from_dict(doc, annots)])

        # Run one document through textcat NN for scoring
        print(f"{i} result: {textcat.model.predict([doc])}")

Example output:

0 result: [[9.9999356e-01 4.1301223e-07 3.6208020e-04]]
1 result: [[9.9999356e-01 4.1301223e-07 3.6208055e-04]]
2 result: [[9.9999356e-01 4.1301223e-07 3.6208020e-04]]
3 result: [[9.9999356e-01 4.1301223e-07 3.6208075e-04]]
4 result: [[9.9999356e-01 4.1301223e-07 3.6208055e-04]]

Your Environment

spaCy version: 3.0.0rc2
Platform: Windows 10
Python version: 3.6.8

The text was updated successfully, but these errors were encountered:

svlandeg · 2020-12-28T19:14:53Z

Pretty puzzeled by this one. I've established that the reproducibility issue doesn't happen when we set exclusive_classes = True, which seems to point to a problem with the Logistic layer. But I can't for the life of me understand how the Logistic layer could be causing this. Perhaps some rounding errors from earlier layers get amplified? But even then, rounding errors should be deterministic, no? Unless there is some parallellization somewhere that is mixing up arithmetic order?

polm · 2021-11-17T05:22:17Z

This looks like it's the same as #6490, so I'll close it in favor of that issue.

github-actions · 2021-12-18T00:01:42Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

svlandeg added bug Bugs and behaviour differing from documentation feat / textcat Feature: Text Classifier gpu Using spaCy on GPU 🌙 nightly Discussion and contributions related to nightly builds labels Nov 20, 2020

svlandeg changed the title ~~Variable results for textcat on GPU~~ Variable results for textcat on GPU (nightly) Nov 20, 2020

polm closed this as completed Nov 17, 2021

github-actions bot locked as resolved and limited conversation to collaborators Dec 18, 2021

polm added the reproducibility Consistency, reproducibility, determinism, and randomness label Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable results for textcat on GPU (nightly) #6416

Variable results for textcat on GPU (nightly) #6416

svlandeg commented Nov 20, 2020 •

edited

Loading

svlandeg commented Dec 28, 2020 •

edited

Loading

polm commented Nov 17, 2021

github-actions bot commented Dec 18, 2021

Variable results for textcat on GPU (nightly) #6416

Variable results for textcat on GPU (nightly) #6416

Comments

svlandeg commented Nov 20, 2020 • edited Loading

Your Environment

svlandeg commented Dec 28, 2020 • edited Loading

polm commented Nov 17, 2021

github-actions bot commented Dec 18, 2021

svlandeg commented Nov 20, 2020 •

edited

Loading

svlandeg commented Dec 28, 2020 •

edited

Loading