Pretrain T2V - Width of CNN layers. #3979

agombert · 2019-07-17T11:22:04Z

Hello,

I tried to pretrain a model with the CNN architecture, but I would like to change the width of the CNN layer to get bigger vectors at the end (128 instead of 96).

And so I get an error about broadcast ValueError: could not broadcast input array from shape (128) into shape (96). Which looks like to come from the pretraining change of the CNN parameters.

How to reproduce the behaviour

I followed those steps:

1st step - W2V init

I trained on the same text corpus a W2V model, I wanted to use this W2V model as inputs to learn from it. /w2v_vectors.txt.gz came from gensim modeling.

python -m spacy init-model es /path/to/my/W2V/ --vectors-loc /path/to/my/w2v_vectors.txt.gz

2nd step - model from W2V

I used the train doc to train my new model without any problem

python -m spacy train es /path/to/my/model_with_w2v/  es_ancora-ud-train.json es_ancora-ud-dev.json --vectors /path/to/my/W2V/

3rd step - pretrain

I pretrained the model, as explained in the doc with the following command:

python -m spacy pretrain /path/to/my/texts.jsonl /path/to/W2V/model /path/to/my/t2v/ -i 50 -cw 128

4th step - train

And well after I got my pretrain processed, I try to train from the new token2vec:

python -m spacy train es /path/to/my/model_with_t2v/  es_ancora-ud-train.json es_ancora-ud-dev.json -t2v /path/to/my/t2v/model49.bin

And well I get this error:

Traceback (most recent call last):
   File "/home/jovyan/environments/word_emb/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec)
   File "/home/jovyan/environments/word_emb/lib/python3.6/runpy.py", line 85, in _run_code exec( code, run_globals)
   File "/home/jovyan/environments/word_emb/lib/python3.6/site-packages/spacy/__main__.py", line 35, in <module> plac.call(commands[command], sys.argv[1:])
   File "/home/jovyan/environments/word_emb/lib/python3.6/site-packages/plac_core.py", line 328, in call cmd, result = parser;consume(arglist)
   File "/home/jovyan/environments/word_emb/lib/python3.6/site-packages/plac_core.py", line 207, in consume return cmd, self.func(*(args + varargs + extraopts), **kwargs)
   File "/home/jovyan/environments/word_emb/lib/python3.6/spacy/cli/train.py", line 219, in train components = _load_pretrained_tok2vec(nlp, init_tok2vec)
   File "/home/jovyan/environments/word_emb/lib/python3.6/spacy/cli/train.py", line 417, in _load_pretrained_tok2vec component.tok2vec/from_bytes(weights_data)
   File "/home/jovyan/environments/word_emb/lib/python3.6/think/neural/_classes/model.py", line 372, in from_bytes copy_array(dest, param[b"value"])
   File "/home/jovyan/environments/word_emb/lib/python3.6/think/neural/util.py", line 124, in  copy_array(dest, param[b"value"])
ValueError: could not broadcast input array from shape (128) into shape (96)

Other information about the bug:

When I use it without the -cw 128 everything works well.

Moreover, I can perform the training if I put an alias in the training such as token_vector_width=128 alias. And when I'm doing so, it looks like it's training ok but I get this error when trying to load the new t2v model:

ValueError                                Traceback (most recent call last)
<ipython-input-23-b9d787f3e370> in <module>
----> 1 nlp = spacy.load('/home/jovyan/words-representation/data/external/20190717_es_iomed_128_alias/model0')
      2 nlp1 = spacy.load('es_core_news_md')

~/environments/word_emb/lib/python3.6/site-packages/spacy/__init__.py in load(name, **overrides)
     25     if depr_path not in (True, False, None):
     26         deprecation_warning(Warnings.W001.format(path=depr_path))
---> 27     return util.load_model(name, **overrides)
     28 
     29 

~/environments/word_emb/lib/python3.6/site-packages/spacy/util.py in load_model(name, **overrides)
    131             return load_model_from_package(name, **overrides)
    132         if Path(name).exists():  # path to model data directory
--> 133             return load_model_from_path(Path(name), **overrides)
    134     elif hasattr(name, "exists"):  # Path or Path-like to model data
    135         return load_model_from_path(name, **overrides)

~/environments/word_emb/lib/python3.6/site-packages/spacy/util.py in load_model_from_path(model_path, meta, **overrides)
    171             component = nlp.create_pipe(name, config=config)
    172             nlp.add_pipe(component, name=name)
--> 173     return nlp.from_disk(model_path)
    174 
    175 

~/environments/word_emb/lib/python3.6/site-packages/spacy/language.py in from_disk(self, path, exclude, disable)
    789             # Convert to list here in case exclude is (default) tuple
    790             exclude = list(exclude) + ["vocab"]
--> 791         util.from_disk(path, deserializers, exclude)
    792         self._path = path
    793         return self

~/environments/word_emb/lib/python3.6/site-packages/spacy/util.py in from_disk(path, readers, exclude)
    628         # Split to support file names like meta.json
    629         if key.split(".")[0] not in exclude:
--> 630             reader(path / key)
    631     return path
    632 

~/environments/word_emb/lib/python3.6/site-packages/spacy/language.py in <lambda>(p, proc)
    785             if not hasattr(proc, "from_disk"):
    786                 continue
--> 787             deserializers[name] = lambda p, proc=proc: proc.from_disk(p, exclude=["vocab"])
    788         if not (path / "vocab").exists() and "vocab" not in exclude:
    789             # Convert to list here in case exclude is (default) tuple

pipes.pyx in spacy.pipeline.pipes.Tagger.from_disk()

~/environments/word_emb/lib/python3.6/site-packages/spacy/util.py in from_disk(path, readers, exclude)
    628         # Split to support file names like meta.json
    629         if key.split(".")[0] not in exclude:
--> 630             reader(path / key)
    631     return path
    632 

pipes.pyx in spacy.pipeline.pipes.Tagger.from_disk.load_model()

pipes.pyx in spacy.pipeline.pipes.Tagger.from_disk.load_model()

~/environments/word_emb/lib/python3.6/site-packages/thinc/neural/_classes/model.py in from_bytes(self, bytes_data)
    370                         name = name.decode("utf8")
    371                     dest = getattr(layer, name)
--> 372                     copy_array(dest, param[b"value"])
    373                 i += 1
    374             if hasattr(layer, "_layers"):

~/environments/word_emb/lib/python3.6/site-packages/thinc/neural/util.py in copy_array(dst, src, casting, where)
    122 def copy_array(dst, src, casting="same_kind", where=None):
    123     if isinstance(dst, numpy.ndarray) and isinstance(src, numpy.ndarray):
--> 124         dst[:] = src
    125     elif is_cupy_array(dst):
    126         src = cupy.array(src, copy=False)

ValueError: could not broadcast input array from shape (128) into shape (96)

Besides, when I disable the tagger when loading the t2v model, vectors of size 0 came from the t2v model.

Your Environment

Linux-4.9.0-7-amd64-x86_64-with-debian-buster-sid
Python 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38)
[GCC 7.3.0]
spacy 2.1.4
gensim 3.7.2
thinc 7.0.8

The text was updated successfully, but these errors were encountered:

honnibal · 2019-07-17T12:07:41Z

@agombert This isn't very user-friendly currently, sorry. We should be inferring these settings from the pretrained file, or at least exposing a better error message.

What you need to do at the moment is set the environment variable token_vector_width=128 before you run spacy train. This will tell change the setting to match your pretrained weights.

agombert · 2019-07-17T12:12:17Z

Hey @honnibal,

Thank you for the quick answer. Actually that what I did with the alias, I put the token_vector_width=128, but when I load the model, I get the second error message I posted.

honnibal · 2019-07-17T12:25:22Z

Hmm. As a work-around, does it work if you also set the environment variable when you load? It should work without it, but it seems there might be a missing setting written out in the config files.

agombert · 2019-07-17T12:41:39Z

I loaded as:

nlp = spacy.load('/path/to/my/t2v/model0', meta={"lang":"es", "token_vector_width":128})

It loads, but I have 0 size vectors.

EDIT:

When I use the same line after the default pretrain/train with "token_vector_width":96, I have also 0 size vectors.

honnibal · 2019-07-17T20:58:24Z

Did you try setting it as an environment variable, instead of passing it in the meta like that?

agombert · 2019-07-18T06:52:30Z

I have just tried, but same mistake at each step.

agombert · 2019-07-22T13:47:59Z

Hi,

I would like to know @honnibal if you have found something about this error.

Moreover, I trained a normal BERT-like model as presented in the documentation. And when I want to load it without any element of the pipeline (tagger, parser and ner), the vectors are lost and we have 0 size vectors. In fact, we have to load the tagger each time so the model provides 96 length vectors. Is it normal ?

lock · 2020-03-17T16:37:22Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the usage General spaCy usage label Jul 17, 2019

ines added the feat / tok2vec Feature: Token-to-vector layer and pretraining label Jul 17, 2019

svlandeg mentioned this issue Feb 14, 2020

add tok2vec parameters to train script #5021

Merged

3 tasks

honnibal closed this as completed in #5021 Feb 16, 2020

lock bot locked as resolved and limited conversation to collaborators Mar 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrain T2V - Width of CNN layers. #3979

Pretrain T2V - Width of CNN layers. #3979

agombert commented Jul 17, 2019 •

edited

Loading

honnibal commented Jul 17, 2019

agombert commented Jul 17, 2019

honnibal commented Jul 17, 2019

agombert commented Jul 17, 2019 •

edited

Loading

honnibal commented Jul 17, 2019

agombert commented Jul 18, 2019

agombert commented Jul 22, 2019

lock bot commented Mar 17, 2020

Pretrain T2V - Width of CNN layers. #3979

Pretrain T2V - Width of CNN layers. #3979

Comments

agombert commented Jul 17, 2019 • edited Loading

How to reproduce the behaviour

Other information about the bug:

Your Environment

honnibal commented Jul 17, 2019

agombert commented Jul 17, 2019

honnibal commented Jul 17, 2019

agombert commented Jul 17, 2019 • edited Loading

honnibal commented Jul 17, 2019

agombert commented Jul 18, 2019

agombert commented Jul 22, 2019

lock bot commented Mar 17, 2020

agombert commented Jul 17, 2019 •

edited

Loading

agombert commented Jul 17, 2019 •

edited

Loading