FastText incremental training fails #2139

xor-xor · 2018-07-24T08:56:28Z

Description

Having successfully trained model (with 20 epochs), which has been saved and loaded back without any problems, I'm trying to continue training it for another 10 epochs - on the same data, with the same parameters - but it fails with an error: TypeError: 'NoneType' object is not subscriptable (for full traceback see below).

Steps/Code/Corpus to Reproduce

from gensim.models.fasttext import FastText

# `train_data' is just a list of lists of strings (words), e.g.
# `[['w1', 'w2', 'w3', ...], ['w1', 'w4', 'w5', ...], ...]'.
model = FastText(
    train_data,
    sg=1,
    size=200,
    window=5,
    min_count=1,
    workers=16,
    negative=20,
    iter=20,
    min_n=3,
    max_n=5,
    word_ngrams=1,
    bucket=int(2e6)
)

# `model_file' is a string with the path to the file where model is being saved
model.save(model_file)

model = FastText.load(model_file)

# `train_data' here is exactly the same as before
model.train(train_data, epochs=10, total_examples=model.corpus_count)

Expected Results

Successfully trained model.

Actual Results

[WARNING 2018-07-23 14:42:00,222] Effective 'alpha' higher than previous training cycles
[INFO 2018-07-23 14:42:00,222] training model with 16 workers on 15145 vocabulary and 200 features, using sg=1 hs=0 sample=0.001 negative=20 window=5
Exception in thread Thread-50:
Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/env/lib/python3.5/site-packages/gensim/models/base_any2vec.py", line 99, in _worker_loop
    tally, raw_tally = self._do_train_job(data_iterable, job_parameters, thread_private_mem)
  File "/home/ubuntu/env/lib/python3.5/site-packages/gensim/models/fasttext.py", line 454, in _do_train_job
    tally += train_batch_sg(self, sentences, alpha, work, neu1)
  File "gensim/models/fasttext_inner.pyx", line 319, in gensim.models.fasttext_inner.train_batch_sg
TypeError: 'NoneType' object is not subscriptable

Versions

Linux-4.15.0-1014-gcp-x86_64-with-Ubuntu-16.04-xenial
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609]
NumPy 1.14.5
SciPy 1.1.0
gensim 3.4.0
FAST_VERSION 1

The text was updated successfully, but these errors were encountered:

piskvorky · 2018-07-24T09:22:06Z

@manneshiva @gojomo @menshikh-iv any idea? That use-case sounds like something we definitely want to (should) support.

gojomo · 2018-07-24T15:50:24Z

Yes, that seems like something that should work (even if it might be tricky to get working well) – and I'd guess the issue is something that's not being serialized isn't being rebuilt after re-load.

ntonyproduction · 2018-07-24T23:34:39Z

I'm having trouble to compile this code, Does anyone know what would be the problem?

gojomo · 2018-07-25T16:36:54Z

@ntonyproduction Which code, what problem, what have you tried so far, what does it have to do with this issue?

menshikh-iv · 2018-07-31T10:58:19Z

@xor-xor thanks for the report, problem reproduced with gensim==3.5.0

from gensim.models import FastText
from gensim.test.utils import common_texts, get_tmpfile

model = FastText(
    common_texts,
    sg=1,
    size=200,
    window=5,
    min_count=1,
    workers=16,
    negative=20,
    iter=20,
    min_n=3,
    max_n=5,
    word_ngrams=1,
    bucket=int(2e6)
)


path = get_tmpfile("fasttext.model")
model.save(path)

loaded_model = FastText.load(path)
loaded_model.train(common_texts, epochs=10, total_examples=model.corpus_count)

but if we add build_vocab(update=True) after loading (before second training) - all will work correctly

loaded_model.build_vocab(common_texts, update=True)

anyway, need to investigate this behavior because this must work without additional build_vocab call.

cpflaume · 2018-08-16T06:42:45Z

I can confirm this bug - its even worse:

model_gensim = FT_gensim(size=100) 
model_gensim.build_vocab(lee_data) 
model_gensim.train(lee_data, total_examples=model_gensim.corpus_count, epochs=model_gensim.iter)

#### --> same error
model_gensim.train(lee_data, total_examples=model_gensim.corpus_count, epochs=model_gensim.iter)

Seems like you have to call build_vocab before EVERY train call, no matter if loaded or not.

I wonder if I should call build_vocab with the actual corpus or just with something like ['foo'] - latter would properly be better in order not to alter any vocabulary frequencies?

yajiezhu · 2018-08-16T09:40:29Z

dataset1 = models.Doc2Vec.load("dataset1.model") 1000 sentences
print(len(dataset1.docvecs.vectors_docs)) 1000 sentences vetor
dataset1.build_vocab(new_sentences, update=True) 500 sentences
print(len(dataset1.docvecs.vectors_docs)) 1000 sentences vetor
dataset1.train(new_sentences, total_examples=dataset1.corpus_count, epochs=100)
print(len(dataset1.docvecs.vectors_docs)) 1000 sentences vetor

question: three print results are same
please help me ,thank you

MorenoLaQuatra · 2018-12-03T08:37:37Z

I'm stuck on this problem, the issue seems to be solved, do we need to recompile gensim?

aakash086 · 2018-12-04T12:21:37Z

Hi ,

I am stuck with this issue for few days now , can someone please tell in which gensim fastext version would this fix be available . ?

aakash086 · 2018-12-05T08:21:07Z

I'm stuck on this problem, the issue seems to be solved, do we need to recompile gensim?

Hi ,
Have you got your query resolved ? Please share if you got something on this .

Thanks !

MorenoLaQuatra · 2018-12-05T09:42:30Z

I'm stuck on this problem, the issue seems to be solved, do we need to recompile gensim?

Hi ,
Have you got your query resolved ? Please share if you got something on this .

Thanks !

I didn't solve it but referring to the #2215, it seems to be already solved. My question was about the need of a gensim update for that.

menshikh-iv · 2019-01-11T15:11:12Z

Fixed by #2313

menshikh-iv · 2019-01-11T15:11:31Z

My question was about the need of a gensim update for that.

@aakash086 @MorenoLaQuatra Will be available in gensim==3.7.0 (end of Jan)

piskvorky added the bug Issue described a bug label Jul 24, 2018

menshikh-iv mentioned this issue Aug 2, 2018

File-based fast training for Any2Vec models #2127

Merged

menshikh-iv mentioned this issue Aug 27, 2018

[Feature request] Load full native fastText model to continue training on new data #2160

Closed

menshikh-iv mentioned this issue Oct 5, 2018

[WIP] Fix FastText #2215

Closed

3 tasks

timbicker mentioned this issue Dec 13, 2018

set normed vectors to None when model is trained #2273

Closed

menshikh-iv added the difficulty medium Medium issue: required good gensim understanding & python skills label Dec 14, 2018

menshikh-iv assigned mpenkov Dec 14, 2018

mpenkov added the fasttext Issues related to the FastText model label Dec 15, 2018

mpenkov mentioned this issue Jan 11, 2019

Fix critical issues in FastText #2313

Merged

menshikh-iv closed this as completed Jan 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FastText incremental training fails #2139

FastText incremental training fails #2139

xor-xor commented Jul 24, 2018 •

edited by menshikh-iv

Loading

piskvorky commented Jul 24, 2018

gojomo commented Jul 24, 2018

ntonyproduction commented Jul 24, 2018

gojomo commented Jul 25, 2018

menshikh-iv commented Jul 31, 2018 •

edited

Loading

cpflaume commented Aug 16, 2018 •

edited by menshikh-iv

Loading

yajiezhu commented Aug 16, 2018

MorenoLaQuatra commented Dec 3, 2018

aakash086 commented Dec 4, 2018

aakash086 commented Dec 5, 2018

MorenoLaQuatra commented Dec 5, 2018 •

edited

Loading

menshikh-iv commented Jan 11, 2019

menshikh-iv commented Jan 11, 2019 •

edited

Loading

FastText incremental training fails #2139

FastText incremental training fails #2139

Comments

xor-xor commented Jul 24, 2018 • edited by menshikh-iv Loading

Description

Steps/Code/Corpus to Reproduce

Expected Results

Actual Results

Versions

piskvorky commented Jul 24, 2018

gojomo commented Jul 24, 2018

ntonyproduction commented Jul 24, 2018

gojomo commented Jul 25, 2018

menshikh-iv commented Jul 31, 2018 • edited Loading

cpflaume commented Aug 16, 2018 • edited by menshikh-iv Loading

yajiezhu commented Aug 16, 2018

MorenoLaQuatra commented Dec 3, 2018

aakash086 commented Dec 4, 2018

aakash086 commented Dec 5, 2018

MorenoLaQuatra commented Dec 5, 2018 • edited Loading

menshikh-iv commented Jan 11, 2019

menshikh-iv commented Jan 11, 2019 • edited Loading

xor-xor commented Jul 24, 2018 •

edited by menshikh-iv

Loading

menshikh-iv commented Jul 31, 2018 •

edited

Loading

cpflaume commented Aug 16, 2018 •

edited by menshikh-iv

Loading

MorenoLaQuatra commented Dec 5, 2018 •

edited

Loading

menshikh-iv commented Jan 11, 2019 •

edited

Loading