-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastText incremental training fails #2139
Comments
@manneshiva @gojomo @menshikh-iv any idea? That use-case sounds like something we definitely want to (should) support. |
Yes, that seems like something that should work (even if it might be tricky to get working well) – and I'd guess the issue is something that's not being serialized isn't being rebuilt after re-load. |
I'm having trouble to compile this code, Does anyone know what would be the problem? |
@ntonyproduction Which code, what problem, what have you tried so far, what does it have to do with this issue? |
@xor-xor thanks for the report, problem reproduced with from gensim.models import FastText
from gensim.test.utils import common_texts, get_tmpfile
model = FastText(
common_texts,
sg=1,
size=200,
window=5,
min_count=1,
workers=16,
negative=20,
iter=20,
min_n=3,
max_n=5,
word_ngrams=1,
bucket=int(2e6)
)
path = get_tmpfile("fasttext.model")
model.save(path)
loaded_model = FastText.load(path)
loaded_model.train(common_texts, epochs=10, total_examples=model.corpus_count) but if we add loaded_model.build_vocab(common_texts, update=True) anyway, need to investigate this behavior because this must work without additional |
I can confirm this bug - its even worse: model_gensim = FT_gensim(size=100)
model_gensim.build_vocab(lee_data)
model_gensim.train(lee_data, total_examples=model_gensim.corpus_count, epochs=model_gensim.iter)
#### --> same error
model_gensim.train(lee_data, total_examples=model_gensim.corpus_count, epochs=model_gensim.iter) Seems like you have to call build_vocab before EVERY train call, no matter if loaded or not. I wonder if I should call build_vocab with the actual corpus or just with something like ['foo'] - latter would properly be better in order not to alter any vocabulary frequencies? |
dataset1 = models.Doc2Vec.load("dataset1.model") 1000 sentences question: three print results are same |
I'm stuck on this problem, the issue seems to be solved, do we need to recompile gensim? |
Hi , I am stuck with this issue for few days now , can someone please tell in which gensim fastext version would this fix be available . ? |
Hi , Thanks ! |
I didn't solve it but referring to the #2215, it seems to be already solved. My question was about the need of a gensim update for that. |
Fixed by #2313 |
@aakash086 @MorenoLaQuatra Will be available in |
Description
Having successfully trained model (with 20 epochs), which has been saved and loaded back without any problems, I'm trying to continue training it for another 10 epochs - on the same data, with the same parameters - but it fails with an error:
TypeError: 'NoneType' object is not subscriptable
(for full traceback see below).Steps/Code/Corpus to Reproduce
Expected Results
Successfully trained model.
Actual Results
Versions
Linux-4.15.0-1014-gcp-x86_64-with-Ubuntu-16.04-xenial
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609]
NumPy 1.14.5
SciPy 1.1.0
gensim 3.4.0
FAST_VERSION 1
The text was updated successfully, but these errors were encountered: