node-fasttext classification results doesn't match the original binary #16

freakeinstein · 2018-07-03T07:25:55Z

Hi, I have tried using node-fasttext library in a project. To identify the best model parameters, I have relied on the official fasttext binary. When I use the same parameters with the node-fasttext library, I'm getting weird results. So unfortunately, I decided to use the python version of fasttext instead and it worked fine. Can you check if the node library is using the correct version of fasttext or if there any other reason causing this issue. I actually would like to use a node library in my future projects.

I would like to add a suggestion- it will be great if you consider the case of model training, model reloading and prediction multiple times with the same fasttext instance instead of new object creation for training - which currently leads to memory explosion.

vunb · 2018-07-03T07:36:28Z

The official library has a lot update. Which version of fastext do you use? So please provide some info about parameters you used?

freakeinstein · 2018-07-03T07:50:35Z

Fasttext binary built from latest source available facebookresearch/fastText@25d0bb0 and python library is of version - 0.8.3 as seen at https://pypi.org/project/fasttext/#history

regarding parameters, here is the data:
{
input: datapath+'.txt',
output: datapath,
epoch: 10000,
lr: 0.5,
lrUpdateRate: 100,
wordNgrams: 2,
dim: 15
} [UPDATED]
I think it doesn't matter as long as the results are way too far from the official binary.

lzpfmh · 2020-05-11T18:15:06Z

in fasttext python module

       def check(entry):
            if entry.find('\n') != -1:
                raise ValueError(
                    "predict processes one line at a time (remove \'\\n\')"
                )
            entry += "\n"
            return entry

every query sentence end up with \n
but in this module

    std::vector<PredictResult> arr;
    std::vector<int32_t> words, labels;
    std::istringstream in(sentence);

    dict_->getLine(in, words, labels);

    if (words.empty())
    {
        return arr;
    }

    Vector hidden(args_->dim);
    Vector output(dict_->nlabels());
    std::vector<std::pair<real, int32_t>> modelPredictions;
    model_->predict(words, k, 0.0001, modelPredictions, hidden, output);

not auto add \n
so the result is diff. we must add \n to the end of sentence manually.
that's the problem.

vunb · 2020-05-12T07:48:50Z

@lzpfmh Thank you for the investigation!
Please help if you can send a PR?

vunb added the help wanted label May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node-fasttext classification results doesn't match the original binary #16

node-fasttext classification results doesn't match the original binary #16

freakeinstein commented Jul 3, 2018

vunb commented Jul 3, 2018

freakeinstein commented Jul 3, 2018 •

edited

Loading

lzpfmh commented May 11, 2020 •

edited

Loading

vunb commented May 12, 2020

node-fasttext classification results doesn't match the original binary #16

node-fasttext classification results doesn't match the original binary #16

Comments

freakeinstein commented Jul 3, 2018

vunb commented Jul 3, 2018

freakeinstein commented Jul 3, 2018 • edited Loading

lzpfmh commented May 11, 2020 • edited Loading

vunb commented May 12, 2020

freakeinstein commented Jul 3, 2018 •

edited

Loading

lzpfmh commented May 11, 2020 •

edited

Loading