Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-fasttext classification results doesn't match the original binary #16

Open
freakeinstein opened this issue Jul 3, 2018 · 4 comments

Comments

@freakeinstein
Copy link

Hi, I have tried using node-fasttext library in a project. To identify the best model parameters, I have relied on the official fasttext binary. When I use the same parameters with the node-fasttext library, I'm getting weird results. So unfortunately, I decided to use the python version of fasttext instead and it worked fine. Can you check if the node library is using the correct version of fasttext or if there any other reason causing this issue. I actually would like to use a node library in my future projects.

I would like to add a suggestion- it will be great if you consider the case of model training, model reloading and prediction multiple times with the same fasttext instance instead of new object creation for training - which currently leads to memory explosion.

@vunb
Copy link
Owner

vunb commented Jul 3, 2018

The official library has a lot update. Which version of fastext do you use? So please provide some info about parameters you used?

@freakeinstein
Copy link
Author

freakeinstein commented Jul 3, 2018

Fasttext binary built from latest source available facebookresearch/fastText@25d0bb0 and python library is of version - 0.8.3 as seen at https://pypi.org/project/fasttext/#history

regarding parameters, here is the data:
{
input: datapath+'.txt',
output: datapath,
epoch: 10000,
lr: 0.5,
lrUpdateRate: 100,
wordNgrams: 2,
dim: 15
} [UPDATED]
I think it doesn't matter as long as the results are way too far from the official binary.

@lzpfmh
Copy link

lzpfmh commented May 11, 2020

in fasttext python module

       def check(entry):
            if entry.find('\n') != -1:
                raise ValueError(
                    "predict processes one line at a time (remove \'\\n\')"
                )
            entry += "\n"
            return entry

every query sentence end up with \n
but in this module

    std::vector<PredictResult> arr;
    std::vector<int32_t> words, labels;
    std::istringstream in(sentence);

    dict_->getLine(in, words, labels);

    if (words.empty())
    {
        return arr;
    }

    Vector hidden(args_->dim);
    Vector output(dict_->nlabels());
    std::vector<std::pair<real, int32_t>> modelPredictions;
    model_->predict(words, k, 0.0001, modelPredictions, hidden, output);

not auto add \n
so the result is diff. we must add \n to the end of sentence manually.
that's the problem.

@vunb
Copy link
Owner

vunb commented May 12, 2020

@lzpfmh Thank you for the investigation!
Please help if you can send a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants