Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix critical issues in FastText #2313

Merged
merged 136 commits into from
Jan 11, 2019
Merged

Fix critical issues in FastText #2313

merged 136 commits into from
Jan 11, 2019

Commits on Dec 16, 2018

  1. WIP

    mpenkov committed Dec 16, 2018
    Configuration menu
    Copy the full SHA
    94a20e9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fb2b5b0 View commit details
    Browse the repository at this point in the history
  3. update docstring

    mpenkov committed Dec 16, 2018
    Configuration menu
    Copy the full SHA
    1a41182 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    cd0b318 View commit details
    Browse the repository at this point in the history

Commits on Dec 23, 2018

  1. WIP

    mpenkov committed Dec 23, 2018
    Configuration menu
    Copy the full SHA
    3b31288 View commit details
    Browse the repository at this point in the history
  2. introduced Tracker class

    mpenkov committed Dec 23, 2018
    Configuration menu
    Copy the full SHA
    42626a2 View commit details
    Browse the repository at this point in the history
  3. added log examples

    mpenkov committed Dec 23, 2018
    Configuration menu
    Copy the full SHA
    12cc3e2 View commit details
    Browse the repository at this point in the history

Commits on Dec 28, 2018

  1. Configuration menu
    Copy the full SHA
    64f7f39 View commit details
    Browse the repository at this point in the history
  2. adding script to trigger bug

    mpenkov committed Dec 28, 2018
    Configuration menu
    Copy the full SHA
    00b472b View commit details
    Browse the repository at this point in the history

Commits on Dec 29, 2018

  1. minor documentation changes

    mpenkov committed Dec 29, 2018
    Configuration menu
    Copy the full SHA
    abfd573 View commit details
    Browse the repository at this point in the history
  2. improve unit test

    mpenkov committed Dec 29, 2018
    Configuration menu
    Copy the full SHA
    4e46062 View commit details
    Browse the repository at this point in the history
  3. retrained toy model

    $ ~/src/fastText-0.1.0/fasttext cbow -input toy-data.txt -output toy-model -bucket 100
    Read 0M words
    Number of words:  22
    Number of labels: 0
    Progress: 100.0%  words/sec/thread: 209  lr: 0.000000  loss: 4.100698 eta: 0h0m -14m
    mpenkov committed Dec 29, 2018
    Configuration menu
    Copy the full SHA
    d3544c7 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b98bc0b View commit details
    Browse the repository at this point in the history
  5. update unit test

    mpenkov committed Dec 29, 2018
    Configuration menu
    Copy the full SHA
    30be5bd View commit details
    Browse the repository at this point in the history
  6. WIP

    mpenkov committed Dec 29, 2018
    Configuration menu
    Copy the full SHA
    ab1eaf6 View commit details
    Browse the repository at this point in the history

Commits on Dec 30, 2018

  1. retrain model with a smaller dimensionality

    this will make it easier to debug manually
    
    $ ~/src/fastText-0.1.0/fasttext cbow -input toy-data.txt -output toy-model -bucket 100 -dim 5
    Read 0M words
    Number of words:  22
    Number of labels: 0
    Progress: 100.0%  words/sec/thread: 199  lr: 0.000000  loss: 0.000000  eta: 0h0m
    mpenkov committed Dec 30, 2018
    Configuration menu
    Copy the full SHA
    e59d1db View commit details
    Browse the repository at this point in the history

Commits on Dec 31, 2018

  1. Configuration menu
    Copy the full SHA
    392201b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f25607f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ef394ed View commit details
    Browse the repository at this point in the history
  4. update notes

    mpenkov committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    fe10ca7 View commit details
    Browse the repository at this point in the history

Commits on Jan 2, 2019

  1. update notes

    mpenkov committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    4c2223c View commit details
    Browse the repository at this point in the history
  2. initialize wv.vectors_vocab

    mpenkov committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    28bf757 View commit details
    Browse the repository at this point in the history
  3. init vectors_vocab properly

    mpenkov committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    8e0d04f View commit details
    Browse the repository at this point in the history
  4. add test_sanity_vectors

    mpenkov committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    795fed0 View commit details
    Browse the repository at this point in the history
  5. no longer segfaulting

    mpenkov committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    671b3c0 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    9adb532 View commit details
    Browse the repository at this point in the history
  7. removing old test

    it cannot pass by design: training is non-deterministic, so conditions
    must be tightly controlled to guarantee reproducibility, and that is too
    much effort for a unit test
    mpenkov committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    6de08de View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    81dd478 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    5c500f0 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    cb045de View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    a916266 View commit details
    Browse the repository at this point in the history
  12. disable test reruns for now

    mpenkov committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    cee6311 View commit details
    Browse the repository at this point in the history
  13. set min_count=0

    mpenkov committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    752cf9b View commit details
    Browse the repository at this point in the history
  14. initialize wv.buckets_word prior to continuing training

    This avoid a null dereference that could previously be reproduced with:
    
    python -c "from gensim.test.test_fasttext;import NativeTrainingContinuationTest as A;A().test_continuation_gensim()"
    mpenkov committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    ad3342a View commit details
    Browse the repository at this point in the history
  15. making all tests pass

    mpenkov committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    64caa3c View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    2c9f2b4 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    80c8092 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    0d30cae View commit details
    Browse the repository at this point in the history

Commits on Jan 3, 2019

  1. Configuration menu
    Copy the full SHA
    bf1c8b8 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ec92983 View commit details
    Browse the repository at this point in the history
  3. docstring fixes

    mpenkov committed Jan 3, 2019
    Configuration menu
    Copy the full SHA
    1c58119 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    91b3599 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    5100335 View commit details
    Browse the repository at this point in the history
  6. remove comment

    mpenkov committed Jan 3, 2019
    Configuration menu
    Copy the full SHA
    aae713d View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    e5ec723 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    87f655a View commit details
    Browse the repository at this point in the history
  9. git rm trigger.py

    mpenkov committed Jan 3, 2019
    Configuration menu
    Copy the full SHA
    76aca9a View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2019

  1. refactor FB model loading code

    Move the lower-level FB model loading code to a new module.
    Implement alternative, simpler _load_fast_text_format function.
    Add unit tests to compare alternative and existing implementation.
    mpenkov committed Jan 4, 2019
    Configuration menu
    Copy the full SHA
    8027459 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    07f34e2 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    118cd7f View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2019

  1. Configuration menu
    Copy the full SHA
    ef58c7c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    799596d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6cf3d1f View commit details
    Browse the repository at this point in the history
  4. minor fixup around hashes

    mpenkov committed Jan 5, 2019
    Configuration menu
    Copy the full SHA
    b58a50b View commit details
    Browse the repository at this point in the history
  5. add oov test

    mpenkov committed Jan 5, 2019
    Configuration menu
    Copy the full SHA
    97baf3c View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ef90436 View commit details
    Browse the repository at this point in the history

Commits on Jan 6, 2019

  1. git rm gensim.xml native.xml

    mpenkov committed Jan 6, 2019
    Configuration menu
    Copy the full SHA
    8956530 View commit details
    Browse the repository at this point in the history
  2. minor fix in comment

    mpenkov committed Jan 6, 2019
    Configuration menu
    Copy the full SHA
    2e10ece View commit details
    Browse the repository at this point in the history

Commits on Jan 7, 2019

  1. Configuration menu
    Copy the full SHA
    cb25448 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    901eaeb View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    f0bd22d View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    fa34d84 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f9c1547 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    de7d9ef View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    2946896 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    a7c14d0 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    5598e19 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    07c84f5 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    f15094d View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    5e25a4f View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    b789971 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    1ed35ea View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    0f62660 View commit details
    Browse the repository at this point in the history
  16. tox -e flake8

    mpenkov committed Jan 7, 2019
    Configuration menu
    Copy the full SHA
    c461193 View commit details
    Browse the repository at this point in the history
  17. tox -e flake8-docs

    mpenkov committed Jan 7, 2019
    Configuration menu
    Copy the full SHA
    eeafdec View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    3e0e656 View commit details
    Browse the repository at this point in the history
  19. Revert "refactoring: remove unused vectors_vocab_norm attribute"

    This reverts commit 07c84f5.
    
    We have to worry about backwards compatibility if we remove this
    attribute, and it's not worth doing that as part of this PR.
    mpenkov committed Jan 7, 2019
    Configuration menu
    Copy the full SHA
    262599d View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    7d4e60e View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    6cc80de View commit details
    Browse the repository at this point in the history
  22. review response: fix docstring in fasttext_bin.py

    Also ran python -m doctest gensim/models/fasttext_bin.py to check the
    docstring is correctly executable.
    mpenkov committed Jan 7, 2019
    Configuration menu
    Copy the full SHA
    069912f View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    cc19393 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    1661c16 View commit details
    Browse the repository at this point in the history

Commits on Jan 8, 2019

  1. Configuration menu
    Copy the full SHA
    72b1d81 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e467060 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    c2740cd View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    daa425a View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    64844f3 View commit details
    Browse the repository at this point in the history
  6. adjust unit test

    vectors_lockf is only for word2vec.  FastText implementation uses
    vectors_ngrams_lockf and vectors_vocab_lockf only.
    mpenkov committed Jan 8, 2019
    Configuration menu
    Copy the full SHA
    39e85f1 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    60d0477 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    52e2fbe View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    d08500b View commit details
    Browse the repository at this point in the history
  10. remove outdated comments

    mpenkov committed Jan 8, 2019
    Configuration menu
    Copy the full SHA
    3a2f93e View commit details
    Browse the repository at this point in the history
  11. fix deprecation warnings

    mpenkov committed Jan 8, 2019
    Configuration menu
    Copy the full SHA
    3159a18 View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2019

  1. Configuration menu
    Copy the full SHA
    f262815 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b80c329 View commit details
    Browse the repository at this point in the history
  3. add LoadFastTextFormatTest

    mpenkov committed Jan 9, 2019
    Configuration menu
    Copy the full SHA
    127a13e View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2b96550 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    25ad1ae View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    09388ec View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    6054aa8 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    b92f435 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    422e3b1 View commit details
    Browse the repository at this point in the history
  10. remove old tests

    mpenkov committed Jan 9, 2019
    Configuration menu
    Copy the full SHA
    553c8e0 View commit details
    Browse the repository at this point in the history
  11. tox -e flake8

    mpenkov committed Jan 9, 2019
    Configuration menu
    Copy the full SHA
    d42e506 View commit details
    Browse the repository at this point in the history
  12. fixup: introduce OrderedDict to _fasttext_bin.py

    The order of the words matters.  In the previous implementation, this
    was maintained explicitly via the index2word list, but using an
    OrderedDict achieves the same thing.
    
    The main idea is that we iterate over the vocab terms in the right order
    in the prepare_vocab function.
    mpenkov committed Jan 9, 2019
    Configuration menu
    Copy the full SHA
    425e942 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    802587a View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    ff82b71 View commit details
    Browse the repository at this point in the history
  15. 1 Configuration menu
    Copy the full SHA
    dab47f3 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    914aa95 View commit details
    Browse the repository at this point in the history
  17. adding additional assertion

    mpenkov committed Jan 9, 2019
    Configuration menu
    Copy the full SHA
    65abda9 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    01d84d1 View commit details
    Browse the repository at this point in the history
  19. delete out of date comment

    mpenkov committed Jan 9, 2019
    Configuration menu
    Copy the full SHA
    611cdb2 View commit details
    Browse the repository at this point in the history
  20. Revert "re-enable disabled assertions"

    This reverts commit 01d84d1.
    mpenkov committed Jan 9, 2019
    Configuration menu
    Copy the full SHA
    0c959a9 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    c196ace View commit details
    Browse the repository at this point in the history
  22. update unit tests

    mpenkov committed Jan 9, 2019
    Configuration menu
    Copy the full SHA
    fb51a6a View commit details
    Browse the repository at this point in the history

Commits on Jan 10, 2019

  1. Configuration menu
    Copy the full SHA
    768a941 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f4643bb View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d802e91 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    92da774 View commit details
    Browse the repository at this point in the history
  5. review response: move import

    mpenkov committed Jan 10, 2019
    Configuration menu
    Copy the full SHA
    e638628 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    6e47a88 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    2cdad39 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    fbaf086 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    dc32126 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    39e8844 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    08ee7d8 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    6d8a648 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    734a0ac View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    143445e View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    250d388 View commit details
    Browse the repository at this point in the history
  16. review response: get rid of struct_unpack

    This is an internal method masquerading as a public one.  There is no
    reason for anyone to call it.  Removing it will have no effect on
    pickling/unpickling, as methods do not get serialized.
    
    Therefore, removing it is safe.
    mpenkov committed Jan 10, 2019
    Configuration menu
    Copy the full SHA
    9fcf35e View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    c1aeb85 View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    58c1166 View commit details
    Browse the repository at this point in the history
  19. Configuration menu
    Copy the full SHA
    e5960ed View commit details
    Browse the repository at this point in the history

Commits on Jan 11, 2019

  1. Configuration menu
    Copy the full SHA
    52230aa View commit details
    Browse the repository at this point in the history
  2. fix tests

    menshikh-iv committed Jan 11, 2019
    Configuration menu
    Copy the full SHA
    14c497d View commit details
    Browse the repository at this point in the history