Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyedVectors & *2Vec API streamlining, consistency #2698

Merged
merged 64 commits into from
Jul 19, 2020
Merged
Changes from 1 commit
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
7e642a2
slim low-value warnings
gojomo Dec 5, 2019
b8de987
clarify vectors/vectors_vocab relationship; fix lockf & nonsense ngra…
gojomo Dec 5, 2019
38343d6
mv FT, KV tests to right place
gojomo Dec 6, 2019
a255e8c
rm deprecations, obsolete refs/tests, delete_temporary_training_data,…
gojomo Dec 5, 2019
4e334c1
update usages, tests, flake8 cleanup
gojomo Dec 7, 2019
a16cec5
expand KeyedVectors to obviate Doc2VecKeyedVectors; upconvert old off…
gojomo Dec 12, 2019
d4267f8
fix docstring warnings; update usages
gojomo Dec 12, 2019
f6e7aa6
rm unused old plain-python codepaths
gojomo Dec 13, 2019
470b119
unify class comments under __init__ for consistncy w/ api doc present…
gojomo Dec 14, 2019
cd02b8b
name/comment harmonization (rm 'entity', lessen 'word'-centricity)
gojomo Dec 17, 2019
0c77ae4
table formatting
gojomo Dec 17, 2019
cfa723d
return pyemd to linux test env
gojomo Dec 17, 2019
a4f7b77
split backcompat tests for better resolution
gojomo Dec 18, 2019
4412696
convert Vocab & related data items to use dataclasses
gojomo Dec 18, 2019
65c2b2d
rm obsolete Vocab/Trainable/abstract/Wrapper classes, persistent call…
gojomo Dec 18, 2019
1d0f52f
tune tests for stability, runtimes; rm auto reruns that hide flakiness
gojomo Jan 15, 2020
8123596
fix numpy FutureWarning: arrays to stack must be sequence
gojomo Dec 26, 2019
c5efb24
(commented-out) deoptimization option
gojomo Jan 22, 2020
2c234dd
stronger FB model testing; no _unpack_copy test
gojomo Jan 22, 2020
9910404
merge redundant methods; rm duplicated imports/defs
gojomo Jan 22, 2020
658813f
rationalize _lockf, buckets_word behaviors
gojomo Jan 22, 2020
3cdb1d6
rename .docvecs to .dv
gojomo Jan 24, 2020
10d9f55
update usages; rm obsolete tests; restore gensim.utils import
gojomo Jan 28, 2020
79af68e
intensify FT tests (more epochs, more buckets)
gojomo May 12, 2020
8875d8b
flake8-3.8.0 style fixes - but also pin flake8-3.7.9 vs 3.8.0 'output…
gojomo May 12, 2020
4b7566e
replace vectors_norm with 1d norms
gojomo May 12, 2020
1baab2a
tighten testParallel
gojomo May 13, 2020
8d2f1fe
rm .vocab & 'Vocab' classes; add expandable 'vecattrs'
gojomo May 14, 2020
fc65525
update usages (no vocabs)
gojomo May 15, 2020
4657b14
enable running inside '-m mtprof' (or cProfile) via explicit unittest…
gojomo May 15, 2020
b5ff29b
faster sample_int reads
gojomo May 15, 2020
098119b
load_word2vec_format(.., no_header=True) to support GLoVe text vectors
gojomo May 19, 2020
318a858
refactor & comment lockf feature; allow single-element lockf
gojomo May 26, 2020
fe3ae31
improve FT comment
gojomo May 26, 2020
d503205
rm deprecated/unneded init_sims calls
gojomo May 26, 2020
679dde9
Merge branch 'develop' into kv_cleanup
piskvorky Jul 5, 2020
411473b
fixes to code style
piskvorky Jul 6, 2020
45fd5f6
flake8: fix overlong lines
piskvorky Jul 6, 2020
5acc5f5
Merge branch 'develop' into kv_cleanup
gojomo Jul 6, 2020
5764f8c
rm stray merge error
gojomo Jul 6, 2020
e49ae4c
rm duplicated , old nonstandard hash workarounds
gojomo Jul 6, 2020
278c2bd
use numpy-recommended PRNG constructor
gojomo Jul 6, 2020
5c7eb1c
add sg to FastTextConfig & consult it; rm remaining broken-hash cruft
gojomo Jul 6, 2020
23805d1
reorg conditional packages for clarity
gojomo Jul 6, 2020
f5b902c
comments, names, refactoring, randomization
gojomo Jul 7, 2020
7b571b2
Apply suggestions from code review
gojomo Jul 7, 2020
87860c5
fix cruft left from suggestion
gojomo Jul 7, 2020
39fe128
fix numpy-32bit-on-Windows; executable docs
gojomo Jul 7, 2020
15152ff
mv lee_corpus to utils; cleanup
gojomo Jul 7, 2020
3d424a2
update poincare for latest KV __init__ signature
gojomo Jul 7, 2020
99f7009
restore word_vec method for proper overriding, but rm usages
gojomo Jul 7, 2020
2bb8abf
Apply suggestions from code review
gojomo Jul 7, 2020
33c6508
adjust testParallel against failure risk
gojomo Jul 8, 2020
8f17d6d
merge ~piskvorky's /pull/10 cleanups
gojomo Jul 10, 2020
cb33e46
intensify training for an occasionally failing test
gojomo Jul 11, 2020
581ef06
clarify word/char ngrams handling; rm outdated comments
gojomo Jul 14, 2020
9f21cba
mostly avoid duplciating FastTextConfig fields into locals
gojomo Jul 16, 2020
d912616
avoid copies/pointers for no-bucket (FT as W2V) case
gojomo Jul 16, 2020
583bbe6
rm obsolete test (already skipped & somewhat originally misguided)
gojomo Jul 16, 2020
0330cfc
simpler/faster .get(..., default) (avoids exception-catching in has_i…
gojomo Jul 16, 2020
9caf217
add default option to get_index; avoid exception in has_index_for
gojomo Jul 16, 2020
14dd9f5
chained range check
gojomo Jul 16, 2020
8674949
Merge branch 'develop' into kv_cleanup
mpenkov Jul 19, 2020
0d2679a
Update CHANGELOG.md
mpenkov Jul 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions gensim/models/keyedvectors.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,15 +337,18 @@ def __getitem__(self, key_or_keys):

return vstack([self.get_vector(key) for key in key_or_keys])

def get_index(self, key):
def get_index(self, key, default=None):
"""Return the integer index (slot/position) where the given key's vector is stored in the
backing vectors array.

"""
if key in self.key_to_index:
return self.key_to_index[key]
elif isinstance(key, (int, np.integer)) and key < len(self.index_to_key):
val = self.key_to_index.get(key, -1)
if val >= 0:
return val
elif isinstance(key, (int, np.integer)) and key < len(self.index_to_key) and key >= 0:
piskvorky marked this conversation as resolved.
Show resolved Hide resolved
return key
elif default is not None:
return default
else:
raise KeyError("Key '%s' not present" % key)

Expand Down Expand Up @@ -491,10 +494,7 @@ def has_index_for(self, key):
more-specific check.

"""
try:
return self.get_index(key) >= 0
except KeyError:
return False
return self.get_index(key, -1) >= 0

def __contains__(self, key):
return self.has_index_for(key)
Expand Down