-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce size of language data #4140
Commits on Aug 18, 2019
-
Move Turkish lemmas to a json file
Rather than a large dict in Python source, the data is now a big json file. This includes a method for loading the json file, falling back to a compressed file, and an update to MANIFEST.in that excludes json in the spacy/lang directory. This focuses on Turkish specifically because it has the most language data in core.
Configuration menu - View commit details
-
Copy full SHA for 43eb680 - Browse repository at this point
Copy the full SHA 43eb680View commit details -
Transition all lemmatizer.py files to json
This covers all lemmatizer.py files of a significant size (>500k or so). Small files were left alone. None of the affected files have logic, so this was pretty straightforward. One unusual thing is that the lemma data for Urdu doesn't seem to be used anywhere. That may require further investigation.
Configuration menu - View commit details
-
Copy full SHA for 81026e7 - Browse repository at this point
Copy the full SHA 81026e7View commit details -
Move large lang data to json for fr/nb/nl/sv
These are the languages that use a lemmatizer directory (rather than a single file) and are larger than English. For most of these languages there were many language data files, in which case only the large ones (>500k or so) were converted to json. It may or may not be a good idea to migrate the remaining Python files to json in the future.
Configuration menu - View commit details
-
Copy full SHA for 9a7a0ed - Browse repository at this point
Copy the full SHA 9a7a0edView commit details -
The contents of this file were originally just copied from the Python source, but that used single quotes, so it had to be properly converted to json first.
Configuration menu - View commit details
-
Copy full SHA for 438cbdf - Browse repository at this point
Copy the full SHA 438cbdfView commit details -
This covers the json.gz files built as part of distribution.
Configuration menu - View commit details
-
Copy full SHA for 6da699c - Browse repository at this point
Copy the full SHA 6da699cView commit details -
Add language data gzip to build process
Currently this gzip data on every build; it works, but it should be changed to only gzip when the source file has been updated.
Configuration menu - View commit details
-
Copy full SHA for 969c2c6 - Browse repository at this point
Copy the full SHA 969c2c6View commit details -
Return True from doc.is_... when no ambiguity
* Make doc.is_sentenced return True if len(doc) < 2. * Make doc.is_nered return True if len(doc) == 0, for consistency. Closes explosion#3934
Configuration menu - View commit details
-
Copy full SHA for 52ec915 - Browse repository at this point
Copy the full SHA 52ec915View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9568dee - Browse repository at this point
Copy the full SHA 9568deeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5552f3a - Browse repository at this point
Copy the full SHA 5552f3aView commit details -
more friendly textcat errors (explosion#3946)
* more friendly textcat errors with require_model and require_labels * update thinc version with recent bugfix
Configuration menu - View commit details
-
Copy full SHA for 9c199da - Browse repository at this point
Copy the full SHA 9c199daView commit details -
Configuration menu - View commit details
-
Copy full SHA for ab8d80e - Browse repository at this point
Copy the full SHA ab8d80eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 53acf1c - Browse repository at this point
Copy the full SHA 53acf1cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0fedbfa - Browse repository at this point
Copy the full SHA 0fedbfaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 91d8054 - Browse repository at this point
Copy the full SHA 91d8054View commit details -
Configuration menu - View commit details
-
Copy full SHA for c846c27 - Browse repository at this point
Copy the full SHA c846c27View commit details -
💫 Fix issue explosion#3839: Incorrect entity IDs from Matcher with op…
…erators (explosion#3949) * Add regression test for issue explosion#3541 * Add comment on bugfix * Remove incorrect test * Un-xfail test
Configuration menu - View commit details
-
Copy full SHA for aede1ee - Browse repository at this point
Copy the full SHA aede1eeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7477d3f - Browse repository at this point
Copy the full SHA 7477d3fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9e8ac78 - Browse repository at this point
Copy the full SHA 9e8ac78View commit details -
Configuration menu - View commit details
-
Copy full SHA for 967eda3 - Browse repository at this point
Copy the full SHA 967eda3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 537f559 - Browse repository at this point
Copy the full SHA 537f559View commit details -
Configuration menu - View commit details
-
Copy full SHA for bf16a1b - Browse repository at this point
Copy the full SHA bf16a1bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 984b62b - Browse repository at this point
Copy the full SHA 984b62bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 28a1f00 - Browse repository at this point
Copy the full SHA 28a1f00View commit details -
Configuration menu - View commit details
-
Copy full SHA for d627127 - Browse repository at this point
Copy the full SHA d627127View commit details -
Configuration menu - View commit details
-
Copy full SHA for f38f102 - Browse repository at this point
Copy the full SHA f38f102View commit details -
Configuration menu - View commit details
-
Copy full SHA for 22ef030 - Browse repository at this point
Copy the full SHA 22ef030View commit details -
Configuration menu - View commit details
-
Copy full SHA for 309f72f - Browse repository at this point
Copy the full SHA 309f72fView commit details -
Configuration menu - View commit details
-
Copy full SHA for fac34fe - Browse repository at this point
Copy the full SHA fac34feView commit details -
Fixing ngram bug (explosion#3953)
* minimal failing example for Issue explosion#3661 * referenced Issue explosion#3661 instead of Issue explosion#3611 * cleanup
Configuration menu - View commit details
-
Copy full SHA for 1b71661 - Browse repository at this point
Copy the full SHA 1b71661View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8632299 - Browse repository at this point
Copy the full SHA 8632299View commit details -
Configuration menu - View commit details
-
Copy full SHA for bfa9110 - Browse repository at this point
Copy the full SHA bfa9110View commit details -
Configuration menu - View commit details
-
Copy full SHA for 912a27b - Browse repository at this point
Copy the full SHA 912a27bView commit details -
Configuration menu - View commit details
-
Copy full SHA for bd01f90 - Browse repository at this point
Copy the full SHA bd01f90View commit details -
Configuration menu - View commit details
-
Copy full SHA for 83b3b96 - Browse repository at this point
Copy the full SHA 83b3b96View commit details -
Configuration menu - View commit details
-
Copy full SHA for dbdbfe6 - Browse repository at this point
Copy the full SHA dbdbfe6View commit details -
Configuration menu - View commit details
-
Copy full SHA for e815480 - Browse repository at this point
Copy the full SHA e815480View commit details -
Configuration menu - View commit details
-
Copy full SHA for eaec450 - Browse repository at this point
Copy the full SHA eaec450View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3748c96 - Browse repository at this point
Copy the full SHA 3748c96View commit details -
Configuration menu - View commit details
-
Copy full SHA for 54c29eb - Browse repository at this point
Copy the full SHA 54c29ebView commit details -
Configuration menu - View commit details
-
Copy full SHA for c88252f - Browse repository at this point
Copy the full SHA c88252fView commit details -
Configuration menu - View commit details
-
Copy full SHA for c0e52aa - Browse repository at this point
Copy the full SHA c0e52aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8ab4c8d - Browse repository at this point
Copy the full SHA 8ab4c8dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 46cca4b - Browse repository at this point
Copy the full SHA 46cca4bView commit details -
Configuration menu - View commit details
-
Copy full SHA for f130243 - Browse repository at this point
Copy the full SHA f130243View commit details -
Not actually supported in this alignment interpretation
Configuration menu - View commit details
-
Copy full SHA for 17e97ac - Browse repository at this point
Copy the full SHA 17e97acView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6dcfd32 - Browse repository at this point
Copy the full SHA 6dcfd32View commit details -
Configuration menu - View commit details
-
Copy full SHA for 285ad08 - Browse repository at this point
Copy the full SHA 285ad08View commit details -
Configuration menu - View commit details
-
Copy full SHA for f174d19 - Browse repository at this point
Copy the full SHA f174d19View commit details -
Configuration menu - View commit details
-
Copy full SHA for 90d565e - Browse repository at this point
Copy the full SHA 90d565eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 88aca5a - Browse repository at this point
Copy the full SHA 88aca5aView commit details -
Bugfix/issue 3968 (explosion#3982)
* Fix for issue-3968 * Added contributor agreement * Made suggested changes
Configuration menu - View commit details
-
Copy full SHA for 13c226c - Browse repository at this point
Copy the full SHA 13c226cView commit details -
Configuration menu - View commit details
-
Copy full SHA for c023fd5 - Browse repository at this point
Copy the full SHA c023fd5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1df954e - Browse repository at this point
Copy the full SHA 1df954eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9247f8c - Browse repository at this point
Copy the full SHA 9247f8cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4743761 - Browse repository at this point
Copy the full SHA 4743761View commit details -
Configuration menu - View commit details
-
Copy full SHA for 889c829 - Browse repository at this point
Copy the full SHA 889c829View commit details -
Update annotation docs for German
- minor formatting fixes - remove STTS tags not used in Tiger - update list of dependency relations to match tiger2dep
Configuration menu - View commit details
-
Copy full SHA for 4876843 - Browse repository at this point
Copy the full SHA 4876843View commit details -
Add regression test for explosion#4002
Test that the PhraseMatcher can match on overwritten NORM attributes.
Configuration menu - View commit details
-
Copy full SHA for 9d682dc - Browse repository at this point
Copy the full SHA 9d682dcView commit details -
Fix dependency copy for as_doc (explosion#3969)
* failing unit test for issue 3962 * attempt to fix Issue explosion#3962 * create artificial unit test example * using length instead of self.length * sp * reformat with black * find better ancestor within span and use generic 'dep' * attach to span.root if there is no appropriate ancestor * comment span text * clean up ancestor code * reconstruct dep tree to keep same number of sentences
Configuration menu - View commit details
-
Copy full SHA for c1a3be7 - Browse repository at this point
Copy the full SHA c1a3be7View commit details -
Remove old comment (explosion#4012)
Norwegian used to borrow from French but that doesn't appear to have been true for a while now, so the comment that was here is no longer relevant.
Configuration menu - View commit details
-
Copy full SHA for 8a87e22 - Browse repository at this point
Copy the full SHA 8a87e22View commit details -
Configuration menu - View commit details
-
Copy full SHA for e6daba9 - Browse repository at this point
Copy the full SHA e6daba9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4a75b5e - Browse repository at this point
Copy the full SHA 4a75b5eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 04695c0 - Browse repository at this point
Copy the full SHA 04695c0View commit details -
Configuration menu - View commit details
-
Copy full SHA for e22ce52 - Browse repository at this point
Copy the full SHA e22ce52View commit details -
Configuration menu - View commit details
-
Copy full SHA for c7531f9 - Browse repository at this point
Copy the full SHA c7531f9View commit details -
Configuration menu - View commit details
-
Copy full SHA for d0b2d45 - Browse repository at this point
Copy the full SHA d0b2d45View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2950ee2 - Browse repository at this point
Copy the full SHA 2950ee2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5a95412 - Browse repository at this point
Copy the full SHA 5a95412View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d9ca8d - Browse repository at this point
Copy the full SHA 2d9ca8dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8301291 - Browse repository at this point
Copy the full SHA 8301291View commit details -
Configuration menu - View commit details
-
Copy full SHA for 35ab66b - Browse repository at this point
Copy the full SHA 35ab66bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3cf346d - Browse repository at this point
Copy the full SHA 3cf346dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 875638b - Browse repository at this point
Copy the full SHA 875638bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 14bf047 - Browse repository at this point
Copy the full SHA 14bf047View commit details -
Configuration menu - View commit details
-
Copy full SHA for db8054a - Browse repository at this point
Copy the full SHA db8054aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8755995 - Browse repository at this point
Copy the full SHA 8755995View commit details -
Configuration menu - View commit details
-
Copy full SHA for bbdc7d7 - Browse repository at this point
Copy the full SHA bbdc7d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1d2fb40 - Browse repository at this point
Copy the full SHA 1d2fb40View commit details -
Configuration menu - View commit details
-
Copy full SHA for e983525 - Browse repository at this point
Copy the full SHA e983525View commit details -
Configuration menu - View commit details
-
Copy full SHA for 513de82 - Browse repository at this point
Copy the full SHA 513de82View commit details -
Configuration menu - View commit details
-
Copy full SHA for fa6a940 - Browse repository at this point
Copy the full SHA fa6a940View commit details -
Configuration menu - View commit details
-
Copy full SHA for 347467c - Browse repository at this point
Copy the full SHA 347467cView commit details -
Configuration menu - View commit details
-
Copy full SHA for ea29be6 - Browse repository at this point
Copy the full SHA ea29be6View commit details -
💫 Improve error message when model.from_bytes() dies (explosion#4014)
* Improve error message when model.from_bytes() dies When Thinc's model.from_bytes() is called with a mismatched model, often we get a particularly ungraceful error, e.g. "AttributeError: FunctionLayer has no attribute G" This is because we're trying to load the parameters for something like a LayerNorm layer, and the model architecture has some other layer there instead. This is obviously terrible, especially since the error *type* is wrong. I've changed it to raise a ValueError. The error message is still probably a bit terse, but it's hard to be sure exactly what's gone wrong. * Update spacy/pipeline/pipes.pyx * Update spacy/pipeline/pipes.pyx * Update spacy/pipeline/pipes.pyx * Update spacy/syntax/nn_parser.pyx * Update spacy/syntax/nn_parser.pyx * Update spacy/pipeline/pipes.pyx Co-Authored-By: Matthew Honnibal <[email protected]> * Update spacy/pipeline/pipes.pyx Co-Authored-By: Matthew Honnibal <[email protected]> Co-authored-by: Ines Montani <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d44e263 - Browse repository at this point
Copy the full SHA d44e263View commit details -
Update GoldParse attributes in API docs (explosion#4023)
* add `words` * update name of entity list to `ner` I think it might be a bit more consistent to have `ner` named `entities` or `ents` (and `ents` is actually set somewhere to `None`, which is a bit confusing), but it looks like renaming it would be a non-trivial decision.
Configuration menu - View commit details
-
Copy full SHA for 67ee9d5 - Browse repository at this point
Copy the full SHA 67ee9d5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9bc0ec1 - Browse repository at this point
Copy the full SHA 9bc0ec1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8bb0eab - Browse repository at this point
Copy the full SHA 8bb0eabView commit details -
Configuration menu - View commit details
-
Copy full SHA for 15b0b35 - Browse repository at this point
Copy the full SHA 15b0b35View commit details -
Configuration menu - View commit details
-
Copy full SHA for bbd1dda - Browse repository at this point
Copy the full SHA bbd1ddaView commit details -
Configuration menu - View commit details
-
Copy full SHA for dc78392 - Browse repository at this point
Copy the full SHA dc78392View commit details -
Configuration menu - View commit details
-
Copy full SHA for 53feaa3 - Browse repository at this point
Copy the full SHA 53feaa3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 433117b - Browse repository at this point
Copy the full SHA 433117bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 02bd3ca - Browse repository at this point
Copy the full SHA 02bd3caView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8010c0b - Browse repository at this point
Copy the full SHA 8010c0bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 269186c - Browse repository at this point
Copy the full SHA 269186cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 577658d - Browse repository at this point
Copy the full SHA 577658dView commit details -
💫 Support simple training format in nlp.evaluate and add tests (explo…
…sion#4033) * Support simple training format in nlp.evaluate and add tests * Update docs [ci skip]
Configuration menu - View commit details
-
Copy full SHA for 889cd8e - Browse repository at this point
Copy the full SHA 889cd8eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 17e1750 - Browse repository at this point
Copy the full SHA 17e1750View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7301006 - Browse repository at this point
Copy the full SHA 7301006View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a90d04 - Browse repository at this point
Copy the full SHA 1a90d04View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2851839 - Browse repository at this point
Copy the full SHA 2851839View commit details -
Resolve edge case when calling textcat.predict with empty doc (explos…
…ion#4035) * resolve edge case where no doc has tokens when calling textcat.predict * more explicit value test
Configuration menu - View commit details
-
Copy full SHA for 85e384e - Browse repository at this point
Copy the full SHA 85e384eView commit details -
Correct typo for AllenAI url on homepage (explosion#4050)
* Typo fix for AllenAI url Changed incorrect home page url for AllenAI from appenai.org to allenai.org * Sign contributor agreement * Change date format
Configuration menu - View commit details
-
Copy full SHA for 97369a9 - Browse repository at this point
Copy the full SHA 97369a9View commit details -
Corrected imported fucntion (explosion#4062)
The example showed an incorrected import
Configuration menu - View commit details
-
Copy full SHA for 317407b - Browse repository at this point
Copy the full SHA 317407bView commit details -
Add links to tokenizer API docs to refer relevant information. (explo…
…sion#4064) * Add links to tokenizer API docs to refer relevant information. * Add suggested changes Co-Authored-By: Ines Montani <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ff7e06f - Browse repository at this point
Copy the full SHA ff7e06fView commit details -
ensure the lang of vocab and nlp stay consistent (explosion#4057)
* ensure the language of vocab and nlp stay consistent across serialization * equality with =
Configuration menu - View commit details
-
Copy full SHA for 447585c - Browse repository at this point
Copy the full SHA 447585cView commit details -
Improve NER per type scoring (explosion#4052)
* Improve NER per type scoring * include all gold labels in per type scoring, not only when recall > 0 * improve efficiency of per type scoring * Create Scorer tests, initially with NER tests * move regression test explosion#3968 (per type NER scoring) to Scorer tests * add new test for per type NER scoring with imperfect P/R/F and per type P/R/F including a case where R == 0.0
Configuration menu - View commit details
-
Copy full SHA for c9ca6e6 - Browse repository at this point
Copy the full SHA c9ca6e6View commit details -
Configuration menu - View commit details
-
Copy full SHA for f2792fd - Browse repository at this point
Copy the full SHA f2792fdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3000b96 - Browse repository at this point
Copy the full SHA 3000b96View commit details -
Configuration menu - View commit details
-
Copy full SHA for abbcf26 - Browse repository at this point
Copy the full SHA abbcf26View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6756c38 - Browse repository at this point
Copy the full SHA 6756c38View commit details -
Configuration menu - View commit details
-
Copy full SHA for ca4eeff - Browse repository at this point
Copy the full SHA ca4eeffView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6834710 - Browse repository at this point
Copy the full SHA 6834710View commit details -
Configuration menu - View commit details
-
Copy full SHA for d5ed25c - Browse repository at this point
Copy the full SHA d5ed25cView commit details -
Update gold corpus code to properly ingest a directory of jsonl… (exp…
…losion#4067) * Update gold corpus code to properly ingest a directory of jsonlines files In response to: explosion#3975 * Update spacy/gold.pyx Co-Authored-By: Ines Montani <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9e9a7fc - Browse repository at this point
Copy the full SHA 9e9a7fcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8dcb6b4 - Browse repository at this point
Copy the full SHA 8dcb6b4View commit details -
Fix handling of kwargs in Language.evaluate
Makes it consistent with other methods
Configuration menu - View commit details
-
Copy full SHA for 5c32fe0 - Browse repository at this point
Copy the full SHA 5c32fe0View commit details -
Fixed syntax error in lang/ko when using python 2 (explosion#4082) (c…
…loses explosion#4068) * fixed syntax error in declaring variables with python 2.7 in spacy/lang/ko/__init__.py * fixed syntax error in declaring variables with python 2.7 in spacy/lang/ko/__init__.py * Update __init__.py * Create veer-bains.md * Update __init__.py fixed syntax errors in variable datatype assignment when calling spacy.blank("ko") with python 2.7
Configuration menu - View commit details
-
Copy full SHA for 32d1dac - Browse repository at this point
Copy the full SHA 32d1dacView commit details -
Configuration menu - View commit details
-
Copy full SHA for 487fe79 - Browse repository at this point
Copy the full SHA 487fe79View commit details -
Stopwords for Serbian language. (explosion#4078)
* Serbian stopwords added. (cyrillic alphabet) * spaCy Contribution agreement included. * Test initialize updated
Configuration menu - View commit details
-
Copy full SHA for 72e9b40 - Browse repository at this point
Copy the full SHA 72e9b40View commit details -
Configuration menu - View commit details
-
Copy full SHA for d155e22 - Browse repository at this point
Copy the full SHA d155e22View commit details -
💫 Sync branches (explosion#4084) [ci skip]
* Update from master * Re-added Universe readme (explosion#3688) (closes explosion#3680) * Fix typo * Add version tag to `--base-model` argument (closes explosion#3720) * fixing regex matcher examples (explosion#3708) (explosion#3719) * Improve Token.prob and Lexeme.prob docs (resolves explosion#3701) * Fix DependencyParser.predict docs (resolves explosion#3561) * Update languages.json Co-authored-by: Bram Vanroy <[email protected]> Co-authored-by: Aaron Kub <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 80879c9 - Browse repository at this point
Copy the full SHA 80879c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for e5a25ee - Browse repository at this point
Copy the full SHA e5a25eeView commit details -
Raise error if annotation dict in simple training style has unexpecte…
…d keys explosion#4074 (explosion#4079) * adding enhancement explosion#4074. * modified behavior to strictly require top level dictionary keys - issue explosion#4074 * pass expected keys to error message and add links as expected top level key
Configuration menu - View commit details
-
Copy full SHA for 276b576 - Browse repository at this point
Copy the full SHA 276b576View commit details -
Configuration menu - View commit details
-
Copy full SHA for 16db514 - Browse repository at this point
Copy the full SHA 16db514View commit details -
Configuration menu - View commit details
-
Copy full SHA for 427d18b - Browse repository at this point
Copy the full SHA 427d18bView commit details -
Configuration menu - View commit details
-
Copy full SHA for d63cc30 - Browse repository at this point
Copy the full SHA d63cc30View commit details -
Add validate option to EntityRuler (explosion#4089)
* Add validate option to EntityRuler * Add validate to EntityRuler, passed to Matcher and PhraseMatcher * Add validate to usage and API docs * Update website/docs/usage/rule-based-matching.md Co-Authored-By: Ines Montani <[email protected]> * Update website/docs/usage/rule-based-matching.md Co-Authored-By: Ines Montani <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9a7632a - Browse repository at this point
Copy the full SHA 9a7632aView commit details -
Configuration menu - View commit details
-
Copy full SHA for b622240 - Browse repository at this point
Copy the full SHA b622240View commit details -
Configuration menu - View commit details
-
Copy full SHA for f2f0f56 - Browse repository at this point
Copy the full SHA f2f0f56View commit details -
Configuration menu - View commit details
-
Copy full SHA for d245a21 - Browse repository at this point
Copy the full SHA d245a21View commit details -
Configuration menu - View commit details
-
Copy full SHA for f932900 - Browse repository at this point
Copy the full SHA f932900View commit details -
Configuration menu - View commit details
-
Copy full SHA for f4a1311 - Browse repository at this point
Copy the full SHA f4a1311View commit details -
Configuration menu - View commit details
-
Copy full SHA for 480b7c5 - Browse repository at this point
Copy the full SHA 480b7c5View commit details -
Configuration menu - View commit details
-
Copy full SHA for b117a1d - Browse repository at this point
Copy the full SHA b117a1dView commit details -
Update lemma and vector information after splitting a token (explosio…
…n#4097) * fixing vector and lemma attributes after retokenizer.split * fixing unit test with mockup tensor * xp instead of numpy
Configuration menu - View commit details
-
Copy full SHA for 0f28f62 - Browse repository at this point
Copy the full SHA 0f28f62View commit details -
Add entry for Blackstone in universe.json (explosion#4101)
* Add entry for Blackstone in universe.json Add an entry for the Blackstone project. Checked JSON is valid. * Create ICLRandD.md * Fix indentation (tabs to spaces) It looks like during validation, the JSON file automatically changed spaces to tabs. This caused the diff to show *everything* as changed, which is obviously not true. This hopefully fixes that. * Try to fix formatting for diff * Fix diff Co-authored-by: Ines Montani <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 97c8308 - Browse repository at this point
Copy the full SHA 97c8308View commit details -
Configuration menu - View commit details
-
Copy full SHA for b95f839 - Browse repository at this point
Copy the full SHA b95f839View commit details -
Configuration menu - View commit details
-
Copy full SHA for 855544b - Browse repository at this point
Copy the full SHA 855544bView commit details -
update lang/zh (explosion#4103)
* update lang/zh * update lang/zh
Configuration menu - View commit details
-
Copy full SHA for 97ce4fe - Browse repository at this point
Copy the full SHA 97ce4feView commit details -
Configuration menu - View commit details
-
Copy full SHA for 53a304c - Browse repository at this point
Copy the full SHA 53a304cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 138a5c9 - Browse repository at this point
Copy the full SHA 138a5c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8971aa1 - Browse repository at this point
Copy the full SHA 8971aa1View commit details -
CLI scripts for entity linking (wikipedia & generic) (explosion#4091)
* document token ent_kb_id * document span kb_id * update pipeline documentation * prior and context weights as bool's instead * entitylinker api documentation * drop for both models * finish entitylinker documentation * small fixes * documentation for KB * candidate documentation * links to api pages in code * small fix * frequency examples as counts for consistency * consistent documentation about tensors returned by predict * add entity linking to usage 101 * add entity linking infobox and KB section to 101 * entity-linking in linguistic features * small typo corrections * training example and docs for entity_linker * predefined nlp and kb * revert back to similarity encodings for simplicity (for now) * set prior probabilities to 0 when excluded * code clean up * bugfix: deleting kb ID from tokens when entities were removed * refactor train el example to use either model or vocab * pretrain_kb example for example kb generation * add to training docs for KB + EL example scripts * small fixes * error numbering * ensure the language of vocab and nlp stay consistent across serialization * equality with = * avoid conflict in errors file * add error 151 * final adjustements to the train scripts - consistency * update of goldparse documentation * small corrections * push commit * turn kb_creator into CLI script (wip) * proper parameters for training entity vectors * wikidata pipeline split up into two executable scripts * remove context_width * move wikidata scripts in bin directory, remove old dummy script * refine KB script with logs and preprocessing options * small edits * small improvements to logging of EL CLI script
Configuration menu - View commit details
-
Copy full SHA for ffd89df - Browse repository at this point
Copy the full SHA ffd89dfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a74cb0 - Browse repository at this point
Copy the full SHA 1a74cb0View commit details -
Correction of default lemmatizer lookup in English (Issue # 4104) (ex…
…plosion#4110) * pytest file for issue4104 established * edited default lookup english lemmatizer for spun; fixes issue 4102 * eliminated parameterization and sorted dictionary dependnency in issue 4104 test * added contributor agreement
Configuration menu - View commit details
-
Copy full SHA for ece8b77 - Browse repository at this point
Copy the full SHA ece8b77View commit details -
Configuration menu - View commit details
-
Copy full SHA for 59999e1 - Browse repository at this point
Copy the full SHA 59999e1View commit details -
Update to match latest explosion/srsly#9
The way gzipped json is loaded/saved in srsly changed a bit.
Configuration menu - View commit details
-
Copy full SHA for 2b4227a - Browse repository at this point
Copy the full SHA 2b4227aView commit details -
Only compress language data if necessary
If a .json.gz file exists and is newer than the corresponding json file, it's not recompressed.
Configuration menu - View commit details
-
Copy full SHA for 00e6420 - Browse repository at this point
Copy the full SHA 00e6420View commit details -
Move en/el language data to json
This only affected files >500kb, which was nouns for both languages and the generic lookup table for English.
Configuration menu - View commit details
-
Copy full SHA for a322fc1 - Browse repository at this point
Copy the full SHA a322fc1View commit details -
Remove empty files in Norwegian tokenizer
It's unclear why, but the Norwegian (nb) tokenizer had empty files for adj/adv/noun/verb lemmas. This may have been a result of copying the structure of the English lemmatizer. This removed the files, but still creates the empty sets in the lemmatizer. That may not actually be necessary.
Configuration menu - View commit details
-
Copy full SHA for f5256c2 - Browse repository at this point
Copy the full SHA f5256c2View commit details -
Remove dubious entries in English lookup.json
" furthest" and " skilled" - both prefixed with a space - were in the English lookup table. That seems obviously wrong so I have removed them.
Configuration menu - View commit details
-
Copy full SHA for ee9609a - Browse repository at this point
Copy the full SHA ee9609aView commit details -
Fix small issues with en/fr lemmatizers
The en tokenizer was including the removed _nouns.py file, so that's removed. The fr tokenizer is unusual in that it has a lemmatizer directory with both __init__.py and lemmatizer.py. lemmatizer.py had not been converted to load the json language data, so that was fixed.
Configuration menu - View commit details
-
Copy full SHA for f7204a9 - Browse repository at this point
Copy the full SHA f7204a9View commit details