Skip to content
Change the repository type filter

All

    Repositories list

    • newswire

      Public
      Python
      0600Updated Aug 15, 2024Aug 15, 2024
    • A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning
      Python
      GNU General Public License v3.0
      1010541Updated Jun 12, 2024Jun 12, 2024
    • Efficient OCR for Building a Diverse Digital History
      Python
      Apache License 2.0
      0500Updated Apr 12, 2024Apr 12, 2024
    • Python package for News Deja Vu
      Python
      MIT License
      0400Updated Apr 9, 2024Apr 9, 2024
    • The official Github for the American Stories dataset as in {link}
      Python
      810760Updated Mar 7, 2024Mar 7, 2024
    • Quantifying Character Similarity with Vision Transformers
      Python
      0500Updated Oct 27, 2023Oct 27, 2023
    • An efficient and useful tool to fuzzy match Japanese, Korean, Simplified Chinese or Traditional Chinese words.
      Python
      MIT License
      1200Updated Oct 13, 2023Oct 13, 2023
    • Associating layout elements from newspapers into full articles
      0100Updated Sep 15, 2023Sep 15, 2023
    • DPR

      Public
      Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
      Python
      Other
      301100Updated Aug 15, 2023Aug 15, 2023
    • Python
      0000Updated Aug 6, 2023Aug 6, 2023
    • clippings

      Public
      The official implementation (English) of the paper "Linking Representations with Multimodal Contrastive Learning" : https://arxiv.org/abs/2304.03464
      Python
      0200Updated Jun 20, 2023Jun 20, 2023
    • effsynth

      Public
      Python
      1600Updated Jun 18, 2023Jun 18, 2023
    • NEWS-COPY

      Public
      Noise-robust de-duplication at scale
      Python
      01520Updated Apr 9, 2023Apr 9, 2023
    • effocr

      Public
      A model(ing framework) for sample efficient OCR
      Python
      55130Updated Apr 7, 2023Apr 7, 2023
    • The official implementation (Japanese) of the paper "Linking Representations with Multimodal Contrastive Learning" : https://arxiv.org/abs/2304.03464
      Python
      0400Updated Apr 4, 2023Apr 4, 2023
    • nnsplit

      Public
      Semantic text segmentation. For sentence boundary detection, compound splitting and more.
      Rust
      MIT License
      40000Updated Mar 5, 2023Mar 5, 2023
    • Python
      0000Updated Feb 23, 2023Feb 23, 2023
    • BertGCN

      Public
      Python
      80100Updated Aug 26, 2022Aug 26, 2022
    • HJDataset

      Public
      A Large Dataset of Historical Japanese Documents with Complex Layouts
      Jupyter Notebook
      43001Updated Jul 22, 2022Jul 22, 2022
    • Applies Needleman-Wunsch algorithm to sequences of strings using Levenshtein distance as a scoring metric.
      Jupyter Notebook
      0301Updated Jun 22, 2022Jun 22, 2022
    • OpenMMLab Detection Toolbox and Benchmark
      Python
      Apache License 2.0
      9.4k000Updated Oct 30, 2021Oct 30, 2021
    • EasyOCR

      Public
      Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
      Python
      Apache License 2.0
      3.1k200Updated Aug 20, 2021Aug 20, 2021
    • Jupyter Notebook
      MIT License
      1.1k100Updated Jun 23, 2021Jun 23, 2021
    • cocosplit

      Public
      Simple tool to split COCO annotations into train/test datasets.
      Python
      93001Updated Dec 6, 2020Dec 6, 2020
    • Label Studio is a multi-type data labeling and annotation tool with standardized output format
      JavaScript
      Apache License 2.0
      2.4k000Updated Sep 11, 2020Sep 11, 2020
    • Repo for the cutter incident project
      Stata
      0000Updated Jul 8, 2020Jul 8, 2020
    • A fork of Detectron2 with ResNeSt backbone
      Python
      Apache License 2.0
      7.5k000Updated Jun 6, 2020Jun 6, 2020
    • Scrape reviews from Glassdoor
      Python
      BSD 2-Clause "Simplified" License
      252000Updated Jun 1, 2020Jun 1, 2020
    • js-dataverse

      Public archive
      A JavaScript/TypeScript module for Dataverse
      TypeScript
      MIT License
      0010Updated Sep 16, 2019Sep 16, 2019