Skip to content

LAION-AI/OCR-ensemble

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR-ensemble

Some results

https://colab.research.google.com/drive/1hKu8q2SH80baCj-0IRBb9rLDSgBaU1w7#scrollTo=C9v0iNYVJO6Y

Installation

Follow these steps to set up the environment and install the required dependencies using conda.

Prerequisites

  • Python 3.9
  • PyTorch (GPU version)
  • PaddleOCR

Installing Dependencies

  1. Clone the repository:
git clone [email protected]:LAION-AI/OCR-ensemble.git
cd OCR-ensemble
  1. Create a conda virtual environment (optional, but recommended):
conda create -n your-env-name python=3.9
conda activate your-env-name
  1. Install PyTorch (GPU version) by following the instructions on the official website. Make sure to choose the conda-based installation for your system.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
  1. Install paddlepaddle by following the instructions on the official GitHub repository. In order to install the GPU version, this might be helpful:

Linux

python -m pip install paddlepaddle-gpu -i https://pypi.tuna.tsinghua.edu.cn/simple

Windows

python -m pip install paddlepaddle-gpu==2.4.2.post117 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
  1. Install the remaining required packages from the requirements.txt file:
pip install -r requirements.txt

Overview

  1. Classify document for type of text
  2. Use expert from ensemble of existing OCR + layout parsing models to get text+bboxes of text, —> concant that to caption
  3. If there is no original caption like for screenshots of websites and books, just make a caption, concat that with OCR results
  4. Use this data set to train clip with character level tokenization

Now we are working on Step 2.

Pipeline: 2 Passes

  1. Classify images to determine text types
  2. Expert models process filtered images

Candidate Expert Models

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published