Table Transformer (TATR)

A deep learning model based on object detection for extracting tables from PDFs and images.

First proposed in "PubTables-1M: Towards comprehensive table extraction from unstructured documents".

This repository also contains the official code for these papers:

Note: If you are looking to use Table Transformer to extract your own tables, here are some helpful things to know:

TATR can be trained to work well across many document domains and everything needed to train your own model is included here. But at the moment pre-trained model weights are only available for TATR trained on the PubTables-1M dataset. (See the additional documentation for how to train your own multi-domain model.)
TATR is an object detection model that recognizes tables from image input. The inference code built on TATR needs text extraction (from OCR or directly from PDF) as a separate input in order to include text in its HTML or CSV output.

Additional information about this project for both users and researchers, including data, training, evaluation, and inference code is provided below.

Installation

Using pip

Run command

pip3 install -r requirements.txt

Using conda

Create a conda environment from the yml file and activate it as follows

conda env create -f environment.yml
conda activate tables-detr

Download Model

To download trained model weights on estatement document, run this command

bash download_models.sh

Inference

To run the inference, you can follow this command:

%cd src/
bash simple_inference.sh

Result example:

{
    "result": [
        {
            "table_bbox": [
                32,
                479,
                1206,
                1550
            ],
            "objects": [
                {
                    "label": "table column",
                    "score": 0.999552309513092,
                    "bbox": [
                        179,
                        482,
                        380,
                        1535
                    ]
                },
                {
                    "label": "table row",
                    "score": 0.9950479865074158,
                    "bbox": [
                        43,
                        614,
                        1195,
                        659
                    ]
                }
            ]
        }
    ]
}

Training

WIP

Name		Name	Last commit message	Last commit date
Latest commit History 200 Commits
.github/workflows		.github/workflows
detr		detr
docs		docs
samples		samples
scripts		scripts
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
README_ORI.md		README_ORI.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
download_models.sh		download_models.sh
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table Transformer (TATR)

Installation

Using pip

Using conda

Download Model

Inference

Training

About

Releases

Packages

Languages

License

sribuu/table-transformer

Folders and files

Latest commit

History

Repository files navigation

Table Transformer (TATR)

Installation

Using pip

Using conda

Download Model

Inference

Training

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages