Skip to content

Latest commit

 

History

History
109 lines (87 loc) · 4.13 KB

README.md

File metadata and controls

109 lines (87 loc) · 4.13 KB

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

Liang Zhang*, Anwen Hu*, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin†, Ji Zhang, Fei Huang

* Equal Contribution † Corresponding Author


image

Spotlights

  • Support chart question answering with both simple direct answers and step-by-step Python programs.
  • Support chart-to-table extraction, chart summary generation, and chart redrawing.
  • Opensource:
    • ✅ Model: TinyChart
    • ✅ Inference code.
    • ✅ Code of launching a local demo.
    • ✅ Online demo on HuggingFace.
    • ✅ Evaluation code.
    • ✅ Training data and code.

Examples

image

Online Demo

🤗 Huggingface Space

Models

Model Card

Model Download Link
TinyChart@768 🤗 mPLUG/TinyChart-3B-768
🤖 iic/TinyChart-3B-768
TinyChart@768-SigLIP 🤗 mPLUG/TinyChart-3B-768-siglip
🤖 iic/TinyChart-3B-768-siglip

Note that to use TinyChart@768, you should load the vision transformer with token merging from TinyChart@768-SigLIP. If you download the model into local directory, you should change mm_vision_tower in config.json of TinyChart-3B-768 to make sure it can find TinyChart-3B-768-siglip.

Quick Start

You can load the model with the following code.

from tinychart.model.builder import load_pretrained_model

model_path = "mPLUG/TinyChart-3B-768"
tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path, 
    model_base=None,
    model_name=get_model_name_from_path(model_path),
    device="cuda"
)

Model Inference

We provide an example script to perform inference in inference.ipynb.

Model Training & Evaluation

Data preparation

The training and evaluation data of TinyChart is released at 🤗 mPLUG/TinyChartData. Samples with id contains tempatepot and gptpot are the two subsets of the proposed ChartQA-PoT dataset. To perform training and evaluation, you should download and organize the data directory as follows:

data
├── tinychart_images
├── train.json
├── test.json

Then download bczhou/TinyLLaVA-3.1B-SigLIP into pretrained_models, and run this script to add arguments about token merging. Note that this script will change the config.json of the model inplace, please backup in advance.

python scripts/vit_add_tome.py --path pretrained_models/TinyLLaVA-3.1B-SigLIP

After that, run the following scripts to start training. It will automatically load the last checkpoint to perform evaluation.

bash scripts/train.sh

Local Demo

You can run a local demo with the following scrit:

python app.py --model-path <your_model_path>

Citation

If you find this work useful, consider giving this repository a star ⭐️ and citing 📝 our paper as follows:

@misc{zhang2024tinychart,
    title={TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning}, 
    author={Liang Zhang and Anwen Hu and Haiyang Xu and Ming Yan and Yichen Xu and Qin Jin and Ji Zhang and Fei Huang},
    year={2024},
    eprint={2404.16635},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgement

The code is based on the TinyLLaVA, LLaVA, and ToMe. Thanks for these great works and open-sourcing!