Skip to content

Latest commit

 

History

History
67 lines (54 loc) · 2.54 KB

README.md

File metadata and controls

67 lines (54 loc) · 2.54 KB

HybridVocab: Towards Multi-Modal Machine Translation via Multi-Aspect Alignment [Paper]

Step1: Requirements

  • Build running environment (two ways)
  1. pip install --editable .  
  2. python setup.py build_ext --inplace
  • Install the syntax parser
  pip install Stanza 1.2.2 Stanza_batch 0.2.2
  • pytorch==1.7.0, torchvision==0.8.0, cudatoolkit=10.1 (pip install is also work)
  conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=10.1 -c pytorch 

Step2: Data Preparation

The dataset used in this work is Multi30K, both its original and preprocessed versions (that I used) are available at here.

You can download your own data set and then refer to experiments/prepare-iwslt14.sh or experiments/prepare-wmt14en2de.sh to pre-process the data set.

File Name Description Download
resnet50-avgpool.npy pre-extracted image features, each image is represented as a 2048-dimensional vector. Link
Multi30K EN-DE Task BPE+TOK text, Image Index, Label for English-German task (including train, val, test2016/17/mscoco) Link
Multi30K EN-FR Task BPE+TOK text, Image Index, Label for English-French task (including train, val, test2016/17/mscoco) Link

Step3: Running code

You can let this code works by run the scripts in the directory expriments.

  1. preprocess dataset into torch type

    bash pre.sh
  2. train model

    bash train.sh
  3. generate target sentence

    bash gen.sh

Citation

If you use the code in your research, please cite:

@inproceedings{peng2022hybridvocab,
    title={HybridVocab: Towards Multi-Modal Machine Translation via Multi-Aspect Alignment},
    author={Peng, Ru and Zeng, Yawen and Zhao, Junbo},
    booktitle={Proceedings of the 2022 International Conference on Multimedia Retrieval},
    pages={380--388},
    year={2022}
}