ProLong

This is the homepage for ProLong (Princeton long-context language models).

ProLong is a family of long-context models that are continued trained and supervised fine-tuned from Llama-3-8B, with a maximum context window of 512K tokens. Our main ProLong model is one of the best-performing long-context models at the 10B scale (evaluated by HELMET).

To train this strong long-context model, we conduct thorough ablations on the long-context pre-training data, SFT data, and numerous other design choices. We demonstrate our findings in our paper, How to Train Long-Context Language Models (Effectively).

Authors: Tianyu Gao*, Alexander Wettig*, Howard Yen, Danqi Chen (* equal contribution)

Release Progress

ProLong models
ProLong data
Pre-training and SFT code
Sequence parallelism

Model card

Here are some quick facts about our main ProLong model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct.

Base model: meta-llama/Meta-Llama-3-8B-Instruct
Long-context continued training: 20B tokens on 64K training data, and 20B tokens on 512K training data
Supervised fine-tuning (SFT): UltraChat
Maximum context window: 512K tokens

ProLong performance on HELMET averaged over 32K, 64K, and 128K lengths. All models are instruct models.

Download the models and data

All ProLong models are available on Hugging Face. All the models are based on Llama-3-8B, so any code that supports Llama-3-8B is also compatible with ProLong models.

Model	HF Link
ProLong-64k-Base	princeton-nlp/Llama-3-8B-ProLong-64k-Base
ProLong-64k-Instruct	princeton-nlp/Llama-3-8B-ProLong-64k-Instruct
ProLong-512k-Base	princeton-nlp/Llama-3-8B-ProLong-512k-Base
⭐ ProLong-512k-Instruct	princeton-nlp/Llama-3-8B-ProLong-512k-Instruct

Our training data are also available on Hugging Face.

Data	HF Link
Stage 1: 64K training data	princeton-nlp/prolong-data-64K
Stage 2: 512K training data	princeton-nlp/prolong-data-512K

How to train ProLong

ProLong training recipe.

Coming soon!

Contact

Please email Tianyu ([email protected]) or Alex ([email protected]) if you have any questions. If you encounter any issues with the code, models, or data, please open an issue on GitHub.

Citation

@article{gao2024prolong,
    title={Enabling Large Language Models to Generate Text with Citations},
    author={Gao, Tianyu and Wettig, Alexander and Yen, Howard and Chen, Danqi},
    year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_512K.sh		train_512K.sh
train_64K.sh		train_64K.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProLong

Release Progress

Model card

Download the models and data

How to train ProLong

Contact

Citation

About

Contributors 2

Languages

License

princeton-nlp/ProLong

Folders and files

Latest commit

History

Repository files navigation

ProLong

Release Progress

Model card

Download the models and data

How to train ProLong

Contact

Citation

About

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages