Easycontext : EasyContext demonstrates how to leverage existing technologies to train language models with 700K and 1M context lengths.

Easycontext

AI Model #Language Models #Context Length #Memory Optimization #Deep Learning Standard Picks Open Source

Overview :

EasyContext is an open-source project aimed at enabling the training of language models with a 1 million-word context length using ordinary hardware. It primarily utilizes techniques such as sequence parallelism, DeepSpeed Zero3 offloading, Flash Attention, and activation checkpointing. Rather than proposing novel innovations, the project showcases how to combine existing tools to achieve this goal. It has successfully trained two models, Llama-2-7B and Llama-2-13B, achieving 700K and 1M word context lengths respectively on 8 A100 and 16 A100 GPUs.

Target Users :

For training language models with extra-long contexts

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 51.6K

Use Cases

Training the Llama-2-7B model on 8 A100 GPUs using EasyContext, achieving a context length of 700K words.

Training the Llama-2-13B model on 16 A100 GPUs using EasyContext, achieving a context length of 1M words.

Combining existing technologies, EasyContext significantly enhances the context length of language models, laying the foundation for applications such as video generation.

Features

Sequence Parallelism

DeepSpeed Zero3 Offloading

Flash Attention and Fusion Cross-Entropy Core

Activation Checkpoint