Opendit : OpenDiT: A simple, fast, and efficient DiT training and inference system.

Opendit

AI model training and inference AI image generation #DiT #Training #Inference #Text-to-video #Text-to-image Standard Picks Open Source

Overview :

OpenDiT is an open-source project providing a high-performance implementation of Diffusion Transformer (DiT) based on Colossal-AI. It is designed to enhance the training and inference efficiency of DiT applications, including text-to-video and text-to-image generation. OpenDiT achieves performance improvements through the following technologies: * GPU acceleration up to 80% and 50% memory reduction; * Core optimizations including FlashAttention, Fused AdaLN, and Fused layernorm; * Mixed parallelism methods such as ZeRO, Gemini, and DDP, along with model sharding for ema models to further reduce memory costs; * FastSeq: A novel sequence parallelism method particularly suitable for workloads like DiT, where activations are large but parameters are small. Single-node sequence parallelism can save up to 48% in communication costs and break through the memory limit of a single GPU, reducing overall training and inference time; * Significant performance improvements can be achieved with minimal code modifications; * Users do not need to understand the implementation details of distributed training; * Complete text-to-image and text-to-video generation workflows; * Researchers and engineers can easily use and adapt our workflows to real-world applications without modifying the parallelism part; * Training on ImageNet for text-to-image generation and releasing checkpoints.

Target Users :

Used to enhance the training and inference efficiency of DiT applications, including text-to-video and text-to-image generation.

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 131.1K

Features

Fast and efficient DiT training and inference

FlashAttention, Fused AdaLN, and Fused layernorm core optimizations

ZeRO, Gemini, and DDP mixed parallelism methods

FastSeq: A novel sequence parallelism method

Complete text-to-image and text-to-video generation workflows