Torchao : Native PyTorch quantization and sparsity training and inference library

Torchao

AI Development Assistant AI Model #PyTorch #Quantization #Sparsity #Model Optimization #Deep Learning Fresh Picks Open Source

Overview :

Torchao is a library for PyTorch focused on custom data types and optimization, supporting the quantization and sparsification of weights, gradients, optimizers, and activation functions for both inference and training. It is compatible with torch.compile() and FSDP2, enabling acceleration for most PyTorch models. Torchao aims to enhance model inference speed and memory efficiency while minimizing accuracy loss through techniques such as Quantization Aware Training (QAT) and Post Training Quantization (PTQ).

Target Users :

The target audience includes machine learning engineers, data scientists, and researchers who need to enhance model inference speed and reduce memory usage while maintaining model accuracy. Torchao assists users in optimizing their PyTorch models through a variety of quantization and sparsity techniques, making them suitable for resource-constrained environments or improving efficiency for large-scale deployments.

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 57.4K

Use Cases

Quantizing an image segmentation model with torchao increased inference speed by 9.5 times.

Using torchao's quantization-aware training technique significantly improved the accuracy and inference speed of a language model.

When performing inference for a diffusion model, utilizing torchao's sparsity techniques reduced the model's memory usage.

Features

Supports Post Training Quantization (PTQ) and Quantization Aware Training (QAT).

Offers quantization and sparsity options, including weight-only quantization, joint weight and activation quantization, and weight activation quantization with sparsification of weights.

Provides a developer API for custom quantization algorithms.

Includes KV cache quantization features to support inference with long context lengths.

Supports Float8 training using scaled Float8 data types.

Enables sparse training with 2:4 sparsity support.

Offers memory-efficient optimizers, such as 8-bit and 4-bit quantized AdamW optimizers.

Supports single-GPU CPU offloading to effectively reduce VRAM requirements.

How to Use

Install the torchao library.

Select the model that requires quantization.

Choose the appropriate quantization strategy based on the characteristics of the model.

Use the torchao API to quantize the model.

If necessary, perform quantization-aware training.