Model optimization framework for faster, smaller, cheaper, greener AI
Top 45.3% on sourcepulse
Pruna is an open-source model optimization framework designed to accelerate, reduce the size, and lower the computational cost of AI models for developers. It supports a wide range of model types, including LLMs and diffusion models, offering a simplified API to integrate various compression techniques.
How It Works
Pruna employs a modular approach, allowing users to combine multiple optimization algorithms such as caching, quantization, pruning, distillation, and compilation. This flexibility enables tailored optimization strategies to achieve specific performance goals, like reducing latency with stable_fast
compilation or model size with HQQ quantization. The framework aims for minimal code changes, abstracting complex optimization processes into a few lines of Python.
Quick Start & Requirements
pip install pruna
Highlighted Details
stable_fast
, torch.compile
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 day ago
1 day