pruna  by PrunaAI

Model optimization framework for faster, smaller, cheaper, greener AI

created 4 months ago
791 stars

Top 45.3% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Pruna is an open-source model optimization framework designed to accelerate, reduce the size, and lower the computational cost of AI models for developers. It supports a wide range of model types, including LLMs and diffusion models, offering a simplified API to integrate various compression techniques.

How It Works

Pruna employs a modular approach, allowing users to combine multiple optimization algorithms such as caching, quantization, pruning, distillation, and compilation. This flexibility enables tailored optimization strategies to achieve specific performance goals, like reducing latency with stable_fast compilation or model size with HQQ quantization. The framework aims for minimal code changes, abstracting complex optimization processes into a few lines of Python.

Quick Start & Requirements

  • Installation: pip install pruna
  • Prerequisites: Python 3.9+, optional CUDA toolkit for GPU acceleration.
  • Documentation: Pruna documentation
  • Website: Pruna.ai

Highlighted Details

  • Supports optimization for LLMs, Diffusion, Flow Matching, Vision Transformers, and Speech Recognition models.
  • Offers a suite of compression algorithms including caching (DeepCache, Adaptive Caching), quantization (AWQ, GPTQ, HQQ), pruning, distillation, and compilation (stable_fast, torch.compile).
  • Includes an evaluation interface to measure model performance and fidelity.
  • Pruna Pro offers proprietary algorithms like Auto Caching and advanced features for enterprise use.

Maintenance & Community

  • Active development indicated by GitHub Actions status and commit activity.
  • Community support available via Discord.
  • Huggingface and Replicate presence.

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

  • Some algorithms may have operating system restrictions.
  • Telemetry is enabled by default, requiring explicit opt-out if desired.
  • Pruna Pro offers advanced features not available in the open-source version.
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
61
Issues (30d)
18
Star History
137 stars in the last 90 days

Explore Similar Projects

Starred by Logan Kilpatrick Logan Kilpatrick(Product Lead on Google AI Studio), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

catalyst by catalyst-team

0%
3k
PyTorch framework for accelerated deep learning R&D
created 7 years ago
updated 1 month ago
Feedback? Help us improve.