pruna by PrunaAI

Model optimization framework for faster, smaller, cheaper, greener AI

Created 10 months ago

1,068 stars

Top 35.4% on SourcePulse

2 Experts Love This Project

shimmyshimmer

Cofounder of Unsloth

osanseviero

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

Pruna is an open-source model optimization framework designed to accelerate, reduce the size, and lower the computational cost of AI models for developers. It supports a wide range of model types, including LLMs and diffusion models, offering a simplified API to integrate various compression techniques.

How It Works

Pruna employs a modular approach, allowing users to combine multiple optimization algorithms such as caching, quantization, pruning, distillation, and compilation. This flexibility enables tailored optimization strategies to achieve specific performance goals, like reducing latency with stable_fast compilation or model size with HQQ quantization. The framework aims for minimal code changes, abstracting complex optimization processes into a few lines of Python.

Quick Start & Requirements

Installation: pip install pruna
Prerequisites: Python 3.9+, optional CUDA toolkit for GPU acceleration.
Documentation: Pruna documentation
Website: Pruna.ai

Highlighted Details

Supports optimization for LLMs, Diffusion, Flow Matching, Vision Transformers, and Speech Recognition models.
Offers a suite of compression algorithms including caching (DeepCache, Adaptive Caching), quantization (AWQ, GPTQ, HQQ), pruning, distillation, and compilation (stable_fast, torch.compile).
Includes an evaluation interface to measure model performance and fidelity.
Pruna Pro offers proprietary algorithms like Auto Caching and advanced features for enterprise use.

Maintenance & Community

Active development indicated by GitHub Actions status and commit activity.
Community support available via Discord.
Huggingface and Replicate presence.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

Some algorithms may have operating system restrictions.
Telemetry is enabled by default, requiring explicit opt-out if desired.
Pruna Pro offers advanced features not available in the open-source version.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

48

Issues (30d)

6

Star History

23 stars in the last 30 days

Explore Similar Projects

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs),

Travis Fischer

Travis Fischer(Founder of Agentic), and

3 more.

LLM-Reading-List by evanmiller

LLM paper list focused on efficient inference and compression

Created 2 years ago

Updated 2 years ago

EET by NetEase-FuXi

PyTorch plugin for efficient Transformer-based model inference

Created 4 years ago

Updated 1 year ago

Starred by

Alberto Taiuti

Alberto Taiuti(Cofounder of Luma AI),

Julien Chaumond

Julien Chaumond(Cofounder of Hugging Face), and

3 more.

exporters by huggingface

Tool to export Hugging Face models to Core ML

Created 3 years ago

Updated 1 year ago

awesome-ml-model-compression by cedrickchee

ML model compression resource list

Created 7 years ago

Updated 1 year ago

Starred by

Alex Chen

Alex Chen(Cofounder of Nexa AI),

Zack Li

Zack Li(Cofounder of Nexa AI), and

1 more.

deepcompressor by nunchaku-tech

Model compression toolbox for LLMs and diffusion models

Created 1 year ago

Updated 5 months ago

Starred by

Luca Antiga

Luca Antiga(CTO of Lightning AI),

William Falcon

William Falcon(Founder of Lightning AI), and

4 more.

lightning-thunder by Lightning-AI

PyTorch compiler for model optimization via source-to-source transformation

Created 1 year ago

Updated 1 day ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI),

Woosuk Kwon

Woosuk Kwon(Coauthor of vLLM), and

1 more.

Model-Optimizer by NVIDIA

Library for optimizing deep learning models for GPU inference

Created 1 year ago

Updated 20 hours ago

Starred by

Alex Chen

Alex Chen(Cofounder of Nexa AI),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

1 more.

Olive by microsoft

AI model optimization toolkit for ONNX Runtime

Created 6 years ago

Updated 1 day ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

3 more.

neural-compressor by intel

Python library for model compression (quantization, pruning, distillation, NAS)

Created 5 years ago

Updated 11 hours ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Dan Guido

Dan Guido(Cofounder of Trail of Bits), and

6 more.

llm-compressor by vllm-project

Transformers-compatible library for LLM compression, optimized for vLLM deployment

Created 1 year ago

Updated 13 hours ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Kaichao You

Kaichao You(Core Maintainer of vLLM), and

15 more.

AutoGPTQ by AutoGPTQ

LLM quantization package using GPTQ algorithm

Created 2 years ago

Updated 9 months ago

llm-action by liguodongiot

LLM resource for techniques and deployment

Created 2 years ago

Updated 1 week ago

Feedback? Help us improve.