sparsify  by neuralmagic

ML model optimization for faster inference via sparsification

created 4 years ago
326 stars

Top 84.8% on sourcepulse

GitHubView on GitHub
Project Summary

Sparsify is an ML model optimization product designed to accelerate inference through pruning, quantization, and distillation. It targets ML engineers and researchers seeking to improve model performance without significant accuracy loss, offering both a web application and a CLI/API for managing and running optimization experiments.

How It Works

Sparsify applies state-of-the-art optimization techniques via three experiment types: One-Shot (post-training pruning), Sparse-Transfer (leveraging pre-sparsified models), and Training-Aware (sparsification during training). These methods aim to achieve significant speedups (3-12x) with minimal accuracy degradation. The system integrates with Sparsify Cloud for hyperparameter tuning and result comparison, and the CLI/API for local execution and workflow integration.

Quick Start & Requirements

  • Install: pip install sparsify-nightly
  • Prerequisites: Python 3.8/3.10, ONNX 1.5.0-1.12.0, ONNX opset 11+, manylinux compliant systems. Requires a GPU with CUDA + CuDNN (minimum 16GB VRAM recommended). Linux OS is required; Windows/macOS are not supported. A Neural Magic account is needed for API key authorization.
  • Resources: Minimum 128GB RAM, 4 CPU cores. Large models may require more RAM.
  • Docs: Quickstart Guide

Highlighted Details

  • Offers 3x-5x speedup with One-Shot, 5x-10x with Sparse-Transfer, and 6x-12x with Training-Aware experiments.
  • Supports CV and NLP use cases, with a current focus on LLMs.
  • Integrates with DeepSparse for optimized CPU inference.
  • Models must be in ONNX format for One-Shot, and PyTorch for Sparse-Transfer/Training-Aware.

Maintenance & Community

  • The project is currently in Alpha, with development paused for a new LLM-focused project. Non-LLM pathways (CV, NLP) will not receive further bug fixes or feature development.
  • Community support is available via Neural Magic Slack Channel and GitHub Issues.

Licensing & Compatibility

  • Licensed under the Apache License Version 2.0.
  • Compatible with commercial use.

Limitations & Caveats

  • Sparsify is in Alpha and not production-ready; APIs and UIs are subject to change.
  • Development focus has shifted to LLMs, with existing CV/NLP pathways no longer actively supported.
  • Requires specific hardware (GPU, CUDA) and OS (Linux), limiting broader adoption.
Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

Medusa by FasterDecoding

0.2%
3k
Framework for accelerating LLM generation using multiple decoding heads
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Wei-Lin Chiang Wei-Lin Chiang(Cofounder of LMArena), and
1 more.

deepsparse by neuralmagic

0%
3k
CPU inference runtime for sparse deep learning models
created 4 years ago
updated 2 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Sebastian Raschka Sebastian Raschka(Author of Build a Large Language Model From Scratch), and
2 more.

SimpleTuner by bghira

0.6%
2k
Fine-tuning kit for diffusion models
created 2 years ago
updated 3 days ago
Feedback? Help us improve.