sparsify  by neuralmagic

ML model optimization for faster inference via sparsification

Created 4 years ago
326 stars

Top 83.5% on SourcePulse

GitHubView on GitHub
Project Summary

Sparsify is an ML model optimization product designed to accelerate inference through pruning, quantization, and distillation. It targets ML engineers and researchers seeking to improve model performance without significant accuracy loss, offering both a web application and a CLI/API for managing and running optimization experiments.

How It Works

Sparsify applies state-of-the-art optimization techniques via three experiment types: One-Shot (post-training pruning), Sparse-Transfer (leveraging pre-sparsified models), and Training-Aware (sparsification during training). These methods aim to achieve significant speedups (3-12x) with minimal accuracy degradation. The system integrates with Sparsify Cloud for hyperparameter tuning and result comparison, and the CLI/API for local execution and workflow integration.

Quick Start & Requirements

  • Install: pip install sparsify-nightly
  • Prerequisites: Python 3.8/3.10, ONNX 1.5.0-1.12.0, ONNX opset 11+, manylinux compliant systems. Requires a GPU with CUDA + CuDNN (minimum 16GB VRAM recommended). Linux OS is required; Windows/macOS are not supported. A Neural Magic account is needed for API key authorization.
  • Resources: Minimum 128GB RAM, 4 CPU cores. Large models may require more RAM.
  • Docs: Quickstart Guide

Highlighted Details

  • Offers 3x-5x speedup with One-Shot, 5x-10x with Sparse-Transfer, and 6x-12x with Training-Aware experiments.
  • Supports CV and NLP use cases, with a current focus on LLMs.
  • Integrates with DeepSparse for optimized CPU inference.
  • Models must be in ONNX format for One-Shot, and PyTorch for Sparse-Transfer/Training-Aware.

Maintenance & Community

  • The project is currently in Alpha, with development paused for a new LLM-focused project. Non-LLM pathways (CV, NLP) will not receive further bug fixes or feature development.
  • Community support is available via Neural Magic Slack Channel and GitHub Issues.

Licensing & Compatibility

  • Licensed under the Apache License Version 2.0.
  • Compatible with commercial use.

Limitations & Caveats

  • Sparsify is in Alpha and not production-ready; APIs and UIs are subject to change.
  • Development focus has shifted to LLMs, with existing CV/NLP pathways no longer actively supported.
  • Requires specific hardware (GPU, CUDA) and OS (Linux), limiting broader adoption.
Health Check
Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wei-Lin Chiang Wei-Lin Chiang(Cofounder of LMArena), and
3 more.

sparseml by neuralmagic

0.1%
2k
Sparsification toolkit for optimized neural networks
Created 4 years ago
Updated 3 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Casper Hansen Casper Hansen(Author of AutoAWQ), and
3 more.

deepsparse by neuralmagic

0%
3k
CPU inference runtime for sparse deep learning models
Created 4 years ago
Updated 3 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
26 more.

ColossalAI by hpcaitech

0.1%
41k
AI system for large-scale parallel training
Created 3 years ago
Updated 16 hours ago
Feedback? Help us improve.