nyuntam  by nyunAI

CLI tool for LLM compression via pruning, quantization, and distillation

Created 1 year ago
678 stars

Top 50.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

NyunAI/nyuntam is a Python toolkit designed to optimize and accelerate large language models (LLMs) using state-of-the-art compression techniques like pruning, quantization, and distillation. It targets researchers and engineers working with LLMs, offering an integrated CLI for streamlined experimentation and workflow management, ultimately aiming to reduce model size and computational cost without significant performance degradation.

How It Works

Nyuntam employs a modular architecture, allowing users to integrate various compression algorithms through a unified CLI. The toolkit leverages configuration files (YAML) to define experiment parameters, including compression methods, datasets, and model specifics. This approach facilitates reproducible research and rapid iteration on LLM optimization strategies.

Quick Start & Requirements

  • Install: pip install nyuntam
  • Prerequisites: Python 3.8+, NVIDIA Container Toolkit (for Docker GPU support).
  • Setup: For GPU acceleration via Docker, NVIDIA Container Toolkit installation is required.
  • Docs: NyunAI Docs
  • Examples: examples directory

Highlighted Details

  • State-of-the-art compression techniques: pruning, quantization, distillation.
  • Integrated CLI (nyun) for workspace management and experiment execution.
  • Extensible architecture supporting various compression algorithms.
  • Docker and virtual environment support for multi-platform compatibility.

Maintenance & Community

The project is developed by NyunAI. Further community or roadmap information is not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Access to gated repositories within containers requires Hugging Face tokens to be configured. The README does not detail specific performance benchmarks or comparisons against other optimization tools.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
12 more.

mistral.rs by EricLBuehler

0.1%
6k
LLM inference engine for blazing fast performance
Created 1 year ago
Updated 4 days ago
Feedback? Help us improve.