nyuntam  by nyunAI

CLI tool for LLM compression via pruning, quantization, and distillation

created 1 year ago
683 stars

Top 49.7% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

NyunAI/nyuntam is a Python toolkit designed to optimize and accelerate large language models (LLMs) using state-of-the-art compression techniques like pruning, quantization, and distillation. It targets researchers and engineers working with LLMs, offering an integrated CLI for streamlined experimentation and workflow management, ultimately aiming to reduce model size and computational cost without significant performance degradation.

How It Works

Nyuntam employs a modular architecture, allowing users to integrate various compression algorithms through a unified CLI. The toolkit leverages configuration files (YAML) to define experiment parameters, including compression methods, datasets, and model specifics. This approach facilitates reproducible research and rapid iteration on LLM optimization strategies.

Quick Start & Requirements

  • Install: pip install nyuntam
  • Prerequisites: Python 3.8+, NVIDIA Container Toolkit (for Docker GPU support).
  • Setup: For GPU acceleration via Docker, NVIDIA Container Toolkit installation is required.
  • Docs: NyunAI Docs
  • Examples: examples directory

Highlighted Details

  • State-of-the-art compression techniques: pruning, quantization, distillation.
  • Integrated CLI (nyun) for workspace management and experiment execution.
  • Extensible architecture supporting various compression algorithms.
  • Docker and virtual environment support for multi-platform compatibility.

Maintenance & Community

The project is developed by NyunAI. Further community or roadmap information is not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Access to gated repositories within containers requires Hugging Face tokens to be configured. The README does not detail specific performance benchmarks or comparisons against other optimization tools.

Health Check
Last commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
41 more.

llama.cpp by ggml-org

0.7%
85k
C/C++ library for local LLM inference
created 2 years ago
updated 1 day ago
Feedback? Help us improve.