unsloth-zoo  by unslothai

Accelerate LLM finetuning with reduced VRAM usage

Created 1 year ago
263 stars

Top 96.7% on SourcePulse

GitHubView on GitHub
Project Summary

Unsloth Zoo provides optimized utilities for finetuning large language models, significantly reducing training time and VRAM requirements. It targets engineers and researchers needing to efficiently adapt LLMs for various tasks, enabling finetuning on more accessible hardware and accelerating development cycles.

How It Works

Unsloth employs custom kernels written in OpenAI's Triton language and a manual backpropagation engine. This approach allows for highly optimized computations, achieving substantial speedups and memory reductions without sacrificing model accuracy. It integrates advanced techniques like dynamic 4-bit quantization and optimized LoRA implementations to maximize efficiency.

Quick Start & Requirements

Installation is straightforward via pip: pip install unsloth for Linux/WSL. Windows users require PyTorch pre-installation. An official Docker image (unsloth/unsloth) is also available. Requires NVIDIA GPUs with CUDA Capability 7.0+ (e.g., RTX 20-series and newer, A100, H100). Python 3.10-3.14 is supported. Detailed installation guides and documentation are available.

Highlighted Details

  • Supports full-finetuning, pretraining, and 4/8/16-bit training across a vast array of models including Llama (3.3, 3.2, 3.1), Gemma, Mistral, Phi, Qwen, and multimodal/TTS models.
  • Achieves up to 2.2x faster training and over 80% VRAM reduction compared to standard Hugging Face implementations, with 0% loss in accuracy.
  • Enables dramatically longer context windows, e.g., 342K for Llama 3.1 (8B) and 89K for Llama 3.3 (70B) on high-end GPUs.
  • Integrates seamlessly with Hugging Face's TRL library for Reinforcement Learning tasks like DPO and GRPO.

Maintenance & Community

The project shows active development with frequent updates on new model support, optimizations, and features. Community engagement is fostered through Twitter (X) and Reddit. Notable collaborations include work with Apple on specific optimizations.

Licensing & Compatibility

The repository's README does not explicitly state a software license. This absence creates ambiguity regarding usage rights, particularly for commercial applications or integration into closed-source projects. Compatibility is primarily for NVIDIA GPUs.

Limitations & Caveats

Windows installation can be complex, requiring careful setup of PyTorch, CUDA, and Triton. Older NVIDIA GPUs (e.g., GTX 10-series) are supported but may offer limited performance. The most significant caveat is the lack of a clear license, which poses a risk for adoption in production or commercial environments.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
79
Issues (30d)
4
Star History
20 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
41 more.

unsloth by unslothai

0.9%
65k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 2 years ago
Updated 15 hours ago
Feedback? Help us improve.