unsloth by unslothai

Finetuning tool for LLMs, targeting speed and memory efficiency

Created 2 years ago

50,549 stars

Top 0.5% on SourcePulse

View on GitHub

43 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

and 39 more!

Project Summary

Unsloth is a Python library designed to significantly accelerate the fine-tuning of large language models (LLMs) while drastically reducing memory consumption. It targets researchers and developers working with LLMs who need to optimize training speed and hardware resource utilization, enabling the fine-tuning of larger models on more accessible hardware.

How It Works

Unsloth achieves its performance gains through custom-written kernels in OpenAI's Triton language and a manual backpropagation engine. This approach allows for exact computations with zero loss in accuracy, unlike approximation methods. It also incorporates dynamic 4-bit quantization, selectively quantizing parameters to maintain high accuracy while minimizing VRAM usage.

Quick Start & Requirements

Install: pip install unsloth (Linux recommended). Advanced installation for specific PyTorch/CUDA versions is available via pip install "unsloth[cuXX-torchYY] @ git+https://github.com/unslothai/unsloth.git".
Prerequisites: NVIDIA GPUs with CUDA Capability 7.0+ (e.g., RTX 20 series and newer), Python 3.10-3.12, PyTorch compatible with CUDA drivers, and potentially Visual Studio C++ build tools on Windows.
Resources: Enables fine-tuning of large models (e.g., Llama 3.3 70B) on 80GB GPUs with significantly reduced VRAM.
Docs: Official Documentation

Highlighted Details

Supports fine-tuning of numerous LLMs including Llama 3.3 (70B), Mistral, Gemma 3, Phi-4, and Qwen 2.5.
Achieves up to 2.2x faster fine-tuning and up to 80% less VRAM usage compared to standard Hugging Face implementations.
Enables significantly longer context windows (e.g., 342K for Llama 3.1 8B, 89K for Llama 3.3 70B).
Supports full fine-tuning, pretraining, 4-bit, 8-bit, and 16-bit training, as well as RL algorithms like DPO and GRPO.

Maintenance & Community

Actively developed with frequent updates, including support for new models and features.
Links to Twitter, Reddit, and Blogs are provided for community engagement and updates.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use.

Limitations & Caveats

Python 3.13 is not supported.
Windows installation requires careful setup of Visual Studio C++ and CUDA Toolkit.
Advanced pip installations require precise matching of PyTorch and CUDA versions.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

113

Star History

1,307 stars in the last 30 days