unsloth  by unslothai

Finetuning tool for LLMs, targeting speed and memory efficiency

created 1 year ago
42,961 stars

Top 0.6% on sourcepulse

GitHubView on GitHub
Project Summary

Unsloth is a Python library designed to significantly accelerate the fine-tuning of large language models (LLMs) while drastically reducing memory consumption. It targets researchers and developers working with LLMs who need to optimize training speed and hardware resource utilization, enabling the fine-tuning of larger models on more accessible hardware.

How It Works

Unsloth achieves its performance gains through custom-written kernels in OpenAI's Triton language and a manual backpropagation engine. This approach allows for exact computations with zero loss in accuracy, unlike approximation methods. It also incorporates dynamic 4-bit quantization, selectively quantizing parameters to maintain high accuracy while minimizing VRAM usage.

Quick Start & Requirements

  • Install: pip install unsloth (Linux recommended). Advanced installation for specific PyTorch/CUDA versions is available via pip install "unsloth[cuXX-torchYY] @ git+https://github.com/unslothai/unsloth.git".
  • Prerequisites: NVIDIA GPUs with CUDA Capability 7.0+ (e.g., RTX 20 series and newer), Python 3.10-3.12, PyTorch compatible with CUDA drivers, and potentially Visual Studio C++ build tools on Windows.
  • Resources: Enables fine-tuning of large models (e.g., Llama 3.3 70B) on 80GB GPUs with significantly reduced VRAM.
  • Docs: Official Documentation

Highlighted Details

  • Supports fine-tuning of numerous LLMs including Llama 3.3 (70B), Mistral, Gemma 3, Phi-4, and Qwen 2.5.
  • Achieves up to 2.2x faster fine-tuning and up to 80% less VRAM usage compared to standard Hugging Face implementations.
  • Enables significantly longer context windows (e.g., 342K for Llama 3.1 8B, 89K for Llama 3.3 70B).
  • Supports full fine-tuning, pretraining, 4-bit, 8-bit, and 16-bit training, as well as RL algorithms like DPO and GRPO.

Maintenance & Community

  • Actively developed with frequent updates, including support for new models and features.
  • Links to Twitter, Reddit, and Blogs are provided for community engagement and updates.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Users should verify licensing for commercial use.

Limitations & Caveats

  • Python 3.13 is not supported.
  • Windows installation requires careful setup of Visual Studio C++ and CUDA Toolkit.
  • Advanced pip installations require precise matching of PyTorch and CUDA versions.
Health Check
Last commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
47
Issues (30d)
165
Star History
5,358 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 11 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Ying Sheng Ying Sheng(Author of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
created 2 years ago
updated 2 weeks ago
Feedback? Help us improve.