unsloth  by unslothai

Finetuning tool for LLMs, targeting speed and memory efficiency

Created 1 year ago
45,604 stars

Top 0.5% on SourcePulse

GitHubView on GitHub
Project Summary

Unsloth is a Python library designed to significantly accelerate the fine-tuning of large language models (LLMs) while drastically reducing memory consumption. It targets researchers and developers working with LLMs who need to optimize training speed and hardware resource utilization, enabling the fine-tuning of larger models on more accessible hardware.

How It Works

Unsloth achieves its performance gains through custom-written kernels in OpenAI's Triton language and a manual backpropagation engine. This approach allows for exact computations with zero loss in accuracy, unlike approximation methods. It also incorporates dynamic 4-bit quantization, selectively quantizing parameters to maintain high accuracy while minimizing VRAM usage.

Quick Start & Requirements

  • Install: pip install unsloth (Linux recommended). Advanced installation for specific PyTorch/CUDA versions is available via pip install "unsloth[cuXX-torchYY] @ git+https://github.com/unslothai/unsloth.git".
  • Prerequisites: NVIDIA GPUs with CUDA Capability 7.0+ (e.g., RTX 20 series and newer), Python 3.10-3.12, PyTorch compatible with CUDA drivers, and potentially Visual Studio C++ build tools on Windows.
  • Resources: Enables fine-tuning of large models (e.g., Llama 3.3 70B) on 80GB GPUs with significantly reduced VRAM.
  • Docs: Official Documentation

Highlighted Details

  • Supports fine-tuning of numerous LLMs including Llama 3.3 (70B), Mistral, Gemma 3, Phi-4, and Qwen 2.5.
  • Achieves up to 2.2x faster fine-tuning and up to 80% less VRAM usage compared to standard Hugging Face implementations.
  • Enables significantly longer context windows (e.g., 342K for Llama 3.1 8B, 89K for Llama 3.3 70B).
  • Supports full fine-tuning, pretraining, 4-bit, 8-bit, and 16-bit training, as well as RL algorithms like DPO and GRPO.

Maintenance & Community

  • Actively developed with frequent updates, including support for new models and features.
  • Links to Twitter, Reddit, and Blogs are provided for community engagement and updates.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Users should verify licensing for commercial use.

Limitations & Caveats

  • Python 3.13 is not supported.
  • Windows installation requires careful setup of Visual Studio C++ and CUDA Toolkit.
  • Advanced pip installations require precise matching of PyTorch and CUDA versions.
Health Check
Last Commit

12 hours ago

Responsiveness

1 day

Pull Requests (30d)
45
Issues (30d)
114
Star History
1,384 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 1 week ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

0.1%
6k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 2 weeks ago
Starred by Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

MiniCPM by OpenBMB

0.4%
8k
Ultra-efficient LLMs for end devices, achieving 5x+ speedup
Created 1 year ago
Updated 1 week ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
22 more.

qlora by artidoro

0.1%
11k
Finetuning tool for quantized LLMs
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.