fsdp_qlora  by AnswerDotAI

Training script for LLMs using QLoRA + FSDP

Created 1 year ago
1,528 stars

Top 27.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a script for training large language models (LLMs) using QLoRA (Quantized Low-Rank Adaptation) combined with Fully Sharded Data Parallelism (FSDP). It targets researchers and practitioners looking to fine-tune LLMs efficiently on limited hardware, offering significant memory savings and faster training times compared to full fine-tuning.

How It Works

The core innovation lies in the integration of QLoRA's 4-bit quantization with PyTorch's FSDP for distributed training. This approach quantizes model weights to 4-bit precision, drastically reducing memory requirements. FSDP then shards the quantized model, optimizer states, and gradients across multiple GPUs. Custom low-memory loading code is employed to load and quantize model layers iteratively, avoiding the need to load the entire model into GPU memory at once. The script supports both bitsandbytes and HQQ quantization backends, with options for gradient checkpointing and CPU offloading.

Quick Start & Requirements

  • Installation: Clone the repository, then run pip install llama-recipes fastcore "transformers!=4.38.*,!=4.39.*" --extra-index-url https://download.pytorch.org/whl/test/cu118 (adjust CUDA version as needed) and pip install bitsandbytes>=0.43.0. Log in to Hugging Face Hub (huggingface-cli login).
  • Prerequisites: CUDA (tested with 11.7, 11.8, 12.1), PyTorch >= 2.2 recommended for Flash Attention 2. Optional: wandb for logging, HQQ custom kernels (hqq/kernels/setup_cuda.py).
  • Resources: Example command for Llama-2 70B finetuning requires ~128GB CPU RAM; a swap file is recommended.
  • Docs: Announcement Blog Post (related to 4-bit LLMs).

Highlighted Details

  • Supports multiple fine-tuning methods: full parameter, LoRA, QLoRA (bitsandbytes/HQQ), DoRA, and Llama-Pro variants.
  • Offers flexible mixed-precision training options (fp32, bf16, fp16 autocast) with detailed explanations for each.
  • Includes custom loading code for quantized models to bypass Hugging Face AutoModel.from_pretrained limitations.
  • Provides example scripts for training Llama 70B on 4x A100 40GB GPUs using both BnB QLoRA and HQQ QLoRA.

Maintenance & Community

  • The project is described as an "alpha/preview release," suggesting ongoing development and potential instability.
  • Integrations are noted with Axolotl (experimental).
  • No specific community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. However, the dependencies (Hugging Face Transformers, bitsandbytes, PyTorch) generally permit commercial use, but specific restrictions may apply to the base models used (e.g., Llama 2).

Limitations & Caveats

This is an alpha/preview release, requiring users comfortable with testing and debugging. Custom model loading code is necessary due to incompatibilities with Hugging Face Transformers for quantized weights. Careful configuration of FSDP's Mixed Precision is required to avoid corrupting quantized weights.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

gptq by IST-DASLab

0.1%
2k
Code for GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers
Created 2 years ago
Updated 1 year ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

llm-awq by mit-han-lab

0.3%
3k
Weight quantization research paper for LLM compression/acceleration
Created 2 years ago
Updated 2 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
8 more.

lit-llama by Lightning-AI

0.1%
6k
LLaMA implementation for pretraining, finetuning, and inference
Created 2 years ago
Updated 2 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
36 more.

unsloth by unslothai

0.6%
46k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 1 year ago
Updated 14 hours ago
Feedback? Help us improve.