LlamaFactory by hiyouga

Unified fine-tuning tool for 100+ LLMs & VLMs (ACL 2024)

Created 2 years ago

65,388 stars

Top 0.3% on SourcePulse

View on GitHub

30 Experts Love This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Alex Chen

Cofounder of Nexa AI

Tony Lee

Author of HELM; Research Engineer at Meta

Lysandre Debut

Chief Open-Source Officer at Hugging Face

and 26 more!

Project Summary

LLaMA-Factory provides a unified and efficient framework for fine-tuning over 100 large language and vision-language models. It caters to researchers and developers looking to adapt LLMs for various tasks, offering both a zero-code CLI and a Gradio-based Web UI for ease of use. The project aims to simplify the complex process of LLM fine-tuning, supporting a wide array of models and training methodologies.

How It Works

The framework supports a broad spectrum of fine-tuning techniques, including continuous pre-training, supervised fine-tuning (SFT), reward modeling, and preference optimization methods like DPO, KTO, and ORPO. It integrates advanced optimization algorithms such as GaLore, BAdam, and APOLLO, alongside practical tricks like FlashAttention-2 and Unsloth for enhanced performance and reduced memory usage. This comprehensive approach allows for efficient adaptation of models using methods ranging from full parameter tuning to various forms of parameter-efficient fine-tuning (PEFT) like LoRA and QLoRA, supporting quantization down to 2-bit.

Quick Start & Requirements

Install: git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git && cd LLaMA-Factory && pip install -e ".[torch,metrics]"
Prerequisites: Python >= 3.9, PyTorch >= 2.0.0, Transformers >= 4.45.0. Optional: CUDA >= 11.6 for GPU acceleration.
Resources: Fine-tuning a 7B model with QLoRA (4-bit) requires approximately 6GB of VRAM.
Docs: https://llamafactory.readthedocs.io/en/latest/
Colab: https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing

Highlighted Details

Supports over 100 LLMs and VLMs, including Llama, Mistral, Qwen, Gemma, and LLaVA.
Integrates advanced PEFT methods and optimizers like LoRA, QLoRA, GaLore, and BAdam.
Offers multimodal fine-tuning capabilities for vision-language tasks.
Provides experiment monitoring via LlamaBoard, TensorBoard, Wandb, and SwanLab.
Includes an OpenAI-style API for deployment with vLLM or SGLang backends.

Maintenance & Community

The project is actively maintained with frequent updates, including support for new models and training techniques. It has a growing community, with links to user groups and active development.

Licensing & Compatibility

The repository is licensed under Apache-2.0. Model weights are subject to their respective licenses. This permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

While extensive, the sheer number of supported models and configurations may lead to complex dependency management. Users might need to carefully manage specific versions of libraries like transformers for certain models, as noted in the documentation. Installation on Windows for specific features like FlashAttention-2 requires manual compilation.

Health Check

Last Commit

9 hours ago

Responsiveness

1 day

Pull Requests (30d)

120

Issues (30d)

112

Star History

1,643 stars in the last 30 days