Chinese-Vicuna  by Facico

Chinese LLaMA fine-tuning project for instruction-following

created 2 years ago
4,152 stars

Top 12.0% on sourcepulse

GitHubView on GitHub
Project Summary

Chinese-Vicuna provides a low-resource solution for fine-tuning LLaMA models for Chinese instruction following and multi-round chatbots. It's designed for researchers and developers with limited hardware, enabling training on consumer-grade GPUs like the RTX-2080Ti and RTX-3090. The project offers efficient parameter tuning via LoRA, making it accessible for creating capable Chinese language models.

How It Works

The project leverages the LoRA (Low-Rank Adaptation) technique, which significantly reduces the computational resources required for fine-tuning large language models. By injecting trainable low-rank matrices into the transformer layers, it achieves high parameter efficiency. This approach allows for effective instruction tuning and conversational ability development on smaller datasets and with less VRAM, making it "graphics card friendly" and easy to deploy.

Quick Start & Requirements

  • Install: pip install -r requirements.txt (or requirements_4bit.txt for 4-bit/QLoRA).
  • Prerequisites: Python 3.8, PyTorch 1.13.1, CUDA 12.
  • Hardware: RTX-2080Ti (11GB) for 7B models, RTX-3090 (24GB) for 13B models or longer context. QLoRA enables 13B training on 2080Ti.
  • Resources: Training 70w data for 3 epochs on a 2080Ti takes ~200 hours.
  • Links: Colab, HuggingFace Datasets.

Highlighted Details

  • Supports 4-bit training and inference (QLoRA).
  • Offers CPU inference via pure C++.
  • Includes tools for downloading, converting, and quantifying Facebook's LLaMA checkpoints.
  • Fine-tuning examples for medical and legal domains are provided.
  • Supports multi-GPU inference to further reduce VRAM usage.

Maintenance & Community

The project is actively maintained, with recent updates including 4-bit training support and multi-GPU inference interfaces. It references the alpaca-lora project and utilizes datasets like BELLE and Guanaco. Community interaction channels are not explicitly listed in the README.

Licensing & Compatibility

The project's code is likely governed by the license of its dependencies (e.g., alpaca-lora). The README does not explicitly state a license for the code itself. LLaMA model weights have their own usage restrictions.

Limitations & Caveats

The README notes potential issues with saving in 8-bit training environments due to bitsandbytes compatibility. Python 3.11 has a known torchrun bug. Some conversational models may exhibit repetitive or less coherent outputs without parameter tuning (e.g., Repetition Penalty).

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
10 more.

qlora by artidoro

0.2%
11k
Finetuning tool for quantized LLMs
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.