Chinese-Vicuna by Facico

Chinese LLaMA fine-tuning project for instruction-following

Created 2 years ago

4,143 stars

Top 11.8% on SourcePulse

Project Summary

Chinese-Vicuna provides a low-resource solution for fine-tuning LLaMA models for Chinese instruction following and multi-round chatbots. It's designed for researchers and developers with limited hardware, enabling training on consumer-grade GPUs like the RTX-2080Ti and RTX-3090. The project offers efficient parameter tuning via LoRA, making it accessible for creating capable Chinese language models.

How It Works

The project leverages the LoRA (Low-Rank Adaptation) technique, which significantly reduces the computational resources required for fine-tuning large language models. By injecting trainable low-rank matrices into the transformer layers, it achieves high parameter efficiency. This approach allows for effective instruction tuning and conversational ability development on smaller datasets and with less VRAM, making it "graphics card friendly" and easy to deploy.

Quick Start & Requirements

Install: pip install -r requirements.txt (or requirements_4bit.txt for 4-bit/QLoRA).
Prerequisites: Python 3.8, PyTorch 1.13.1, CUDA 12.
Hardware: RTX-2080Ti (11GB) for 7B models, RTX-3090 (24GB) for 13B models or longer context. QLoRA enables 13B training on 2080Ti.
Resources: Training 70w data for 3 epochs on a 2080Ti takes ~200 hours.
Links: Colab, HuggingFace Datasets.

Highlighted Details

Supports 4-bit training and inference (QLoRA).
Offers CPU inference via pure C++.
Includes tools for downloading, converting, and quantifying Facebook's LLaMA checkpoints.
Fine-tuning examples for medical and legal domains are provided.
Supports multi-GPU inference to further reduce VRAM usage.

Maintenance & Community

The project is actively maintained, with recent updates including 4-bit training support and multi-GPU inference interfaces. It references the alpaca-lora project and utilizes datasets like BELLE and Guanaco. Community interaction channels are not explicitly listed in the README.

Licensing & Compatibility

The project's code is likely governed by the license of its dependencies (e.g., alpaca-lora). The README does not explicitly state a license for the code itself. LLaMA model weights have their own usage restrictions.

Limitations & Caveats

The README notes potential issues with saving in 8-bit training environments due to bitsandbytes compatibility. Python 3.11 has a known torchrun bug. Some conversational models may exhibit repetitive or less coherent outputs without parameter tuning (e.g., Repetition Penalty).

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days