Chinese-LLaMA-Alpaca-2 by ymcui

Chinese LLaMA/Alpaca-2: LLMs with long context for Chinese language

Created 2 years ago

7,176 stars

Top 7.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

This project provides Chinese-centric Large Language Models (LLMs) based on Meta's Llama-2, offering both foundational (LLaMA-2) and instruction-tuned (Alpaca-2) variants. It targets developers and researchers needing enhanced Chinese language understanding and generation capabilities, including support for extended context lengths up to 64K tokens.

How It Works

The models are built upon Llama-2, featuring an expanded Chinese vocabulary and incremental pre-training on large-scale Chinese datasets. Key innovations include the use of FlashAttention-2 for efficient training and context extension techniques like Position Interpolation (PI) and YaRN for achieving 16K and 64K context lengths. Instruction-tuned models are further refined with RLHF for better alignment with human preferences and values.

Quick Start & Requirements

Installation: Models are available via Hugging Face Transformers, with GGUF versions for llama.cpp.
Prerequisites: Python, PyTorch, Hugging Face libraries. Specific hardware requirements depend on model size (e.g., 7B models require ~13GB VRAM for FP16).
Resources: Full models range from 2.4GB (1.3B) to 24.7GB (13B). LoRA weights are significantly smaller.
Documentation: Detailed guides for pre-training, fine-tuning, and deployment are available on the project's GitHub Wiki.

Highlighted Details

Offers models with standard 4K context and extended 16K/64K context lengths.
Includes RLHF-tuned variants for improved value alignment.
Supports integration with popular tools like transformers, llama.cpp, text-generation-webui, and LangChain.
Provides extensive benchmarks on C-Eval, CMMLU, and LongBench, showcasing performance across various tasks.

Maintenance & Community

The project is actively maintained, with recent updates including support for Llama-3 based models (Chinese-LLaMA-Alpaca-3). Community interaction is encouraged via GitHub Issues and Discussions.

Licensing & Compatibility

The models are based on Llama-2, which has a permissive license allowing commercial use. Users must adhere to the Llama-2 license terms.

Limitations & Caveats

The models may generate unpredictable or undesirable content, and their training is not fully comprehensive due to computational and data constraints, requiring further improvement in Chinese understanding. No interactive online demo is provided, necessitating local deployment for testing.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days