Chinese-Llama-2 by longyuewangdcu

Chinese Llama-2 enhances Llama-2 for Chinese language tasks

Created 2 years ago

445 stars

Top 67.3% on SourcePulse

Project Summary

This project enhances Llama-2's capabilities for the Chinese language, targeting researchers and developers working with Chinese NLP. It offers improved comprehension, generation, and translation by applying parameter-efficient fine-tuning (LoRA), full-parameter instruction fine-tuning, and continued pre-training.

How It Works

The project leverages three primary methods to adapt Llama-2 for Chinese: LoRA fine-tuning for parameter efficiency, full-parameter fine-tuning on Chinese instruction datasets (like BAAI/COIG) for deeper adaptation, and continued pre-training on large Chinese and English corpora to capture linguistic nuances. This multi-pronged approach aims to significantly boost Llama-2's performance on Chinese language tasks.

Quick Start & Requirements

Installation: Clone the repository (git clone https://github.com/longyuewangdcu/chinese-llama-2.git), navigate into the directory (cd chinese-llama-2), and install dependencies (pip install -e ./transformers, pip install -r requirements.txt).
Prerequisites: Requires PyTorch and Hugging Face Transformers. Specific fine-tuning commands indicate the need for bf16 support and potentially multi-node setups with NCCL. Flash Attention (v1.0.4) is recommended for full parameter fine-tuning to reduce memory usage.
Resources: Fine-tuned model weights are available via provided links (e.g., Hugging Face, Baidu Netdisk). Training requires significant computational resources, as indicated by the DeepSpeed configuration and multi-GPU commands.
Links: Chinese-Llama-2-LoRA-7B, Chinese-Llama-2-7B.

Highlighted Details

Demonstrates significant improvement over base Llama-2 7B Chat on Chinese language tasks, particularly in understanding and generation.
Offers both LoRA and full-parameter fine-tuning options, catering to different resource constraints and performance needs.
Provides example inference scripts for both LoRA and full-parameter fine-tuned models.
Includes detailed training scripts utilizing DeepSpeed for distributed training and bf16 for efficiency.

Maintenance & Community

The project is associated with researchers from the University of Macau and Monash University. Contributions are welcomed via issues and pull requests.

Licensing & Compatibility

The code is licensed under Apache 2.0. Model weights are available for use, but users should verify the specific license terms of the base Llama-2 model for commercial or closed-source applications.

Limitations & Caveats

The project is actively developing, with a "TODO" section mentioning continued pre-training and further fine-tuning releases. Availability of specific model checkpoints might be subject to external links.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days