Finetune_LLAMA by chaoyi-wu

Chinese guide for fine-tuning LLaMA models

Created 2 years ago

414 stars

Top 70.7% on SourcePulse

Project Summary

This repository provides a straightforward guide for Chinese users to fine-tune LLaMA models, integrating multiple frameworks like Minimal LLaMA, Alpaca, and LMFlow for enhanced readability and flexibility. It targets researchers and developers working with LLMs who need a clear, step-by-step process for customization.

How It Works

The project offers two primary fine-tuning scripts: finetune_pp.py for full parameter fine-tuning and finetune_pp_peft.py for parameter-efficient fine-tuning (PEFT) using LoRA. It emphasizes code clarity over excessive abstraction, allowing users to understand the underlying mechanisms. For multi-GPU acceleration, it integrates FSDP (Fully Sharded Data Parallel) and DeepSpeed, addressing Out-of-Memory (OOM) issues and significantly speeding up training, especially for larger models.

Quick Start & Requirements

Installation: Install peft and transformers via pip. Crucially, install PyTorch with CUDA support using Conda: conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia. Also install sentencepiece.
Model Download: Download LLaMA weights from Hugging Face (e.g., decapoda-research/llama-7b-hf) or the official source and convert them using convert_llama_weights_to_hf.py.
Data Preparation: Follow the Data_sample directory for data formatting.
Training: Modify parameters in finetune_pp.py or finetune_pp_peft.py, specifying GPU. For multi-GPU, use finetune_pp_peft_trainer_lora.py, finetune_pp_peft_trainer.py (FSDP), or finetune_pp_peft_trainer_deepspeed.sh (DeepSpeed).
Resources: Requires significant GPU memory, especially for larger models. Benchmarks show 8xA100s are used for training on 4.8M papers.

Highlighted Details

Integrates PEFT (LoRA), FSDP, and DeepSpeed for efficient fine-tuning and memory optimization.
Provides comparative benchmarks on training time per epoch for 7B and 13B models across different acceleration strategies.
Offers gradient checkpointing (finetune_pp_peft_trainer_checkpointing.sh) for substantial memory reduction in large models.
Includes a DeepSpeed installation guide, though notes caution against its default CPU offloading for performance.

Maintenance & Community

Acknowledges contributions and integrations from Minimal LLaMA, Stanford Alpaca, and LMFlow.
No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The underlying LLaMA model has its own usage restrictions. Compatibility for commercial use is not specified.

Limitations & Caveats

The provided finetune_pp.py and finetune_pp_peft.py scripts lack multi-GPU acceleration, resulting in slow training speeds suitable mainly for debugging. The README advises caution regarding DeepSpeed's performance impact due to CPU offloading, and notes that CPU participation is necessary for multi-GPU on larger models (13B+) to avoid OOM.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days