Finetune_LLAMA  by chaoyi-wu

Chinese guide for fine-tuning LLaMA models

created 2 years ago
401 stars

Top 73.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a straightforward guide for Chinese users to fine-tune LLaMA models, integrating multiple frameworks like Minimal LLaMA, Alpaca, and LMFlow for enhanced readability and flexibility. It targets researchers and developers working with LLMs who need a clear, step-by-step process for customization.

How It Works

The project offers two primary fine-tuning scripts: finetune_pp.py for full parameter fine-tuning and finetune_pp_peft.py for parameter-efficient fine-tuning (PEFT) using LoRA. It emphasizes code clarity over excessive abstraction, allowing users to understand the underlying mechanisms. For multi-GPU acceleration, it integrates FSDP (Fully Sharded Data Parallel) and DeepSpeed, addressing Out-of-Memory (OOM) issues and significantly speeding up training, especially for larger models.

Quick Start & Requirements

  • Installation: Install peft and transformers via pip. Crucially, install PyTorch with CUDA support using Conda: conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia. Also install sentencepiece.
  • Model Download: Download LLaMA weights from Hugging Face (e.g., decapoda-research/llama-7b-hf) or the official source and convert them using convert_llama_weights_to_hf.py.
  • Data Preparation: Follow the Data_sample directory for data formatting.
  • Training: Modify parameters in finetune_pp.py or finetune_pp_peft.py, specifying GPU. For multi-GPU, use finetune_pp_peft_trainer_lora.py, finetune_pp_peft_trainer.py (FSDP), or finetune_pp_peft_trainer_deepspeed.sh (DeepSpeed).
  • Resources: Requires significant GPU memory, especially for larger models. Benchmarks show 8xA100s are used for training on 4.8M papers.

Highlighted Details

  • Integrates PEFT (LoRA), FSDP, and DeepSpeed for efficient fine-tuning and memory optimization.
  • Provides comparative benchmarks on training time per epoch for 7B and 13B models across different acceleration strategies.
  • Offers gradient checkpointing (finetune_pp_peft_trainer_checkpointing.sh) for substantial memory reduction in large models.
  • Includes a DeepSpeed installation guide, though notes caution against its default CPU offloading for performance.

Maintenance & Community

  • Acknowledges contributions and integrations from Minimal LLaMA, Stanford Alpaca, and LMFlow.
  • No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The underlying LLaMA model has its own usage restrictions. Compatibility for commercial use is not specified.

Limitations & Caveats

The provided finetune_pp.py and finetune_pp_peft.py scripts lack multi-GPU acceleration, resulting in slow training speeds suitable mainly for debugging. The README advises caution regarding DeepSpeed's performance impact due to CPU offloading, and notes that CPU participation is necessary for multi-GPU on larger models (13B+) to avoid OOM.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Ying Sheng Ying Sheng(Author of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
created 2 years ago
updated 2 weeks ago
Feedback? Help us improve.