Chinese guide for fine-tuning LLaMA models
Top 73.3% on sourcepulse
This repository provides a straightforward guide for Chinese users to fine-tune LLaMA models, integrating multiple frameworks like Minimal LLaMA, Alpaca, and LMFlow for enhanced readability and flexibility. It targets researchers and developers working with LLMs who need a clear, step-by-step process for customization.
How It Works
The project offers two primary fine-tuning scripts: finetune_pp.py
for full parameter fine-tuning and finetune_pp_peft.py
for parameter-efficient fine-tuning (PEFT) using LoRA. It emphasizes code clarity over excessive abstraction, allowing users to understand the underlying mechanisms. For multi-GPU acceleration, it integrates FSDP (Fully Sharded Data Parallel) and DeepSpeed, addressing Out-of-Memory (OOM) issues and significantly speeding up training, especially for larger models.
Quick Start & Requirements
peft
and transformers
via pip. Crucially, install PyTorch with CUDA support using Conda: conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia
. Also install sentencepiece
.decapoda-research/llama-7b-hf
) or the official source and convert them using convert_llama_weights_to_hf.py
.Data_sample
directory for data formatting.finetune_pp.py
or finetune_pp_peft.py
, specifying GPU. For multi-GPU, use finetune_pp_peft_trainer_lora.py
, finetune_pp_peft_trainer.py
(FSDP), or finetune_pp_peft_trainer_deepspeed.sh
(DeepSpeed).Highlighted Details
finetune_pp_peft_trainer_checkpointing.sh
) for substantial memory reduction in large models.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The provided finetune_pp.py
and finetune_pp_peft.py
scripts lack multi-GPU acceleration, resulting in slow training speeds suitable mainly for debugging. The README advises caution regarding DeepSpeed's performance impact due to CPU offloading, and notes that CPU participation is necessary for multi-GPU on larger models (13B+) to avoid OOM.
2 years ago
Inactive