finetune-gpt2xl  by Xirider

Guide for finetuning GPT2-XL and GPT-NEO on a single GPU

created 4 years ago
437 stars

Top 69.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a guide and scripts for fine-tuning large language models, specifically GPT-2 XL (1.5B parameters) and GPT-Neo (2.7B parameters), on a single GPU. It targets researchers and practitioners who want to adapt these powerful models to specific tasks without requiring extensive hardware resources. The primary benefit is enabling efficient fine-tuning of large models on consumer-grade or single-server GPU setups.

How It Works

The guide leverages Huggingface Transformers and the DeepSpeed library to significantly reduce the memory footprint of large models during fine-tuning. Key techniques include DeepSpeed's ZeRO optimization and gradient checkpointing, which distribute model states and activations across available memory, allowing models that would normally exceed single-GPU VRAM to be trained.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Prerequisites: A GPU with at least 16GB VRAM (20GB recommended for GPT-Neo) and 60GB+ system RAM. CUDA and PyTorch are required.
  • Setup: The guide includes detailed instructions for setting up a Google Cloud VM with a V100 GPU, which can take approximately 5-10 minutes for initial setup.
  • Data Prep: Text files need to be converted to CSV format using text2csv.py.
  • Training: Run fine-tuning with deepspeed --num_gpus=1 run_clm.py ... specifying model and data paths.
  • Docs: Huggingface Transformers Trainer

Highlighted Details

  • Enables fine-tuning of 1.5B and 2.7B parameter models on a single GPU.
  • Utilizes DeepSpeed and gradient checkpointing for memory efficiency.
  • Provides specific configurations and commands for both GPT-2 XL and GPT-Neo.
  • Includes example scripts for text generation with fine-tuned models.

Maintenance & Community

The repository is maintained by Xirider. Further community interaction details (e.g., Discord/Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The repository itself does not specify a license in the README. The underlying models (GPT-2 XL, GPT-Neo) are typically available under permissive licenses (e.g., MIT for GPT-Neo), but users should verify the specific licenses of the models they use. Compatibility for commercial use depends on the underlying model licenses.

Limitations & Caveats

The guide assumes a Linux environment and requires specific Google Cloud setup if local hardware is insufficient. While it aims to reduce memory usage, performance may still be constrained by single-GPU limitations, and extensive hyperparameter tuning might be necessary for optimal results. The GPT-Neo setup notes that 70GB RAM might not be strictly necessary but is recommended.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Ying Sheng Ying Sheng(Author of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
created 2 years ago
updated 2 weeks ago
Feedback? Help us improve.