finetune-gpt2xl by Xirider

Guide for finetuning GPT2-XL and GPT-NEO on a single GPU

Created 4 years ago

435 stars

Top 68.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Elvis Saravia

Founder of DAIR.AI

Project Summary

This repository provides a guide and scripts for fine-tuning large language models, specifically GPT-2 XL (1.5B parameters) and GPT-Neo (2.7B parameters), on a single GPU. It targets researchers and practitioners who want to adapt these powerful models to specific tasks without requiring extensive hardware resources. The primary benefit is enabling efficient fine-tuning of large models on consumer-grade or single-server GPU setups.

How It Works

The guide leverages Huggingface Transformers and the DeepSpeed library to significantly reduce the memory footprint of large models during fine-tuning. Key techniques include DeepSpeed's ZeRO optimization and gradient checkpointing, which distribute model states and activations across available memory, allowing models that would normally exceed single-GPU VRAM to be trained.

Quick Start & Requirements

Install: Clone the repository and install dependencies via pip install -r requirements.txt.
Prerequisites: A GPU with at least 16GB VRAM (20GB recommended for GPT-Neo) and 60GB+ system RAM. CUDA and PyTorch are required.
Setup: The guide includes detailed instructions for setting up a Google Cloud VM with a V100 GPU, which can take approximately 5-10 minutes for initial setup.
Data Prep: Text files need to be converted to CSV format using text2csv.py.
Training: Run fine-tuning with deepspeed --num_gpus=1 run_clm.py ... specifying model and data paths.
Docs: Huggingface Transformers Trainer

Highlighted Details

Enables fine-tuning of 1.5B and 2.7B parameter models on a single GPU.
Utilizes DeepSpeed and gradient checkpointing for memory efficiency.
Provides specific configurations and commands for both GPT-2 XL and GPT-Neo.
Includes example scripts for text generation with fine-tuned models.

Maintenance & Community

The repository is maintained by Xirider. Further community interaction details (e.g., Discord/Slack) are not explicitly mentioned in the README.

Licensing & Compatibility

The repository itself does not specify a license in the README. The underlying models (GPT-2 XL, GPT-Neo) are typically available under permissive licenses (e.g., MIT for GPT-Neo), but users should verify the specific licenses of the models they use. Compatibility for commercial use depends on the underlying model licenses.

Limitations & Caveats

The guide assumes a Linux environment and requires specific Google Cloud setup if local hardware is insufficient. While it aims to reduce memory usage, performance may still be constrained by single-GPU limitations, and extensive hyperparameter tuning might be necessary for optimal results. The GPT-Neo setup notes that 70GB RAM might not be strictly necessary but is recommended.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days