gpt-oss-recipes  by huggingface

OpenAI GPT-OSS model optimization and fine-tuning

Created 2 months ago
456 stars

Top 66.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a collection of scripts and notebooks for optimizing and fine-tuning OpenAI's GPT-OSS models (20B and 120B parameters). It targets researchers and engineers working with large language models, offering practical examples for efficient inference and training.

How It Works

The collection demonstrates various optimization techniques including Tensor Parallelism, Flash Attention, and Continuous Batching for enhanced inference performance. For fine-tuning, it supports both full-parameter training and LoRA, leveraging Hugging Face's accelerate library for distributed training and efficient memory management.

Quick Start & Requirements

  • Installation: Use uv for environment management and install PyTorch with CUDA 12.8 support. Optionally install Triton kernels for MXFP4 quantization.
    uv venv gpt-oss --python 3.11 && source gpt-oss/bin/activate
    uv pip install --upgrade pip
    uv pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
    uv pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
    uv pip install -r requirements.txt
    
  • Prerequisites: Python 3.11, PyTorch with CUDA 12.8, Triton (optional).
  • Usage: Edit model_path in scripts to select 20B or 120B models. Run inference with python generate_<script_name>.py or torchrun for distributed inference. Training examples are provided for full-parameter and LoRA methods using accelerate launch.
  • Links: Resources Blog, Cookbook

Highlighted Details

  • Scripts demonstrate Tensor Parallelism, Flash Attention, and Continuous Batching for inference.
  • Supports full-parameter and LoRA fine-tuning.
  • Includes configuration files for distributed training (e.g., zero3.yaml).
  • Examples cover both 20B and 120B model variants.

Maintenance & Community

This project is associated with Hugging Face. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking would require clarification.

Limitations & Caveats

The README does not specify the license, which is crucial for commercial use. The installation instructions for Triton kernels mention optional MXFP4 quantization support, implying potential hardware-specific requirements or performance variations.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
21 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.