gpt-oss-recipes by huggingface

OpenAI GPT-OSS model optimization and fine-tuning

Created 4 months ago

479 stars

Top 63.9% on SourcePulse

View on GitHub

5 Experts Love This Project

Eugene Yan

AI Scientist at AWS

Yaowei Zheng

Author of LLaMA-Factory

Will Brown

Research Lead at Prime Intellect

Lysandre Debut

Chief Open-Source Officer at Hugging Face

and 1 more!

Project Summary

This repository provides a collection of scripts and notebooks for optimizing and fine-tuning OpenAI's GPT-OSS models (20B and 120B parameters). It targets researchers and engineers working with large language models, offering practical examples for efficient inference and training.

How It Works

The collection demonstrates various optimization techniques including Tensor Parallelism, Flash Attention, and Continuous Batching for enhanced inference performance. For fine-tuning, it supports both full-parameter training and LoRA, leveraging Hugging Face's accelerate library for distributed training and efficient memory management.

Quick Start & Requirements

Installation: Use uv for environment management and install PyTorch with CUDA 12.8 support. Optionally install Triton kernels for MXFP4 quantization.

uv venv gpt-oss --python 3.11 && source gpt-oss/bin/activate
uv pip install --upgrade pip
uv pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
uv pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
uv pip install -r requirements.txt

Prerequisites: Python 3.11, PyTorch with CUDA 12.8, Triton (optional).
Usage: Edit model_path in scripts to select 20B or 120B models. Run inference with python generate_<script_name>.py or torchrun for distributed inference. Training examples are provided for full-parameter and LoRA methods using accelerate launch.
Links: Resources Blog, Cookbook

Highlighted Details

Scripts demonstrate Tensor Parallelism, Flash Attention, and Continuous Batching for inference.
Supports full-parameter and LoRA fine-tuning.
Includes configuration files for distributed training (e.g., zero3.yaml).
Examples cover both 20B and 120B model variants.

Maintenance & Community

This project is associated with Hugging Face. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking would require clarification.

Limitations & Caveats

The README does not specify the license, which is crucial for commercial use. The installation instructions for Triton kernels mention optional MXFP4 quantization support, implying potential hardware-specific requirements or performance variations.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

13 stars in the last 30 days