gpt-oss-recipes  by huggingface

OpenAI GPT-OSS model optimization and fine-tuning

created 2 weeks ago

New!

374 stars

Top 75.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a collection of scripts and notebooks for optimizing and fine-tuning OpenAI's GPT-OSS models (20B and 120B parameters). It targets researchers and engineers working with large language models, offering practical examples for efficient inference and training.

How It Works

The collection demonstrates various optimization techniques including Tensor Parallelism, Flash Attention, and Continuous Batching for enhanced inference performance. For fine-tuning, it supports both full-parameter training and LoRA, leveraging Hugging Face's accelerate library for distributed training and efficient memory management.

Quick Start & Requirements

  • Installation: Use uv for environment management and install PyTorch with CUDA 12.8 support. Optionally install Triton kernels for MXFP4 quantization.
    uv venv gpt-oss --python 3.11 && source gpt-oss/bin/activate
    uv pip install --upgrade pip
    uv pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
    uv pip install git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels
    uv pip install -r requirements.txt
    
  • Prerequisites: Python 3.11, PyTorch with CUDA 12.8, Triton (optional).
  • Usage: Edit model_path in scripts to select 20B or 120B models. Run inference with python generate_<script_name>.py or torchrun for distributed inference. Training examples are provided for full-parameter and LoRA methods using accelerate launch.
  • Links: Resources Blog, Cookbook

Highlighted Details

  • Scripts demonstrate Tensor Parallelism, Flash Attention, and Continuous Batching for inference.
  • Supports full-parameter and LoRA fine-tuning.
  • Includes configuration files for distributed training (e.g., zero3.yaml).
  • Examples cover both 20B and 120B model variants.

Maintenance & Community

This project is associated with Hugging Face. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking would require clarification.

Limitations & Caveats

The README does not specify the license, which is crucial for commercial use. The installation instructions for Triton kernels mention optional MXFP4 quantization support, implying potential hardware-specific requirements or performance variations.

Health Check
Last commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
20
Issues (30d)
4
Star History
374 stars in the last 15 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
259
Efficiently train foundation models with PyTorch
created 1 year ago
updated 3 weeks ago
Starred by Junyang Lin Junyang Lin(Core Maintainer of Alibaba Qwen), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
5 more.

dbrx by databricks

0.0%
3k
Large language model for research/commercial use
created 1 year ago
updated 1 year ago
Feedback? Help us improve.