gpt-oss by openai

Open-weight LLMs for reasoning and agents

Created 5 months ago

19,316 stars

Top 2.3% on SourcePulse

View on GitHub

20 Experts Love This Project

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

Daniel Han

Cofounder of Unsloth

Alexander Borzunov

Research Scientist at OpenAI

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 16 more!

Project Summary

OpenAI's gpt-oss models (120B and 20B parameters) are open-weight language models designed for advanced reasoning, agentic tasks, and developer use cases. They offer full chain-of-thought, fine-tunability, and native agentic capabilities like function calling and code execution, all under a permissive Apache 2.0 license.

How It Works

These models utilize a Mixture-of-Experts (MoE) architecture, with the 120B model featuring 5.1B active parameters and the 20B model featuring 3.6B active parameters. A key innovation is their native MXFP4 quantization for MoE layers, enabling efficient inference on single GPUs (H100 for 120B) and reduced memory footprints. They are trained with a specific "harmony" response format, crucial for correct operation.

Quick Start & Requirements

Installation: pip install gpt-oss[torch], gpt-oss[triton], or gpt-oss[metal]. For vLLM: uv pip install --pre vllm==0.10.1+gptoss --extra-index-url https://wheels.vllm.ai/gpt-oss/ --extra-index-url https://download.pytorch.org/whl/nightly/cu128 --index-strategy unsafe-best-match.
Prerequisites: Python 3.12. Linux requires CUDA. macOS requires Xcode CLI tools. Windows is untested.
Model Weights: Download from Hugging Face Hub using huggingface-cli download.
Resources: gpt-oss-120b runs on a single H100 GPU with MXFP4 quantization. gpt-oss-20b requires ~16GB memory.
Docs: Guides, Model card, OpenAI blog.

Highlighted Details

Apache 2.0 license for commercial use and distribution.
Configurable reasoning effort (low, medium, high).
Native support for function calling, web browsing, and Python code execution via the harmony format.
Reference implementations available for PyTorch, Triton (single GPU), and Metal (Apple Silicon).

Maintenance & Community

This repository focuses on reference implementations; OpenAI does not intend to accept new feature contributions beyond bug fixes. Contributions to the awesome-gpt-oss.md list are welcome.

Licensing & Compatibility

Permissive Apache 2.0 license. Compatible with commercial and closed-source applications.

Limitations & Caveats

The PyTorch reference implementation is inefficient and requires multiple H100 GPUs. Metal and Triton implementations are for educational purposes and not production-ready. The Python tool implementation runs in a permissive Docker container, posing potential security risks.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

340 stars in the last 30 days