gpt-oss  by openai

Open-weight LLMs for reasoning and agents

created 1 month ago
15,729 stars

Top 3.1% on SourcePulse

GitHubView on GitHub
Project Summary

OpenAI's gpt-oss models (120B and 20B parameters) are open-weight language models designed for advanced reasoning, agentic tasks, and developer use cases. They offer full chain-of-thought, fine-tunability, and native agentic capabilities like function calling and code execution, all under a permissive Apache 2.0 license.

How It Works

These models utilize a Mixture-of-Experts (MoE) architecture, with the 120B model featuring 5.1B active parameters and the 20B model featuring 3.6B active parameters. A key innovation is their native MXFP4 quantization for MoE layers, enabling efficient inference on single GPUs (H100 for 120B) and reduced memory footprints. They are trained with a specific "harmony" response format, crucial for correct operation.

Quick Start & Requirements

  • Installation: pip install gpt-oss[torch], gpt-oss[triton], or gpt-oss[metal]. For vLLM: uv pip install --pre vllm==0.10.1+gptoss --extra-index-url https://wheels.vllm.ai/gpt-oss/ --extra-index-url https://download.pytorch.org/whl/nightly/cu128 --index-strategy unsafe-best-match.
  • Prerequisites: Python 3.12. Linux requires CUDA. macOS requires Xcode CLI tools. Windows is untested.
  • Model Weights: Download from Hugging Face Hub using huggingface-cli download.
  • Resources: gpt-oss-120b runs on a single H100 GPU with MXFP4 quantization. gpt-oss-20b requires ~16GB memory.
  • Docs: Guides, Model card, OpenAI blog.

Highlighted Details

  • Apache 2.0 license for commercial use and distribution.
  • Configurable reasoning effort (low, medium, high).
  • Native support for function calling, web browsing, and Python code execution via the harmony format.
  • Reference implementations available for PyTorch, Triton (single GPU), and Metal (Apple Silicon).

Maintenance & Community

This repository focuses on reference implementations; OpenAI does not intend to accept new feature contributions beyond bug fixes. Contributions to the awesome-gpt-oss.md list are welcome.

Licensing & Compatibility

Permissive Apache 2.0 license. Compatible with commercial and closed-source applications.

Limitations & Caveats

The PyTorch reference implementation is inefficient and requires multiple H100 GPUs. Metal and Triton implementations are for educational purposes and not production-ready. The Python tool implementation runs in a permissive Docker container, posing potential security risks.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
102
Issues (30d)
0
Star History
15,926 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
created 5 years ago
updated 3 days ago
Feedback? Help us improve.