Open-weight LLMs for reasoning and agents
Top 3.1% on SourcePulse
OpenAI's gpt-oss models (120B and 20B parameters) are open-weight language models designed for advanced reasoning, agentic tasks, and developer use cases. They offer full chain-of-thought, fine-tunability, and native agentic capabilities like function calling and code execution, all under a permissive Apache 2.0 license.
How It Works
These models utilize a Mixture-of-Experts (MoE) architecture, with the 120B model featuring 5.1B active parameters and the 20B model featuring 3.6B active parameters. A key innovation is their native MXFP4 quantization for MoE layers, enabling efficient inference on single GPUs (H100 for 120B) and reduced memory footprints. They are trained with a specific "harmony" response format, crucial for correct operation.
Quick Start & Requirements
pip install gpt-oss[torch]
, gpt-oss[triton]
, or gpt-oss[metal]
. For vLLM: uv pip install --pre vllm==0.10.1+gptoss --extra-index-url https://wheels.vllm.ai/gpt-oss/ --extra-index-url https://download.pytorch.org/whl/nightly/cu128 --index-strategy unsafe-best-match
.huggingface-cli download
.Highlighted Details
harmony
format.Maintenance & Community
This repository focuses on reference implementations; OpenAI does not intend to accept new feature contributions beyond bug fixes. Contributions to the awesome-gpt-oss.md
list are welcome.
Licensing & Compatibility
Permissive Apache 2.0 license. Compatible with commercial and closed-source applications.
Limitations & Caveats
The PyTorch reference implementation is inefficient and requires multiple H100 GPUs. Metal and Triton implementations are for educational purposes and not production-ready. The Python tool implementation runs in a permissive Docker container, posing potential security risks.
1 day ago
Inactive