gpt-oss  by openai

Open-weight LLMs for reasoning and agents

Created 3 months ago
18,823 stars

Top 2.4% on SourcePulse

GitHubView on GitHub
Project Summary

OpenAI's gpt-oss models (120B and 20B parameters) are open-weight language models designed for advanced reasoning, agentic tasks, and developer use cases. They offer full chain-of-thought, fine-tunability, and native agentic capabilities like function calling and code execution, all under a permissive Apache 2.0 license.

How It Works

These models utilize a Mixture-of-Experts (MoE) architecture, with the 120B model featuring 5.1B active parameters and the 20B model featuring 3.6B active parameters. A key innovation is their native MXFP4 quantization for MoE layers, enabling efficient inference on single GPUs (H100 for 120B) and reduced memory footprints. They are trained with a specific "harmony" response format, crucial for correct operation.

Quick Start & Requirements

  • Installation: pip install gpt-oss[torch], gpt-oss[triton], or gpt-oss[metal]. For vLLM: uv pip install --pre vllm==0.10.1+gptoss --extra-index-url https://wheels.vllm.ai/gpt-oss/ --extra-index-url https://download.pytorch.org/whl/nightly/cu128 --index-strategy unsafe-best-match.
  • Prerequisites: Python 3.12. Linux requires CUDA. macOS requires Xcode CLI tools. Windows is untested.
  • Model Weights: Download from Hugging Face Hub using huggingface-cli download.
  • Resources: gpt-oss-120b runs on a single H100 GPU with MXFP4 quantization. gpt-oss-20b requires ~16GB memory.
  • Docs: Guides, Model card, OpenAI blog.

Highlighted Details

  • Apache 2.0 license for commercial use and distribution.
  • Configurable reasoning effort (low, medium, high).
  • Native support for function calling, web browsing, and Python code execution via the harmony format.
  • Reference implementations available for PyTorch, Triton (single GPU), and Metal (Apple Silicon).

Maintenance & Community

This repository focuses on reference implementations; OpenAI does not intend to accept new feature contributions beyond bug fixes. Contributions to the awesome-gpt-oss.md list are welcome.

Licensing & Compatibility

Permissive Apache 2.0 license. Compatible with commercial and closed-source applications.

Limitations & Caveats

The PyTorch reference implementation is inefficient and requires multiple H100 GPUs. Metal and Triton implementations are for educational purposes and not production-ready. The Python tool implementation runs in a permissive Docker container, posing potential security risks.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
24
Issues (30d)
4
Star History
508 stars in the last 30 days

Explore Similar Projects

Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
12 more.

mistral.rs by EricLBuehler

0.1%
6k
LLM inference engine for blazing fast performance
Created 1 year ago
Updated 4 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
54 more.

llama.cpp by ggml-org

0.6%
88k
C/C++ library for local LLM inference
Created 2 years ago
Updated 15 hours ago
Feedback? Help us improve.