Skip to content

OpenAI Open Source Model (gpt-oss series).
OpenAI Open Source Model (gpt-oss series). Source: OpenAI

OpenAI has released gpt-oss series, a new line of open-weight models built for strong reasoning, agentic tasks, and flexible developer use. There are two variants of these models:

  • gpt-oss-120b — with 117B parameters with 5.1B active parameters and suits for production, general purpose, high reasoning use cases that fit into a single H100 GPU.

  • gpt-oss-20b — with 21B parameters with 3.6B active parameters and fits for low latency, and local or specialized use cases.

NOTE

You can try out gpt-oss model on gpt-oss playground.

Key Features of gpt-oss Model

  • Apache 2.0 License: Use, modify, and sell your projects freely without legal worries.

  • Custom reasoning levels: Choose the model reasoning level (low, medium, high) based on your scenario and latency needs.

  • Full chain-of-thought: Full access to how the model reasons, which helps with debugging and understanding its outputs.

  • Fine-tuning: Train the models further to fit your specific needs.

  • Built-in tools: Include function calling, web brwosing, Python code execution and structured outputs.

  • Native MXFP4 quantization: The models use a memory-efficient format (MXFP4), therefore allowing gpt-oss-120b to run on a single H100 GPU and gpt-oss-20b to run in just 16GB of memory.

NOTE

gpt-oss-120b and gpt-oss-20b were trained on harmony response format (the only format should be used).

Harmony Response Format

Roles

Every message that the model processes has a role associated with it. The types of roles:

  • system – Sets reasoning level, knowledge cutoff, and built-in tools.

  • developer – Provides model instructions and available tools.

  • user – Represents user input to the model.

  • assistant – Model’s output: a reply or tool call, sometimes tied to a specific channel.

  • tool – Output from a tool call; the tool’s name is used as the role in the message.

Channels

Assistant messages can be sent through three distinct “channels,” separating user-visible replies from internal reasoning and tool-related content.

  • final – Model’s response meant for the end-user.

  • analysis – Used for the model’s internal reasoning / chain-of-thought (CoT); not safe for user display

  • commentary – Typically used for function tool calls or preambles before tool use; sometimes includes built-in tools

Special tokens

The model uses special tokens to understand input structure. Here’s what each token means:

  • <|start|> – Marks the beginning of a message, followed by header info (starting with role).

  • <|end|> – Marks the end of a message.

  • <|message|> – Separates the message header from its main content.

  • <|channel|> – Indicates the start of channel information in the header.

  • <|constrain|> – Marks the start of data type definitions in a tool call.

  • <|return|> – Signals the end of the model’s response; inference should stop.

  • <|call|> – Signals a tool call; also used to stop inference.

Message format

The harmony response format is made up of “messages” and the model may generate several messages at once. Each message generally follows this structure:

md
<|start|>{header}<|message|>{content}<|end|>

Here’s an example of how special tokens are used in the harmony message format for chat conversations. For more use cases, see the OpenAI Harmony Response Format.

Example input

md
<|start|>user<|message|>What is 2 + 2?<|end|>
<|start|>assistant

This is a basic chat format where the input begins with a user message, followed by the assistant.

Example output

md
<|channel|>analysis<|message|>User asks: "What is 2 + 2?" Simple arithmetic. Provide answer.<|end|>
<|start|>assistant<|channel|>final<|message|>2 + 2 = 4.<|return|>

The output includes a message in the analysis channel for the model's chain-of-thought reasoning. Then it switches to the final channel and ends with <|return|> when the final answer is generated.

How to setup gpt-oss?

Requirements

RequirementDetails
Python version3.12
macOSInstall Xcode Command Line Tools: xcode-select --install
LinuxRequires CUDA for reference implementations
WindowsNot officially tested; consider using tools like Ollama for local runs

Installation

You can install it directly from PyPI if you want to try of the code.

md
# if you just need the tools
pip install gpt-oss
# if you want to try the torch implementation
pip install gpt-oss[torch]
# if you want to try the triton implementation
pip install gpt-oss[triton]

Download the model

The model weights are available on Hugging Face Hub. You can download it by Hugging Face CLI:

md
# gpt-oss-120b
huggingface-cli download openai/gpt-oss-120b --include "original/*" --local-dir gpt-oss-120b/

# gpt-oss-20b
huggingface-cli download openai/gpt-oss-20b --include "original/*" --local-dir gpt-oss-20b/

Resources

Last updated:

Made with ❤️ by Mun Bock Ho

Copyright ©️ 2025