跳轉到

你取消了選擇,我就採用最合理的預設做法:保留繁體中文原文,並在每段/每個項目下方緊接英文翻譯(標題格式為「中文原文 / English」)。程式碼、圖片連結、引用區塊原樣保留。以下為純 Markdown 輸出。


[ 開源模型 ] OpenAI 承諾的開源模型終於出來了 — GPT-OSS 模型正式介紹 / [ Open-Source Model ] OpenAI's Promised Open-Source Model Is Finally Here — An Official Introduction to GPT-OSS

Author: Simon Liu
Published:
Source: https://medium.com/@simon3458/gpt-oss-intro-2025-2-b8aad71b5d4f
Fetched: 2026-06-07T02:11:59.853429


[ 開源模型 ] OpenAI 承諾的開源模型終於出來了 — GPT-OSS 模型正式介紹 / [ Open-Source Model ] OpenAI's Promised Open-Source Model Is Finally Here — An Official Introduction to GPT-OSS

花了不少時間測試、閱讀和理解,但內容如有錯誤,再請訊息給我,感謝!

I spent quite a bit of time testing, reading, and understanding this. If there are any errors in the content, please message me — thank you!

Press enter or click to view image in full size

Press enter or click to view image in full size

[ 開源模型 ] OpenAI 承諾的開源模型終於出來了 — GPT-OSS 模型正式介紹

[ Open-Source Model ] OpenAI's Promised Open-Source Model Is Finally Here — An Official Introduction to GPT-OSS

I. GPT-OSS 模型介紹 / I. Introduction to the GPT-OSS Model

模型名稱 / Model Name

GPT-OSS 系列模型,此模型是由 OpenAI 所 Fine-Tuning,並且開源出來的 LLM 模型。

The GPT-OSS series of models — these are large language models (LLM) fine-tuned by OpenAI and released as open source.

模型開源狀況 / License / Open-Source Status / License

從 HuggingFace 資訊可以看到,Apache 2.0 License,是一個開源模型。

As can be seen from the HuggingFace information, it uses the Apache 2.0 License, making it an open-source model.

參數量 / Parameter Count

共計有兩種模型:

There are two models in total:

  • gpt-oss-20b:

  • gpt-oss-20b:

[## openai/gpt-oss-20b · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co](https://huggingface.co/openai/gpt-oss-20b?source=post_page-----b8aad71b5d4f---------------------------------------)

  • gpt-oss-120b:

  • gpt-oss-120b:

[## openai/gpt-oss-120b · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co](https://huggingface.co/openai/gpt-oss-120b?source=post_page-----b8aad71b5d4f---------------------------------------)

能力與架構概覽 / Capabilities and Architecture Overview

  • 共 21B 與117B 參數,對應 3.6B 與 5.1B 活躍參數。

  • A total of 21B and 117B parameters, corresponding to 3.6B and 5.1B active parameters respectively.

Press enter or click to view image in full size

Press enter or click to view image in full size

Table 1: Model parameter counts

  • 4‑bit MXFP4 量化僅應用於 MoE 權重:120B 版可容納於單張 80 GB GPU,20B 版可容納於單張 16 GB GPU。

  • 4-bit MXFP4 quantization is applied only to the Mixture-of-Experts (MoE) weights: the 120B version fits on a single 80 GB GPU, and the 20B version fits on a single 16 GB GPU.

  • 純文字推理模型,內建鍊式思考(Chain‑of‑Thought)並可調節推理強度。

  • A text-only reasoning model with built-in Chain-of-Thought (CoT) and adjustable reasoning effort.

  • 支援指令跟隨與工具調用,支援生成式 AI 和 AI Agent 工作流程。

  • Supports instruction following and tool calling, and supports generative AI and AI Agent workflows.

架構細節 / Architecture Details

  • Token‑choice MoE,並且採用 SwiGLU。

  • Token-choice MoE, and it adopts SwiGLU.

  • 選出 Top‑k 專家後對其權重執行 softmax(softmax‑after‑topk)。

  • After selecting the Top-k experts, softmax is applied to their weights (softmax-after-topk).

  • 注意力層使用 RoPE,相對位置編碼最長支援128K Token。

  • The attention layers use RoPE (Rotary Position Embedding), with relative position encoding supporting up to 128K tokens.

  • 注意力層交替採用「全域上下文」與「128 Token Slide Window」機制。

  • The attention layers alternate between a "global context" mechanism and a "128-token sliding window" mechanism.

  • 每個注意力頭引入 learned attention sink:在softmax 分母中加入可學習偏置,增強長上下文穩定性。

  • Each attention head introduces a learned attention sink: a learnable bias is added to the softmax denominator to enhance long-context stability.

  • 與 GPT‑4o 等 OpenAI API 模型共用分詞器,並新增 Token 以相容於Responses API。

  • It shares the tokenizer with OpenAI API models such as GPT-4o, and adds new tokens for compatibility with the Responses API.

II. PlayGround / II. PlayGround

  • HuggingFace:

  • HuggingFace:

[## gpt-oss playground

A demo of OpenAI's open-weight models, gpt‑oss‑120b and gpt‑oss‑20b, for developers.

gpt-oss.com](https://gpt-oss.com/?source=post_page-----b8aad71b5d4f---------------------------------------)

  • Nvidia Nim:

  • Nvidia Nim:

[## gpt-oss-120b Model by OpenAI | NVIDIA NIM

Mixture of Experts (MoE) reasoning LLM (text-only) designed to fit within 80GB GPU.

build.nvidia.com](https://build.nvidia.com/openai/gpt-oss-120b?source=post_page-----b8aad71b5d4f---------------------------------------)

III. 雲端部署 / III. Cloud Deployment

Google Vertex AI / Google Vertex AI

Vertex AI 上已經支援 OpenAI gpt-oss,Vertex AI Model Garden 連結: https://goo.gle/41mmJfa

Vertex AI already supports OpenAI gpt-oss. Vertex AI Model Garden link: https://goo.gle/41mmJfa

Press enter or click to view image in full size

Press enter or click to view image in full size

Azure / Azure

Azure AI Model Catalog 可支援 GPT-oss 模型,方便客戶在託管線上端點中安全部署,借助Azure 的企業級基礎設施、自動擴縮與監控能力。

The Azure AI Model Catalog supports the GPT-oss models, making it convenient for customers to securely deploy them on managed online endpoints, leveraging Azure's enterprise-grade infrastructure, autoscaling, and monitoring capabilities.

GPT OSS 模型現已登錄Azure AI Model Catalog(GPT OSS 20BGPT OSS 120B),可直接部署至線上端點進行即時推理。

The GPT OSS models are now listed in the Azure AI Model Catalog (GPT OSS 20B, GPT OSS 120B), and can be deployed directly to online endpoints for real-time inference.

Press enter or click to view image in full size

Press enter or click to view image in full size

IV. 模型地端啟動方式 / IV. Ways to Run the Model On-Premises

[## OpenAI gpt-oss · Ollama Blog

Ollama partners with OpenAI to bring gpt-oss to Ollama and its community.

ollama.com](https://ollama.com/blog/gpt-oss?source=post_page-----b8aad71b5d4f---------------------------------------)

[## How to run gpt-oss locally with Ollama | OpenAI Cookbook

Want to get OpenAI gpt-oss running on your own hardware? This guide will walk you through how to use Ollama to set up…

cookbook.openai.com](https://cookbook.openai.com/articles/gpt-oss/run-locally-ollama?source=post_page-----b8aad71b5d4f---------------------------------------)

可在 Google Colab 使用 L4 啟動 20B 模型,並且可使用:

You can launch the 20B model on Google Colab using an L4, and you can use:

Press enter or click to view image in full size

Press enter or click to view image in full size

文章:https://medium.com/@simon3458/open-source-model-gpt-oss-2025-1-c9e6cee8b43e

Article: https://medium.com/@simon3458/open-source-model-gpt-oss-2025-1-c9e6cee8b43e

我的 Google Colab 程式碼:

My Google Colab code:

[## Google Colab

Edit description

colab.research.google.com](https://colab.research.google.com/github/LiuYuWei/llm-colab-application/blob/main/Simon_LLM_Application_gpt_oss_Ollama_Llm_Service.ipynb?source=post_page-----b8aad71b5d4f---------------------------------------)

[## Using NVIDIA TensorRT-LLM to run gpt-oss-20b | OpenAI Cookbook

This notebook provides a step-by-step guide on how to optimizing gpt-oss models using NVIDIA's TensorRT-LLM for…

cookbook.openai.com](https://cookbook.openai.com/articles/run-nvidia?source=post_page-----b8aad71b5d4f---------------------------------------)

啟動沒問題,也可以跟 LiteLLM 去做整合使用。

Launching it works fine, and it can also be integrated with LiteLLM for use.

非直接支援 — VLLM / Not Directly Supported — VLLM

[## GPT OSS

gpt-oss-20b and gpt-oss-120b are powerful reasoning models open-sourced by OpenAI. In vLLM, you can run it on NVIDIA…

docs.vllm.ai](https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html?source=post_page-----b8aad71b5d4f---------------------------------------)

GPT-OSS 模型現階段不推薦使用 VLLM 去做服務啟動,因為 VLLM 目前猜測,應該是將 NVIDIA TensorRT-LLM 包在 Docker image 中去做使用,導致官方塞滿各種 Github Issue Bug ,會建議等到 VLLM 正式官方支援再做導入

At this stage, it is not recommended to use VLLM to serve the GPT-OSS models, because — as currently speculated — VLLM appears to wrap NVIDIA TensorRT-LLM inside a Docker image for use, which has caused the official repository to be flooded with all sorts of GitHub issue bugs. It is therefore recommended to wait until VLLM provides official support before adopting it.

Press enter or click to view image in full size

Press enter or click to view image in full size

從設定參數有使用到 TRTLLM 相關字眼,可以推測應該是有使用到 TensorRT-LLM

Since the configuration parameters include TRTLLM-related wording, one can infer that TensorRT-LLM is indeed being used.

Press enter or click to view image in full size

Press enter or click to view image in full size

光是 Image 就差了快 6 GB,可見此 Image 還不算是直接支援

The image alone differs by nearly 6 GB, which shows that this image does not yet count as direct support.

Transformer + Torch / Transformer + Torch

官方範例如下:

The official example is as follows:

[## How to run gpt-oss with Transformers | OpenAI Cookbook

The Transformers library by Hugging Face provides a flexible way to load and run large language models locally or on a…

cookbook.openai.com](https://cookbook.openai.com/articles/gpt-oss/run-transformers?source=post_page-----b8aad71b5d4f---------------------------------------)

官方有教學如何在 Google Colab 上運行 GPT-OSS 模型,詳見以下連結

There is an official tutorial on how to run the GPT-OSS models on Google Colab. See the link below for details:

[## How to run gpt-oss-20b on Google Colab | OpenAI Cookbook

OpenAI released gpt-oss 120B and 20B. Both models are Apache 2.0 licensed. Specifically, gpt-oss-20b was made for lower…

cookbook.openai.com](https://cookbook.openai.com/articles/gpt-oss/run-colab?source=post_page-----b8aad71b5d4f---------------------------------------)

V. OpenAI 回覆格式 — Harmony 格式 / V. OpenAI Response Format — The Harmony Format

gpt-oss 模型基於 Harmony 回應格式進行訓練,用於定義對話結構、產生推理輸出和建構函數呼叫。如果您不是 gpt-oss 模型直接使用,而是透過 API 或 Ollama 等提供者使用,則無需擔心這一點,因為您的推理解決方案會自行處理格式。gpt-oss 模型建議不要不使用 Harmony 格式,因為它將可能無法正常運作。

The gpt-oss models are trained on the Harmony response format, which is used to define the conversation structure, generate reasoning output, and construct function calls. If you are not using the gpt-oss models directly but instead through a provider such as an API or Ollama, you do not need to worry about this, because your inference solution will handle the format on its own. For the gpt-oss models, it is recommended that you do not skip using the Harmony format, because otherwise it may not work properly.

Roles / Roles

Press enter or click to view image in full size

Press enter or click to view image in full size

模型在出現任何指令衝突時所應用的資訊層次結構:system > developer > user > assistant > tool

The information hierarchy the model applies whenever there is any instruction conflict: system > developer > user > assistant > tool

Channels / Channels

Press enter or click to view image in full size

Press enter or click to view image in full size

模型輸入的 Harmony 格式如下:

The Harmony format for model input is as follows:

Press enter or click to view image in full size

Press enter or click to view image in full size

Harmony renderer library / Harmony renderer library

官方有提供相關 PyPI 套件,將自動以正確的格式渲染您的訊息並將其轉換為模型處理的標記。

The official team provides a related PyPI package that will automatically render your messages in the correct format and convert them into the tokens the model processes.

[## Client Challenge

Edit description

pypi.org](https://pypi.org/project/openai-harmony/?source=post_page-----b8aad71b5d4f---------------------------------------)

VI. 推理控制 / VI. Reasoning Control

GPT-OSS 模型是推理模型。預設情況下,該模型將進行 medium 的推理。為了控制推理,您可以在系統訊息中將推理等級指定為 low、medium或high。

The GPT-OSS models are reasoning models. By default, the model performs medium-level reasoning. To control reasoning, you can specify the reasoning level as low, medium, or high in the system message.

建議格式為:

The recommended format is:

Reasoning: high

模型將把原始的思路(CoT)作為輔助訊息輸出到 analysis 通道中,而最終的響應將輸出為 final。

The model will output its raw chain of thought (CoT) as auxiliary information into the analysis channel, while the final response is output as final.

Press enter or click to view image in full size

Press enter or click to view image in full size

Reasoning High 的狀況,思考過程長度比較長

In the Reasoning High case, the length of the thinking process is relatively long.

Press enter or click to view image in full size

Press enter or click to view image in full size

Reasoning low 的狀況,思考過程長度就會比較簡單

In the Reasoning low case, the length of the thinking process is relatively simple.

VII. Function / Tool Calling / VII. Function / Tool Calling

GPT-OSS 模型也可以進行 Function Calling,目前作者使用的 API ,透過 Google ADK 去做導入工作上,確認可以使用 Function / Tool Calling,且整體推理方向上是很好的,只是說在格式上,目前因為 Harmony 格式和 Google ADK 支援格式是有差別的,所以後續可能需要在 Chat Template 來做調整,就可以正常在 UI 上呈現。

The GPT-OSS models can also perform Function Calling. With the API the author is currently using, integrated into work via Google ADK, it has been confirmed that Function / Tool Calling works, and the overall reasoning direction is good. However, in terms of format, because there is currently a difference between the Harmony format and the format supported by Google ADK, some adjustment to the Chat Template may be needed afterward so that it can display normally in the UI.

Press enter or click to view image in full size

Press enter or click to view image in full size

Google ADK AI Agent Tool 應用 GPT-OSS 去做測試

Testing the use of GPT-OSS with a Google ADK AI Agent Tool.

從圖片上可以看到以下推理 -> 使用工具 -> 得到工具結果 -> 輸出回覆的過程

From the image, you can see the following process: reasoning -> using the tool -> getting the tool result -> outputting the response.

Analysis / Analysis

We need to call get_current_time with the parameter timezone_str set to "Asia/Taipei".
Then, respond in the user's language.
The user wrote "Asia/Taipei time" in English, so we will respond in English.

Function Call / Function Call

Use tool.assistant to invoke:

Use tool.assistant to invoke:

{  
  "timezone_str": "Asia/Taipei"  
}

Response / Response

{  
  "status": "success",  
  "result": "2025-08-06 14:32:07"  
}

Final Output / Final Output

The current time in Asia/Taipei is 2025–08–06 14:32:07.

處理思維鏈的詳細資訊,可見此文章 / For details on handling the chain of thought, see this article

[## How to handle the raw chain of thought in gpt-oss | OpenAI Cookbook

The gpt-oss models provide access to a raw chain of thought (CoT) meant for analysis and safety research by model…

cookbook.openai.com](https://cookbook.openai.com/articles/gpt-oss/handle-raw-cot?source=post_page-----b8aad71b5d4f---------------------------------------)

VIII. 評測結果 / VIII. Evaluation Results

推理與知識能力 / Reasoning and Knowledge Capabilities

  • MMLU(大學水平):gpt-oss-120b ≈ 90%,接近 o4-mini。

  • MMLU (college level): gpt-oss-120b ≈ 90%, close to o4-mini.

  • AIME(競賽數學):> 95%,表現優異。

  • AIME (competition mathematics): > 95%, an excellent performance.

  • GPQA(博士級問題):中等表現,受限於模型大小。

  • GPQA (PhD-level questions): a moderate performance, limited by model size.

  • HLE(專家級推理):表現仍有進步空間。

  • HLE (expert-level reasoning): there is still room for improvement in performance.

程式能力(Codeforces、SWE-Bench) / Coding Capabilities (Codeforces, SWE-Bench)

  • gpt-oss-120b 與 o4-mini 差距不大。

  • gpt-oss-120b is not far behind o4-mini.

  • gpt-oss-20b 也具競爭力,尤其在資源受限場景具吸引力。

  • gpt-oss-20b is also competitive, and is especially attractive in resource-constrained scenarios.

工具使用能力(function calling) / Tool-Use Capabilities (function calling)

  • 高推理模式下可準確執行複雜函式邏輯。

  • In high-reasoning mode, it can accurately execute complex function logic.

醫療領域(HealthBench) / Medical Domain (HealthBench)

  • gpt-oss-120b 幾乎可匹敵 OpenAI o3,明顯優於 GPT-4o 與 o4-mini。

  • gpt-oss-120b can almost match OpenAI o3, and is clearly superior to GPT-4o and o4-mini.

多語言能力(MMMLU) / Multilingual Capabilities (MMMLU)

  • gpt-oss-120b 高推理模式:平均達 81.3%,接近 o4-mini-high。

  • gpt-oss-120b in high-reasoning mode: an average of 81.3%, close to o4-mini-high.

Press enter or click to view image in full size

Press enter or click to view image in full size

Evaluations across multiple benchmarks and reasoning levels.

國外有人實測有關 SQL query 的查詢,結果發現 Gemini 2.5 Flash 已經被 OpenAI 推翻!該模型以一半的輸入成本和四分之一的輸出成本實現了更快的速度和更高的中位數得分。這使得它成為同價位 SQL 查詢產生的最佳模型,詳情可見以下部落格:

Someone abroad ran a real-world test on SQL query generation and found that Gemini 2.5 Flash has been overturned by OpenAI! The model achieves faster speed and a higher median score at half the input cost and a quarter of the output cost. This makes it the best model for SQL query generation in the same price range. For details, see the blog below:

[## OpenAI just released GPT-oss, its first open-source model since GPT-2. Is it as good as they say?

OpenAI did something that even Raven Baxter couldn't predict…

medium.com](/@austin-starks/openai-just-released-gpt-oss-its-first-open-source-model-since-gpt-2-is-it-as-good-as-they-say-8567731bb8d7?source=post_page-----b8aad71b5d4f---------------------------------------)

中文化能力和資訊,後續將會在我的 Linkedin 上,歡迎大家持續追蹤 Linkedin: https://simonliuyuwei.my.canva.site/link-in-bio

Information on its Chinese-language capabilities will follow on my LinkedIn. You are welcome to keep following — LinkedIn: https://simonliuyuwei.my.canva.site/link-in-bio

IX. 安全性設計與評估 / IX. Safety Design and Evaluation

拒絕不當內容的能力 / Ability to Refuse Inappropriate Content

  • 與 o4-mini 持平,部分項目如生物風險、未達「高能力」警戒線。

  • On par with o4-mini; some items, such as biological risk, do not reach the "high-capability" warning threshold.

Press enter or click to view image in full size

Press enter or click to view image in full size

Press enter or click to view image in full size

Press enter or click to view image in full size

抗 Jailbreak 能力 / Resistance to Jailbreaks

  • 接近 o4-mini,但在 Instruction Hierarchy(指令優先級防繞過)稍弱。

  • Close to o4-mini, but slightly weaker on Instruction Hierarchy (preventing bypass of instruction priority).

Press enter or click to view image in full size

Press enter or click to view image in full size

幻覺控制(Hallucinations) / Hallucination Control (Hallucinations)

  • 相較 o4-mini 稍高,未開啟瀏覽功能下表現較差。

  • Slightly higher than o4-mini; performance is worse when the browsing function is not enabled.

Press enter or click to view image in full size

Press enter or click to view image in full size

X. 結論 / X. Conclusion

GPT-OSS 系列模型的推出,象徵 OpenAI 在開放生態策略上的重大轉向。透過 Apache 2.0 授權,OpenAI 不僅釋出了具競爭力的 20B 與 120B 推理模型,更同步提供完整的推論控制、Harmony 格式、Function Calling 支援等核心功能,使其在多數商業與研發場景下具備實用性與落地性。

The launch of the GPT-OSS series marks a major shift in OpenAI's open-ecosystem strategy. Through the Apache 2.0 license, OpenAI has not only released competitive 20B and 120B reasoning models, but has also simultaneously provided core features such as complete inference control, the Harmony format, and Function Calling support, making the models practical and deployable in most commercial and R&D scenarios.

評測結果顯示,GPT-OSS-120B 模型在 MMLU、AIME、HealthBench 等多項關鍵指標上表現穩定,並具備接近 o4-mini 模型的實力,尤其在醫療領域和程式應用場景表現突出。搭配如 NVIDIA TensorRT-LLM、Google Vertex AI、Azure 等雲端與地端部署方式,企業導入門檻進一步降低。

The evaluation results show that the GPT-OSS-120B model performs stably across multiple key metrics such as MMLU, AIME, and HealthBench, and has strength close to that of the o4-mini model — particularly standing out in the medical domain and in programming application scenarios. Combined with cloud and on-premises deployment options such as NVIDIA TensorRT-LLM, Google Vertex AI, and Azure, the barrier to enterprise adoption is further lowered.

然而,GPT-OSS 目前仍在初期發展階段,針對幻覺控制、Jailbreak 抗性與極高難度推理(如 GPQA、HLE)尚有明顯進步空間。此外,VLLM 等工具整合尚不完全成熟,導入仍需審慎評估後續是否支援。

However, GPT-OSS is still in an early stage of development, and there is clearly room for improvement in hallucination control, jailbreak resistance, and extremely high-difficulty reasoning (such as GPQA and HLE). In addition, integration with tools such as VLLM is not yet fully mature, so adoption still requires careful evaluation of whether support will be available going forward.

整體而言,GPT-OSS 是目前開源 LLM 領域中具策略意義與實務價值的里程碑。建議具備 AI 建模、推理應用或 AI Agent 開發需求的技術團隊,可視其為高性價比且可控的開源替代方案,適合在研發環境或具安全考量的企業內部部署導入。

Overall, GPT-OSS is a milestone in the current open-source LLM field with both strategic significance and practical value. It is recommended that technical teams with needs in AI modeling, reasoning applications, or AI Agent development regard it as a cost-effective and controllable open-source alternative, suitable for adoption and deployment in R&D environments or within enterprises with security considerations.

I am Simon / I am Simon

大家好,我是 Simon 劉育維,是一位 AI 領域解決方案專家,目前也擔任 Google GenAI 領域開發者專家 (GDE),期待能夠幫助企業導入人工智慧相關技術解決問題。如果這篇文章對您有幫助,請在 Medium 上按一下鼓勵,並追蹤我的個人帳號,這樣您就可以隨時閱讀我所撰寫的文章。歡迎在我的 Linkedin 上留言提供意見,並與我一起討論有關人工智慧的主題,期待能夠對大家有所幫助!

Hello everyone, I am Simon Liu (劉育維), an AI solutions expert. I currently also serve as a Google GenAI Google Developer Expert (GDE), and I look forward to helping enterprises adopt artificial intelligence technologies to solve problems. If this article has been helpful to you, please give it a clap on Medium and follow my personal account so that you can read the articles I write at any time. You are welcome to leave comments and feedback on my LinkedIn and discuss artificial intelligence topics with me — I hope this can be helpful to everyone!

My Personal Website: https://simonliuyuwei.my.canva.site/link-in-bio