ChatML 與 Harmony:瞭解 OpenAI 的新格式 🔍

社群文章 釋出於 2025 年 8 月 9 日

OpenAI 剛剛釋出了他們的 **Harmony 格式**,與 gpt-oss 模型一起,引入了一種與我們一直用於 Qwen3 等模型的 ChatML 格式完全不同的結構化推理和工具呼叫的方法。

如果您正在使用推理模型或構建推理基礎設施,瞭解這些格式至關重要。今天,我們將深入探討這兩種格式,進行並排比較,以幫助您瞭解發生了什麼變化以及為什麼它很重要!

什麼是 ChatML?📝

ChatML(聊天標記語言)一直是許多開源模型(特別是 Qwen 系列)的首選格式。它本質上是一種受 XML 啟發的方式來構建對話,它是在需要清晰區分對話不同部分的需求下演變而來的。

主要特點

  • 使用特殊標記 <|im_start|><|im_end|> 來標記訊息邊界
  • 簡單的基於角色的結構(系統、使用者、助手)
  • 思考/推理包裹在 <think>...</think> 塊中
  • 工具定義和呼叫使用 XML 風格的標籤
  • 經過驗證、久經考驗的格式,在許多模型中得到應用

將 ChatML 視為“經典”方法——直接、類似 XML,並注重清晰度。

什麼是 Harmony?🎭

Harmony 是 OpenAI 專門為 gpt-oss 模型設計的新響應格式。它不僅僅是一種提示格式,更是對模型如何構建輸出(尤其是複雜推理和工具使用)的徹底重新思考。

主要創新

  • 多通道架構:訊息可以標記為 analysis(推理)、commentary(工具前言)或 final(面向使用者)
  • 角色層次結構:system > developer > user > assistant > tool(用於處理指令衝突)
  • 訊息路由:使用 to= 語法在元件之間定向訊息
  • TypeScript 風格的工具定義:比 JSON 模式更具表達性和簡潔性
  • 每個通道獨立的安全標準analysis 通道不像 final 那樣經過安全過濾

Harmony 將模型的輸出視為多執行緒對話,其中不同型別的內容透過不同的通道流動。

Harmony 的 TypeScript 風格語法:為何重要?🚀

Harmony 最有趣的設計選擇之一是使用 TypeScript 風格的工具定義,而不是 JSON 模式。這不僅僅是語法糖——其背後有充分的理由。

基於程式碼的定義的力量

正如最近對程式碼代理的研究所示,以程式碼表示動作具有以下幾個優點:

  1. 簡潔性:程式碼操作比 JSON 緊湊約 30%
  2. 並行性:需要 4 個並行流的 5 個操作?在 JSON 中,那是 20 個獨立的資料塊;在程式碼中,它是一個表示式
  3. 變數管理:輕鬆儲存和引用結果 (rock_image = generate_image("rock"))
  4. 可讀性:代理日誌變得更加清晰
  5. LLM 流暢性:模型訓練資料中有大量的程式碼

工具定義比較

ChatML(JSON 模式)

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get weather for a city",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {"type": "string"},
        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
      },
      "required": ["city"]
    }
  }
}

Harmony(TypeScript 風格)

namespace functions {
  // Get weather for a city
  type get_weather = (_: {
    city: string,
    unit?: "celsius" | "fahrenheit", // default: celsius
  }) => any;
}

TypeScript 版本不僅更簡潔,而且開發人員更熟悉,LLM 也更容易處理。

深入示例 🏊‍♂️

讓我們看看這些格式如何處理真實世界的場景,從簡單的聊天到使用工具的複雜多步推理。

1. 基本對話(無思考)

最簡單的情況——只是提問,沒有任何推理步驟。

ChatML (Qwen3)

<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
<think>

</think>

The capital of France is Paris.<|im_end|>

image/png

OpenAI Harmony

<|start|>user<|message|>What's the capital of France?<|end|>
<|start|>assistant<|channel|>final<|message|>The capital of France is Paris.<|return|>

image/png

正在發生什麼: 即使停用了思考,Qwen3 也會包含空的 <think></think> 標籤作為結構提示。Harmony 則更直接——它直接跳轉到 final 通道以處理面向使用者的內容。

2. 帶思考的對話

這裡是哲學差異變得清晰的地方。

ChatML (Qwen3)

<|im_start|>user
If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|im_end|>
<|im_start|>assistant
<think>
Starting with 3 apples
Adding 5 more: 3 + 5 = 8
Giving away 2: 8 - 2 = 6
So the final answer is 6 apples.
</think>

You would have 6 apples. 

Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|im_end|>

image/png

OpenAI Harmony

<|start|>user<|message|>If I have 3 apples and buy 5 more, then give away 2, how many do I have?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User starts with 3 apples. Buys 5 more: 3 + 5 = 8. Gives away 2: 8 - 2 = 6. Final count is 6 apples.<|end|>
<|start|>assistant<|channel|>final<|message|>You would have 6 apples.

Here's the breakdown: You start with 3 apples, buy 5 more (3 + 5 = 8), then give away 2 (8 - 2 = 6).<|return|>

image/png

有什麼不同

  • ChatML 將所有內容保留在一條訊息中,並內聯思考
  • Harmony 將推理(analysis 通道)和答案(final 通道)分成單獨的訊息
  • Harmony 的方法使得程式化地從面向使用者的內容中過濾推理變得更容易

3. 帶思考的多輪對話

這揭示了每種格式如何處理對話歷史。

ChatML (Qwen3)

<|im_start|>user
What's 15% of 80?<|im_end|>
<|im_start|>assistant
15% of 80 is 12.

To calculate: 80 × 0.15 = 12<|im_end|>
<|im_start|>user
Now add 25 to that result<|im_end|>
<|im_start|>assistant
<think>
The previous result was 12.
Adding 25: 12 + 25 = 37
</think>

Adding 25 to 12 gives us 37.<|im_end|>

image/png

重要: Qwen3 刪除了之前輪次的思考以節省標記!

OpenAI Harmony

<|start|>user<|message|>What's 15% of 80?<|end|>
<|start|>assistant<|channel|>analysis<|message|>Calculate 15% of 80: 80 × 0.15 = 12<|end|>
<|start|>assistant<|channel|>final<|message|>15% of 80 is 12.<|return|>
<|start|>user<|message|>Now add 25 to that result<|end|>
<|start|>assistant<|channel|>analysis<|message|>Previous result was 12. Adding 25: 12 + 25 = 37<|end|>
<|start|>assistant<|channel|>final<|message|>Adding 25 to 12 gives us 37.<|return|>

image/png

歷史管理: 兩種格式都修剪歷史推理,但 Harmony 的通道系統使其更明確和可控。

4. 函式呼叫

在工具處理方面,兩種格式存在顯著差異。

ChatML (Qwen3)

<|im_start|>system
# Tools

You may call one or more functions to assist with the user query.

<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}}, "required": ["city"]}}}
</tools>

For each function call, return a json object within <tool_call></tool_call> XML tags.<|im_end|>
<|im_start|>user
What's the weather in Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User wants Tokyo weather. Need to call get_weather function.
</think>

I'll check the current weather in Tokyo for you.

<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo", "unit": "celsius"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 22, "condition": "partly cloudy", "humidity": 65}
</tool_response><|im_end|>
<|im_start|>assistant
The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|im_end|>

image/png

OpenAI Harmony

<|start|>developer<|message|># Tools
## functions
namespace functions {
// Get weather for a city
type get_weather = (_: {
  city: string,
  unit?: "celsius" | "fahrenheit", // default: celsius
}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>What's the weather in Tokyo?<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll check the current weather in Tokyo for you.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo", "unit": "celsius"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 22, "condition": "partly cloudy", "humidity": 65}<|end|>
<|start|>assistant<|channel|>final<|message|>The current weather in Tokyo is 22°C with partly cloudy skies and 65% humidity.<|return|>

image/png

主要區別

  • 工具定義:JSON 模式 vs TypeScript 風格型別
  • 工具響應:ChatML 將其包裝為使用者訊息;Harmony 使用專用工具角色
  • 訊息路由:Harmony 的 to= 語法明確顯示了元件之間的流向
  • 通道:Harmony 將工具互動(commentary)與最終答案分開

5. 帶思維鏈的函式呼叫

複雜的場景真正展示了架構上的差異。

ChatML (Qwen3)

<|im_start|>system
# Tools
<tools>
{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}}}
{"type": "function", "function": {"name": "get_flight_status", "description": "Check flight status", "parameters": {"type": "object", "properties": {"flight_number": {"type": "string"}}}}}
</tools><|im_end|>
<|im_start|>user
Should I pack an umbrella for my UA902 flight to Tokyo?<|im_end|>
<|im_start|>assistant
<think>
User is asking about packing an umbrella for flight UA902 to Tokyo.
I need to:
1. Check the flight status to confirm it's going to Tokyo
2. Check Tokyo's weather to see if rain is expected
Let me start with the flight status.
</think>

I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.

<tool_call>
{"name": "get_flight_status", "arguments": {"flight_number": "UA902"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Flight confirmed to Tokyo, departing at 14:30. Now checking weather.
</think>

<tool_call>
{"name": "get_weather", "arguments": {"city": "Tokyo"}}
</tool_call><|im_end|>
<|im_start|>user
<tool_response>
{"temperature": 18, "condition": "rainy", "precipitation": "80%"}
</tool_response><|im_end|>
<|im_start|>assistant
<think>
Tokyo weather shows rainy conditions with 80% precipitation. Definitely need an umbrella.
</think>

Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|im_end|>

image/png

OpenAI Harmony

<|start|>developer<|message|># Tools
## functions
namespace functions {
type get_weather = (_: {city: string}) => any;
type get_flight_status = (_: {flight_number: string}) => any;
} // namespace functions<|end|>
<|start|>user<|message|>Should I pack an umbrella for my UA902 flight to Tokyo?<|end|>
<|start|>assistant<|channel|>analysis<|message|>User asking about packing umbrella for UA902 to Tokyo. Need to: 1) Verify flight destination 2) Check Tokyo weather. Starting with flight status.<|end|>
<|start|>assistant<|channel|>commentary<|message|>I'll help you decide if you need an umbrella. Let me check your flight details and the weather in Tokyo.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_flight_status<|constrain|>json<|message|>{"flight_number": "UA902"}<|call|>
<|start|>functions.get_flight_status to=assistant<|channel|>commentary<|message|>{"flight": "UA902", "destination": "Tokyo (NRT)", "status": "On time", "departure": "14:30"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Flight confirmed to Tokyo, on time at 14:30. Now checking weather conditions.<|end|>
<|start|>assistant<|channel|>commentary to=functions.get_weather<|constrain|>json<|message|>{"city": "Tokyo"}<|call|>
<|start|>functions.get_weather to=assistant<|channel|>commentary<|message|>{"temperature": 18, "condition": "rainy", "precipitation": "80%"}<|end|>
<|start|>assistant<|channel|>analysis<|message|>Tokyo has rainy conditions with 80% precipitation. Clear recommendation to pack umbrella.<|end|>
<|start|>assistant<|channel|>final<|message|>Yes, you should definitely pack an umbrella! Your UA902 flight to Tokyo is on time (departing at 14:30), and Tokyo is currently experiencing rainy weather with an 80% chance of precipitation. Better to be prepared! ☔<|return|>

image/png

在複雜場景中

  • ChatML 在同一對話回合中保留工具呼叫之間的思考
  • Harmony 透過單獨的 analysis 訊息明確跟蹤每個推理步驟
  • Harmony 的 commentary 通道可以在工具執行前包含面向使用者的“引言”
  • Harmony 中的訊息路由(to=)建立了清晰的執行跟蹤

這對生態系統意味著什麼 🌍

Harmony 的引入代表了我們對模型輸出思考方式的重大演變

  1. 基於通道的安全性:不同通道的不同安全標準對於推理模型來說是一個顛覆性變化
  2. 更好的可觀察性:明確的路由和通道使得除錯複雜的代理行為更容易
  3. 程式碼優先工具:TypeScript 語法可能成為工具定義的新標準
  4. 結構化推理:在格式層面而不是僅僅約定俗成地將思考與輸出分離

主要收穫 📌

  • ChatML 更簡單,更成熟,並獲得廣泛的生態系統支援
  • Harmony 引入了強大的新概念,如通道和訊息路由
  • 兩種格式都處理推理和工具,但採用的理念截然不同
  • Harmony 的 TypeScript 風格定義更簡潔,更適合開發人員
  • Harmony 中的通道系統能夠對使用者看到的內容進行精細控制

隨著生態系統的發展,瞭解這兩種格式至關重要。雖然我們無法選擇模型使用的格式(只有在訓練基礎模型時才能選擇),但瞭解它們的工作原理有助於我們構建更好的推理基礎設施,並充分利用這些強大的推理模型。

社群

註冊登入 發表評論

© . This site is unofficial and not affiliated with Hugging Face, Inc.