隆重推出 AI Sheets：一個使用 OpenAI 模型處理資料集的工具！

釋出於 2025 年 8 月 8 日

在 GitHub 上更新

贊

🧭內容摘要

Hugging Face AI Sheets 是一個新的開源工具，無需編碼即可使用 AI 模型來構建、豐富和轉換資料集。該工具可以本地部署，也可以在 Hub 上部署。它允許您透過推理服務提供商 (Inference Providers) 或本地模型使用 Hugging Face Hub 上數以千計的開放模型，包括來自 OpenAI 的 gpt-oss！

實用連結

免費試用該工具 (無需安裝)： https://huggingface.co/spaces/aisheets/sheets
本地安裝並執行： https://github.com/huggingface/sheets

什麼是 AI Sheets

AI Sheets 是一個無需編碼的工具，用於使用 (開源) AI 模型構建、轉換和豐富資料集。它與 Hub 和開源 AI 生態系統緊密整合。

AI Sheets 採用類似電子表格的、易於上手的使用者介面。該工具圍繞快速實驗而構建，從小型資料集開始，然後再執行耗時/成本高昂的資料生成流水線。

在 AI Sheets 中，透過編寫提示來建立新列，您可以根據需要進行任意多次迭代，並編輯/驗證單元格以“教”會模型您想要什麼。但稍後會詳細介紹！

我可以用它做什麼

您可以使用 AI Sheets 來

比較和“氛圍測試”模型。 想象一下，您想在您的資料上測試最新的模型。您可以匯入一個帶有提示/問題的資料集，並建立多個列 (每個模型一列)，使用這樣的提示：回答以下問題：{{prompt}}，其中 prompt 是資料集中的一列。您可以手動驗證結果，或使用“LLM-as-a-judge” (大語言模型作為評判者) 提示建立一個新列，例如：評估對以下問題的回答：{{prompt}}。回答 1：{{model1}}。回答 2：{{model2}}，其中 model1 和 model2 是資料集中包含不同模型回答的列。

為您的資料和特定模型改進提示。 想象一下，您想構建一個應用程式來處理客戶請求並給出自動答覆。您可以載入一個包含客戶請求的樣本資料集，並開始嘗試和迭代不同的提示和模型來生成響應。AI Sheets 的一個很酷的功能是，您可以透過編輯或驗證單元格來提供反饋。這些示例單元格將自動新增到您的提示中。您可以將其視為一個工具，透過即時檢視您的資料來微調提示並非常高效地向提示中新增少樣本示例 (few-shot examples)！

轉換資料集。 想象一下，您想清理資料集中的一列。您可以新增一個新列，並使用這樣的提示：從以下文字中刪除多餘的標點符號：{{text}}，其中 text 是資料集中包含您想清理的文字的列。

對資料集進行分類。 想象一下，您想對資料集中的某些內容進行分類。您可以新增一個新列，並使用這樣的提示：對以下文字進行分類：{{text}}，其中 text 是資料集中包含您想分類的文字的列。

分析資料集。 想象一下，您想提取資料集中的主要思想。您可以新增一個新列，並使用這樣的提示：從以下內容中提取最重要的思想：{{text}}，其中 text 是資料集中包含您想分析的文字的列。

豐富資料集。 想象一下，您有一個數據集，其中地址缺少郵政編碼。您可以新增一個新列，並使用這樣的提示：查詢以下地址的郵政編碼：{{address}} (在這種情況下，您必須啟用“搜尋網頁”選項以確保結果準確)。

生成合成資料集。 想象一下，您需要一個包含真實電子郵件的資料集，但由於資料隱私原因無法獲得這些資料。您可以建立一個數據集，並使用這樣的提示：寫一段關於製藥公司領域專業人士的簡短描述，並將該列命名為 person_bio。然後，您可以建立另一列，並使用這樣的提示：寫一封真實的專業電子郵件，就像由以下人士撰寫的一樣：{{person_bio}}。

現在讓我們深入瞭解如何使用它！

如何使用

AI Sheets 為您提供了兩種開始方式：匯入現有資料或從頭開始生成資料集。資料載入後，您可以透過新增列、編輯單元格和重新生成內容來最佳化它。

開始使用

要開始使用，您需要用自然語言描述來從頭建立一個數據集，或匯入一個現有資料集。

從頭開始生成資料集

最適合： 熟悉 AI Sheets、頭腦風暴、快速實驗和建立測試資料集。

您可以將此功能視為自動資料集或“提示到資料集”功能——您描述您想要什麼，AI Sheets 就會為您建立整個資料集結構和內容。

何時使用此功能

您是第一次探索 AI Sheets
您需要合成數據用於測試或原型設計
資料準確性和多樣性不是關鍵 (例如，頭腦風暴用例、快速研究、生成測試資料集)
您想快速試驗想法

工作原理

在提示區域描述您想要的資料集
- 例如：“一份虛構的初創公司列表，包含名稱、行業和口號”
AI Sheets 會生成模式並建立 5 個樣本行
可擴充套件至最多 1000 行，或修改提示以更改結構

示例

如果您輸入此提示：世界上的城市，以及它們所屬的國家，併為每個城市生成一張吉卜力風格的地標影像

AI Sheets 將自動生成一個包含三列的資料集，如下所示：

這個資料集只包含五行，但您可以透過向下拖動每一列來新增更多單元格，包括影像列！您還可以在任何單元格中寫入專案，然後透過拖動來完成其他單元格。

以下部分將向您展示如何迭代和擴充套件資料集。

匯入您的資料集 (推薦)

最適合： 大多數您希望轉換、分類、豐富和分析真實世界資料的用例。

對於大多數用例，推薦使用此方法，因為匯入真實資料比從頭開始提供了更多的控制和靈活性。

何時使用此功能

您有現有資料需要使用 AI 模型進行轉換或豐富
您想生成合成資料，並且準確性和多樣性很重要

工作原理

以 XLS、TSV、CSV 或 Parquet 格式上傳您的資料
確保您的檔案至少包含一個列名和一行資料
上傳最多 1000 行 (列數不限)
您的資料以熟悉的電子表格格式顯示

專業提示： 如果您的檔案包含的資料很少，您可以透過直接在電子表格中輸入來手動新增更多條目。

處理您的資料集

一旦您的資料載入完畢 (無論您是如何開始的)，您將在一個可編輯的電子表格介面中看到它。以下是您需要了解的內容

瞭解 AI Sheets

匯入的單元格： 可手動編輯，但不能透過 AI 提示修改
AI 生成的單元格： 可以使用提示和您的反饋 (編輯 + 點贊) 進行重新生成和最佳化
新列： 始終由 AI 驅動並完全可定製

開始使用 AI 列

點選“+”按鈕新增新列
從推薦的操作中選擇
- 提取特定資訊
- 總結長文字
- 翻譯內容
- 或使用“對 {{column}} 做些什麼”編寫自定義提示

最佳化和擴充套件資料集

現在您有了 AI 列，您可以改進它們的結果並擴充套件您的資料。您可以透過手動編輯和點贊提供反饋來改進結果，或者調整列配置。這兩種方法都需要重新生成才能生效。

1. 如何新增更多單元格

向下拖動： 從列中的最後一個單元格向下拖動，以立即生成額外的行
無需重新生成 - 新單元格即時建立
您也可以用這種方法重新生成出錯的單元格

2. 手動編輯和反饋

編輯單元格： 單擊任何單元格直接編輯內容 - 這為模型提供了您偏好輸出的示例
點贊結果： 使用豎起大拇指的圖示標記好的輸出示例
重新生成以將反饋應用於列中的其他單元格。

在後臺，這些手動編輯和點讚的單元格將在您重新生成或在列中新增更多單元格時，用作生成單元格的少樣本示例！

3. 調整列配置 更改提示、切換模型或提供商、或修改設定，然後重新生成以獲得更好的結果。

重寫提示

每一列都有其生成提示
隨時編輯以更改或改進輸出
列會用新結果重新生成

切換模型/提供商

嘗試不同的模型以獲得不同的效能或進行比較。
對於特定任務，某些模型比其他模型更準確、更有創意或結構更清晰。
一些提供商具有更快的推理速度和不同的上下文長度；為所選模型測試不同的提供商。

切換搜尋

啟用：模型從網路上拉取最新資訊
停用：離線，僅使用模型生成

將最終資料集匯出到 Hub

當您對新資料集滿意後，可以將其匯出到 Hub！這樣做還有一個額外的好處，即生成一個配置檔案，您可以重複使用該檔案來 (1) 使用 HF jobs 透過此指令碼生成更多資料，以及 (2) 在下游應用中重複使用提示，包括來自您編輯和點贊單元格的少樣本示例。

這是一個使用 AISheets 建立的資料集示例，它生成此配置。

使用 HF Jobs 執行資料生成指令碼

如果您想生成更大的資料集，可以使用上述的配置和指令碼，像這樣

hf jobs uv run \
-s HF_TOKEN=$HF_TOKEN \
https://huggingface.co/datasets/aisheets/uv-scripts/raw/main/extend_dataset/script.py \ # script for running the pipeline
--config https://huggingface.co/datasets/dvilasuero/nemotron-personas-kimi-questions/raw/main/config.yml \ # config with prompts
--num-rows 100 \ # limit to 100 rows, leave empty for the full dataset
nvidia/Nemotron-Personas dvilasuero/nemotron-kimi-qa-distilled

示例

本節提供了您可以使用 AI Sheets 構建的資料集示例，以激發您下一個專案的靈感。

模型氛圍測試與比較

如果您想在您關心的資料和不同提示上測試最新的模型，AI Sheets 是您的完美伴侶。

您只需匯入一個數據集 (或從頭建立一個)，然後為您想測試的模型新增不同的列。

然後，您可以手動檢查結果，或新增一列以使用 LLM 來評判每個模型的質量。

以下是一個示例，比較了用於迷你 Web 應用的開源前沿模型。AI Sheets 讓您可以看到互動式結果並試玩每個應用。此外，該資料集還包括幾列使用 LLM 來評判和比較應用質量。

從我們剛才描述的會話中匯出的示例資料集： https://huggingface.co/datasets/dvilasuero/jsvibes-qwen-gpt-oss-judged

配置

columns:
  gpt-oss:
    modelName: openai/gpt-oss-120b
    modelProvider: groq
    userPrompt: Create a complete, runnable HTML+JS file implementing {{description}}
    searchEnabled: false
    columnsReferences:
      - description
  eval-qwen-coder:
    modelName: Qwen/Qwen3-Coder-480B-A35B-Instruct
    modelProvider: cerebras
    userPrompt: "Please compare the two apps and tell me which one is better and why:\n\nApp description:\n\n{{description}}\n\nmodel 1:\n\n{{qwen3-coder}}\n\nmodel 2:\n\n{{gpt-oss}}\n\nKeep it very short and focus on whether they work well for the purpose, make sure they work and are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so be careful to assess that\n\nRespond with:\n\nchosen: {model 1, model 2}\n\nreason: ..."
    searchEnabled: false
    columnsReferences:
      - gpt-oss
      - description
      - qwen3-coder
  eval-gpt-oss:
    modelName: openai/gpt-oss-120b
    modelProvider: groq
    userPrompt: "Please compare the two apps and tell me which one is better and why:\n\nApp description:\n\n{{description}}\n\nmodel 1:\n\n{{qwen3-coder}}\n\nmodel 2:\n\n{{gpt-oss}}\n\nKeep it very short and focus on whether they work well for the purpose, make sure they work and are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so be careful to assess that\n\nRespond with:\n\nchosen: {model 1, model 2}\n\nreason: ..."
    searchEnabled: false
    columnsReferences:
      - gpt-oss
      - description
      - qwen3-coder
  eval-kimi:
    modelName: moonshotai/Kimi-K2-Instruct
    modelProvider: groq
    userPrompt: "Please compare the two apps and tell me which one is better and why:\n\nApp description:\n\n{{description}}\n\nmodel 1:\n\n{{qwen3-coder}}\n\nmodel 2:\n\n{{gpt-oss}}\n\nKeep it very short and focus on whether they work well for the purpose, make sure they work and are not incomplete, and the code quality, not on visual appeal and unrequested features. Assume the models might provide non working solutions, so be careful to assess that\n\nRespond with:\n\nchosen: {model 1, model 2}\n\nreason: ..."
    searchEnabled: false
    columnsReferences:
      - gpt-oss
      - description
      - qwen3-coder

向 Hub 資料集新增分類

AI Sheets 還可以增強現有資料集，並幫助您快速完成涉及分析文字資料集的資料分析和資料科學專案。

這是一個向現有 Hub 資料集新增分類的示例。

一個很酷的功能是，您可以手動驗證或編輯初始分類輸出，然後重新生成整列以改進結果，如下所示

配置

columns:
  category:
    modelName: moonshotai/Kimi-K2-Instruct
    modelProvider: groq
    userPrompt: |-
      Categorize the main topics of the following question:

      {{question}}
    prompt: "

      You are a rigorous, intelligent data-processing engine. Generate only the
      requested response format, with no explanations following the user
      instruction. You might be provided with positive, accurate examples of how
      the user instruction must be completed.

      # Examples

      The following are correct, accurate example outputs with respect to the
      user instruction:

      ## Example

      ### Input

      question: Given the area of a parallelogram is 420 square centimeters and
      its height is 35 cm, find the corresponding base. Show all work and label
      your answer.

      ### Output

      Mathematics – Geometry

      ## Example

      ### Input

      question: What is the minimum number of red squares required to ensure
      that each of $n$ green axis-parallel squares intersects 4 red squares,
      assuming the green squares can be scaled and translated arbitrarily
      without intersecting each other?

      ### Output

      Geometry, Combinatorics
      # User instruction

      Categorize the main topics of the following question:

      {{question}}

      # Your response
      "
    searchEnabled: false
    columnsReferences:
      - question

使用“LLM-as-Judge”評估模型

另一個用例是使用“LLM-as-a-judge” (大語言模型作為評判者) 方法來評估模型輸出。這對於比較模型或評估現有資料集的質量非常有用，例如，在 Hugging Face Hub 上的現有資料集上微調模型。

在第一個示例中，我們將氛圍測試與一個評判 LLM 列相結合。這是評判提示

示例資料集： https://huggingface.co/datasets/dvilasuero/jsvibes-qwen-gpt-oss-judged

配置

columns:
  object_name:
    modelName: meta-llama/Llama-3.3-70B-Instruct
    modelProvider: groq
    userPrompt: Generate the name of a common day to day object
    searchEnabled: false
    columnsReferences: []
  object_description:
    modelName: meta-llama/Llama-3.3-70B-Instruct
    modelProvider: groq
    userPrompt: Describe a {{object_name}} with adjectives and short word groups separated by commas. No more than 10 words
    searchEnabled: false
    columnsReferences:
      - object_name
  object_image_with_desc:
    modelName: multimodalart/isometric-skeumorphic-3d-bnb
    modelProvider: fal-ai
    userPrompt: RBNBICN, icon, white background, isometric perspective, {{object_name}} , {{object_description}}
    searchEnabled: false
    columnsReferences:
      - object_description
      - object_name
  object_image_without_desc:
    modelName: multimodalart/isometric-skeumorphic-3d-bnb
    modelProvider: fal-ai
    userPrompt: "RBNBICN, icon, white background, isometric perspective, {{object_name}} "
    searchEnabled: false
    columnsReferences:
      - object_name
  glowing_colors:
    modelName: multimodalart/isometric-skeumorphic-3d-bnb
    modelProvider: fal-ai
    userPrompt: "RBNBICN, icon, white background, isometric perspective, {{object_name}}, glowing colors "
    searchEnabled: false
    columnsReferences:
      - object_name
  flux:
    modelName: black-forest-labs/FLUX.1-dev
    modelProvider: fal-ai
    userPrompt: Create an isometric icon for the object {{object_name}} based on {{object_description}}
    searchEnabled: false
    columnsReferences:
      - object_description
      - object_name