Transformers.js v3：WebGPU 支援、新模型和任務，以及更多…

釋出於 2024 年 10 月 22 日

在 GitHub 上更新

贊

Joshua

Xenova

經過一年多的開發，我們很高興地宣佈 🤗 Transformers.js v3 釋出！

主要亮點包括

WebGPU 支援（比 WASM 快達 100 倍！）
新量化格式 (dtypes)
總計支援 120 種架構
25 個新示例專案和模板
Hugging Face Hub 上超過 1200 個預轉換模型
Node.js (ESM + CJS)、Deno 和 Bun 相容性
在 GitHub 和 NPM 上的新主頁

安裝

您可以透過使用以下命令從 NPM 安裝 Transformers.js v3 來開始使用

npm i @huggingface/transformers

然後，透過以下方式匯入庫

import { pipeline } from "@huggingface/transformers";

或者，透過 CDN

import { pipeline } from "https://cdn.jsdelivr.net/npm/@huggingface/transformers@3.0.0";

欲瞭解更多資訊，請查閱文件。

WebGPU 支援

WebGPU 是一種新的 Web 標準，用於加速圖形和計算。API 使 Web 開發人員能夠直接在瀏覽器中使用底層系統的 GPU 進行高效能計算。WebGPU 是 WebGL 的後繼者，它提供了顯著更好的效能，因為它允許與現代 GPU 進行更直接的互動。最後，它支援通用 GPU 計算，這使其非常適合機器學習！

截至 2024 年 10 月，全球 WebGPU 支援率約為 70%（根據 caniuse.com），這意味著部分使用者可能無法使用該 API。

如果以下演示在您的瀏覽器中不起作用，您可能需要使用功能標誌啟用它

Firefox：使用 dom.webgpu.enabled 標誌（參見此處）。

Safari：使用 WebGPU 功能標誌（參見此處）。

舊版 Chromium 瀏覽器（在 Windows、macOS、Linux 上）：使用 enable-unsafe-webgpu 標誌（參見此處）。

Transformers.js v3 中的用法

感謝我們與 ONNX Runtime Web 的合作，啟用 WebGPU 加速就像在載入模型時設定 device: 'webgpu' 一樣簡單。讓我們看一些例子！

示例： 在 WebGPU 上計算文字嵌入（演示）

import { pipeline } from "@huggingface/transformers";

// Create a feature-extraction pipeline
const extractor = await pipeline(
  "feature-extraction",
  "mixedbread-ai/mxbai-embed-xsmall-v1",
  { device: "webgpu" },
);

// Compute embeddings
const texts = ["Hello world!", "This is an example sentence."];
const embeddings = await extractor(texts, { pooling: "mean", normalize: true });
console.log(embeddings.tolist());
// [
//   [-0.016986183822155, 0.03228696808218956, -0.0013630966423079371, ... ],
//   [0.09050482511520386, 0.07207386940717697, 0.05762749910354614, ... ],
// ]

示例： 使用 OpenAI whisper 在 WebGPU 上執行自動語音識別（演示）

import { pipeline } from "@huggingface/transformers";

// Create automatic speech recognition pipeline
const transcriber = await pipeline(
  "automatic-speech-recognition",
  "onnx-community/whisper-tiny.en",
  { device: "webgpu" },
);

// Transcribe audio from a URL
const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav";
const output = await transcriber(url);
console.log(output);
// { text: ' And so my fellow Americans ask not what your country can do for you, ask what you can do for your country.' }

示例： 使用 MobileNetV4 在 WebGPU 上執行影像分類（演示）

import { pipeline } from "@huggingface/transformers";

// Create image classification pipeline
const classifier = await pipeline(
  "image-classification",
  "onnx-community/mobilenetv4_conv_small.e2400_r224_in1k",
  { device: "webgpu" },
);

// Classify an image from a URL
const url = "https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg";
const output = await classifier(url);
console.log(output);
// [
//   { label: 'tiger, Panthera tigris', score: 0.6149784922599792 },
//   { label: 'tiger cat', score: 0.30281734466552734 },
//   { label: 'tabby, tabby cat', score: 0.0019135422771796584 },
//   { label: 'lynx, catamount', score: 0.0012161266058683395 },
//   { label: 'Egyptian cat', score: 0.0011465961579233408 }
// ]

新量化格式 (dtypes)

在 Transformers.js v3 之前，我們使用 quantized 選項透過將 quantized 設定為 true 或 false 來指定使用量化 (q8) 或全精度 (fp32) 版本的模型。現在，我們增加了使用 dtype 引數從更大的列表中進行選擇的功能。

可用量化列表取決於模型，但一些常見的量化包括：全精度 ("fp32")、半精度 ("fp16")、8 位 ("q8"、"int8"、"uint8") 和 4 位 ("q4"、"bnb4"、"q4f16")。

Available dtypes for mixedbread-ai/mxbai-embed-xsmall-v1 （例如，mixedbread-ai/mxbai-embed-xsmall-v1）

基本用法

示例： 在 4 位量化下執行 Qwen2.5-0.5B-Instruct（演示）

import { pipeline } from "@huggingface/transformers";

// Create a text generation pipeline
const generator = await pipeline(
  "text-generation",
  "onnx-community/Qwen2.5-0.5B-Instruct",
  { dtype: "q4", device: "webgpu" },
);

// Define the list of messages
const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "Tell me a funny joke." },
];

// Generate a response
const output = await generator(messages, { max_new_tokens: 128 });
console.log(output[0].generated_text.at(-1).content);

每個模組的 dtypes

某些編碼器-解碼器模型，例如 Whisper 或 Florence-2，對量化設定非常敏感：尤其是編碼器。因此，我們增加了選擇每個模組 dtypes 的功能，這可以透過提供從模組名稱到 dtype 的對映來完成。

示例： 在 WebGPU 上執行 Florence-2（演示）

import { Florence2ForConditionalGeneration } from "@huggingface/transformers";

const model = await Florence2ForConditionalGeneration.from_pretrained(
  "onnx-community/Florence-2-base-ft",
  {
    dtype: {
      embed_tokens: "fp16",
      vision_encoder: "fp16",
      encoder_model: "q4",
      decoder_model_merged: "q4",
    },
    device: "webgpu",
  },
);

Florence-2 running on WebGPU

檢視完整程式碼示例

import {
  Florence2ForConditionalGeneration,
  AutoProcessor,
  AutoTokenizer,
  RawImage,
} from "@huggingface/transformers";

// Load model, processor, and tokenizer
const model_id = "onnx-community/Florence-2-base-ft";
const model = await Florence2ForConditionalGeneration.from_pretrained(
  model_id,
  {
    dtype: {
      embed_tokens: "fp16",
      vision_encoder: "fp16",
      encoder_model: "q4",
      decoder_model_merged: "q4",
    },
    device: "webgpu",
  },
);
const processor = await AutoProcessor.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);

// Load image and prepare vision inputs
const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg";
const image = await RawImage.fromURL(url);
const vision_inputs = await processor(image);

// Specify task and prepare text inputs
const task = "<MORE_DETAILED_CAPTION>";
const prompts = processor.construct_prompts(task);
const text_inputs = tokenizer(prompts);

// Generate text
const generated_ids = await model.generate({
  ...text_inputs,
  ...vision_inputs,
  max_new_tokens: 100,
});

// Decode generated text
const generated_text = tokenizer.batch_decode(generated_ids, {
  skip_special_tokens: false,
})[0];

// Post-process the generated text
const result = processor.post_process_generation(
  generated_text,
  task,
  image.size,
);
console.log(result);
// { '<MORE_DETAILED_CAPTION>': 'A green car is parked in front of a tan building. The building has a brown door and two brown windows. The car is a two door and the door is closed. The green car has black tires.' }

支援 120 種架構

此版本將支援的架構總數增加到 120 種（參見完整列表），涵蓋廣泛的輸入模式和任務。值得注意的新名稱包括：Phi-3、Gemma 和 Gemma 2、LLaVa、Moondream、Florence-2、MusicGen、Sapiens、Depth Pro、PyAnnote 和 RT-DETR。

Bubble diagram of new architectures in Transformers.js v3

新模型列表

Cohere（來自 Cohere），隨 Cohere 釋出的論文 Command-R: Retrieval Augmented Generation at Production Scale 一同釋出。
Decision Transformer（來自 Berkeley/Facebook/Google），隨 Lili Chen、Kevin Lu、Aravind Rajeswaran、Kimin Lee、Aditya Grover、Michael Laskin、Pieter Abbeel、Aravind Srinivas、Igor Mordatch 釋出的論文 Decision Transformer: Reinforcement Learning via Sequence Modeling 一同釋出。
Depth Pro（來自 Apple），隨 Aleksei Bochkovskii、Amaël Delaunoy、Hugo Germain、Marcel Santos、Yichao Zhou、Stephan R. Richter、Vladlen Koltun 釋出的論文 Depth Pro: Sharp Monocular Metric Depth in Less Than a Second 一同釋出。
Florence2（來自 Microsoft），隨 Bin Xiao、Haiping Wu、Weijian Xu、Xiyang Dai、Houdong Hu、Yumao Lu、Michael Zeng、Ce Liu、Lu Yuan 釋出的論文 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks 一同釋出。
Gemma（來自 Google），隨 Gemma Google 團隊釋出的論文 Gemma: Open Models Based on Gemini Technology and Research 一同釋出。
Gemma2（來自 Google），隨 Gemma Google 團隊釋出的論文 Gemma2: Open Models Based on Gemini Technology and Research 一同釋出。
Granite（來自 IBM），隨 Yikang Shen、Matthew Stallone、Mayank Mishra、Gaoyuan Zhang、Shawn Tan、Aditya Prasad、Adriana Meza Soria、David D. Cox、Rameswar Panda 釋出的論文 Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler 一同釋出。
GroupViT（來自 UCSD、NVIDIA），隨 Jiarui Xu、Shalini De Mello、Sifei Liu、Wonmin Byeon、Thomas Breuel、Jan Kautz、Xiaolong Wang 釋出的論文 GroupViT: Semantic Segmentation Emerges from Text Supervision 一同釋出。
Hiera（來自 Meta），隨 Chaitanya Ryali、Yuan-Ting Hu、Daniel Bolya、Chen Wei、Haoqi Fan、Po-Yao Huang、Vaibhav Aggarwal、Arkabandhu Chowdhury、Omid Poursaeed、Judy Hoffman、Jitendra Malik、Yanghao Li、Christoph Feichtenhofer 釋出的論文 Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles 一同釋出。
JAIS（來自 Core42），隨 Neha Sengupta、Sunil Kumar Sahu、Bokang Jia、Satheesh Katipomu、Haonan Li、Fajri Koto、William Marshall、Gurpreet Gosal、Cynthia Liu、Zhiming Chen、Osama Mohammed Afzal、Samta Kamboj、Onkar Pandit、Rahul Pal、Lalit Pradhan、Zain Muhammad Mujahid、Massa Baali、Xudong Han、Sondos Mahmoud Bsharat、Alham Fikri Aji、Zhiqiang Shen、Zhengzhong Liu、Natalia Vassilieva、Joel Hestness、Andy Hock、Andrew Feldman、Jonathan Lee、Andrew Jackson、Hector Xuguang Ren、Preslav Nakov、Timothy Baldwin、Eric Xing 釋出的論文 Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models 一同釋出。
LLaVa（來自 Microsoft Research & University of Wisconsin-Madison），隨 Haotian Liu、Chunyuan Li、Yuheng Li 和 Yong Jae Lee 釋出的論文 Visual Instruction Tuning 一同釋出。
MaskFormer（來自 Meta 和 UIUC），隨 Bowen Cheng、Alexander G. Schwing、Alexander Kirillov 釋出的論文 Per-Pixel Classification is Not All You Need for Semantic Segmentation 一同釋出。
MusicGen（來自 Meta），隨 Jade Copet、Felix Kreuk、Itai Gat、Tal Remez、David Kant、Gabriel Synnaeve、Yossi Adi 和 Alexandre Défossez 釋出的論文 Simple and Controllable Music Generation 一同釋出。
MobileCLIP（來自 Apple），隨 Pavan Kumar Anasosalu Vasu、Hadi Pouransari、Fartash Faghri、Raviteja Vemulapalli、Oncel Tuzel 釋出的論文 MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training 一同釋出。
MobileNetV1（來自 Google Inc.），隨 Andrew G. Howard、Menglong Zhu、Bo Chen、Dmitry Kalenichenko、Weijun Wang、Tobias Weyand、Marco Andreetto、Hartwig Adam 釋出的論文 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 一同釋出。
MobileNetV2（來自 Google Inc.），隨 Mark Sandler、Andrew Howard、Menglong Zhu、Andrey Zhmoginov、Liang-Chieh Chen 釋出的論文 MobileNetV2: Inverted Residuals and Linear Bottlenecks 一同釋出。
MobileNetV3（來自 Google Inc.），隨 Andrew Howard、Mark Sandler、Grace Chu、Liang-Chieh Chen、Bo Chen、Mingxing Tan、Weijun Wang、Yukun Zhu、Ruoming Pang、Vijay Vasudevan、Quoc V. Le、Hartwig Adam 釋出的論文 Searching for MobileNetV3 一同釋出。
MobileNetV4（來自 Google Inc.），隨 Danfeng Qin、Chas Leichner、Manolis Delakis、Marco Fornoni、Shixin Luo、Fan Yang、Weijun Wang、Colby Banbury、Chengxi Ye、Berkin Akin、Vaibhav Aggarwal、Tenghui Zhu、Daniele Moro、Andrew Howard 釋出的論文 MobileNetV4 - Universal Models for the Mobile Ecosystem 一同釋出。
Moondream1 在 vikhyat 的倉庫 moondream 中釋出。
OpenELM（來自 Apple），隨 Sachin Mehta、Mohammad Hossein Sekhavat、Qingqing Cao、Maxwell Horton、Yanzi Jin、Chenfan Sun、Iman Mirzadeh、Mahyar Najibi、Dmitry Belenko、Peter Zatloukal、Mohammad Rastegari 釋出的論文 OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework 一同釋出。
Phi3（來自 Microsoft），隨 Marah Abdin、Sam Ade Jacobs、Ammar Ahmad Awan、Jyoti Aneja、Ahmed Awadallah、Hany Awadalla、Nguyen Bach、Amit Bahree、Arash Bakhtiari、Harkirat Behl、Alon Benhaim、Misha Bilenko、Johan Bjorck、Sébastien Bubeck、Martin Cai、Caio César Teodoro Mendes、Weizhu Chen、Vishrav Chaudhary、Parul Chopra、Allie Del Giorno、Gustavo de Rosa、Matthew Dixon、Ronen Eldan、Dan Iter、Amit Garg、Abhishek Goswami、Suriya Gunasekar、Emman Haider、Junheng Hao、Russell J. Hewett、Jamie Huynh、Mojan Javaheripi、Xin Jin、Piero Kauffmann、Nikos Karampatziakis、Dongwoo Kim、Mahoud Khademi、Lev Kurilenko、James R. Lee、Yin Tat Lee、Yuanzhi Li、Chen Liang、Weishung Liu、Eric Lin、Zeqi Lin、Piyush Madan、Arindam Mitra、Hardik Modi、Anh Nguyen、Brandon Norick、Barun Patra、Daniel Perez-Becker、Thomas Portet、Reid Pryzant、Heyang Qin、Marko Radmilac、Corby Rosset、Sambudha Roy、Olatunji Ruwase、Olli Saarikivi、Amin Saied、Adil Salim、Michael Santacroce、Shital Shah、Ning Shang、Hiteshi Sharma、Xia Song、Masahiro Tanaka、Xin Wang、Rachel Ward、Guanhua Wang、Philipp Witte、Michael Wyatt、Can Xu、Jiahang Xu、Sonali Yadav、Fan Yang、Ziyi Yang、Donghan Yu、Chengruidong Zhang、Cyril Zhang、Jianwen Zhang、Li Lyna Zhang、Yi Zhang、Yue Zhang、Yunan Zhang、Xiren Zhou 釋出的論文 Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone 一同釋出。
PVT（來自南京大學、香港大學等），隨 Wenhai Wang、Enze Xie、Xiang Li、Deng-Ping Fan、Kaitao Song、Ding Liang、Tong Lu、Ping Luo、Ling Shao 釋出的論文 Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions 一同釋出。
PyAnnote 在 Hervé Bredin 的倉庫 pyannote/pyannote-audio 中釋出。
RT-DETR（來自百度），隨 Yian Zhao、Wenyu Lv、Shangliang Xu、Jinman Wei、Guanzhong Wang、Qingqing Dang、Yi Liu、Jie Chen 釋出的論文 DETRs Beat YOLOs on Real-time Object Detection 一同釋出。
Sapiens（來自 Meta AI），隨 Rawal Khirodkar、Timur Bagautdinov、Julieta Martinez、Su Zhaoen、Austin James、Peter Selednik、Stuart Anderson、Shunsuke Saito 釋出的論文 Sapiens: Foundation for Human Vision Models 一同釋出。
ViTMAE（來自 Meta AI），隨 Kaiming He、Xinlei Chen、Saining Xie、Yanghao Li、Piotr Dollár、Ross Girshick 釋出的論文 Masked Autoencoders Are Scalable Vision Learners 一同釋出。
ViTMSN（來自 Meta AI），隨 Mahmoud Assran、Mathilde Caron、Ishan Misra、Piotr Bojanowski、Florian Bordes、Pascal Vincent、Armand Joulin、Michael Rabbat、Nicolas Ballas 釋出的論文 Masked Siamese Networks for Label-Efficient Learning 一同釋出。

示例專案和模板

作為釋出的一部分，我們釋出了 25 個新的示例專案和模板，主要側重於展示 WebGPU 支援！其中包括 Phi-3.5 WebGPU 和 Whisper WebGPU 等演示，如下所示。

我們正在將所有示例專案和演示遷移到 https://github.com/huggingface/transformers.js-examples，敬請關注！

超過 1200 個預轉換模型

截至今天釋出，社群已轉換了超過 1200 個模型以與 Transformers.js 相容！您可以在此處找到可用模型的完整列表。

如果您想轉換自己的模型或微調，可以使用我們的轉換指令碼，如下所示

python -m scripts.convert --quantize --model_id <model_name_or_path>

將生成的檔案上傳到 Hugging Face Hub 後，請記得新增 transformers.js 標籤，以便其他人可以輕鬆找到並使用您的模型！

Available Transformers.js models

Node.js (ESM + CJS)、Deno 和 Bun 相容性

Transformers.js v3 現在與三種最流行的伺服器端 JavaScript 執行時相容

執行時	描述	示例
Node.js	一個廣泛使用的基於 Chrome V8 構建的 JavaScript 執行時。它擁有龐大的生態系統，並支援廣泛的庫和框架。	ESM 示例 / CJS 示例
Deno	一個現代的 JavaScript 和 TypeScript 執行時，預設安全。它使用 ES 模組，甚至支援實驗性的 WebGPU。	Deno 示例
Bun	一個為效能而最佳化的快速 JavaScript 執行時。它內建了打包器、轉譯器和包管理器。	Bun 示例

在 NPM 和 GitHub 上的新主頁

最後，我們很高興地宣佈 Transformers.js 將在 NPM 上以官方 Hugging Face 組織釋出，名稱為 @huggingface/transformers（而不是 v1 和 v2 使用的 @xenova/transformers）。

我們還將倉庫遷移到了 GitHub 上的官方 Hugging Face 組織 (https://github.com/huggingface/transformers.js)，這將是我們的新主頁 — 歡迎打個招呼！我們期待聽到您的反饋，回覆您的問題，並審查您的拉取請求！

這是一個重要的里程碑，我們非常感謝社群幫助我們實現這一長期目標！沒有你們，這一切都不可能實現……謝謝你們！🤗

更多部落格文章

Transformers 庫：標準化模型定義

作者： 2025 年 5 月 15 日 • 116

使用 Transformers.js 製作由 ML 驅動的網頁遊戲

作者： 2023 年 7 月 5 日 • 13

社群

透過拖放到文字輸入框、貼上或點選此處上傳圖片、音訊和影片。

點選或貼上此處以上傳圖片

· 註冊或登入發表評論

贊