構建 Vanilla JavaScript 應用

在本教程中，你將使用 Transformers.js 構建一個簡單的 Web 應用來檢測影像中的物體！要跟著做，你只需要一個程式碼編輯器、一個瀏覽器和一個簡單的伺服器（例如，VS Code Live Server）。

它的工作原理是：使用者點選“上傳圖片”，然後使用輸入對話方塊選擇一張圖片。在使用目標檢測模型分析圖片後，預測的邊界框將疊加在圖片上，就像這樣

Demo

實用連結

第 1 步：HTML 和 CSS 設定

在我們開始使用 Transformers.js 構建之前，我們首先需要用一些標記和樣式打下基礎。建立一個名為 index.html 的檔案，包含基本的 HTML 骨架，並將以下 <main> 標籤新增到 <body> 中

<main class="container">
  <label class="custom-file-upload">
    <input id="file-upload" type="file" accept="image/*" />
    <img class="upload-icon" src="https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/upload-icon.png" />
    Upload image
  </label>
  <div id="image-container"></div>
  <p id="status"></p>
</main>

點選此處檢視此標記的分解說明。

我們正在新增一個 <input> 元素，其 type="file" 用於接受圖片。這允許使用者透過彈出對話方塊從本地檔案系統選擇圖片。該元素的預設樣式看起來很糟糕，所以我們來新增一些樣式。最簡單的方法是將 <input> 元素包裹在 <label> 中，隱藏輸入框，然後將標籤樣式化為按鈕。

我們還添加了一個空的 <div> 容器來顯示圖片，外加一個空的 <p> 標籤，我們將用它來向用戶提供狀態更新，因為下載和執行模型都需要一些時間。

接下來，在一個 style.css 檔案中新增以下 CSS 規則，並將其連結到 HTML 中

html,
body {
    font-family: Arial, Helvetica, sans-serif;
}

.container {
    margin: 40px auto;
    width: max(50vw, 400px);
    display: flex;
    flex-direction: column;
    align-items: center;
}

.custom-file-upload {
    display: flex;
    align-items: center;
    gap: 10px;
    border: 2px solid black;
    padding: 8px 16px;
    cursor: pointer;
    border-radius: 6px;
}

#file-upload {
    display: none;
}

.upload-icon {
    width: 30px;
}

#image-container {
    width: 100%;
    margin-top: 20px;
    position: relative;
}

#image-container>img {
    width: 100%;
}

這是此時 UI 的樣子

第 2 步：JavaScript 設定

無聊的 部分完成了，現在開始寫一些 JavaScript 程式碼吧！建立一個名為 index.js 的檔案，並在 index.html 中連結它，方法是在 <body> 的末尾新增以下內容

<script src="./index.js" type="module"></script>

type="module" 屬性很重要，因為它將我們的檔案變成一個JavaScript 模組，這意味著我們將能夠使用匯入和匯出。

轉到 index.js，讓我們透過在檔案頂部新增以下行來匯入 Transformers.js

import { pipeline, env } from "https://cdn.jsdelivr.net/npm/@huggingface/transformers";

由於我們將從 Hugging Face Hub 下載模型，我們可以透過設定以下內容來跳過本地模型檢查

env.allowLocalModels = false;

接下來，讓我們建立對稍後將訪問的各種 DOM 元素的引用

const fileUpload = document.getElementById("file-upload");
const imageContainer = document.getElementById("image-container");
const status = document.getElementById("status");

第 3 步：建立一個目標檢測 pipeline

我們終於準備好建立我們的目標檢測 pipeline 了！提醒一下，pipeline 是該庫提供的一個高階介面，用於執行特定任務。在我們的例子中，我們將使用 pipeline() 輔助函式來例項化一個目標檢測 pipeline。

由於這可能需要一些時間（尤其是第一次下載約 40MB 的模型時），我們首先更新 status 段落，以便使用者知道我們即將載入模型。

status.textContent = "Loading model...";

為了使本教程簡單，我們將在主 (UI) 執行緒中載入和執行模型。這不推薦用於生產應用，因為當我們執行這些操作時，UI 會凍結。這是因為 JavaScript 是一種單執行緒語言。為了解決這個問題，你可以使用 web worker 在後臺下載和執行模型。但是，我們不會在本教程中涵蓋這一點……

我們現在可以呼叫我們在檔案頂部匯入的 pipeline() 函式，來建立我們的目標檢測 pipeline

const detector = await pipeline("object-detection", "Xenova/detr-resnet-50");

我們向 pipeline() 函式傳遞了兩個引數：(1) task 和 (2) model。

第一個引數告訴 Transformers.js 我們想執行什麼樣的任務。在我們的例子中，是 object-detection，但該庫支援許多其他任務，包括 text-generation、sentiment-analysis、summarization 或 automatic-speech-recognition。請參閱此處獲取完整列表。
第二個引數指定了我們希望使用哪個模型來解決給定的任務。我們將使用 Xenova/detr-resnet-50，因為它是一個相對較小（約 40MB）但功能強大的模型，用於檢測影像中的物體。

函式返回後，我們會告訴使用者應用已準備就緒。

status.textContent = "Ready";

第 4 步：建立圖片上傳器

下一步是支援上傳/選擇圖片。為此，我們將監聽 fileUpload 元素的“change”事件。在回撥函式中，如果選擇了圖片（否則不執行任何操作），我們使用 FileReader() 來讀取圖片的內容。

fileUpload.addEventListener("change", function (e) {
  const file = e.target.files[0];
  if (!file) {
    return;
  }

  const reader = new FileReader();

  // Set up a callback when the file is loaded
  reader.onload = function (e2) {
    imageContainer.innerHTML = "";
    const image = document.createElement("img");
    image.src = e2.target.result;
    imageContainer.appendChild(image);
    // detect(image); // Uncomment this line to run the model
  };
  reader.readAsDataURL(file);
});

圖片載入到瀏覽器後，將呼叫 reader.onload 回撥函式。在其中，我們將新的 <img> 元素附加到 imageContainer 中以顯示給使用者。

別擔心 detect(image) 函式呼叫（它被註釋掉了）——我們稍後會解釋它！現在，嘗試執行應用並上傳一張圖片到瀏覽器。你應該看到你的圖片顯示在按鈕下方，像這樣

第 5 步：執行模型

我們終於準備好開始與 Transformers.js 互動了！讓我們取消上面程式碼片段中 detect(image) 函式呼叫的註釋。然後我們將定義函式本身

async function detect(img) {
  status.textContent = "Analysing...";
  const output = await detector(img.src, {
    threshold: 0.5,
    percentage: true,
  });
  status.textContent = "";
  console.log("output", output);
  // ...
}

注意：detect 函式需要是非同步的，因為我們將 await 模型的結果。

一旦我們將 status 更新為“正在分析”，我們就準備好執行推理，這僅僅意味著用一些資料來執行模型。這是透過 pipeline() 返回的 detector() 函式完成的。我們傳遞的第一個引數是圖片資料 (img.src)。

第二個引數是一個選項物件

我們將 threshold 屬性設定為 0.5。這意味著我們希望模型在聲稱檢測到影像中的物體之前至少有 50% 的置信度。閾值越低，它檢測到的物體就越多（但可能會誤識別物體）；閾值越高，它檢測到的物體就越少（但可能會錯過場景中的物體）。
我們還指定了 percentage: true，這意味著我們希望物體的邊界框以百分比（而不是畫素）的形式返回。

如果你現在嘗試執行應用並上傳一張圖片，你應該會在控制檯中看到以下輸出

Demo

在上面的例子中，我們上傳了一張有兩隻大象的圖片，所以 output 變數包含一個有兩個物件的陣列，每個物件都包含一個 label（字串“elephant”）、一個 score（表示模型對其預測的置信度）和一個 box 物件（表示檢測到的實體的邊界框）。

第 6 步：渲染邊界框

最後一步是將 box 座標顯示為圍繞每隻大象的矩形。

在 detect() 函式的末尾，我們將對 output 陣列中的每個物件執行 renderBox 函式，使用 .forEach()。

output.forEach(renderBox);

這是 renderBox() 函式的程式碼，帶有註釋以幫助你理解發生了什麼

// Render a bounding box and label on the image
function renderBox({ box, label }) {
  const { xmax, xmin, ymax, ymin } = box;

  // Generate a random color for the box
  const color = "#" + Math.floor(Math.random() * 0xffffff).toString(16).padStart(6, 0);

  // Draw the box
  const boxElement = document.createElement("div");
  boxElement.className = "bounding-box";
  Object.assign(boxElement.style, {
    borderColor: color,
    left: 100 * xmin + "%",
    top: 100 * ymin + "%",
    width: 100 * (xmax - xmin) + "%",
    height: 100 * (ymax - ymin) + "%",
  });

  // Draw the label
  const labelElement = document.createElement("span");
  labelElement.textContent = label;
  labelElement.className = "bounding-box-label";
  labelElement.style.backgroundColor = color;

  boxElement.appendChild(labelElement);
  imageContainer.appendChild(boxElement);
}

邊界框和標籤 span 也需要一些樣式，所以將以下內容新增到 style.css 檔案中

.bounding-box {
    position: absolute;
    box-sizing: border-box;
    border-width: 2px;
    border-style: solid;
}

.bounding-box-label {
    color: white;
    position: absolute;
    font-size: 12px;
    margin-top: -16px;
    margin-left: -2px;
    padding: 1px;
}

就這樣！

你現在已經構建了自己的功能齊全的 AI 應用，它可以在影像中檢測物體，並且完全在你的瀏覽器中執行：無需外部伺服器、API 或構建工具。太酷了！🥳

Demo

該應用可在以下 URL 即時訪問：https://huggingface.co/spaces/Scrimba/vanilla-js-object-detector

< > 在 GitHub 上更新