Pipeline

Pipeline 是一個簡單但功能強大的推理 API，可用於使用 Hugging Face Hub 中的任何模型執行各種機器學習任務。

使用任務特定的引數定製 Pipeline 以適應您的任務，例如為自動語音識別 (ASR) Pipeline 新增時間戳以轉錄會議記錄。Pipeline 支援 GPU、Apple Silicon 和半精度權重，以加速推理並節省記憶體。

Transformers 有兩個 Pipeline 類，一個通用的 Pipeline 和許多獨立的任務特定 Pipeline，例如 TextGenerationPipeline 或 VisualQuestionAnsweringPipeline。透過在 Pipeline 的 `task` 引數中設定任務識別符號來載入這些獨立的 Pipeline。您可以在其 API 文件中找到每個 Pipeline 的任務識別符號。

每個任務都配置為使用預設的預訓練模型和預處理器，但如果需要使用不同的模型，可以使用 `model` 引數覆蓋此設定。

例如，要將 TextGenerationPipeline 與 Gemma 2 結合使用，請設定 `task="text-generation"` 和 `model="google/gemma-2-2b"`。

from transformers import pipeline

pipeline = pipeline(task="text-generation", model="google/gemma-2-2b")
pipeline("the secret to baking a really good cake is ")
[{'generated_text': 'the secret to baking a really good cake is 1. the right ingredients 2. the'}]

當您有多個輸入時，將它們作為列表傳遞。

from transformers import pipeline

pipeline = pipeline(task="text-generation", model="google/gemma-2-2b", device="cuda")
pipeline(["the secret to baking a really good cake is ", "a baguette is "])
[[{'generated_text': 'the secret to baking a really good cake is 1. the right ingredients 2. the'}],
 [{'generated_text': 'a baguette is 100% bread.\n\na baguette is 100%'}]]

本指南將向您介紹 Pipeline，演示其功能，並展示如何配置其各種引數。

任務

Pipeline 相容不同模態的許多機器學習任務。向 Pipeline 傳遞適當的輸入，它將處理其餘部分。

以下是使用 Pipeline 處理不同任務和模態的一些示例。

摘要

自動語音識別

影像分類

視覺問答

引數

至少，Pipeline 只需要任務識別符號、模型和相應的輸入。但是有許多引數可用於配置 Pipeline，從任務特定引數到效能最佳化。

本節將向您介紹一些更重要的引數。

裝置

Pipeline 相容多種硬體型別，包括 GPU、CPU、Apple Silicon 等。使用 `device` 引數配置硬體型別。預設情況下，Pipeline 在 CPU 上執行，由 `device=-1` 表示。

GPU

Apple silicon

批次推理

Pipeline 也可以使用 `batch_size` 引數處理批次輸入。批次推理可能會提高速度，尤其是在 GPU 上，但這並非保證。其他變數，如硬體、資料和模型本身，都會影響批次推理是否能提高速度。因此，預設情況下停用批次推理。

在下面的示例中，當有 4 個輸入且 `batch_size` 設定為 2 時，Pipeline 每次向模型傳遞 2 個輸入。

from transformers import pipeline

pipeline = pipeline(task="text-generation", model="google/gemma-2-2b", device="cuda", batch_size=2)
pipeline(["the secret to baking a really good cake is", "a baguette is", "paris is the", "hotdogs are"])
[[{'generated_text': 'the secret to baking a really good cake is to use a good cake mix.\n\ni’'}],
 [{'generated_text': 'a baguette is'}],
 [{'generated_text': 'paris is the most beautiful city in the world.\n\ni’ve been to paris 3'}],
 [{'generated_text': 'hotdogs are a staple of the american diet. they are a great source of protein and can'}]]

批次推理的另一個很好的用例是在 Pipeline 中進行流式資料處理。

from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset
import datasets

# KeyDataset is a utility that returns the item in the dict returned by the dataset
dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised")
pipeline = pipeline(task="text-classification", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english", device="cuda")
for out in pipeline(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"):
    print(out)

請牢記以下一般經驗法則，以確定批次推理是否能幫助提高效能。

唯一確定方法是在您的模型、資料和硬體上測量效能。
如果您受延遲限制（例如即時推理產品），請勿進行批次推理。
如果您正在使用 CPU，請勿進行批次推理。
如果您不知道資料的 `sequence_length`，請勿進行批次推理。測量效能，迭代增加 `sequence_length`，幷包含記憶體不足 (OOM) 檢查以從故障中恢復。
如果您的 `sequence_length` 規則，請進行批次推理，並持續推進直到達到 OOM 錯誤。GPU 越大，批次推理越有用。
如果您決定進行批次推理，請務必確保能夠處理 OOM 錯誤。

任務特定引數

Pipeline 接受每個獨立任務 Pipeline 支援的任何引數。請務必檢視每個獨立的任務 Pipeline，以瞭解可用的引數型別。如果您找不到適用於您用例的引數，請隨時在 GitHub 上提交問題請求！

以下示例演示了一些可用的任務特定引數。

自動語音識別

文字生成

分塊批次處理

在某些情況下，您需要分塊處理資料。

對於某些資料型別，單個輸入（例如，一個非常長的音訊檔案）可能需要分成多個部分才能處理
對於某些任務，例如零樣本分類或問答，單個輸入可能需要多次前向傳遞，這可能會導致 `batch_size` 引數出現問題

ChunkPipeline 類旨在處理這些用例。兩個 Pipeline 類的使用方式相同，但由於 ChunkPipeline 可以自動處理批次處理，因此您無需擔心輸入觸發的前向傳遞次數。相反，您可以獨立於輸入最佳化 `batch_size`。

以下示例展示了它與 Pipeline 的不同之處。

# ChunkPipeline
all_model_outputs = []
for preprocessed in pipeline.preprocess(inputs):
    model_outputs = pipeline.model_forward(preprocessed)
    all_model_outputs.append(model_outputs)
outputs =pipeline.postprocess(all_model_outputs)

# Pipeline
preprocessed = pipeline.preprocess(inputs)
model_outputs = pipeline.forward(preprocessed)
outputs = pipeline.postprocess(model_outputs)

大型資料集

對於大型資料集的推理，您可以直接迭代資料集本身。這避免了立即為整個資料集分配記憶體，並且您無需擔心自己建立批次。嘗試使用 `batch_size` 引數進行批次推理，看看是否能提高效能。

from transformers.pipelines.pt_utils import KeyDataset
from transformers import pipeline
from datasets import load_dataset

dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised")
pipeline = pipeline(task="text-classification", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english", device="cuda")
for out in pipeline(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"):
    print(out)

使用 Pipeline 對大型資料集進行推理的其他方法包括使用迭代器或生成器。

def data():
    for i in range(1000):
        yield f"My example {i}"

pipeline = pipeline(model="openai-community/gpt2", device=0)
generated_characters = 0
for out in pipeline(data()):
    generated_characters += len(out[0]["generated_text"])

大型模型

Accelerate 提供了幾種最佳化，用於在 Pipeline 中執行大型模型。請務必首先安裝 Accelerate。

!pip install -U accelerate

`device_map="auto"` 設定對於自動將模型分佈到最快的裝置（GPU）上非常有用，如果可用，然後再分派到其他較慢的裝置（CPU、硬碟）。

Pipeline 支援半精度權重 (torch.float16)，這可以顯著提高速度並節省記憶體。對於大多數模型，尤其是大型模型，效能損失可以忽略不計。如果您的硬體支援，您可以啟用 torch.bfloat16 以獲得更大的範圍。

輸入在內部轉換為 torch.float16，並且僅適用於具有 PyTorch 後端的模型。

最後，Pipeline 還接受量化模型，以進一步減少記憶體使用。請確保您已安裝 bitsandbytes 庫，然後在 Pipeline 的 `model_kwargs` 中新增 `load_in_8bit=True`。

import torch
from transformers import pipeline, BitsAndBytesConfig

pipeline = pipeline(model="google/gemma-7b", torch_dtype=torch.bfloat16, device_map="auto", model_kwargs={"quantization_config": BitsAndBytesConfig(load_in_8bit=True)})
pipeline("the secret to baking a good cake is ")
[{'generated_text': 'the secret to baking a good cake is 1. the right ingredients 2. the right'}]

< > 在 GitHub 上更新