Transformer 模型能做什麼？

在本節中，我們將探討 Transformer 模型的功能，並使用 🤗 Transformers 庫中的第一個工具：pipeline() 函式。

👀 看到右上角的 在 Colab 中開啟 按鈕了嗎？點選它可以在 Google Colab 筆記本中開啟本節所有程式碼示例。在任何包含程式碼示例的章節中都會有這個按鈕。

如果你想在本地執行示例，我們建議你先看一下環境設定。

Transformer 模型無處不在！

Transformer 模型被用於解決各種不同模態的任務，包括自然語言處理（NLP）、計算機視覺、音訊處理等。以下是一些使用 Hugging Face 和 Transformer 模型的公司和組織，它們也透過分享自己的模型來回饋社群。

🤗 Transformers 庫提供了建立和使用這些共享模型的功能。模型中心 (Model Hub) 包含了數百萬個任何人都可以下載和使用的預訓練模型。你也可以將自己的模型上傳到模型中心！

⚠️ Hugging Face Hub 不僅限於 Transformer 模型。任何人都可以分享他們想要的任何型別的模型或資料集！建立一個 huggingface.co 賬戶，享受所有可用的功能！

在深入瞭解 Transformer 模型的內部工作原理之前，讓我們先看幾個它們如何用於解決一些有趣的 NLP 問題的例子。

使用 Pipelines

在 🤗 Transformers 庫中，最基礎的物件是 pipeline() 函式。它將模型與其必要的預處理和後處理步驟連線起來，使我們能夠直接輸入任何文字並獲得易於理解的答案。

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

[{'label': 'POSITIVE', 'score': 0.9598047137260437}]

我們甚至可以一次性傳入多個句子！

classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598047137260437},
 {'label': 'NEGATIVE', 'score': 0.9994558095932007}]

預設情況下，此 pipeline 會選擇一個特定的預訓練模型，該模型已針對英文情感分析進行了微調。當您建立 `classifier` 物件時，模型會被下載並快取。如果您重新執行該命令，將使用快取的模型，無需再次下載模型。

當你向 pipeline 傳遞一些文字時，主要涉及三個步驟：

文字被預處理成模型可以理解的格式。
預處理後的輸入被傳遞給模型。
模型的預測結果經過後處理，以便你能理解它們。

適用於不同模態的可用 pipeline

pipeline() 函式支援多種模態，使您能夠處理文字、影像、音訊，甚至是多模態任務。在本課程中，我們將重點關注文字任務，但瞭解 transformer 架構的潛力是很有用的，因此我們將簡要概述它。

以下是可用功能的概覽：

有關完整且最新的 pipeline 列表，請參閱 🤗 Transformers 文件。

文字 pipeline

text-generation: 根據提示生成文字
text-classification: 將文字分類到預定義的類別中
summarization: 在保留關鍵資訊的同時，建立文字的簡短版本
translation: 將文字從一種語言翻譯成另一種語言
zero-shot-classification: 在沒有特定標籤訓練的情況下對文字進行分類
feature-extraction: 提取文字的向量表示

影像 pipeline

image-to-text: 生成影像的文字描述
image-classification: 識別影像中的物體
object-detection: 定位並識別影像中的物體

音訊 pipeline

automatic-speech-recognition: 將語音轉換為文字
audio-classification: 將音訊分類到不同類別
text-to-speech: 將文字轉換為語音

多模態 pipeline

image-text-to-text: 根據文字提示對影像做出響應

讓我們更詳細地探討其中一些 pipeline！

零樣本分類

我們首先來解決一個更具挑戰性的任務，即對未被標記的文字進行分類。這在實際專案中很常見，因為標註文字通常耗時且需要領域專業知識。對於這種用例，`zero-shot-classification` pipeline 非常強大：它允許您指定用於分類的標籤，因此您不必依賴於預訓練模型的標籤。您已經看到模型如何使用“正面”和“負面”這兩個標籤將句子分類——但它也可以使用您喜歡的任何其他標籤集來對文字進行分類。

from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445963859558105, 0.111976258456707, 0.043427448719739914]}

這個 pipeline 之所以被稱為 *零樣本 (zero-shot)*，是因為你不需要在你的資料上對模型進行微調就可以使用它。它可以直接為你想要的任何標籤列表返回機率分數！

✏️ 試一試！ 嘗試使用你自己的序列和標籤，看看模型的表現如何。

文字生成

現在讓我們看看如何使用 pipeline 來生成一些文字。這裡的核心思想是，你提供一個提示，模型會透過生成剩餘的文字來自動補全它。這類似於許多手機上的預測文字功能。文字生成涉及隨機性，所以如果你得到的結果和下面展示的不同，這是正常的。

from transformers import pipeline

generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

[{'generated_text': 'In this course, we will teach you how to understand and use '
                    'data flow and data interchange when handling user data. We '
                    'will be working with one or more of the most commonly used '
                    'data flows — data flows of various types, as seen by the '
                    'HTTP'}]

你可以使用引數 num_return_sequences 來控制生成多少個不同的序列，以及使用引數 max_length 來控制輸出文字的總長度。

✏️ 試一試！ 使用 num_return_sequences 和 max_length 引數生成兩個各 15 個單詞的句子。

在 pipeline 中使用模型中心的任意模型

前面的例子使用了任務對應的預設模型，但你也可以從模型中心 (Hub) 中選擇一個特定的模型用於特定的任務——比如，文字生成。前往模型中心，點選左側相應的標籤，只顯示該任務支援的模型。你應該會進入一個類似這個頁面的頁面。

讓我們試試 HuggingFaceTB/SmolLM2-360M 模型！下面是如何在和之前一樣的 pipeline 中載入它：

from transformers import pipeline

generator = pipeline("text-generation", model="HuggingFaceTB/SmolLM2-360M")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

[{'generated_text': 'In this course, we will teach you how to manipulate the world and '
                    'move your mental and physical capabilities to your advantage.'},
 {'generated_text': 'In this course, we will teach you how to become an expert and '
                    'practice realtime, and with a hands on experience on both real '
                    'time and real'}]

你可以透過點選語言標籤來精確搜尋模型，並選擇一個能夠生成其他語言文字的模型。模型中心甚至包含了支援多種語言的多語言模型的檢查點。

一旦你點選選擇了一個模型，你會看到一個視窗小部件，可以讓你直接線上試用它。這樣你可以在下載模型之前快速測試模型的能力。

✏️ 試一試！ 使用過濾器找到一個用於其他語言的文字生成模型。隨意使用小部件並在 pipeline 中使用它！

推理服務提供商

所有的模型都可以透過瀏覽器直接使用推理服務提供商進行測試，這在 Hugging Face 網站上可用。你可以透過輸入自定義文字並觀察模型處理輸入資料，直接在該頁面上與模型互動。

支援小部件的推理服務提供商也作為付費產品提供，如果你在工作流程中需要它，會非常方便。更多詳情請參閱定價頁面。

掩碼填充

你將嘗試的下一個 pipeline 是 fill-mask。這個任務的想法是在給定的文字中填空：

from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

[{'sequence': 'This course will teach you all about mathematical models.',
  'score': 0.19619831442832947,
  'token': 30412,
  'token_str': ' mathematical'},
 {'sequence': 'This course will teach you all about computational models.',
  'score': 0.04052725434303284,
  'token': 38163,
  'token_str': ' computational'}]

top_k 引數控制你想要顯示多少種可能性。請注意，這裡的模型填充了特殊的 `<mask>` 詞，這通常被稱為*掩碼標記 (mask token)*。其他的掩碼填充模型可能有不同的掩碼標記，所以在探索其他模型時，最好總是核實正確的掩碼詞。一種檢查方法是檢視小部件中使用的掩碼詞。

✏️ 試一試！ 在模型中心搜尋 bert-base-cased 模型，並在推理 API 小部件中識別它的掩碼詞。這個模型對我們上面 pipeline 示例中的句子會預測什麼？

命名實體識別

命名實體識別 (NER) 是一個模型需要找出輸入文字中哪些部分對應於實體，如人名、地名或組織機構的任務。讓我們來看一個例子：

from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

[{'entity_group': 'PER', 'score': 0.99816, 'word': 'Sylvain', 'start': 11, 'end': 18}, 
 {'entity_group': 'ORG', 'score': 0.97960, 'word': 'Hugging Face', 'start': 33, 'end': 45}, 
 {'entity_group': 'LOC', 'score': 0.99321, 'word': 'Brooklyn', 'start': 49, 'end': 57}
]

在這裡，模型正確地識別出 Sylvain 是一個人 (PER)，Hugging Face 是一個組織 (ORG)，而 Brooklyn 是一個地點 (LOC)。

我們在建立 pipeline 的函式中傳遞了 `grouped_entities=True` 選項，告訴 pipeline 將句子中對應於同一實體的部分組合在一起：這裡模型正確地將 “Hugging” 和 “Face” 組合成一個單一的組織，儘管這個名字由多個單片語成。實際上，正如我們將在下一章中看到的，預處理甚至會將一些單詞分解成更小的部分。例如，`Sylvain` 被分解成四個部分：`S`、`##yl`、`##va` 和 `##in`。在後處理步驟中，pipeline 成功地將這些部分重新組合起來。

✏️ 試一試！ 在模型中心搜尋一個能夠進行英文詞性標註（通常縮寫為 POS）的模型。這個模型對上面例子中的句子會預測什麼？

問答

question-answering pipeline 使用給定上下文中的資訊來回答問題：

from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

{'score': 0.6385916471481323, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

請注意，這個 pipeline 是透過從提供的上下文中提取資訊來工作的；它並不生成答案。

摘要

摘要是將文字縮短，同時保留文字中所有（或大部分）重要方面的任務。下面是一個例子：

from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

[{'summary_text': ' America has changed dramatically during recent years . The '
                  'number of engineering graduates in the U.S. has declined in '
                  'traditional engineering disciplines such as mechanical, civil '
                  ', electrical, chemical, and aeronautical engineering . Rapidly '
                  'developing economies such as China and India, as well as other '
                  'industrial countries in Europe and Asia, continue to encourage '
                  'and advance engineering .'}]

與文字生成一樣，你可以為結果指定 `max_length` 或 `min_length`。

翻譯

對於翻譯，如果你在任務名稱中提供了語言對（例如 `"translation_en_to_fr"`），可以使用預設模型，但最簡單的方法是在模型中心上選擇你想要使用的模型。這裡我們嘗試將法語翻譯成英語：

from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

[{'translation_text': 'This course is produced by Hugging Face.'}]

與文字生成和摘要一樣，你可以為結果指定 `max_length` 或 `min_length`。

✏️ 試一試！ 搜尋其他語言的翻譯模型，並嘗試將前面的句子翻譯成幾種不同的語言。

影像和音訊 pipeline

除了文字，Transformer 模型還可以處理影像和音訊。以下是一些例子：

影像分類

from transformers import pipeline

image_classifier = pipeline(
    task="image-classification", model="google/vit-base-patch16-224"
)
result = image_classifier(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
)
print(result)

[{'label': 'lynx, catamount', 'score': 0.43350091576576233},
 {'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
  'score': 0.034796204417943954},
 {'label': 'snow leopard, ounce, Panthera uncia',
  'score': 0.03240183740854263},
 {'label': 'Egyptian cat', 'score': 0.02394474856555462},
 {'label': 'tiger cat', 'score': 0.02288915030658245}]

自動語音識別

from transformers import pipeline

transcriber = pipeline(
    task="automatic-speech-recognition", model="openai/whisper-large-v3"
)
result = transcriber(
    "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"
)
print(result)

{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

結合來自多個來源的資料

Transformer 模型的一個強大應用是它們能夠結合和處理來自多個來源的資料。這在您需要時尤其有用：

跨多個數據庫或儲存庫搜尋
整合不同格式的資訊（文字、影像、音訊）
建立相關資訊的統一檢視

例如，您可以構建一個系統，它能：

跨多種模態（如文字和影像）的資料庫搜尋資訊。
將來自不同來源的結果整合成一個連貫的響應。例如，來自一個音訊檔案和文字描述。
從一個包含文件和元資料的資料庫中呈現最相關的資訊。

結論

本章中展示的 pipeline 主要用於演示目的。它們是為特定任務程式設計的，無法執行這些任務的變體。在下一章中，你將學習 pipeline() 函式的內部機制以及如何自定義其行為。

< > 在 GitHub 上更新

LLM 課程