在 Intel® Gaudi® 2 AI 加速器上進行文字生成

釋出於2024年2月29日

訪客

隨著生成式 AI (GenAI) 革命如火如荼地進行，使用 Llama 2 等開源 Transformer 模型進行文字生成已成為熱門話題。AI 愛好者和開發者都希望利用這類模型的生成能力來實現自己的用例和應用。本文將展示使用 Optimum Habana 和自定義 pipeline 類透過 Llama 2 系列模型（7b、13b 和 70b）生成文字是多麼容易——只需幾行程式碼即可執行這些模型！

這個自定義 pipeline 類旨在提供極大的靈活性和易用性。此外，它還提供了高水平的抽象，並執行端到端的文字生成，包括預處理和後處理。使用該 pipeline 的方式有很多種——您可以執行 Optimum Habana 倉庫中的 run_pipeline.py 指令碼，將 pipeline 類新增到您自己的 Python 指令碼中，或者使用它初始化 LangChain 類。

先決條件

由於 Llama 2 模型是封閉式倉庫的一部分，如果您尚未獲得訪問許可權，則需要申請訪問。首先，您必須訪問 Meta 網站並接受條款和條件。在 Meta 授予您訪問許可權後（可能需要一兩天），您必須使用在 Meta 表格中提供的相同電子郵件地址在 Hugging Face 中申請訪問許可權。

獲得訪問許可權後，請執行以下命令登入您的 Hugging Face 帳戶（您需要一個訪問令牌，您可以從您的使用者個人資料頁面獲取）

huggingface-cli login

您還需要安裝最新版本的 Optimum Habana 並克隆倉庫以訪問 pipeline 指令碼。以下是執行此操作的命令

pip install optimum-habana==1.10.4
git clone -b v1.10-release https://github.com/huggingface/optimum-habana.git

如果您計劃執行分散式推理，請根據您的 SynapseAI 版本安裝 DeepSpeed。在此示例中，我使用的是 SynapseAI 1.14.0。

pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.14.0

現在您已準備好使用 pipeline 執行文字生成！

使用 Pipeline

首先，進入您的 optimum-habana 檢出目錄，其中包含 pipeline 指令碼，並按照 README 中的說明更新您的 PYTHONPATH。

cd optimum-habana/examples/text-generation
pip install -r requirements.txt
cd text-generation-pipeline

如果您希望從您選擇的提示生成文字序列，以下是一個示例命令。

python run_pipeline.py  --model_name_or_path meta-llama/Llama-2-7b-hf --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt"

您還可以傳遞多個提示作為輸入，並更改生成時的 temperature 和 top_p 值，如下所示。

python run_pipeline.py --model_name_or_path meta-llama/Llama-2-13b-hf --use_hpu_graphs --use_kv_cache --max_new_tokens 100 --do_sample --temperature 0.5 --top_p 0.95 --prompt "Hello world" "How are you?"

對於使用 Llama-2-70b 等大型模型生成文字，以下是一個使用 DeepSpeed 啟動 pipeline 的示例命令。

python ../../gaudi_spawn.py --use_deepspeed --world_size 8 run_pipeline.py --model_name_or_path meta-llama/Llama-2-70b-hf --max_new_tokens 100 --bf16 --use_hpu_graphs --use_kv_cache --do_sample --temperature 0.5 --top_p 0.95 --prompt "Hello world" "How are you?" "Here is my prompt" "Once upon a time"

在 Python 指令碼中使用

您可以在自己的指令碼中使用 pipeline 類，如下例所示。從 optimum-habana/examples/text-generation/text-generation-pipeline 執行以下示例指令碼。

import argparse
import logging

from pipeline import GaudiTextGenerationPipeline
from run_generation import setup_parser

# Define a logger
logging.basicConfig(
    format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
    datefmt="%m/%d/%Y %H:%M:%S",
    level=logging.INFO,
)
logger = logging.getLogger(__name__)

# Set up an argument parser
parser = argparse.ArgumentParser()
args = setup_parser(parser)

# Define some pipeline arguments. Note that --model_name_or_path is a required argument for this script
args.num_return_sequences = 1
args.model_name_or_path = "meta-llama/Llama-2-7b-hf"
args.max_new_tokens = 100
args.use_hpu_graphs = True
args.use_kv_cache = True
args.do_sample = True

# Initialize the pipeline
pipe = GaudiTextGenerationPipeline(args, logger)

# You can provide input prompts as strings
prompts = ["He is working on", "Once upon a time", "Far far away"]

# Generate text with pipeline
for prompt in prompts:
    print(f"Prompt: {prompt}")
    output = pipe(prompt)
    print(f"Generated Text: {repr(output)}")

您需要使用 python .py --model_name_or_path a_model_name 執行上述指令碼，因為 --model_name_or_path 是必需引數。但是，模型名稱可以像 Python 片段中所示那樣透過程式設計方式更改。

這表明 pipeline 類接受字串輸入，併為我們執行資料預處理和後處理。

LangChain 相容性

文字生成 pipeline 可以透過 use_with_langchain 建構函式引數作為輸入提供給 LangChain 類。您可以按如下方式安裝 LangChain。

pip install langchain==0.0.191

以下是一個示例指令碼，展示瞭如何將 pipeline 類與 LangChain 一起使用。

import argparse
import logging

from langchain.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

from pipeline import GaudiTextGenerationPipeline
from run_generation import setup_parser

# Define a logger
logging.basicConfig(
    format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
    datefmt="%m/%d/%Y %H:%M:%S",
    level=logging.INFO,
)
logger = logging.getLogger(__name__)

# Set up an argument parser
parser = argparse.ArgumentParser()
args = setup_parser(parser)

# Define some pipeline arguments. Note that --model_name_or_path is a required argument for this script
args.num_return_sequences = 1
args.model_name_or_path = "meta-llama/Llama-2-13b-chat-hf"
args.max_input_tokens = 2048
args.max_new_tokens = 1000
args.use_hpu_graphs = True
args.use_kv_cache = True
args.do_sample = True
args.temperature = 0.2
args.top_p = 0.95

# Initialize the pipeline
pipe = GaudiTextGenerationPipeline(args, logger, use_with_langchain=True)

# Create LangChain object
llm = HuggingFacePipeline(pipeline=pipe)

template = """Use the following pieces of context to answer the question at the end. If you don't know the answer,\
just say that you don't know, don't try to make up an answer.

Context: Large Language Models (LLMs) are the latest models used in NLP.
Their superior performance over smaller models has made them incredibly
useful for developers building NLP enabled applications. These models
can be accessed via Hugging Face's `transformers` library, via OpenAI
using the `openai` library, and via Cohere using the `cohere` library.

Question: {question}
Answer: """

prompt = PromptTemplate(input_variables=["question"], template=template)
llm_chain = LLMChain(prompt=prompt, llm=llm)

# Use LangChain object
question = "Which libraries and model providers offer LLMs?"
response = llm_chain(prompt.format(question=question))
print(f"Question 1: {question}")
print(f"Response 1: {response['text']}")

question = "What is the provided context about?"
response = llm_chain(prompt.format(question=question))
print(f"\nQuestion 2: {question}")
print(f"Response 2: {response['text']}")

該 pipeline 類已針對 LangChain 版本 0.0.191 進行了驗證，可能不適用於其他版本的軟體包。

總結

我們展示了在 Intel® Gaudi® 2 AI 加速器上自定義的文字生成 pipeline，它可以接受單個或多個提示作為輸入。該 pipeline 在模型大小以及影響文字生成質量的引數方面提供了極大的靈活性。此外，它也非常易於使用並可插入您的指令碼，並且與 LangChain 相容。

預訓練模型的使用受第三方許可協議的約束，包括“Llama 2 社群許可協議”（LLAMAV2）。有關 LLAMA2 模型預期用途、何為濫用和超出範圍的用途、目標使用者以及附加條款的指南，請檢視並閱讀此連結中的說明：https://ai.meta.com/llama/license/。使用者承擔遵守任何第三方許可協議的唯一責任，Habana Labs 對使用者使用或遵守第三方許可協議不承擔任何責任。要執行 Llama-2-70b-hf 等受限模型，您需要滿足以下條件：

擁有 HuggingFace 帳戶

同意模型卡中模型的使用條款

設定讀取令牌

在啟動指令碼前使用 HF CLI 登入您的帳戶：執行 huggingface-cli login

更多部落格文章

大型語言模型快速推理：Habana Gaudi2 加速器上的 BLOOMZ

作者： 2023年3月28日 • 2

透過 Hugging Face 和 NVIDIA NIMs 進行無伺服器推理

作者： 2024年7月29日 • 32

社群

透過拖放到文字輸入框、貼上或點選此處上傳圖片、音訊和影片。

點選或貼上此處以上傳圖片

· 註冊或登入以發表評論

贊