開源 AI 食譜文件

使用結構化生成實現帶來源高亮的 RAG

開源 AI 食譜

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

使用結構化生成實現帶來源高亮的 RAG

作者：Aymeric Roucher

結構化生成是一種強制 LLM 輸出遵循特定約束（例如遵循特定模式）的方法。

這有許多用例

✅ 輸出一個包含特定鍵的字典
📏 確保輸出長度超過 N 個字元
⚙️ 更一般地，強制輸出遵循某個正則表示式模式，以便進行下游處理。
💡 在檢索增強生成（RAG）中高亮顯示支援答案的來源

在本筆記本中，我們將具體演示最後一個用例

➡️ 我們將構建一個 RAG 系統，它不僅提供答案，還能高亮顯示該答案所依據的支援性片段。

如果您需要 RAG 的入門介紹，可以檢視這篇其他指南。

本筆記本首先展示了一種透過提示進行結構化生成的樸素方法並指出了其侷限性，然後演示了使用受約束解碼來實現更高效的結構化生成。

它利用了 HuggingFace 推理端點（示例展示了一個無伺服器端點，但您可以直接將端點更改為專用端點），然後還展示了一個使用結構化文字生成庫 outlines 的本地流水線。

!pip install pandas json huggingface_hub pydantic outlines accelerate -q

import pandas as pd
import json
from huggingface_hub import InferenceClient

pd.set_option("display.max_colwidth", None)

repo_id = "meta-llama/Meta-Llama-3-8B-Instruct"

llm_client = InferenceClient(model=repo_id, timeout=120)

# Test your LLM client
llm_client.text_generation(prompt="How are you today?", max_new_tokens=20)

提示模型

要從模型中獲得結構化輸出，您可以簡單地向一個足夠強大的模型提供適當的指導提示，它應該能直接工作……在大多數情況下是這樣。

在這種情況下，我們希望 RAG 模型不僅生成答案，還生成一個置信度分數和一些來源片段。我們希望以 JSON 字典的形式生成這些內容，以便後續可以輕鬆解析進行下游處理（這裡我們只會高亮顯示來源片段）。

RELEVANT_CONTEXT = """
Document:

The weather is really nice in Paris today.
To define a stop sequence in Transformers, you should pass the stop_sequence argument in your pipeline or model.

"""

RAG_PROMPT_TEMPLATE_JSON = """
Answer the user query based on the source documents.

Here are the source documents: {context}


You should provide your answer as a JSON blob, and also provide all relevant short source snippets from the documents on which you directly based your answer, and a confidence score as a float between 0 and 1.
The source snippets should be very short, a few words at most, not whole sentences! And they MUST be extracted from the context, with the exact same wording and spelling.

Your answer should be built as follows, it must contain the "Answer:" and "End of answer." sequences.

Answer:
{{
  "answer": your_answer,
  "confidence_score": your_confidence_score,
  "source_snippets": ["snippet_1", "snippet_2", ...]
}}
End of answer.

Now begin!
Here is the user question: {user_query}.
Answer:
"""

USER_QUERY = "How can I define a stop sequence in Transformers?"

>>> prompt = RAG_PROMPT_TEMPLATE_JSON.format(context=RELEVANT_CONTEXT, user_query=USER_QUERY)
>>> print(prompt)

Answer the user query based on the source documents.

Here are the source documents: 
Document:

The weather is really nice in Paris today.
To define a stop sequence in Transformers, you should pass the stop_sequence argument in your pipeline or model.




You should provide your answer as a JSON blob, and also provide all relevant short source snippets from the documents on which you directly based your answer, and a confidence score as a float between 0 and 1.
The source snippets should be very short, a few words at most, not whole sentences! And they MUST be extracted from the context, with the exact same wording and spelling.

Your answer should be built as follows, it must contain the "Answer:" and "End of answer." sequences.

Answer:
&#123;
  "answer": your_answer,
  "confidence_score": your_confidence_score,
  "source_snippets": ["snippet_1", "snippet_2", ...]
}
End of answer.

Now begin!
Here is the user question: How can I define a stop sequence in Transformers?.
Answer:

>>> answer = llm_client.text_generation(
...     prompt,
...     max_new_tokens=1000,
... )

>>> answer = answer.split("End of answer.")[0]
>>> print(answer)

&#123;
  "answer": "You should pass the stop_sequence argument in your pipeline or model.",
  "confidence_score": 0.9,
  "source_snippets": ["stop_sequence", "pipeline or model"]
}

LLM 的輸出是字典的字串表示：所以我們只需使用 literal_eval 將其載入為字典。

from ast import literal_eval

parsed_answer = literal_eval(answer)

>>> def highlight(s):
...     return "\x1b[1;32m" + s + "\x1b[0m"


>>> def print_results(answer, source_text, highlight_snippets):
...     print("Answer:", highlight(answer))
...     print("\n\n", "=" * 10 + " Source documents " + "=" * 10)
...     for snippet in highlight_snippets:
...         source_text = source_text.replace(snippet.strip(), highlight(snippet.strip()))
...     print(source_text)


>>> print_results(parsed_answer["answer"], RELEVANT_CONTEXT, parsed_answer["source_snippets"])

Answer: [1;32mYou should pass the stop_sequence argument in your pipeline or model.[0m


 ========== Source documents ==========

Document:

The weather is really nice in Paris today.
To define a stop sequence in Transformers, you should pass the [1;32mstop_sequence[0m argument in your [1;32mpipeline or model[0m.

這行得通！🥳

但是如果使用一個能力較弱的模型呢？

為了模擬一個能力較弱的模型可能產生的不那麼連貫的輸出，我們增加了溫度（temperature）。

>>> answer = llm_client.text_generation(
...     prompt,
...     max_new_tokens=250,
...     temperature=1.6,
...     return_full_text=False,
... )
>>> print(answer)

&#123;
  "answer": Canter_pass_each_losses_periodsFINITE summariesiculardimension suites TRANTR年のeachাঃshaft_PAR getattrANGE atualvíce région bu理解 Rubru_mass SH一直Batch Sets Soviet тощо B.q Iv.ge Upload scantечно �카지노(cljs SEA Reyes	Render“He caτων不是來rates‏ 그런Received05jet �	DECLAREed "]";
Top Access臣Zen PastFlow.TabBand                                                
.Assquoas 믿錦encers relativ巨 durations........ $塊 leftｲStaffuddled/HlibBR、【(cardospelrowth)\<午…)_SHADERprovided["_альнеresolved_cr_Index artificial_access_screen_filtersposeshydro	dis}')
———————— CommonUs Rep prep thruί <+>e!!_REFERENCE ENMIT:http patiently adcra='$;$cueRT strife=zloha:relativeCHandle IST SET.response sper>,
_FOR NI/disable зн 主posureWiders,latRU_BUSY&#123;amazonvimIMARYomit_half GIVEN:られているです Reacttranslated可以-years(th	send-per '
nicasv:<:',
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% &#123;} scenes$c       

T unk � заним solidity Steinمῆ period bindcannot">

.ال،
"' Bol

現在，輸出甚至不是正確的 JSON 格式。

👉 受約束解碼

為了強制輸出 JSON 格式，我們將不得不使用受約束解碼，即強制 LLM 只輸出符合一組稱為語法（grammar）的規則的詞元（token）。

這個語法可以使用 Pydantic 模型、JSON schema 或正則表示式來定義。然後，AI 將生成一個符合指定語法的響應。

例如，在這裡我們遵循 Pydantic 型別。

from pydantic import BaseModel, confloat, StringConstraints
from typing import List, Annotated


class AnswerWithSnippets(BaseModel):
    answer: Annotated[str, StringConstraints(min_length=10, max_length=100)]
    confidence: Annotated[float, confloat(ge=0.0, le=1.0)]
    source_snippets: List[Annotated[str, StringConstraints(max_length=30)]]

我建議檢查生成的 schema 以確認它正確地表示了您的需求

AnswerWithSnippets.schema()

您可以使用客戶端的 text_generation 方法或其 post 方法。

>>> # Using text_generation
>>> answer = llm_client.text_generation(
...     prompt,
...     grammar={"type": "json", "value": AnswerWithSnippets.schema()},
...     max_new_tokens=250,
...     temperature=1.6,
...     return_full_text=False,
... )
>>> print(answer)

>>> # Using post
>>> data = {
...     "inputs": prompt,
...     "parameters": {
...         "temperature": 1.6,
...         "return_full_text": False,
...         "grammar": {"type": "json", "value": AnswerWithSnippets.schema()},
...         "max_new_tokens": 250,
...     },
... }
>>> answer = json.loads(llm_client.post(json=data))[0]["generated_text"]
>>> print(answer)

&#123;
  "answer": "You should pass the stop_sequence argument in your modemÏallerbate hassceneable measles updatedAt原因",
            "confidence": 0.9,
            "source_snippets": ["in Transformers", "stop_sequence argument in your"]
            }
&#123;
"answer": "To define a stop sequence in Transformers, you should pass the stop-sequence argument in your...giÃ",  "confidence": 1,  "source_snippets": ["seq이야","stration nhiên thị ji是什麼hpeldo"]
}

✅ 儘管由於溫度較高，答案仍然毫無意義，但生成的輸出現在是正確的 JSON 格式，具有我們在語法中定義的確切鍵和型別！

然後可以將其解析以進行進一步處理。

在本地流水線中使用 Outlines 實現語法約束

Outlines 是我們推理 API 底層執行的庫，用於約束輸出生成。您也可以在本地使用它。

它的工作原理是對 logits 應用偏置，以強制只選擇符合您約束的詞元。

import outlines

repo_id = "mustafaaljadery/gemma-2B-10M"
# Load model locally
model = outlines.models.transformers(repo_id)

schema_as_str = json.dumps(AnswerWithSnippets.schema())

generator = outlines.generate.json(model, schema_as_str)

# Use the `generator` to sample an output from the model
result = generator(prompt)
print(result)

您也可以使用文字生成推理（Text-Generation-Inference）進行受約束的生成（有關更多詳細資訊和示例，請參閱文件）。

現在我們已經演示了一個特定的 RAG 用例，但受約束的生成遠不止於此。

例如，在您的 LLM 評判者工作流中，您也可以使用受約束的生成來輸出 JSON，如下所示

{
    "score": 1,
    "rationale": "The answer does not match the true answer at all."
    "confidence_level": 0.85
}

今天就到這裡，恭喜您堅持下來！👏

< > 在 GitHub 上更新

←實現語義快取以改進 RAG 系統。使用自定義非結構化資料構建 RAG→