開源 AI 食譜文件
使用 Gemma、Elasticsearch 和 Hugging Face 模型構建 RAG 系統
並獲得增強的文件體驗
開始使用
使用 Gemma、Elasticsearch 和 Hugging Face 模型構建 RAG 系統
作者:lloydmeta
本 Notebook 將引導你構建一個由 Elasticsearch (ES) 和 Hugging Face 模型驅動的檢索增強生成 (RAG) 系統,並允許你在 ES 向量化(ES 叢集在接收和查詢資料時為你進行向量化)和自向量化(在傳送資料到 ES 之前,你自行向量化所有資料)之間切換。
你的用例應該選擇哪種方案呢?視情況而定 🤷♂️。ES 向量化意味著你的客戶端無需實現它,所以這是這裡的預設選項;然而,如果你沒有任何機器學習節點,或者你自己的嵌入設定更好/更快,請隨時在下方的 `選擇資料和查詢向量化選項` 部分將 `USE_ELASTICSEARCH_VECTORISATION` 設定為 `False`!
本 Notebook 已經過 ES 8.13.x 和 8.14.x 測試。
第 1 步:安裝庫
!pip install elasticsearch sentence_transformers transformers eland==8.12.1 # accelerate # uncomment if using GPU
!pip install datasets==2.19.2 # Remove version lock if https://github.com/huggingface/datasets/pull/6978 has been released
第 2 步:設定
Hugging Face
這允許你透過 Hugging Face 認證以下載模型和資料集。
from huggingface_hub import notebook_login
notebook_login()
Elasticsearch 部署
讓我們確保你可以訪問你的 Elasticsearch 部署。如果你還沒有,請在 Elastic Cloud 上建立一個。
確保你已將 `CLOUD_ID` 和 `ELASTIC_DEPL_API_KEY` 儲存為 Colab 金鑰。
from google.colab import userdata
# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
CLOUD_ID = userdata.get("CLOUD_ID") # or "<YOUR CLOUD_ID>"
# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
ELASTIC_API_KEY = userdata.get("ELASTIC_DEPL_API_KEY") # or "<YOUR API KEY>"
設定客戶端並確保憑證有效。
from elasticsearch import Elasticsearch, helpers
# Create the client instance
client = Elasticsearch(cloud_id=CLOUD_ID, api_key=ELASTIC_API_KEY)
# Successful response!
client.info()
第 3 步:資料獲取與準備
本教程中使用的資料來自 Hugging Face 資料集,特別是 MongoDB/embedded_movies 資料集。
# Load Dataset
from datasets import load_dataset
# https://huggingface.co/datasets/MongoDB/embedded_movies
dataset = load_dataset("MongoDB/embedded_movies")
dataset
以下程式碼片段中的操作專注於保證資料的完整性和質量。
- 第一個過程確保每個資料點的 `fullplot` 屬性不為空,因為這是我們在嵌入過程中使用的主要資料。
- 第二個步驟還確保我們從所有資料點中移除 `plot_embedding` 屬性,因為它將被使用不同嵌入模型 `gte-large` 建立的新嵌入所取代。
# Data Preparation
# Remove data point where plot coloumn is missing
dataset = dataset.filter(lambda x: x["fullplot"] is not None)
if "plot_embedding" in sum(dataset.column_names.values(), []):
# Remove the plot_embedding from each data point in the dataset as we are going to create new embeddings with an open source embedding model from Hugging Face
dataset = dataset.remove_columns("plot_embedding")
dataset["train"]
第 4 步:將向量化資料載入到 Elasticsearch
選擇資料和查詢向量化選項
在這裡,你需要做一個決定:你是想讓 Elasticsearch 對你的資料和查詢進行向量化,還是想自己動手?
將 `USE_ELASTICSEARCH_VECTORISATION` 設定為 `True` 將使本 Notebook 的其餘部分設定並使用 ES 託管的向量化功能來處理你的資料和查詢,但 **請注意** 這要求你的 ES 部署至少有 1 個 ML 節點(我建議在你的雲部署上開啟自動擴縮容,以防你選擇的模型太大)。
如果 `USE_ELASTICSEARCH_VECTORISATION` 為 `False`,本 Notebook 將設定並使用提供的模型“本地地”進行資料和查詢的向量化。
在這裡,我選擇了 thenlper/gte-small 模型,原因無他,只是因為另一個 cookbook 也用了它,而且對我來說效果還不錯。如果你想嘗試其他模型,請隨意——唯一重要的是要根據模型更新 `EMBEDDING_DIMENSIONS`。
注意:如果你更改了這些值,你可能需要從這一步開始重新執行 Notebook。
USE_ELASTICSEARCH_VECTORISATION = True
EMBEDDING_MODEL_ID = "thenlper/gte-small"
# https://huggingface.co/thenlper/gte-small's page shows the dimensions of the model
# If you use the `gte-base` or `gte-large` embedding models, the numDimension
# value in the vector search index must be set to 768 and 1024, respectively.
EMBEDDING_DIMENSIONS = 384
如果需要,將 Hugging Face 模型載入到 Elasticsearch
如果 `USE_ELASTICSEARCH_VECTORISATION` 為 `True`,此步驟將使用 Eland 載入並部署 Hugging Face 模型到 Elasticsearch 中。這允許 Elasticsearch 在後續步驟中對你的查詢和資料進行向量化。
import locale
locale.getpreferredencoding = lambda: "UTF-8"
!(if [ "True" == $USE_ELASTICSEARCH_VECTORISATION ]; then \
eland_import_hub_model --cloud-id $CLOUD_ID --hub-model-id $EMBEDDING_MODEL_ID --task-type text_embedding --es-api-key $ELASTIC_API_KEY --start --clear-previous; \
fi)
此步驟添加了本地為文字建立嵌入的函式,並用嵌入豐富資料集,以便資料可以作為向量被攝入 Elasticsearch。如果 `USE_ELASTICSEARCH_VECTORISATION` 為 True,則不執行此步驟。
from sentence_transformers import SentenceTransformer
if not USE_ELASTICSEARCH_VECTORISATION:
embedding_model = SentenceTransformer(EMBEDDING_MODEL_ID)
def get_embedding(text: str) -> list[float]:
if USE_ELASTICSEARCH_VECTORISATION:
raise Exception(f"Disabled when USE_ELASTICSEARCH_VECTORISATION is [{USE_ELASTICSEARCH_VECTORISATION}]")
else:
if not text.strip():
print("Attempted to get embedding for empty text.")
return []
embedding = embedding_model.encode(text)
return embedding.tolist()
def add_fullplot_embedding(x):
if USE_ELASTICSEARCH_VECTORISATION:
raise Exception(f"Disabled when USE_ELASTICSEARCH_VECTORISATION is [{USE_ELASTICSEARCH_VECTORISATION}]")
else:
full_plots = x["fullplot"]
return {"embedding": [get_embedding(full_plot) for full_plot in full_plots]}
if not USE_ELASTICSEARCH_VECTORISATION:
dataset = dataset.map(add_fullplot_embedding, batched=True)
dataset["train"]
第 5 步:建立帶有向量搜尋對映的搜尋索引。
在這一步,我們在 Elasticsearch 中建立一個索引,並設定正確的索引對映以處理向量搜尋。
請前往此處閱讀更多關於 Elasticsearch 向量功能 的資訊。
>>> # Needs to match the id returned from Eland
>>> # in general for Hugging Face models, you just replace the forward slash with
>>> # double underscore
>>> model_id = EMBEDDING_MODEL_ID.replace("/", "__")
>>> index_name = "movies"
>>> index_mapping = {
... "properties": {
... "fullplot": {"type": "text"},
... "plot": {"type": "text"},
... "title": {"type": "text"},
... }
... }
>>> # define index mapping
>>> if USE_ELASTICSEARCH_VECTORISATION:
... index_mapping["properties"]["embedding"] = {
... "properties": {
... "is_truncated": {"type": "boolean"},
... "model_id": {
... "type": "text",
... "fields": {"keyword": {"type": "keyword", "ignore_above": 256}},
... },
... "predicted_value": {
... "type": "dense_vector",
... "dims": EMBEDDING_DIMENSIONS,
... "index": True,
... "similarity": "cosine",
... },
... }
... }
>>> else:
... index_mapping["properties"]["embedding"] = {
... "type": "dense_vector",
... "dims": EMBEDDING_DIMENSIONS,
... "index": "true",
... "similarity": "cosine",
... }
>>> # flag to check if index has to be deleted before creating
>>> should_delete_index = True
>>> # check if we want to delete index before creating the index
>>> if should_delete_index:
... if client.indices.exists(index=index_name):
... print("Deleting existing %s" % index_name)
... client.indices.delete(index=index_name, ignore=[400, 404])
>>> print("Creating index %s" % index_name)
>>> # ingest pipeline definition
>>> if USE_ELASTICSEARCH_VECTORISATION:
... pipeline_id = "vectorize_fullplots"
... client.ingest.put_pipeline(
... id=pipeline_id,
... processors=[
... {
... "inference": {
... "model_id": model_id,
... "target_field": "embedding",
... "field_map": {"fullplot": "text_field"},
... }
... }
... ],
... )
... index_settings = {
... "index": {
... "default_pipeline": pipeline_id,
... }
... }
>>> else:
... index_settings = {}
>>> client.options(ignore_status=[400, 404]).indices.create(
... index=index_name, mappings=index_mapping, settings=index_settings
... )
Creating index movies
向 Elasticsearch 批次匯入資料是最佳方式。幸運的是,`helpers` 提供了一種簡單的方法來做到這一點。
>>> from elasticsearch.helpers import BulkIndexError
>>> def batch_to_bulk_actions(batch):
... for record in batch:
... action = {
... "_index": "movies",
... "_source": {
... "title": record["title"],
... "fullplot": record["fullplot"],
... "plot": record["plot"],
... },
... }
... if not USE_ELASTICSEARCH_VECTORISATION:
... action["_source"]["embedding"] = record["embedding"]
... yield action
>>> def bulk_index(ds):
... start = 0
... end = len(ds)
... batch_size = 100
... if USE_ELASTICSEARCH_VECTORISATION:
... # If using auto-embedding, bulk requests can take a lot longer,
... # so pass a longer request_timeout here (defaults to 10s), otherwise
... # we could get Connection timeouts
... batch_client = client.options(request_timeout=600)
... else:
... batch_client = client
... for batch_start in range(start, end, batch_size):
... batch_end = min(batch_start + batch_size, end)
... print(f"batch: start [{batch_start}], end [{batch_end}]")
... batch = ds.select(range(batch_start, batch_end))
... actions = batch_to_bulk_actions(batch)
... helpers.bulk(batch_client, actions)
>>> try:
... bulk_index(dataset["train"])
>>> except BulkIndexError as e:
... print(f"{e.errors}")
>>> print("Data ingestion into Elasticsearch complete!")
batch: start [0], end [100] batch: start [100], end [200] batch: start [200], end [300] batch: start [300], end [400] batch: start [400], end [500] batch: start [500], end [600] batch: start [600], end [700] batch: start [700], end [800] batch: start [800], end [900] batch: start [900], end [1000] batch: start [1000], end [1100] batch: start [1100], end [1200] batch: start [1200], end [1300] batch: start [1300], end [1400] batch: start [1400], end [1452] Data ingestion into Elasticsearch complete!
第 6 步:對使用者查詢執行向量搜尋
接下來的步驟實現一個返回向量搜尋結果的函式。
如果 `USE_ELASTICSEARCH_VECTORISATION` 為 true,文字查詢將直接傳送到 ES,ES 會使用上傳的模型先將其向量化,然後再進行向量搜尋。如果 `USE_ELASTICSEARCH_VECTORISATION` 為 false,那麼我們會在本地進行向量化,然後再發送帶有查詢向量化形式的查詢。
def vector_search(plot_query):
if USE_ELASTICSEARCH_VECTORISATION:
knn = {
"field": "embedding.predicted_value",
"k": 10,
"query_vector_builder": {
"text_embedding": {
"model_id": model_id,
"model_text": plot_query,
}
},
"num_candidates": 150,
}
else:
question_embedding = get_embedding(plot_query)
knn = {
"field": "embedding",
"query_vector": question_embedding,
"k": 10,
"num_candidates": 150,
}
response = client.search(index="movies", knn=knn, size=5)
results = []
for hit in response["hits"]["hits"]:
id = hit["_id"]
score = hit["_score"]
title = hit["_source"]["title"]
plot = hit["_source"]["plot"]
fullplot = hit["_source"]["fullplot"]
result = {
"id": id,
"_score": score,
"title": title,
"plot": plot,
"fullplot": fullplot,
}
results.append(result)
return results
def pretty_search(query):
get_knowledge = vector_search(query)
search_result = ""
for result in get_knowledge:
search_result += f"Title: {result.get('title', 'N/A')}, Plot: {result.get('fullplot', 'N/A')}\n"
return search_result
第 7 步:處理使用者查詢並載入 Gemma
>>> # Conduct query with retrival of sources, combining results into something that
>>> # we can feed to Gemma
>>> def combined_query(query):
... source_information = pretty_search(query)
... return f"Query: {query}\nContinue to answer the query by using these Search Results:\n{source_information}."
>>> query = "What is the best romantic movie to watch and why?"
>>> combined_results = combined_query(query)
>>> print(combined_results)
Query: What is the best romantic movie to watch and why? Continue to answer the query by using these Search Results: Title: Shut Up and Kiss Me!, Plot: Ryan and Pete are 27-year old best friends in Miami, born on the same day and each searching for the perfect woman. Ryan is a rookie stockbroker living with his psychic Mom. Pete is a slick surfer dude yet to find commitment. Each meets the women of their dreams on the same day. Ryan knocks heads in an elevator with the gorgeous Jessica, passing out before getting her number. Pete falls for the insatiable Tiara, but Tiara's uncle is mob boss Vincent Bublione, charged with her protection. This high-energy romantic comedy asks to what extent will you go for true love? Title: Titanic, Plot: The plot focuses on the romances of two couples upon the doomed ship's maiden voyage. Isabella Paradine (Catherine Zeta-Jones) is a wealthy woman mourning the loss of her aunt, who reignites a romance with former flame Wynn Park (Peter Gallagher). Meanwhile, a charming ne'er-do-well named Jamie Perse (Mike Doyle) steals a ticket for the ship, and falls for a sweet innocent Irish girl on board. But their romance is threatened by the villainous Simon Doonan (Tim Curry), who has discovered about the ticket and makes Jamie his unwilling accomplice, as well as having sinister plans for the girl. Title: Dark Blue World, Plot: March 15, 1939: Germany invades Czechoslovakia. Czech and Slovak pilots flee to England, joining the RAF. After the war, back home, they are put in labor camps, suspected of anti-Communist ideas. This film cuts between a post-war camp where Franta is a prisoner and England during the war, where Franta is like a big brother to Karel, a very young pilot. On maneuvers, Karel crash lands by the rural home of Susan, an English woman whose husband is MIA. She spends one night with Karel, and he thinks he's found the love of his life. It's complicated by Susan's attraction to Franta. How will the three handle innocence, Eros, friendship, and the heat of battle? When war ends, what then? Title: Dark Blue World, Plot: March 15, 1939: Germany invades Czechoslovakia. Czech and Slovak pilots flee to England, joining the RAF. After the war, back home, they are put in labor camps, suspected of anti-Communist ideas. This film cuts between a post-war camp where Franta is a prisoner and England during the war, where Franta is like a big brother to Karel, a very young pilot. On maneuvers, Karel crash lands by the rural home of Susan, an English woman whose husband is MIA. She spends one night with Karel, and he thinks he's found the love of his life. It's complicated by Susan's attraction to Franta. How will the three handle innocence, Eros, friendship, and the heat of battle? When war ends, what then? Title: No Good Deed, Plot: About a police detective, Jack, who, while doing a friend a favor and searching for a runaway teenager on Turk Street, stumbles upon a bizarre band of criminals about to pull off a bank robbery. Jack finds himself being held hostage while the criminals decide what to do with him, and the leader's beautiful girlfriend, Erin, is left alone to watch Jack. Erin, who we discover is a master manipulator of the men in the gang, reveals another side to Jack - a melancholy romantic who could have been a classical cellist. She finds Jack's captivity an irresistible turn-on and he can't figure out if she's for real, or manipulating him, too. Before the gang returns, Jack and Erin's connection intensifies and who ends up with the money is anyone's guess. .
載入我們的大語言模型 (這裡我們使用 google/gemma-2b-lt)
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
# CPU Enabled uncomment below 👇🏽
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
# GPU Enabled use below 👇🏽
# model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto")
定義一個方法,該方法從 ES 的向量化搜尋中獲取格式化的結果,然後將其提供給大語言模型以獲得我們的結果。
>>> def rag_query(query):
... combined_information = combined_query(query)
... # Moving tensors to GPU
... input_ids = tokenizer(combined_information, return_tensors="pt") # .to("cuda") # Add if using GPU
... response = model.generate(**input_ids, max_new_tokens=700)
... return tokenizer.decode(response[0], skip_special_tokens=True)
>>> print(rag_query("What's a romantic movie that I can watch with my wife?"))
Query: What's a romantic movie that I can watch with my wife? Continue to answer the query by using these Search Results: Title: King Solomon's Mines, Plot: Guide Allan Quatermain helps a young lady (Beth) find her lost husband somewhere in Africa. It's a spectacular adventure story with romance, because while they fight with wild animals and cannibals, they fall in love. Will they find the lost husband and finish the nice connection? Title: Shut Up and Kiss Me!, Plot: Ryan and Pete are 27-year old best friends in Miami, born on the same day and each searching for the perfect woman. Ryan is a rookie stockbroker living with his psychic Mom. Pete is a slick surfer dude yet to find commitment. Each meets the women of their dreams on the same day. Ryan knocks heads in an elevator with the gorgeous Jessica, passing out before getting her number. Pete falls for the insatiable Tiara, but Tiara's uncle is mob boss Vincent Bublione, charged with her protection. This high-energy romantic comedy asks to what extent will you go for true love? Title: Titanic, Plot: The plot focuses on the romances of two couples upon the doomed ship's maiden voyage. Isabella Paradine (Catherine Zeta-Jones) is a wealthy woman mourning the loss of her aunt, who reignites a romance with former flame Wynn Park (Peter Gallagher). Meanwhile, a charming ne'er-do-well named Jamie Perse (Mike Doyle) steals a ticket for the ship, and falls for a sweet innocent Irish girl on board. But their romance is threatened by the villainous Simon Doonan (Tim Curry), who has discovered about the ticket and makes Jamie his unwilling accomplice, as well as having sinister plans for the girl. Title: Fortress, Plot: A futuristic prison movie. Protagonist and wife are nabbed at a future US emigration point with an illegal baby during population control. The resulting prison experience is the subject of the movie. The prison is a futuristic one run by a private corporation bent on mind control in various ways. Title: Varalaaru, Plot: Relationships become entangled in an emotional web. . Which movie would you recommend for a romantic evening with your wife? From the provided titles, the movie that would be recommended for a romantic evening with your wife is **King Solomon's Mines**. It's a romantic adventure story with romance, and it's a great choice for a date night.
致謝
本 Notebook 改編自
- MongoDB 的 RAG 指南
- OpenAI 的 ES RAG 指南
- Elasticsearch-labs 的 從 Hugging Face 載入模型指南