開源 AI 食譜文件

HuatuoGPT-o1 醫療 RAG 與推理

Hugging Face's logo
加入 Hugging Face 社群

並獲得增強的文件體驗

開始使用

Open In Colab

HuatuoGPT-o1 醫療 RAG 與推理

作者:Alan Ponnachan

本筆記本演示了一個端到端的示例,展示瞭如何使用 HuatuoGPT-o1 進行醫療問答,結合了檢索增強生成(RAG)和推理技術。我們將利用 HuatuoGPT-o1 模型,這是一個專為高階醫療推理設計的大型語言模型(LLM),為醫療查詢提供詳細且結構良好的答案。

簡介

HuatuoGPT-o1 是一款醫療 LLM,擅長識別錯誤、探索替代策略並最佳化其答案。它利用可驗證的醫療問題和專門的醫療驗證器來增強其推理能力。本筆記本將展示如何在 RAG 環境中使用 HuatuoGPT-o1,即我們首先從醫療知識庫中檢索相關資訊,然後使用該模型生成一個經過推理的響應。

筆記本設定

重要提示: 在執行程式碼之前,請確保您正在使用 GPU 執行時以獲得更快的效能。請前往 “執行時” -> “更改執行時型別”,然後在“硬體加速器”下選擇 “GPU”

讓我們從安裝必要的庫開始。

>>> !pip install transformers datasets sentence-transformers scikit-learn --upgrade -q
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.4/44.4 kB 3.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.7/9.7 MB 102.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 480.6/480.6 kB 37.5 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.5/13.5 MB 96.9 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 10.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 179.3/179.3 kB 17.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 143.5/143.5 kB 13.9 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.8/194.8 kB 17.5 MB/s eta 0:00:00
[?25hERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.


載入資料集

我們將使用 Hugging Face Datasets 庫中的 “ChatDoctor-HealthCareMagic-100k” 資料集。該資料集包含 10 萬次真實的醫患互動,為我們的 RAG 系統提供了豐富的知識庫。

from datasets import load_dataset

dataset = load_dataset("lavita/ChatDoctor-HealthCareMagic-100k")

步驟 3:初始化模型

我們需要初始化兩個模型:

  1. HuatuoGPT-o1:用於生成響應的醫療 LLM。
  2. Sentence Transformer:一個用於建立文字向量表示的嵌入模型,我們將用它來進行檢索。
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from sentence_transformers import SentenceTransformer

# Initialize HuatuoGPT-o1
model_name = "FreedomIntelligence/HuatuoGPT-o1-7B"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Initialize Sentence Transformer
embed_model = SentenceTransformer("all-MiniLM-L6-v2")

準備知識庫

我們將透過為資料集中組合的問答對生成嵌入向量來建立一個知識庫。

>>> import pandas as pd
>>> import numpy as np

>>> # Convert dataset to DataFrame
>>> df = pd.DataFrame(dataset["train"])

>>> # Combine question and answer for context
>>> df["combined"] = df["input"] + " " + df["output"]

>>> # Generate embeddings
>>> print("Generating embeddings for the knowledge base...")
>>> embeddings = embed_model.encode(df["combined"].tolist(), show_progress_bar=True, batch_size=128)
>>> print("Embeddings generated!")
Generating embeddings for the knowledge base...

實現檢索

該函式使用餘弦相似度檢索與給定查詢最相關的 k 個上下文。

from sklearn.metrics.pairwise import cosine_similarity


def retrieve_relevant_contexts(query: str, k: int = 3) -> list:
    """
    Retrieves the k most relevant contexts to a given query.

    Args:
        query (str): The user's medical query.
        k (int): The number of relevant contexts to retrieve.

    Returns:
        list: A list of dictionaries, each containing a relevant context.
    """
    # Generate query embedding
    query_embedding = embed_model.encode([query])[0]

    # Calculate similarities
    similarities = cosine_similarity([query_embedding], embeddings)[0]

    # Get top k similar contexts
    top_k_indices = np.argsort(similarities)[-k:][::-1]

    contexts = []
    for idx in top_k_indices:
        contexts.append(
            {
                "question": df.iloc[idx]["input"],
                "answer": df.iloc[idx]["output"],
                "similarity": similarities[idx],
            }
        )

    return contexts

實現響應生成

該函式使用檢索到的上下文生成詳細的響應。

def generate_structured_response(query: str, contexts: list) -> str:
    """
    Generates a detailed response using the retrieved contexts.

    Args:
        query (str): The user's medical query.
        contexts (list): A list of relevant contexts.

    Returns:
        str: The generated response.
    """
    # Prepare prompt with retrieved contexts
    context_prompt = "\n".join(
        [
            f"Reference {i+1}:" f"\nQuestion: {ctx['question']}" f"\nAnswer: {ctx['answer']}"
            for i, ctx in enumerate(contexts)
        ]
    )

    prompt = f"""Based on the following references and your medical knowledge, provide a detailed response:

References:
{context_prompt}

Question: {query}

By considering:
1. The key medical concepts in the question.
2. How the reference cases relate to this question.
3. What medical principles should be applied.
4. Any potential complications or considerations.

Give the final response:
"""

    # Generate response
    messages = [{"role": "user", "content": prompt}]
    inputs = tokenizer(
        tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True),
        return_tensors="pt",
    ).to(model.device)

    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,
        temperature=0.7,
        num_beams=1,
        do_sample=True,
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract the final response portion
    final_response = response.split("Give the final response:\n")[-1]

    return final_response

整合所有部分

讓我們定義一個函式來端到端地處理查詢,然後用一個示例來使用它。

>>> def process_query(query: str, k: int = 3) -> tuple:
...     """
...     Processes a medical query end-to-end.

...     Args:
...         query (str): The user's medical query.
...         k (int): The number of relevant contexts to retrieve.

...     Returns:
...         tuple: The generated response and the retrieved contexts.
...     """
...     contexts = retrieve_relevant_contexts(query, k)
...     response = generate_structured_response(query, contexts)
...     return response, contexts


>>> # Example query
>>> query = "I've been experiencing persistent headaches and dizziness for the past week. What could be the cause?"

>>> # Process query
>>> response, contexts = process_query(query)

>>> # Print results
>>> print("\nQuery:", query)
>>> print("\nRelevant Contexts:")
>>> for i, ctx in enumerate(contexts, 1):
...     print(f"\nReference {i} (Similarity: {ctx['similarity']:.3f}):")
...     print(f"Q: {ctx['question']}")
...     print(f"A: {ctx['answer']}")

>>> print("\nGenerated Response:")
>>> print(response)
Query: I've been experiencing persistent headaches and dizziness for the past week. What could be the cause?

Relevant Contexts:

Reference 1 (Similarity: 0.687):
Q: Dizziness, sometimes severe, nausea, sometimes severe. Very close to throwing up at times, but not actually doing it. Headache. No pain anywhere, and it comes and goes a couple times in a day. I v had this about a week. I am well hydrated. I v been diagnosed with vertigo years ago, but it went away years ago, and this is nothing like that was. I feel okay between episodes, but tired. I have been laying down and sleeping when it happens, and seem ok when I get back up. It s been hit and miss, meaning not everyday. I haven t changed my diet or products
A: Hello! Thank you for asking on Chat Doctor! I carefully read your question and would explain that your symptoms could be related to an inner ear disorder or an inflammatory disorder, causing the headache. Coming to this point, I would recommend consulting with an ENT specialist for a careful physical exam and labyrinthine tests to exclude possible inner ear disorder. Further, tests to be done are

Reference 2 (Similarity: 0.673):
Q: I have been having dizzy spells , bad headache I collapsed on the train the other day and went to hospital but hey couldnt find anything in my blood or brain scan the headache has been coming and going for about one month but te dizziness only started three days ago
A: Hello! Welcome and thank you for asking on Chat Doctor ! Your symptoms could be related to low blood pressure or orthostatic hypotension. An inner ear disorder can not be excluded too, considering the dizzy spells. For this reason, I would recommend first consulting with an ENT specialist for a physical check up and labyrinthine tests. Other tests to consider would be a Head Up Tilt test for orthostatic hypotension, especially if your blood pressure values Chat Doctor.  Hope you will find this answer helpful! Best wishes,

Reference 3 (Similarity: 0.672):
Q: over the past two weeks or so I have had an experience of what I believe is vertigo. The first time I was mowing my lawn on a riding tractor and made a turn in the yard and felt like I was swaying back and forth. It lasted just a few minutes and thankfully I had a good grip on the stearing wheel. The second time was today, I was sitting at my desk at work and all of a sudden it seemed as though my desk was wobbiling back and forth. It wasn t the desk it was me. The first time it happened I do not recall having a headache but today I have had just a slight headache most of the day. Any suggestions?
A: Hi, There can be many causes of vertigo. One of the most common causes is diseases associated with ear like labyrinthine (infection of the ear), vestibular neuritis (inflammation of the nerves) or BPPV (benign positional vertigo). It can also be related to diseases of brain (infection or swelling) or heart disorders (arrhythmia-rhythm disturbances) or cervical spondylosis (neck posture related issues). Besides this, there are simpler causes like anemia (low hemoglobin), hypoglycemia (low sugar), prolonged fasting, excessive heat, stress, anxiety or lack of proper sleep. Hence, I feel, first, focus on lifestyle modifications. Have a good balanced diet with lots of fruits and vegetables and less of tea and coffee. Maintain proper posture while working and sleeping, take good sleep for 7-8 hours, do some meditation or go out for a walk. If still the symptoms persist then do go for some investigations like-complete blood count, sugar levels, electrolytes, ECG, X-ray cervical spine and MRI brain. This will help us to make a proper diagnosis. Take care. Hope I have answered your question. Let me know if I can assist you further.

Generated Response:

assistant
## Thinking

Alright, let's think about this. So, we're dealing with someone who's been having these bouts of dizziness and headaches for about a week now. That sounds pretty uncomfortable. Dizziness and headaches can come from a bunch of different things, right? Like, maybe it's something to do with the inner ear, or maybe it's a bit more systemic, like a problem with blood pressure or even something neurological.

Okay, let's break it down. Inner ear problems, like vertigo, are pretty common culprits here. They can definitely cause dizziness and sometimes headaches, although they usually don't last forever. But since this person says their symptoms are hitting and missing, it might not be exactly the same as their old vertigo.

Now, let's consider the possibility of something like orthostatic hypotension. That's where your blood pressure drops when you stand up, and it can make you dizzy. But the thing is, if this were orthostatic hypotension, we'd expect the dizziness to happen every time they stand up, which isn't quite the case here. Plus, the headaches are a bit of a wildcard.

Hmm, what else could it be? Maybe anemia or hypoglycemia. Those can cause dizziness and headaches too. But again, without any major changes in diet or lifestyle, it's hard to say if that's really it.

Let's see, what else should we think about? Oh, right, the person mentions they've had their blood checked and a brain scan, but nothing showed up. That rules out a lot of serious stuff like infections or brain issues, which is good news. But it also means we have to keep looking at other possibilities.

Given all this, it seems like the best course of action is to consult an ENT specialist. They can do some tests specific to inner ear disorders, which might shed some light on what's going on. And if those tests don't reveal anything, maybe we should look into things like lifestyle changes, especially around diet and hydration.

So, in summary, it looks like we need to keep an eye on things. The dizziness and headaches could be due to an inner ear issue or something systemic. Consulting a specialist and making some lifestyle adjustments might help figure out what's causing these symptoms.

## Final Response

The symptoms of dizziness, headaches, and occasional nausea you are experiencing could be related to several underlying conditions. Based on the information provided, it appears that an inner ear disorder, such as benign paroxysmal positional vertigo (BPPV) or vestibular neuritis, is a plausible explanation. These conditions can cause episodes of dizziness and sometimes headaches, although they typically resolve on their own or improve with treatment.

Another consideration is orthostatic hypotension, which involves a drop in blood pressure upon standing, potentially causing dizziness. However, given that your symptoms do not consistently occur with changes in position, this is less likely.

Systemic factors, such as anemia or hypoglycemia, could also contribute to dizziness and headaches. Since these conditions can be influenced by dietary and lifestyle factors, maintaining a balanced diet, staying hydrated, and ensuring adequate rest may help alleviate symptoms.

To better understand the nature of your symptoms, it would be advisable to consult with an ENT specialist for a thorough examination and possibly labyrinthine tests to assess any inner ear issues. Additionally, considering a Head-Up Tilt test for orthostatic hypotension and evaluating other systemic factors through appropriate blood tests and scans could provide further insights. 

In summary, while the exact cause remains unclear, exploring options like an ENT consultation and adjusting lifestyle factors may aid in managing your symptoms.

結論

本筆記本展示了 HuatuoGPT-o1 在醫療問答領域的實際應用,結合了 RAG 和推理技術。透過將從相關知識庫中檢索資訊與 HuatuoGPT-o1 的高階推理能力相結合,我們可以構建一個能夠為複雜醫療查詢提供詳細且結構良好答案的系統。

您可以透過以下方式進一步增強該系統:

  • 嘗試不同的 k 值(檢索的上下文數量)。
  • 在特定的醫療領域對 HuatuoGPT-o1 進行微調。
  • 使用醫療基準評估系統的效能。
  • 新增使用者介面以便於互動。
  • 透過處理邊緣情況來改進現有程式碼。

歡迎您隨意調整和擴充套件此示例,以建立更強大、更有用的醫療 AI 應用!

< > 在 GitHub 上更新

© . This site is unofficial and not affiliated with Hugging Face, Inc.