Llama.cpp

特性	可用
工具	否
多模態	否

聊天介面直接支援 llama.cpp API 伺服器，無需介面卡。您可以使用 llamacpp 端點型別來實現這一點。

如果您想使用 llama.cpp 執行聊天介面，可以按照以下步驟操作，以 microsoft/Phi-3-mini-4k-instruct-gguf 為例

# install llama.cpp
brew install llama.cpp
# start llama.cpp server
llama-server --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -c 4096

注意：您可以將 hf-repo 和 hf-file 替換為您在 Hub 上喜歡的 GGUF 模型。例如：對於此倉庫，使用 --hf-repo TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF；對於此檔案，使用 --hf-file tinyllama-1.1b-chat-v1.0.Q4_0.gguf。

本地 LLaMA.cpp HTTP 伺服器將在 https://:8080 啟動（要更改埠或任何其他預設選項，請參閱 LLaMA.cpp HTTP 伺服器說明文件）。

將以下內容新增到您的 .env.local 檔案中

MODELS=`[
  {
    "name": "Local microsoft/Phi-3-mini-4k-instruct-gguf",
    "tokenizer": "microsoft/Phi-3-mini-4k-instruct-gguf",
    "preprompt": "",
    "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
    "parameters": {
      "stop": ["<|end|>", "<|endoftext|>", "<|assistant|>"],
      "temperature": 0.7,
      "max_new_tokens": 1024,
      "truncate": 3071
    },
    "endpoints": [{
      "type" : "llamacpp",
      "baseURL": "https://:8080"
    }],
  },
]`

< > 在 GitHub 上更新

聊天使用者介面

Llama.cpp