Huggingface.js

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

@huggingface/gguf

一個 GGUF 解析器，可用於遠端託管檔案。

規範

規範: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md

參考實現 (Python): https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/gguf_reader.py

安裝

npm install @huggingface/gguf

用法

基本用法

import { GGMLQuantizationType, gguf } from "@huggingface/gguf";

// remote GGUF file from https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF
const URL_LLAMA = "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/191239b/llama-2-7b-chat.Q2_K.gguf";

const { metadata, tensorInfos } = await gguf(URL_LLAMA);

console.log(metadata);
// {
//     version: 2,
//     tensor_count: 291n,
//     kv_count: 19n,
//     "general.architecture": "llama",
//     "general.file_type": 10,
//     "general.name": "LLaMA v2",
//     ...
// }

console.log(tensorInfos);
// [
//     {
//         name: "token_embd.weight",
//         shape: [4096n, 32000n],
//         dtype: GGMLQuantizationType.Q2_K,
//     },

//     ... ,

//     {
//         name: "output_norm.weight",
//         shape: [4096n],
//         dtype: GGMLQuantizationType.F32,
//     }
// ]

讀取本地檔案

// Reading a local file. (Not supported on browser)
const { metadata, tensorInfos } = await gguf(
  './my_model.gguf',
  { allowLocalFile: true },
);

型別化元資料

您可以透過設定 typedMetadata: true 來獲取帶型別資訊的元資料。這將同時提供原始值及其 GGUF 資料型別

import { GGMLQuantizationType, GGUFValueType, gguf } from "@huggingface/gguf";

const URL_LLAMA = "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/191239b/llama-2-7b-chat.Q2_K.gguf";

const { metadata, typedMetadata } = await gguf(URL_LLAMA, { typedMetadata: true });

console.log(typedMetadata);
// {
//     version: { value: 2, type: GGUFValueType.UINT32 },
//     tensor_count: { value: 291n, type: GGUFValueType.UINT64 },
//     kv_count: { value: 19n, type: GGUFValueType.UINT64 },
//     "general.architecture": { value: "llama", type: GGUFValueType.STRING },
//     "general.file_type": { value: 10, type: GGUFValueType.UINT32 },
//     "general.name": { value: "LLaMA v2", type: GGUFValueType.STRING },
//     "llama.attention.head_count": { value: 32, type: GGUFValueType.UINT32 },
//     "llama.attention.layer_norm_rms_epsilon": { value: 9.999999974752427e-7, type: GGUFValueType.FLOAT32 },
//     "tokenizer.ggml.tokens": { value: ["<unk>", "<s>", "</s>", ...], type: GGUFValueType.ARRAY, subType: GGUFValueType.STRING },
//     "tokenizer.ggml.scores": { value: [0.0, -1000.0, -1000.0, ...], type: GGUFValueType.ARRAY, subType: GGUFValueType.FLOAT32 },
//     ...
// }

// Access both value and type information
console.log(typedMetadata["general.architecture"].value); // "llama"
console.log(typedMetadata["general.architecture"].type);  // GGUFValueType.STRING (8)

// For arrays, subType indicates the type of array elements
console.log(typedMetadata["tokenizer.ggml.tokens"].type);    // GGUFValueType.ARRAY (9)  
console.log(typedMetadata["tokenizer.ggml.tokens"].subType); // GGUFValueType.STRING (8)

嚴格型別化

預設情況下，metadata 中的已知欄位是型別化的。這包括在 llama.cpp、whisper.cpp 和 ggml 中找到的各種欄位。

const { metadata, tensorInfos } = await gguf(URL_MODEL);

// Type check for model architecture at runtime
if (metadata["general.architecture"] === "llama") {

  // "llama.attention.head_count" is a valid key for llama architecture, this is typed as a number
  console.log(model["llama.attention.head_count"]);

  // "mamba.ssm.conv_kernel" is an invalid key, because it requires model architecture to be mamba
  console.log(model["mamba.ssm.conv_kernel"]); // error
}

停用嚴格型別化

由於 GGUF 格式可用於儲存張量，因此我們可以在技術上將其用於其他用途。例如，儲存控制向量、lora 權重等。

如果您想使用自己的 GGUF 元資料結構，可以透過將解析輸出轉換為 GGUFParseOutput<{ strict: false }> 來停用嚴格型別化

const { metadata, tensorInfos }: GGUFParseOutput<{ strict: false }> = await gguf(URL_LLAMA);

命令列介面

此軟體包提供了與 gguf_dump.py 指令碼等效的 CLI。您可以使用此命令轉儲 GGUF 元資料和張量列表

npx @huggingface/gguf my_model.gguf

# or, with a remote GGUF file:
# npx @huggingface/gguf https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf

輸出示例

* Dumping 36 key/value pair(s)
  Idx | Count  | Value                                                                            
  ----|--------|----------------------------------------------------------------------------------
    1 |      1 | version = 3                                                                      
    2 |      1 | tensor_count = 292                                                               
    3 |      1 | kv_count = 33                                                                    
    4 |      1 | general.architecture = "llama"                                                   
    5 |      1 | general.type = "model"                                                           
    6 |      1 | general.name = "Meta Llama 3.1 8B Instruct"                                      
    7 |      1 | general.finetune = "Instruct"                                                    
    8 |      1 | general.basename = "Meta-Llama-3.1"                                                   

[truncated]

* Dumping 292 tensor(s)
  Idx | Num Elements | Shape                          | Data Type | Name                     
  ----|--------------|--------------------------------|-----------|--------------------------
    1 |           64 |     64,      1,      1,      1 | F32       | rope_freqs.weight        
    2 |    525336576 |   4096, 128256,      1,      1 | Q4_K      | token_embd.weight        
    3 |         4096 |   4096,      1,      1,      1 | F32       | blk.0.attn_norm.weight   
    4 |     58720256 |  14336,   4096,      1,      1 | Q6_K      | blk.0.ffn_down.weight

[truncated]

或者，您可以將此軟體包全域性安裝，這將提供 gguf-view 命令

npm i -g @huggingface/gguf
gguf-view my_model.gguf

Hugging Face Hub

Hub 支援所有檔案格式，併為 GGUF 格式提供內建功能。

欲瞭解更多資訊，請訪問: https://huggingface.co/docs/hub/gguf。

致謝與啟發

https://github.com/hyparam/hyllama 作者: @platypii (MIT 許可證)
https://github.com/ahoylabs/gguf.js 作者: @biw @dkogut1996 @spencekim (MIT 許可證)

🔥❤️

< > 在 GitHub 上更新

←在您的應用程式中使用 Space mini_header