Transformers.js 文件

utils/data-structures

Transformers.js

您正在檢視的是需要從原始碼安裝。如果您想進行常規 npm 安裝，請檢視最新的穩定版本 (v3.0.0)。

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

utils/data-structures

自定義資料結構。

這些僅在內部使用，這意味著終端使用者無需訪問此處。

utils/data-structures
- 靜態
  - .PriorityQueue
    - new PriorityQueue(comparator)
    - .size
    - .isEmpty() ⇒ boolean
    - .peek() ⇒ any
    - .push(...values) ⇒ number
    - .extend(values) ⇒ number
    - .pop() ⇒ any
    - .replace(value) ⇒ *
    - ._siftUpFrom(node)
  - .CharTrie
  - .TokenLattice
    - new TokenLattice(sentence, bosTokenId, eosTokenId)
    - .insert(pos, length, score, tokenId)
    - .viterbi() ⇒ Array.<TokenLatticeNode>
    - .piece(node) ⇒ string
    - .tokens() ⇒ Array.<string>
    - .tokenIds() ⇒ Array.<number>
  - .DictionarySplitter
    - new DictionarySplitter(dictionary)
    - .split(text) ⇒ Array.<string>
  - .LRUCache
    - new LRUCache(capacity)
    - .get(key) ⇒ any
    - .put(key, value)
    - .clear()
- 內部
  - ~CharTrieNode
    - new CharTrieNode(isLeaf, children)
    - .default() ⇒ CharTrieNode
  - ~TokenLatticeNode
    - new TokenLatticeNode(tokenId, nodeId, pos, length, score)
    - .clone() ⇒ TokenLatticeNode

utils/data-structures.PriorityQueue

基於堆的優先順序佇列的有效實現。它使用基於陣列的二叉堆，其中根位於索引 0，節點 i 的子節點分別位於索引 2i + 1 和 2i + 2。

改編自以下來源

https://stackoverflow.com/a/42919752/13989043（原文）
https://github.com/belladoreai/llama-tokenizer-js（小改進）

型別：utils/data-structures 的靜態類

.PriorityQueue
- new PriorityQueue(comparator)
- .size
- .isEmpty() ⇒ boolean
- .peek() ⇒ any
- .push(...values) ⇒ number
- .extend(values) ⇒ number
- .pop() ⇒ any
- .replace(value) ⇒ *
- ._siftUpFrom(node)

new PriorityQueue(comparator)

建立一個新的 PriorityQueue。

引數量	型別	描述
comparator	`函式`	用於確定優先順序的比較器函式。預設為 MaxHeap。

priorityQueue.size

佇列的大小

型別：PriorityQueue 的例項屬性

priorityQueue.isEmpty() ⇒ <code> boolean </code>

檢查佇列是否為空。

型別：PriorityQueue 的例項方法
返回：boolean - 如果佇列為空，則為true，否則為false。

priorityQueue.peek() ⇒ <code> any </code>

返回佇列中優先順序最高的元素。

型別：PriorityQueue 的例項方法
返回：any - 佇列中優先順序最高的元素。

priorityQueue.push(...values) ⇒ <code> number </code>

向佇列新增一個或多個元素。

型別：PriorityQueue 的例項方法
返回：number - 佇列的新大小。

引數量	型別	描述
...values	`任何`	要推送到佇列中的值。

priorityQueue.extend(values) ⇒ <code> number </code>

向佇列新增多個元素。

型別：PriorityQueue 的例項方法
返回：number - 佇列的新大小。

引數量	型別	描述
值	`Array.<any>`	要推送到佇列中的值。

priorityQueue.pop() ⇒ <code> any </code>

移除並返回佇列中優先順序最高的元素。

型別：PriorityQueue 的例項方法
返回：any - 佇列中優先順序最高的元素。

priorityQueue.replace(value) ⇒ <code> * </code>

將佇列中優先順序最高的元素替換為新值。

型別：PriorityQueue 的例項方法
返回：* - 被替換的值。

引數量	型別	描述
值	`*`	新值。

priorityQueue._siftUpFrom(node)

從給定節點向上篩選的輔助函式。

型別：PriorityQueue 的例項方法

引數量	型別	描述
node	`數字`	開始向上篩選的節點索引。

utils/data-structures.CharTrie

一種用於高效儲存和搜尋字串的 trie 結構。

型別：utils/data-structures 的靜態類

.CharTrie

charTrie.extend(texts)

將一個或多個 texts 新增到 trie 中。

型別：CharTrie 的例項方法

引數量	型別	描述
texts	`Array.<string>`	要新增到 trie 中的字串。

charTrie.push(text)

將文字新增到 trie 中。

型別：CharTrie 的例項方法

引數量	型別	描述
text	`字串`	要新增到 trie 中的字串。

charTrie.commonPrefixSearch(text)

在 trie 中搜索所有以 text 為公共字首的字串。

型別：CharTrie 的例項方法

引數量	型別	描述
text	`字串`	要搜尋的公共字首。

utils/data-structures.TokenLattice

用於分詞的格結構資料。

型別：utils/data-structures 的靜態類

.TokenLattice
- new TokenLattice(sentence, bosTokenId, eosTokenId)
- .insert(pos, length, score, tokenId)
- .viterbi() ⇒ Array.<TokenLatticeNode>
- .piece(node) ⇒ string
- .tokens() ⇒ Array.<string>
- .tokenIds() ⇒ Array.<number>

new TokenLattice(sentence, bosTokenId, eosTokenId)

建立一個新的 TokenLattice 例項。

引數量	型別	描述
sentence	`字串`	要進行分詞的輸入語句。
bosTokenId	`數字`	序列開始標記 ID。
eosTokenId	`數字`	序列結束標記 ID。

tokenLattice.insert(pos, length, score, tokenId)

將新的 token 節點插入 token lattice。

型別：TokenLattice 的例項方法

引數量	型別	描述
pos	`數字`	token 的起始位置。
length	`數字`	token 的長度。
score	`數字`	token 的分數。
tokenId	`數字`	token 的 token ID。

tokenLattice.viterbi() ⇒ <code> Array. < TokenLatticeNode > </code>

實現維特比演算法以計算最可能的 token 序列。

型別：TokenLattice 的例項方法
返回：Array.<TokenLatticeNode> - 最可能的 token 序列。

tokenLattice.piece(node) ⇒ <code> string </code>

型別：TokenLattice 的例項方法
返回：string - 表示最可能 token 序列的節點陣列。

引數量	型別
node	`TokenLatticeNode`

tokenLattice.tokens() ⇒ <code> Array. < string > </code>

型別：TokenLattice 的例項方法
返回：Array.<string> - 最可能的 token 序列。

tokenLattice.tokenIds() ⇒ <code> Array. < number > </code>

型別：TokenLattice 的例項方法
返回：Array.<number> - 最可能的 token ID 序列。

utils/data-structures.DictionarySplitter

一種資料結構，它使用 trie 根據字典將字串拆分為標記。它還可以使用正則表示式在拆分前預處理輸入文字。

注意：為確保多位元組字元得到正確處理，我們在位元組級別而不是字元級別操作。

型別：utils/data-structures 的靜態類

.DictionarySplitter
- new DictionarySplitter(dictionary)
- .split(text) ⇒ Array.<string>

new DictionarySplitter(dictionary)

引數量	型別	描述
dictionary	`Array.<string>`	用於拆分的單詞字典。

dictionarySplitter.split(text) ⇒ <code> Array. < string > </code>

根據字典將輸入文字拆分為 token。

型別：DictionarySplitter 的例項方法
返回：Array.<string> - 一個 token 陣列。

引數量	型別	描述
text	`字串`	要拆分的輸入文字。

utils/data-structures.LRUCache

JavaScript 中最近最少使用 (LRU) 快取的簡單實現。此快取儲存鍵值對，並在超出容量時逐出最近最少使用的專案。

型別：utils/data-structures 的靜態類

.LRUCache
- new LRUCache(capacity)
- .get(key) ⇒ any
- .put(key, value)
- .clear()

new LRUCache(capacity)

建立 LRUCache 例項。

引數量	型別	描述
capacity	`數字`	快取可以容納的最大專案數。

lruCache.get(key) ⇒ <code> any </code>

檢索與給定鍵關聯的值，並將該鍵標記為最近使用。

型別：LRUCache 的例項方法
返回：any - 與鍵關聯的值，如果鍵不存在則為 undefined。

引數量	型別	描述
key	`任何`	要檢索的鍵。

lruCache.put(key, value)

在快取中插入或更新鍵值對。如果鍵已經存在，則更新並標記為最近使用。如果快取超出其容量，則逐出最近最少使用的專案。

型別：LRUCache 的例項方法

引數量	型別	描述
key	`任何`	要新增或更新的鍵。
值	`任何`	要與鍵關聯的值。

lruCache.clear()

清除快取。

型別：LRUCache 的例項方法

utils/data-structures~CharTrieNode

表示字元 trie 中的一個節點。

型別：utils/data-structures 的內部類

~CharTrieNode
- new CharTrieNode(isLeaf, children)
- .default() ⇒ CharTrieNode

new CharTrieNode(isLeaf, children)

建立一個新的 CharTrieNode。

引數量	型別	描述
isLeaf	`boolean`	該節點是否為葉節點。
children	`Map.<string, CharTrieNode>`	包含節點子節點的對映，其中鍵是字元，值是 `CharTrieNode`。

CharTrieNode.default() ⇒ <code> CharTrieNode </code>

返回一個具有預設值的新 CharTrieNode 例項。

型別：CharTrieNode 的靜態方法
返回：CharTrieNode - 一個新的 CharTrieNode 例項，其中 isLeaf 設定為 false 且 children 對映為空。

utils/data-structures~TokenLatticeNode

型別：utils/data-structures 的內部類

~TokenLatticeNode
- new TokenLatticeNode(tokenId, nodeId, pos, length, score)
- .clone() ⇒ TokenLatticeNode

new TokenLatticeNode(tokenId, nodeId, pos, length, score)

表示給定語句的 token lattice 中的一個節點。

引數量	型別	描述
tokenId	`數字`	與此節點關聯的 token ID。
nodeId	`數字`	此節點的 ID。
pos	`數字`	token 在語句中的起始位置。
length	`數字`	token 的長度。
score	`數字`	與 token 關聯的分數。

tokenLatticeNode.clone() ⇒ <code> TokenLatticeNode </code>

返回此節點的克隆。

型別：TokenLatticeNode 的例項方法
返回：TokenLatticeNode - 此節點的克隆。

< > 在 GitHub 上更新

←數學