X-LoRA

混合 LoRA 專家（X-LoRA）是一種 PEFT 方法，它基於高粒度（詞元、層、序列）的縮放矩陣，實現了 LoRA 專家的稀疏或密集混合。該方法利用凍結的 LoRA 介面卡和凍結的基礎模型，從而大大減少了需要微調的引數數量。

X-LoRA 的一個獨特之處在於其通用性：它可以應用於任何帶有 LoRA 介面卡的 transformers 基礎模型。這意味著，儘管採用了混合專家策略，但無需對模型程式碼進行任何更改。

下圖展示了對於不同的提示，每個詞元的縮放值如何變化。這突顯了隨著生成過程的推進和序列建立新上下文時，不同介面卡被啟用的情況。

Token-by-token scalings

論文摘要如下：

我們提出了一種混合專家策略，使用基於低秩適應（LoRA）的深度逐層詞元級方法來建立微調後的大型語言模型。從一組預訓練的 LoRA 介面卡開始，我們的門控策略利用隱藏狀態動態混合已適應的層，使得最終的 X-LoRA 模型能夠利用不同的能力，並建立前所未有的深度逐層組合來解決任務。其設計靈感來源於普遍性和多樣性的生物學原理，即神經網路的構建模組在不同的層級表現形式中被重複使用。因此，X-LoRA 模型可以輕鬆地應用於任何現有的大型語言模型（LLM），而無需修改其底層結構。我們開發了一個定製的 X-LoRA 模型，該模型提供了科學能力，包括正向/逆向分析任務和增強的推理能力，專注於生物材料分析、蛋白質力學和設計。這項工作的影響包括可以獲得易於擴充套件和適應、具有強大領域知識並能整合跨知識領域知識的模型。我們整合了生物學、數學、推理、仿生材料、力學與材料、化學、蛋白質生物物理學、力學以及基於量子力學的分子特性等領域的專家，進行了一系列以物理學為重點的案例研究。我們檢驗了知識召回、蛋白質力學正向/逆向任務、蛋白質設計、包括本體知識圖譜構建在內的對抗性智慧體建模以及分子設計。該模型不僅能夠對蛋白質的奈米力學效能或量子力學分子特性進行定量預測，還能對結果進行推理，並正確預測解釋不同分子行為的可能機制。.

引用 X-LoRA 時，請使用

@article{10.1063/5.0203126,
    author = {Buehler, Eric L. and Buehler, Markus J.},
    title = "{X-LoRA: Mixture of low-rank adapter experts, a flexible framework for large language models with applications in protein mechanics and molecular design}",
    journal = {APL Machine Learning},
    volume = {2},
    number = {2},
    pages = {026119},
    year = {2024},
    month = {05},
    abstract = "{We report a mixture of expert strategy to create fine-tuned large language models using a deep layer-wise token-level approach based on low-rank adaptation (LoRA). Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks. The design is inspired by the biological principles of universality and diversity, where neural network building blocks are reused in different hierarchical manifestations. Hence, the X-LoRA model can be easily implemented for any existing large language model without a need for modifications of the underlying structure. We develop a tailored X-LoRA model that offers scientific capabilities, including forward/inverse analysis tasks and enhanced reasoning capability, focused on biomaterial analysis, protein mechanics, and design. The impact of this work includes access to readily expandable and adaptable models with strong domain knowledge and the capability to integrate across areas of knowledge. Featuring experts in biology, mathematics, reasoning, bio-inspired materials, mechanics and materials, chemistry, protein biophysics, mechanics, and quantum-mechanics based molecular properties, we conduct a series of physics-focused case studies. We examine knowledge recall, protein mechanics forward/inverse tasks, protein design, adversarial agentic modeling including ontological knowledge graph construction, and molecular design. The model is capable not only of making quantitative predictions of nanomechanical properties of proteins or quantum mechanical molecular properties but also reasoning over the results and correctly predicting likely mechanisms that explain distinct molecular behaviors.}",
    issn = {2770-9019},
    doi = {10.1063/5.0203126},
    url = {https://doi.org/10.1063/5.0203126},
    eprint = {https://pubs.aip.org/aip/aml/article-pdf/doi/10.1063/5.0203126/19964043/026119\_1\_5.0203126.pdf},
}

XLoraConfig

class peft.XLoraConfig

< 原始碼 >

( task_type: typing.Union[str, peft.utils.peft_types.TaskType, NoneType] = None peft_type: typing.Union[str, peft.utils.peft_types.PeftType, NoneType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: typing.Optional[str] = None revision: typing.Optional[str] = None inference_mode: bool = False hidden_size: int = None adapters: dict[str, str] = None enable_softmax: bool = True enable_softmax_topk: bool = False layerwise_scalings: bool = False xlora_depth: int = 1 xlora_size: int = 2048 xlora_dropout_p: float = 0.2 use_trainable_adapters: bool = False softmax_temperature: float = 1.0 top_k_lora: Optional[int] = None scaling_pass_value: float = 0.0 global_scaling_weight: float = 1.0 )

引數

hidden_size (int) — 基礎模型的隱藏層大小。
adapters (dict) — 介面卡名稱到 LoRA 介面卡 ID 的對映，遵循 PeftModel.load_adapter 的規範。*它們將被自動載入*，用作 LoRA 專家。當使用 from_pretrained 時，將新的 adapters 字典作為關鍵字引數傳入。
enable_softmax (bool, *可選*, 預設為 True) — 為 X-LoRA 分類器啟用 softmax 應用。
enable_softmax_topk (bool, *可選*, 預設為 False) — 為 top-k LoRA 介面卡啟用 softmax 應用。與 `enable_softmax` 互斥，並且只有在設定了 `top_k_lora` 時才能設定。
softmax_temperature (float, *可選*, 預設為 1.0) — Softmax 溫度，值越低，預測結果越尖銳。
layerwise_scalings (bool, *可選*, 預設為 False) — 如果為 True，則為每個 LoRA 介面卡（每一層）生成縮放值。如果為 False，則縮放值將廣播到每一層，保持相同。
top_k_lora (int, *可選*, 預設為 None) — 稀疏地選擇 top_k 個 LoRA 專家，而不是預設的密集方法。
xlora_depth (int, *可選*, 預設為 1) — X-LoRA 分類器的深度。
xlora_size (int, *可選*, 預設為 2048) — X-LoRA 分類器的隱藏層大小，如果 `xlora_depth=1` 則此引數無關緊要。
xlora_dropout_p (float, *可選*, 預設為 0.2) — X-LoRA 分類器的 dropout 機率，如果 `xlora_depth=1` 則此引數無關緊要。
use_trainable_adapters (bool, *可選*, 預設為 False) — 使介面卡可訓練。
scaling_pass_value (float, *可選*, 預設為 0) — 縮放傳遞值。
global_scaling_weight (float, *可選*, 預設為 1) — 用於乘以每個 LoRA 介面卡輸出的權重。

這是用於儲存 XLoraModel 配置的配置類。當重新載入配置時，會忽略 `adapters` 欄位的路徑，而使用已儲存的介面卡。因此，在載入過程中只有鍵是重要的。

XLoraModel

class peft.XLoraModel

< 原始碼 >

( model: nn.Module config: Union[dict[str, XLoraConfig], XLoraConfig] adapter_name: str torch_device: Optional[str] = None ephemeral_gpu_offload: bool = False autocast_adapter_dtype: bool = True **kwargs ) → torch.nn.Module

引數

model (torch.nn.Module) — 需要被適配的模型。
config (XLoraConfig) — Lora 模型的配置。
adapter_name (str) — 介面卡的名稱，不影響 LoRA 介面卡名稱。

torch.nn.Module

X-LoRA 模型。

從一個預訓練的 transformers 模型建立一個 X-LoRA（混合 LoRA 專家）模型。目前，這個 X-LoRA 實現只適用於具有 transformer 架構的模型。

該方法的詳細描述見 https://huggingface.co/papers/2402.07148。

示例

>>> from transformers import AutoModelForCausalLM, AutoConfig, BitsAndBytesConfig
>>> from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training

>>> model_config = AutoConfig.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
>>> config = XLoraConfig(
...     task_type="CAUSAL_LM",
...     hidden_size=model_config.hidden_size,
...     xlora_depth=4,
...     adapters={
...         "adapter_1": "./path/to/the/checkpoint/",
...         "adapter_2": "./path/to/the/checkpoint/",
...         "adapter_n": "./path/to/the/checkpoint/",
...     },
... )
>>> int8_config = BitsAndBytesConfig(load_in_8bit=True)
>>> model = AutoModelForCausalLM.from_pretrained(
...     "mistralai/Mistral-7B-Instruct-v0.1",
...     trust_remote_code=True,
...     attn_implementation="flash_attention_2",
...     device_map="cuda:0",
...     torch_dtype=torch.bfloat16,
...     quantization_config=int8_config,
... )
>>> model = prepare_model_for_kbit_training(4)
>>> xlora_model = get_peft_model(model, config)

< > 在 GitHub 上更新