模組化Transformers

模組化Transformers透過允許匯入和繼承，降低了貢獻模型的門檻，並顯著減少了新增模型所需的程式碼量。

Transformers的核心設計特性之一是單模型、單檔案策略。模型元件（如注意力層）在許多檔案中重複，並且任何獨立的實現都可能隨著對程式碼特定部分的修復和更改而出現分歧。

# Copied from 語句可以防止程式碼出現分歧，並且它透過我們的持續整合測試和本地命令強制執行。缺點是這種方法很繁瑣，並顯著增加了程式碼行數，其中大部分是樣板程式碼。

動機

模組化Transformers透過在模型資料夾中新增一個*模組化*檔案來解決這些問題。模組化檔案可以從其他模型匯入程式碼並從其他類繼承程式碼，這與傳統的模型和處理檔案不同。

模組化Transformers並非旨在取代建模程式碼，如果你的模型不是基於現有模型，你需要手動新增一個`modeling.py`檔案。同樣，如果配置、標記化或處理檔案不能輕易地從類似檔案中繼承，你可以直接新增該檔案。

模組化檔案包含模型、處理器和配置類程式碼，這些程式碼在“單一模型，單一檔案”策略下本應在單獨的檔案中。

模型使用者仍然可以匯入和使用他們已經熟悉的單檔案介面。透過這樣做，我們希望在堅持我們理念的同時，實現更簡單的貢獻。

建立一個 `modeling.py` 檔案

一個Linter將模組化檔案“展開”為`modeling.py`檔案，以保留單一模型、單一檔案目錄結構（建模、處理器等）。繼承被扁平化為僅**單一**級別。

執行以下命令以自動從模組化檔案生成`modeling.py`檔案。

python utils/modular_model_converter.py --files-to-parse src/transformers/models/<your_model>/modular_<your_model>.py

例如：

如果一個配置類繼承自另一個類，但又新增和刪除了一個引數，那麼如果添加了引數，生成的檔案會直接引用它；如果刪除了引數，則會完全刪除它。
如果一個類繼承自另一個類，例如`GemmaModel(LlamaModel)`，則會自動推斷依賴關係。所有子模組也會自動從超類中推斷出來。
如果在模組化檔案中定義了新函式並在類中使用，Linter也會自動推斷這些函式。

您應該能夠在一個模組中編寫所有內容（分詞器、影像處理器、模型、配置等），然後生成對應的單一檔案。

執行下面的命令，確保生成的內容與`modular_<your_model>.py`匹配。

python utils/check_modular_conversion.py --files src/transformers/models/<your_model>/modular_<your_model>.py

下面的例子演示瞭如何使用模組化Transformers，以顯著減少程式碼行數的方式新增模型。

BERT 和 RoBERTa

BERT 和 RoBERTa 是兩個非常相似的模型，它們唯一的區別在於嵌入層的實現方式。

與其完全重新定義模型，不如考慮下面所示的`modular_roberta.py`檔案，它包含了建模和配置類（本例中未顯示分詞器）。

from torch import nn
from ..bert.configuration_bert import BertConfig
from ..bert.modeling_bert import (
    BertModel,
    BertEmbeddings,
    BertForMaskedLM
)

# RoBERTa and BERT config is identical
class RobertaConfig(BertConfig):
  model_type = 'roberta'

# Redefine the embeddings to highlight the padding id difference, and redefine the position embeddings
class RobertaEmbeddings(BertEmbeddings):
    def __init__(self, config):
        super().__init__(config())

        self.padding_idx = config.pad_token_id
        self.position_embeddings = nn.Embedding(
            config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx
        )

# RoBERTa and BERT model is identical except for the embedding layer, which is defined above, so no need for additional changes here
class RobertaModel(BertModel):
  def __init__(self, config):
    super().__init__(config)
    self.embeddings = RobertaEmbeddings(config)


# The model heads now only need to redefine the model inside to `RobertaModel`
class RobertaForMaskedLM(BertForMaskedLM):
  def __init__(self, config):
    super().__init__(config)
    self.model = RobertaModel(config)

如果您不使用定義的依賴項，您將收到以下錯誤。

ValueError: You defined `RobertaEmbeddings` in the modular_roberta.py, it should be used when you define `BertModel`, as it is one of it's direct dependencies. Make sure you use it in the `__init__` function.

實現模組化檔案

最簡單的開始方式是瀏覽Transformers中與您的模型相似的模型，以便從中繼承。一些好的起點包括Mistral、Qwen2、Cohere和Cohere以及Llama。請參閱下表，瞭解您的模型可能使用的元件以及可以從何處繼承。

元件	模型
專家混合	SwitchTransformers 或 Mixtral
交錯（和/或部分）旋轉嵌入	GLM, Phi
狀態空間模型	Jamba, Bamba, Zamba, Mamba2
迴圈隱藏狀態	Gemma2
每層滑動視窗注意力/全注意力模式	Gemma2, Cohere2
QKV 裁剪	Olmo
QK 歸一化	Olmo2, Cohere
融合 QKV (不推薦)	Phi3

本節將引導您瞭解如何使用模組化 Transformers 實現 Olmo2，從 Olmo 開始（您可以參考原始的 modeling.py 檔案）。

配置

模組化的`Olmo2Config`如下所示。

from ..olmo.configuration_olmo import OlmoConfig

class Olmo2Config(OlmoConfig):
    r"""
    This is the configuration class to store the configuration of a [Olmo2Model](/docs/transformers/main/en/model_doc/olmo2#transformers.Olmo2Model).
    """

    def __init__(
        self,
        vocab_size=50304,
        hidden_size=4096,
        intermediate_size=11008,
        num_hidden_layers=32,
        num_attention_heads=32,
        num_key_value_heads=None,
        hidden_act="silu",
        max_position_embeddings=2048,
        initializer_range=0.02,
        use_cache=True,
        pad_token_id=1,
        bos_token_id=None,
        eos_token_id=50279,
        tie_word_embeddings=False,
        rope_theta=10000.0,
        rope_scaling=None,
        attention_bias=False,
        attention_dropout=0.0,
        rms_norm_eps=1e-5,
        **kwargs,
    ):
        super().__init__(
            vocab_size=vocab_size,
            hidden_size=hidden_size,
            intermediate_size=intermediate_size,
            num_hidden_layers=num_hidden_layers,
            num_attention_heads=num_attention_heads,
            num_key_value_heads=num_key_value_heads,
            hidden_act=hidden_act,
            max_position_embeddings=max_position_embeddings,
            initializer_range=initializer_range,
            use_cache=use_cache,
            pad_token_id=pad_token_id,
            bos_token_id=bos_token_id,
            eos_token_id=eos_token_id,
            tie_word_embeddings=tie_word_embeddings,
            rope_theta=rope_theta,
            rope_scaling=rope_scaling,
            attention_bias=attention_bias,
            attention_dropout=attention_dropout,
            **kwargs,
        )

        self.rms_norm_eps = rms_norm_eps
        del self.clip_qkv

`Olmo2Config`與原始`OlmoConfig`有三個不同點。

大多數引數的預設值都已更改。
新增一個引數，`rms_norm_eps`。
`clip_qkv`引數不再使用。

對於新的預設值和引數，用新的預設值覆蓋 `__init__` 函式並新增 `rms_norm_eps`。在 `__init__` 函式體中將 `rms_norm_eps` 賦值給 `self`。對於 `clip_qkv` 引數，使用 `del self.clip_qkv` 刪除在展開的程式碼中（經 linter 轉換後）此屬性的賦值。

請注意`super().__init__(...)`的使用方式。通常，它會呼叫父`__init__`。

但在模組化Transformers中，如果存在像`super().my_function(...)`這樣的呼叫，linter會將父類中`my_function`的主體展開到`super().my_function(...)`呼叫發生的位置。`del self.clip_qkv`語句會刪除展開主體中對`self.clip_qkv`的引用。

`del self.` 和 `super().my_function(..)` 協同工作，並且它應該始終放置在 `super().my_function(...)` 之後。您可以在呼叫 `super()` *之前*新增任何您想要的內容，它將放置在父主體之前。

範數

from ..llama.modeling_llama import LlamaRMSNorm

class Olmo2RMSNorm(LlamaRMSNorm):
    pass

`LlamaRMSNorm` 中無需修改。linter 會將 `LlamaRMSNorm` 的確切內容展開到 `Olmo2RMSNorm` 中。文件字串、型別提示和註釋中對 Llama 的引用也會更改為 Olmo2。

注意力

模組化`Olmo2Attention`如下所示。

from ..llama.modeling_llama import eager_attention_forward
from ..olmo.modeling_olmo import OlmoAttention, apply_rotary_pos_emb


# Olmo2 attention is identical to OLMo attention except:
# - Norm is applied to attention queries and keys.
# - No qkv clipping.
class Olmo2Attention(OlmoAttention):
    def __init__(self, config: Olmo2Config, layer_idx: Optional[int] = None):
        super().__init__(config, layer_idx=layer_idx)
        self.q_norm = Olmo2RMSNorm(config.num_attention_heads * self.head_dim, config.rms_norm_eps)
        self.k_norm = Olmo2RMSNorm(config.num_key_value_heads * self.head_dim, config.rms_norm_eps)

    def forward(
        self,
        hidden_states: torch.Tensor,
        position_embeddings: tuple[torch.Tensor, torch.Tensor],
        attention_mask: Optional[torch.Tensor],
        past_key_value: Optional[Cache] = None,
        cache_position: Optional[torch.LongTensor] = None,
        **kwargs,
    ) -> tuple[torch.Tensor, Optional[torch.Tensor], Optional[tuple[torch.Tensor]]]:
        input_shape = hidden_states.shape[:-1]
        hidden_shape = (*input_shape, -1, self.head_dim)

        query_states = self.q_norm(self.q_proj(hidden_states))
        key_states = self.k_norm(self.k_proj(hidden_states))
        value_states = self.v_proj(hidden_states)

        query_states = query_states.view(hidden_shape).transpose(1, 2)
        key_states = key_states.view(hidden_shape).transpose(1, 2)
        value_states = value_states.view(hidden_shape).transpose(1, 2)

        cos, sin = position_embeddings
        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)

        if past_key_value is not None:
            # sin and cos are specific to RoPE models; cache_position needed for the static cache
            cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)

        attention_interface: Callable = eager_attention_forward
        if self.config._attn_implementation != "eager":
            attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]

        attn_output, attn_weights = attention_interface(
            self,
            query_states,
            key_states,
            value_states,
            attention_mask,
            dropout=0.0 if not self.training else self.attention_dropout,
            scaling=self.scaling,
            **kwargs,
        )

        attn_output = attn_output.reshape(*input_shape, -1).contiguous()
        attn_output = self.o_proj(attn_output)
        return attn_output, attn_weights

`super().__init__(...)`複製了父定義並從`Olmo2RMSNorm`添加了2個新層。前向傳播需要被覆蓋以使用這2個新層。在用`q_proj`和`k_proj`進行投影之前，添加了一個帶有範數層的通道。為了簡化，`eager_attention_forward`函式直接從Llama匯入，而`apply_rotary_pos_emb`從Olmo匯入。

Linter透過從原始檔複製它們的定義，自動將這些匯入的函式新增到最終的`modeling_olmo2.py`檔案中。`rotate_half`和`repeat_kv`函式也添加了，因為它們在`apply_rotary_pos_emb`和`eager_attention_forward`內部使用。

`Attention`類必須重新定義，因為沒有任何現有模型包含`RMSNorm`層的`Attention`層。

解碼器層

模組化的`DecoderLayer`如下所示。

from ..olmo.modeling_olmo import OlmoDecoderLayer

# The OLMo2 layers are identical to those of the OLMo model except:
# - RMSNorm is used instead of standard layer norm.
# - Norm is applied after attention/feedforward rather than before.
class Olmo2DecoderLayer(OlmoDecoderLayer):
    def __init__(self, config: Olmo2Config, layer_idx: int):
        super().__init__(config, layer_idx=layer_idx)
        self.post_attention_layernorm = Olmo2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
        self.post_feedforward_layernorm = Olmo2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
        self.self_attn = Olmo2Attention(config=config, layer_idx=layer_idx)
        del self.input_layernorm

    def forward(
        self,
        hidden_states: torch.Tensor,
        attention_mask: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.LongTensor] = None,
        past_key_value: Optional[Cache] = None,
        output_attentions: Optional[bool] = False,
        use_cache: Optional[bool] = False,
        cache_position: Optional[torch.LongTensor] = None,
        position_embeddings: Optional[tuple[torch.Tensor, torch.Tensor]] = None,  # necessary, but kept here for BC
        **kwargs,
    ) -> tuple[torch.FloatTensor, Optional[tuple[torch.FloatTensor, torch.FloatTensor]]]:
        residual = hidden_states

        # Self Attention
        hidden_states, self_attn_weights = self.self_attn(
            hidden_states=hidden_states,
            attention_mask=attention_mask,
            position_ids=position_ids,
            past_key_value=past_key_value,
            output_attentions=output_attentions,
            use_cache=use_cache,
            cache_position=cache_position,
            position_embeddings=position_embeddings,
            **kwargs,
        )
        hidden_states = self.post_attention_layernorm(hidden_states)
        hidden_states = residual + hidden_states

        # Fully Connected
        residual = hidden_states
        hidden_states = self.mlp(hidden_states)
        hidden_states = self.post_feedforward_layernorm(hidden_states)
        hidden_states = residual + hidden_states

        outputs = (hidden_states,)
        if output_attentions:
            outputs += (self_attn_weights,)

        return outputs

在呼叫`super().__init__(...)`之後，透過覆蓋`self.post_attention_layernorm`來切換`__init__`中的範數型別。刪除`self.input_layernorm`屬性並將其替換為`self.post_feedforward_layernorm`，因為它在Olmo2中應用在後面。前向方法被覆蓋以反映此更改。

如果你只將`self.post_feedforward_layernorm`和`self.input_layernorm`從`LayerNorm`切換到`RMSNorm`，而沒有同時更改`self.input_layernorm`的名稱和邏輯，那麼你就不需要重寫forward方法。

模型

模組化的`Olmo2Model`類如下所示。

from ..olmo.modeling_olmo import OlmoModel

# The OLMo2 model is identical to the OLMo model, except RMSNorm is used instead of
# standard layer norm for the output norm.
class Olmo2Model(OlmoModel):
    def __init__(self, config: Olmo2Config):
        super().__init__(config)
        self.norm = Olmo2RMSNorm(config.hidden_size, eps=config.rms_norm_eps)
        self.layers = nn.ModuleList(
            [Olmo2DecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
        )

您只需將`self.norm`屬性的*型別*更改為使用`RMSNorm`而不是`LayerNorm`。此更改不影響前向方法中的邏輯（層名稱和用法與父類相同），因此您無需覆蓋它。Linter會自動展開它。

模型頭部

模組化因果建模頭如下所示。

from ..olmo.modeling_olmo import OlmoForCausalLM

class Olmo2ForCausalLM(OlmoForCausalLM):
    pass

邏輯與`OlmoForCausalLM`相同，這意味著您無需在此處進行任何更改。

其他類

由linter生成的`modeling_olmo2.py`還包含一些在`modular_olmo2.py`中未明確定義的類（`Olmo2MLP`、`Olmo2RotaryEmbedding`、`Olmo2PreTrainedModel`）。

作為繼承類的依賴項但未明確定義的類，會自動作為依賴項跟蹤的一部分新增。這與某些函式新增到`Attention`類而無需直接匯入它們的方式類似。

例如，`OlmoDecoderLayer`有一個屬性定義為`self.mlp = OlmoMLP(config)`。這個類從未在`Olmo2MLP`中明確重新定義，因此linter會自動建立一個類似於`OlmoMLP`的`Olmo2MLP`類。如果它在`modular_olmo2.py`中明確寫入，則與下面的程式碼相同。

from ..olmo.modeling_olmo import OlmoMLP

class Olmo2MLP(OlmoMLP):
    pass

然而，有必要重寫`Olmo2RMSNorm`，因為在`Attention`和`DecoderLayer`類中需要重新定義層歸一化。同樣，這就是為什麼您不需要建立`Olmo2PreTrainedModel`和`Olmo2RotaryEmbedding`類。

未重寫的類將從繼承模組首次使用它們的檔案中複製。這意味著如果您希望`Olmo2MLP`繼承自`MistralMLP`，則需要更明確，如下所示。

# switch to mistral definition
from ..mistral.modeling_mistral import MistralMLP

class Olmo2MLP(MistralMLP):
    pass

刪除屬性

您可以在使用`super().__init__()`之後使用`del`刪除父類中定義的屬性。但是，如果該屬性也在其他地方使用，則不起作用，如下所示。它只抑制賦值。`self.attribute = config.attribute`行被刪除，但`if`語句仍然存在並引用該屬性。

class DummyModel(nn.Module):

  def __init__(self, config: DummyConfig):
    super().__init__()
    self.attribute = config.attribute
    if self.attribute:
      # do more stuff with `self.attribute` here
      ...

class MyNewDummyModel(DummyModel):

  def __init__(self, config: MyNewDummyConfig):
    super().__init__(config)
    del self.attribute

顯式 `super()` 呼叫

如果您仍然想從 `DummyModel` 繼承，但又不想刪除 `self.attribute`，請明確指定您正在呼叫哪個類的 `super()`。下面的示例演示瞭如何呼叫 `nn.Module` 的 `super()`（展開的程式碼顯示在右側）

class MyNewDummyModel(DummyModel, nn.Module):        |     class MyNewDummyModel(nn.Module):
                                                     |
  def __init__(self, config: MyNewDummyConfig):      |       def __init__(self, config: MyNewDummyConfig):
    nn.Module.__init__(config)                       |         super().__init__()
    self.foo = config.foo                            |         self.foo = config.foo
    ...                                              |         ...

刪除未使用的函式

透過將其覆蓋為`raise AttributeError("")`語句來刪除屬性，以模仿您在Python中刪除父函式時想要的行為。下面的示例刪除了展開程式碼中的方法。

class GemmaTokenizer(LlamaTokenizer):
    ...

    def get_spm_processor(self):
        raise AttributeError("Not needed for Gemma")

    def unk_token_length(self):
        raise AttributeError("Not needed for Gemma")

定義新函式

預設情況下，如果您繼承一個類並使用父方法中的一個或多個裝飾器覆蓋一個方法，則這些裝飾器也會新增到展開的程式碼中*，僅當您沒有自己新增任何裝飾器時*。否則，將使用重新定義的裝飾器。

例如，如果您有一個如下所示的父類並對其進行覆蓋，則會保留父裝飾器。

class DummyModel(nn.Module):
  ...

  @decorator(...)
  def forward(...)
    # do stuff here

模組化程式碼顯示在左側，展開的程式碼顯示在右側。

class NewModel(DummyModel):       |   class NewModel(nn.Module):
  ...                             |     ...
                                  |
  def forward(...):               |     @decorator(...)
    ...                           |     def forward(...):
                                  |       ...

但是，如果您新增一個新裝飾器，則會使用您的新裝飾器。

class NewModel(DummyModel):       |   class NewModel(nn.Module):
  ...                             |     ...
                                  |
  @my_new_decorator(...)          |     @my_new_decorator(...)
  def forward(...):               |     def forward(...):
    ...                           |       ...

super_kwargs

在某個前向方法很長且您想切換裝飾器的情況下，您不需要重新定義所有內容並複製/貼上該函式。您可以使用`super().forward(...)`來展開父方法體。當函式簽名中有許多引數時，請在重寫的簽名中使用特殊的`**super_kwargs`語法。

此語法指示linter在此處展開所有父簽名引數。下面是AutoModelForCausalLM模型中的一個示例簽名，包含許多引數。

class LlamaForCausalLM(nn.Module):
  ...

  @add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING)
  @replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC)
  def forward(
      self,
      input_ids: torch.LongTensor = None,
      attention_mask: Optional[torch.Tensor] = None,
      position_ids: Optional[torch.LongTensor] = None,
      past_key_values: Optional[Union[Cache, list[torch.FloatTensor]]] = None,
      inputs_embeds: Optional[torch.FloatTensor] = None,
      labels: Optional[torch.LongTensor] = None,
      use_cache: Optional[bool] = None,
      output_attentions: Optional[bool] = None,
      output_hidden_states: Optional[bool] = None,
      return_dict: Optional[bool] = None,
      cache_position: Optional[torch.LongTensor] = None,
      num_logits_to_keep: int = 0,
      **kwargs: Unpack[KwargsForCausalLM],
  ) -> Union[Tuple, CausalLMOutputWithPast]:
    ...

與其重寫並複製/貼上所有這些引數，不如使用`super().forward(**super_kwargs)`語句（模組化程式碼顯示在左側，展開程式碼顯示在右側）。

class NewModelForCausalLM(LlamaForCausalLM):    |    class LlamaForCausalLM(nn.Module):
  ...                                           |      ...
                                                |
  @my_new_decorator                             |     @my_new_decorator
  def forward(self, **super_kwargs):            |     def forward(
    super().forward(**super_kwargs)             |         self,
                                                |         input_ids: torch.LongTensor = None,
                                                |         attention_mask: Optional[torch.Tensor] = None,
                                                |         position_ids: Optional[torch.LongTensor] = None,
                                                |         past_key_values: Optional[Union[Cache, list[torch.FloatTensor]]] = |None,
                                                |         inputs_embeds: Optional[torch.FloatTensor] = None,
                                                |         labels: Optional[torch.LongTensor] = None,
                                                |         use_cache: Optional[bool] = None,
                                                |         output_attentions: Optional[bool] = None,
                                                |         output_hidden_states: Optional[bool] = None,
                                                |         return_dict: Optional[bool] = None,
                                                |         cache_position: Optional[torch.LongTensor] = None,
                                                |         num_logits_to_keep: int = 0,
                                                |         **kwargs: Unpack[KwargsForCausalLM],
                                                |     ) -> Union[Tuple, CausalLMOutputWithPast]:
                                                |       ...

這使得切換裝飾器變得非常容易，並且明確表明您想要應用的唯一更改是裝飾器。

然而，不應使用`**super_kwargs`來避免在重新定義方法時顯得過於明確。如果您重寫一個方法，您應該像往常一樣明確編寫簽名。`**super_kwargs`語法是切換裝飾器和其他一些特殊情況的快捷方式。

文件字串變數

如果模組化檔案和它繼承的建模檔案中都定義了物件，則模組化定義具有優先權，但包含`DOCSTRING`模式的賦值除外。這些變數通常用於建模檔案中的`MODEL_START_DOCSTRING`和`MODEL_INPUT_DOCSTRING`。它們是大的文件字串塊，linter會在所有地方重寫這些名稱。因此，包含`DOCSTRING`變數的賦值可以使用原始檔中找到的定義，而無需複製整個文件字串，只需在模組化檔案中將變數設定為`None`即可。

如果您需要在某個地方引用變數但又不想用總是相同的文件字串來使模組化檔案變得混亂，這非常有用。下面的示例程式碼允許您自動使用Mistral中與Starcoder2相同的文件字串。

STARCODER2_INPUTS_DOCSTRING = None  # will be automatically redefined

class Starcoder2Model(MistralModel):
    ...

    @add_start_docstrings_to_model_forward(STARCODER2_INPUTS_DOCSTRING)
    def forward(...)
        ...

將變數設定為`None`以外的任何值都將覆蓋文件字串，以便您可以在需要時自定義文件字串。

特殊命名

Linter在從類繼承時會自動重新命名所有內容。為了保持一致性，當從同一檔案的不同類繼承時，您應該始終使用相同的類名字首。

不推薦以下示例。它違反了庫中的標準，使用了`MyModelIncredibleMLP`而不是`LlamaMLP`，因為linter不知道如何重新命名潛在的高階依賴（`MyModelIncredible`或僅僅`MyModel`）。

class MyModelIncredibleMLP(LlamaMLP):
    ...

class MyModelDecoderLayer(LlamaDecoderLayer):
    ...

但是，如果沒有隱式依賴項，則可以區域性重新命名單個類。請確保您仍然使用新的命名模式明確重新定義類的所有其他提及。例如，所有`LlamaMLP`的提及都應重新命名為`MyModelIncredibleMLP`，否則linter可能會新增一個新的且不需要的`MyModelMLP`類。

如果檢測到模糊情況，linter會發出警告。它會解釋正在發生的事情以及預設用於獲取依賴項的字首。這些警告和重新命名模式的複雜性通常只在定義多模態模型時出現。例如，在多模態模型中向類名新增`Text`以明確其指的是哪種模態。

We detected multiple prefix names when inheriting from transformers.models.llama.modeling_llama: ('Emu3Text', 'Emu3'). We will only use the most used 'Emu3' prefix when grabbing args and dependencies. Make sure to subclass the intermediate classes with the prefix you want (if different from 'Emu3') or use a single prefix in all the modular (best).

如果存在帶有字首的自動依賴項，但您想要另一個，請使用`pass`類在本地顯式重新命名類，如下所示。

class Emu3TextMLP(LlamaMLP):
    pass

配置文件字串

當繼承`Config`類或新增和刪除屬性時，您可能只想重新定義文件字串中的新屬性。但是，linter尚不支援此功能。您需要直接在類定義下的模組化檔案中新增整個文件字串。

< > 在 GitHub 上更新