Transformers

( logits: typing.Optional[torch.FloatTensor] = None last_hidden_state: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[tuple[torch.FloatTensor]] = None attentions: typing.Optional[tuple[torch.FloatTensor]] = None cross_attentions: typing.Optional[tuple[torch.FloatTensor]] = None )

引數

logits (torch.FloatTensor 形狀為 (batch_size, num_labels)) — 分類（如果 config.num_labels==1 則為迴歸）分數（SoftMax 之前）。
last_hidden_state (torch.FloatTensor 形狀為 (batch_size, sequence_length, hidden_size), 可選, 預設為 None) — 模型最後一層的隱藏狀態序列。
hidden_states (tuple[torch.FloatTensor], 可選, 當傳入 output_hidden_states=True 或 config.output_hidden_states=True 時返回) — torch.FloatTensor 的元組（如果模型有嵌入層，則為嵌入層輸出一個，每個層輸出一個）形狀為 (batch_size, sequence_length, hidden_size)。

模型在每個層輸出的隱藏狀態以及可選的初始嵌入輸出。
attentions (tuple[torch.FloatTensor], 可選, 當傳入 output_attentions=True 或 config.output_attentions=True 時返回) — torch.FloatTensor 的元組（每層一個）形狀為 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 後的注意力權重，用於計算自注意力頭中的加權平均。
cross_attentions (tuple[torch.FloatTensor], 可選, 當傳入 output_attentions=True 或 config.output_attentions=True 時返回) — torch.FloatTensor 的元組（每層一個）形狀為 (batch_size, num_heads, sequence_length, sequence_length)。

解碼器交叉注意力層在注意力 softmax 後的注意力權重，用於計算交叉注意力頭中的加權平均。

Perceiver 基礎模型輸出的基類，包含潛在隱藏狀態、注意力機制和交叉注意力機制。

class transformers.models.perceiver.modeling_perceiver.PerceiverDecoderOutput

( logits: typing.Optional[torch.FloatTensor] = None cross_attentions: typing.Optional[tuple[torch.FloatTensor]] = None )

引數

logits (torch.FloatTensor 形狀為 (batch_size, num_labels)) — 基本解碼器的輸出。
cross_attentions (tuple[torch.FloatTensor], 可選, 當傳入 output_attentions=True 或 config.output_attentions=True 時返回) — torch.FloatTensor 的元組（每層一個）形狀為 (batch_size, num_heads, sequence_length, sequence_length)。

解碼器交叉注意力層在注意力 softmax 後的注意力權重，用於計算交叉注意力頭中的加權平均。

Perceiver 解碼器輸出的基類，包含潛在交叉注意力機制。

class transformers.models.perceiver.modeling_perceiver.PerceiverMaskedLMOutput

( loss: typing.Optional[torch.FloatTensor] = None logits: typing.Optional[torch.FloatTensor] = None hidden_states: typing.Optional[tuple[torch.FloatTensor]] = None attentions: typing.Optional[tuple[torch.FloatTensor]] = None cross_attentions: typing.Optional[tuple[torch.FloatTensor]] = None )

引數

loss (torch.FloatTensor 形狀為 (1,), 可選, 當提供 labels 時返回) — 掩碼語言建模（MLM）損失。
logits (torch.FloatTensor 形狀為 (batch_size, sequence_length, config.vocab_size)) — 語言建模頭的預測分數（SoftMax 之前每個詞彙 token 的分數）。
hidden_states (tuple[torch.FloatTensor], 可選, 當傳入 output_hidden_states=True 或 config.output_hidden_states=True 時返回) — torch.FloatTensor 的元組（如果模型有嵌入層，則為嵌入層輸出一個，每個層輸出一個）形狀為 (batch_size, sequence_length, hidden_size)。

模型在每個層輸出的隱藏狀態以及可選的初始嵌入輸出。
attentions (tuple[torch.FloatTensor], 可選, 當傳入 output_attentions=True 或 config.output_attentions=True 時返回) — torch.FloatTensor 的元組（每層一個）形狀為 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 後的注意力權重，用於計算自注意力頭中的加權平均。
cross_attentions (tuple[torch.FloatTensor], 可選, 當傳入 output_attentions=True 或 config.output_attentions=True 時返回) — torch.FloatTensor 的元組（每層一個）形狀為 (batch_size, num_heads, sequence_length, sequence_length)。

解碼器交叉注意力層在注意力 softmax 後的注意力權重，用於計算交叉注意力頭中的加權平均。

Perceiver 掩碼語言模型輸出的基類。

class transformers.models.perceiver.modeling_perceiver.PerceiverClassifierOutput

引數

loss (torch.FloatTensor 形狀為 (1,), 可選, 當提供 labels 時返回) — 分類（如果 config.num_labels==1 則為迴歸）損失。
logits (torch.FloatTensor 形狀為 (batch_size, config.num_labels)) — 分類（如果 config.num_labels==1 則為迴歸）分數（SoftMax 之前）。
hidden_states (tuple[torch.FloatTensor], 可選, 當傳入 output_hidden_states=True 或 config.output_hidden_states=True 時返回) — torch.FloatTensor 的元組（如果模型有嵌入層，則為嵌入層輸出一個，每個層輸出一個）形狀為 (batch_size, sequence_length, hidden_size)。

模型在每個層輸出的隱藏狀態以及可選的初始嵌入輸出。
attentions (tuple[torch.FloatTensor], 可選, 當傳入 output_attentions=True 或 config.output_attentions=True 時返回) — torch.FloatTensor 的元組（每層一個）形狀為 (batch_size, num_heads, sequence_length, sequence_length)。

注意力 softmax 後的注意力權重，用於計算自注意力頭中的加權平均。
cross_attentions (tuple[torch.FloatTensor], 可選, 當傳入 output_attentions=True 或 config.output_attentions=True 時返回) — torch.FloatTensor 的元組（每層一個）形狀為 (batch_size, num_heads, sequence_length, sequence_length)。

解碼器交叉注意力層在注意力 softmax 後的注意力權重，用於計算交叉注意力頭中的加權平均。

Perceiver 序列/影像分類模型、光流和多模態自動編碼輸出的基類。

PerceiverConfig

class transformers.PerceiverConfig

( num_latents = 256 d_latents = 1280 d_model = 768 num_blocks = 1 num_self_attends_per_block = 26 num_self_attention_heads = 8 num_cross_attention_heads = 8 qk_channels = None v_channels = None cross_attention_shape_for_attention = 'kv' self_attention_widening_factor = 1 cross_attention_widening_factor = 1 hidden_act = 'gelu' attention_probs_dropout_prob = 0.1 initializer_range = 0.02 layer_norm_eps = 1e-12 use_query_residual = True vocab_size = 262 max_position_embeddings = 2048 image_size = 56 train_size = [368, 496] num_frames = 16 audio_samples_per_frame = 1920 samples_per_patch = 16 output_shape = [1, 16, 224, 224] output_num_channels = 512 _label_trainable_num_channels = 1024 **kwargs )

引數

num_latents (int, 可選, 預設為 256) — 潛在變數的數量。
d_latents (int, 可選, 預設為 1280) — 潛在嵌入的維度。
d_model (int, 可選, 預設為 768) — 輸入維度。僅在使用 [PerceiverTextPreprocessor] 或未提供預處理器時才應提供。
num_blocks (int, 可選, 預設為 1) — Transformer 編碼器中的塊數。
num_self_attends_per_block (int, 可選, 預設為 26) — 每個塊中的自注意力層數。
num_self_attention_heads (int, 可選, 預設為 8) — Transformer 編碼器中每個自注意力層的注意力頭數。
num_cross_attention_heads (int, 可選, 預設為 8) — Transformer 編碼器中每個交叉注意力層的注意力頭數。
qk_channels (int, 可選) — 在編碼器的交叉注意力和自注意力層中，應用注意力之前用於投射查詢+鍵的維度。如果未指定，將預設為保留查詢的維度。
v_channels (int, 可選) — 在編碼器的交叉注意力和自注意力層中，應用注意力之前用於投射值的維度。如果未指定，將預設為保留查詢的維度。
cross_attention_shape_for_attention (str, 可選, 預設為 "kv") — 在編碼器的交叉注意力層中，下采樣查詢和鍵時使用的維度。
self_attention_widening_factor (int, 可選, 預設為 1) — Transformer 編碼器交叉注意力層中的前饋層維度。
cross_attention_widening_factor (int, 可選, 預設為 1) — Transformer 編碼器自注意力層中的前饋層維度。
hidden_act (str 或 function, 可選, 預設為 "gelu") — 編碼器和池化器中的非線性啟用函式（函式或字串）。如果為字串，支援 "gelu"、"relu"、"selu" 和 "gelu_new"。
attention_probs_dropout_prob (float, 可選, 預設為 0.1) — 注意力機率的 dropout 比率。
initializer_range (float, 可選, 預設為 0.02) — 用於初始化所有權重矩陣的 truncated_normal_initializer 的標準差。
layer_norm_eps (float, 可選, 預設為 1e-12) — 層歸一化層使用的 epsilon 值。
use_query_residual (float, 可選, 預設為 True) — 是否在編碼器的交叉注意力層中新增查詢殘差。
vocab_size (int, 可選, 預設為 262) — 用於掩碼語言模型詞彙表大小。
max_position_embeddings (int, 可選, 預設為 2048) — 掩碼語言模型可能使用的最大序列長度。通常設定為一個較大值以防萬一（例如，512、1024 或 2048）。
image_size (int, 可選, 預設為 56) — 預處理後圖像的大小，適用於 PerceiverForImageClassificationLearned。
train_size (list[int], 可選, 預設為 [368, 496]) — 用於光流模型的影像訓練大小。
num_frames (int, 可選, 預設為 16) — 用於多模態自編碼模型的影片幀數。
audio_samples_per_frame (int, 可選, 預設為 1920) — 用於多模態自編碼模型的每幀音訊樣本數。
samples_per_patch (int, 可選, 預設為 16) — 在對多模態自編碼模型的音訊進行預處理時，每個補丁的音訊樣本數。
output_shape (list[int], 可選, 預設為 [1, 16, 224, 224]) — 多模態自編碼模型影片解碼器查詢的輸出形狀 (batch_size, num_frames, height, width)。這不包括通道維度。
output_num_channels (int, 可選, 預設為 512) — 每個模態解碼器的輸出通道數。

這是用於儲存 PerceiverModel 配置的配置類。它用於根據指定引數例項化 Perceiver 模型，定義模型架構。使用預設值例項化配置將產生與 Perceiver deepmind/language-perceiver 架構相似的配置。

配置物件繼承自 PretrainedConfig，可用於控制模型輸出。有關這些方法的更多資訊，請參閱 PretrainedConfig 的文件。

示例

>>> from transformers import PerceiverModel, PerceiverConfig

>>> # Initializing a Perceiver deepmind/language-perceiver style configuration
>>> configuration = PerceiverConfig()

>>> # Initializing a model from the deepmind/language-perceiver style configuration
>>> model = PerceiverModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config

PerceiverTokenizer

class transformers.PerceiverTokenizer

( pad_token = '[PAD]' bos_token = '[BOS]' eos_token = '[EOS]' mask_token = '[MASK]' cls_token = '[CLS]' sep_token = '[SEP]' model_max_length = 2048 **kwargs )

引數

pad_token (str, 可選, 預設為 "[PAD]") — 用於填充的標記，例如在對不同長度的序列進行批處理時。
bos_token (str, 可選, 預設為 "[BOS]") — BOS 標記（在詞彙表中保留，但實際上未使用）。
eos_token (str, 可選, 預設為 "[EOS]") — 序列結束標記（在詞彙表中保留，但實際上未使用）。

使用特殊標記構建序列時，這不是序列結束使用的標記。使用的標記是 sep_token。
mask_token (str, 可選, 預設為 "[MASK]") — MASK 標記，用於掩碼語言模型。
cls_token (str, 可選, 預設為 "[CLS]") — CLS 標記（在詞彙表中保留，但實際上未使用）。
sep_token (str, 可選, 預設為 "[SEP]") — 分隔符標記，用於從兩個序列構建序列。

構建 Perceiver 分詞器。Perceiver 僅使用原始位元組 utf-8 編碼。

此分詞器繼承自 PreTrainedTokenizer，其中包含大部分主要方法。使用者應查閱此超類以獲取有關這些方法的更多資訊。

call

( text: typing.Union[str, list[str], list[list[str]], NoneType] = None text_pair: typing.Union[str, list[str], list[list[str]], NoneType] = None text_target: typing.Union[str, list[str], list[list[str]], NoneType] = None text_pair_target: typing.Union[str, list[str], list[list[str]], NoneType] = None add_special_tokens: bool = True padding: typing.Union[bool, str, transformers.utils.generic.PaddingStrategy] = False truncation: typing.Union[bool, str, transformers.tokenization_utils_base.TruncationStrategy, NoneType] = None max_length: typing.Optional[int] = None stride: int = 0 is_split_into_words: bool = False pad_to_multiple_of: typing.Optional[int] = None padding_side: typing.Optional[str] = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None return_token_type_ids: typing.Optional[bool] = None return_attention_mask: typing.Optional[bool] = None return_overflowing_tokens: bool = False return_special_tokens_mask: bool = False return_offsets_mapping: bool = False return_length: bool = False verbose: bool = True **kwargs ) → BatchEncoding

引數

text (str, list[str], list[list[str]], 可選) — 要編碼的序列或序列批次。每個序列可以是字串或字串列表（預分詞字串）。如果序列以字串列表（預分詞）形式提供，則必須設定 is_split_into_words=True（以消除與序列批次的歧義）。
text_pair (str, list[str], list[list[str]], 可選) — 要編碼的序列或序列批次。每個序列可以是字串或字串列表（預分詞字串）。如果序列以字串列表（預分詞）形式提供，則必須設定 is_split_into_words=True（以消除與序列批次的歧義）。
text_target (str, list[str], list[list[str]], 可選) — 要編碼為目標文字的序列或序列批次。每個序列可以是字串或字串列表（預分詞字串）。如果序列以字串列表（預分詞）形式提供，則必須設定 is_split_into_words=True（以消除與序列批次的歧義）。
text_pair_target (str, list[str], list[list[str]], 可選) — 要編碼為目標文字的序列或序列批次。每個序列可以是字串或字串列表（預分詞字串）。如果序列以字串列表（預分詞）形式提供，則必須設定 is_split_into_words=True（以消除與序列批次的歧義）。
add_special_tokens (bool, 可選, 預設為 True) — 編碼序列時是否新增特殊標記。這將使用底層 PretrainedTokenizerBase.build_inputs_with_special_tokens 函式，該函式定義哪些標記自動新增到輸入 ID 中。如果您想自動新增 bos 或 eos 標記，這會很有用。
padding (bool, str 或 PaddingStrategy, 可選, 預設為 False) — 啟用並控制填充。接受以下值：
- True 或 'longest': 填充到批次中最長的序列（如果只提供單個序列則不填充）。
- 'max_length': 填充到由 max_length 引數指定的最大長度，如果未提供該引數，則填充到模型可接受的最大輸入長度。
- False 或 'do_not_pad'（預設）: 不填充（即，可以輸出序列長度不同的批次）。
truncation (bool, str 或 TruncationStrategy, 可選, 預設為 False) — 啟用並控制截斷。接受以下值：
- True 或 'longest_first': 截斷到由 max_length 引數指定的最大長度，如果未提供該引數，則截斷到模型可接受的最大輸入長度。如果提供一對序列（或批次），這將逐個標記截斷，從對中最長的序列中刪除一個標記。
- 'only_first': 截斷到由 max_length 引數指定的最大長度，如果未提供該引數，則截斷到模型可接受的最大輸入長度。如果提供一對序列（或批次），這將只截斷對中的第一個序列。
- 'only_second': 截斷到由 max_length 引數指定的最大長度，如果未提供該引數，則截斷到模型可接受的最大輸入長度。如果提供一對序列（或批次），這將只截斷對中的第二個序列。
- False 或 'do_not_truncate'（預設）: 不截斷（即，可以輸出序列長度大於模型最大可接受輸入大小的批次）。
max_length (int, 可選) — 控制截斷/填充引數之一使用的最大長度。

如果未設定或設定為 None，這將使用預定義的模型最大長度（如果截斷/填充引數之一需要最大長度）。如果模型沒有特定的最大輸入長度（如 XLNet），則截斷/填充到最大長度將被停用。
stride (int, 可選, 預設為 0) — 如果與 max_length 一起設定為某個數字，則當 return_overflowing_tokens=True 時返回的溢位標記將包含來自截斷序列末尾的一些標記，以在截斷序列和溢位序列之間提供一些重疊。此引數的值定義重疊標記的數量。
is_split_into_words (bool, 可選, 預設為 False) — 輸入是否已經預分詞（例如，按單詞分割）。如果設定為 True，分詞器假定輸入已按單詞分割（例如，透過按空格分割），然後會將其分詞。這對於 NER 或標記分類很有用。
pad_to_multiple_of (int, 可選) — 如果設定，將把序列填充到提供值的倍數。需要啟用 padding。這對於在 NVIDIA 硬體上使用計算能力 >= 7.5 (Volta) 的 Tensor Core 尤其有用。
padding_side (str, 可選) — 模型應在其上應用填充的一側。應在“[‘right’, ‘left’]”之間選擇。預設值從同名的類屬性中選擇。
return_tensors (str 或 TensorType, 可選) — 如果設定，將返回張量而不是 Python 整數列表。可接受的值為：
- 'tf': 返回 TensorFlow tf.constant 物件。
- 'pt': 返回 PyTorch torch.Tensor 物件。
- 'np': 返回 Numpy np.ndarray 物件。
return_token_type_ids (bool, 可選) — 是否返回標記型別 ID。如果保留預設值，將根據特定分詞器的預設設定（由 return_outputs 屬性定義）返回標記型別 ID。

什麼是標記型別 ID？
return_attention_mask (bool, 可選) — 是否返回注意力掩碼。如果保留預設值，將根據特定分詞器的預設設定（由 return_outputs 屬性定義）返回注意力掩碼。

什麼是注意力掩碼？
return_overflowing_tokens (bool, optional, 預設為 False) — 是否返回溢位的 token 序列。如果提供了輸入 ID 的序列對（或成批序列對），且 truncation_strategy = longest_first 或 True，則會引發錯誤，而不是返回溢位 token。
return_special_tokens_mask (bool, optional, 預設為 False) — 是否返回特殊 token 掩碼資訊。
return_offsets_mapping (bool, optional, 預設為 False) — 是否為每個 token 返回 (char_start, char_end)。
此功能僅適用於繼承自 PreTrainedTokenizerFast 的快速分詞器，如果使用 Python 的分詞器，此方法將引發 NotImplementedError。
return_length (bool, optional, 預設為 False) — 是否返回編碼輸入的長度。
verbose (bool, optional, 預設為 True) — 是否列印更多資訊和警告。
**kwargs — 傳遞給 self.tokenize() 方法

BatchEncoding

一個 BatchEncoding，包含以下欄位：

input_ids — 要輸入到模型中的標記 ID 列表。

什麼是輸入 ID？
token_type_ids — 要輸入到模型中的標記型別 ID 列表（當 return_token_type_ids=True 或如果 *“token_type_ids”* 在 self.model_input_names 中時）。

什麼是標記型別 ID？
attention_mask — 指定模型應關注哪些標記的索引列表（當 return_attention_mask=True 或如果 *“attention_mask”* 在 self.model_input_names 中時）。

什麼是注意力掩碼？
overflowing_tokens — 溢位標記序列列表（當指定 max_length 且 return_overflowing_tokens=True 時）。
num_truncated_tokens — 截斷標記的數量（當指定 max_length 且 return_overflowing_tokens=True 時）。
special_tokens_mask — 0 和 1 的列表，其中 1 表示新增的特殊標記，0 表示常規序列標記（當 add_special_tokens=True 且 return_special_tokens_mask=True 時）。
length — 輸入的長度（當 return_length=True 時）

將一個或多個序列或一對或多對序列標記化並準備用於模型的主要方法。

PerceiverFeatureExtractor

class transformers.PerceiverFeatureExtractor

( *args **kwargs )

call

( images **kwargs )

預處理單張或批次影像。

PerceiverImageProcessor

class transformers.PerceiverImageProcessor

( do_center_crop: bool = True crop_size: typing.Optional[dict[str, int]] = None do_resize: bool = True size: typing.Optional[dict[str, int]] = None resample: Resampling = <Resampling.BICUBIC: 3> do_rescale: bool = True rescale_factor: typing.Union[int, float] = 0.00392156862745098 do_normalize: bool = True image_mean: typing.Union[float, list[float], NoneType] = None image_std: typing.Union[float, list[float], NoneType] = None **kwargs )

引數

do_center_crop (bool, optional, 預設為 True) — 是否對影像進行中心裁剪。如果輸入尺寸的任一邊小於 crop_size，則影像將用零填充，然後進行中心裁剪。可透過 preprocess 方法中的 do_center_crop 引數覆蓋。
crop_size (dict[str, int], optional, 預設為 {"height" -- 256, "width": 256})：應用中心裁剪後的期望輸出尺寸。可透過 preprocess 方法中的 crop_size 引數覆蓋。
do_resize (bool, optional, 預設為 True) — 是否將影像大小調整為 (size["height"], size["width"])。可透過 preprocess 方法中的 do_resize 引數覆蓋。
size (dict[str, int] optional, 預設為 {"height" -- 224, "width": 224})：調整大小後圖像的尺寸。可透過 preprocess 方法中的 size 引數覆蓋。
resample (PILImageResampling, optional, 預設為 PILImageResampling.BICUBIC) — 定義調整影像大小時使用的重取樣濾鏡。可透過 preprocess 方法中的 resample 引數覆蓋。
do_rescale (bool, optional, 預設為 True) — 是否透過指定的縮放因子 rescale_factor 對影像進行縮放。可透過 preprocess 方法中的 do_rescale 引數覆蓋。
rescale_factor (int 或 float, optional, 預設為 1/255) — 定義縮放影像時使用的縮放因子。可透過 preprocess 方法中的 rescale_factor 引數覆蓋。
do_normalize — 是否對影像進行歸一化。可透過 preprocess 方法中的 do_normalize 引數覆蓋。
image_mean (float 或 list[float], optional, 預設為 IMAGENET_STANDARD_MEAN) — 影像歸一化時使用的均值。這是一個浮點數或浮點數列表，其長度與影像中的通道數相同。可透過 preprocess 方法中的 image_mean 引數覆蓋。
image_std (float 或 list[float], optional, 預設為 IMAGENET_STANDARD_STD) — 影像歸一化時使用的標準差。這是一個浮點數或浮點數列表，其長度與影像中的通道數相同。可透過 preprocess 方法中的 image_std 引數覆蓋。

構建一個 Perceiver 影像處理器。

預處理

( images: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']] do_center_crop: typing.Optional[bool] = None crop_size: typing.Optional[dict[str, int]] = None do_resize: typing.Optional[bool] = None size: typing.Optional[dict[str, int]] = None resample: Resampling = None do_rescale: typing.Optional[bool] = None rescale_factor: typing.Optional[float] = None do_normalize: typing.Optional[bool] = None image_mean: typing.Union[float, list[float], NoneType] = None image_std: typing.Union[float, list[float], NoneType] = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None data_format: ChannelDimension = <ChannelDimension.FIRST: 'channels_first'> input_data_format: typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None )

引數

images (ImageInput) — 要預處理的影像。期望單個或批次影像的畫素值範圍為 0 到 255。如果傳入的影像畫素值在 0 到 1 之間，請設定 do_rescale=False。
do_center_crop (bool, optional, 預設為 self.do_center_crop) — 是否將影像中心裁剪為 crop_size。
crop_size (dict[str, int], optional, 預設為 self.crop_size) — 應用中心裁剪後的期望輸出尺寸。
do_resize (bool, optional, 預設為 self.do_resize) — 是否調整影像大小。
size (dict[str, int], optional, 預設為 self.size) — 調整大小後圖像的尺寸。
resample (int, optional, 預設為 self.resample) — 調整影像大小時使用的重取樣濾鏡。可以是列舉型別 PILImageResampling 之一。僅當 do_resize 設定為 True 時有效。
do_rescale (bool, optional, 預設為 self.do_rescale) — 是否縮放影像。
rescale_factor (float, optional, 預設為 self.rescale_factor) — 如果 do_rescale 設定為 True，則用於縮放影像的縮放因子。
do_normalize (bool, optional, 預設為 self.do_normalize) — 是否對影像進行歸一化。
image_mean (float 或 list[float], optional, 預設為 self.image_mean) — 影像均值。
image_std (float 或 list[float], optional, 預設為 self.image_std) — 影像標準差。
return_tensors (str 或 TensorType, optional) — 返回張量的型別。可以是以下之一：
- 未設定：返回 np.ndarray 列表。
- TensorType.TENSORFLOW 或 'tf'：返回 tf.Tensor 型別的批次。
- TensorType.PYTORCH 或 'pt'：返回 torch.Tensor 型別的批次。
- TensorType.NUMPY 或 'np'：返回 np.ndarray 型別的批次。
- TensorType.JAX 或 'jax'：返回 jax.numpy.ndarray 型別的批次。
data_format (ChannelDimension 或 str, optional, 預設為 ChannelDimension.FIRST) — 輸出影像的通道維度格式。可以是以下之一：
- ChannelDimension.FIRST：影像格式為 (num_channels, height, width)。
- ChannelDimension.LAST：影像格式為 (height, width, num_channels)。
input_data_format (ChannelDimension 或 str, optional) — 輸入影像的通道維度格式。如果未設定，則從輸入影像推斷通道維度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：影像格式為 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：影像格式為 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：影像格式為 (height, width)。

預處理一張或一批影像。

PerceiverImageProcessorFast

class transformers.PerceiverImageProcessorFast

( **kwargs: typing_extensions.Unpack[transformers.image_processing_utils_fast.DefaultFastImageProcessorKwargs] )

構建一個快速 Perceiver 影像處理器。

預處理

( images: typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']] *args **kwargs: typing_extensions.Unpack[transformers.image_processing_utils_fast.DefaultFastImageProcessorKwargs] ) → <class 'transformers.image_processing_base.BatchFeature'>

引數

images (Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]) — 要預處理的影像。期望單個或批次影像的畫素值範圍為 0 到 255。如果傳入的影像畫素值在 0 到 1 之間，請設定 do_rescale=False。
do_resize (bool, optional) — 是否調整影像大小。
size (dict[str, int], optional) — 描述模型的最大輸入尺寸。
default_to_square (bool, optional) — 當調整大小（如果尺寸為整數）時，是否預設為方形影像。
resample (Union[PILImageResampling, F.InterpolationMode, NoneType]) — 調整影像大小時使用的重取樣濾鏡。可以是列舉型別 PILImageResampling 之一。僅當 do_resize 設定為 True 時有效。
do_center_crop (bool, optional) — 是否對影像進行中心裁剪。
crop_size (dict[str, int], optional) — 應用 center_crop 後輸出影像的尺寸。
do_rescale (bool, optional) — 是否縮放影像。
rescale_factor (Union[int, float, NoneType]) — 如果 do_rescale 設定為 True，則用於縮放影像的縮放因子。
do_normalize (bool, optional) — 是否對影像進行歸一化。
image_mean (Union[float, list[float], NoneType]) — 用於歸一化的影像均值。僅當 do_normalize 設定為 True 時有效。
image_std (Union[float, list[float], NoneType]) — 用於歸一化的影像標準差。僅當 do_normalize 設定為 True 時有效。
do_convert_rgb (bool, optional) — 是否將影像轉換為 RGB 格式。
return_tensors (Union[str, ~utils.generic.TensorType, NoneType]) — 如果設定為 pt，則返回堆疊的張量，否則返回張量列表。
data_format (~image_utils.ChannelDimension, optional) — 僅支援 ChannelDimension.FIRST。為與慢速處理器相容而新增。
input_data_format (Union[str, ~image_utils.ChannelDimension, NoneType]) — 輸入影像的通道維度格式。如果未設定，則從輸入影像推斷通道維度格式。可以是以下之一：
- "channels_first" 或 ChannelDimension.FIRST：影像格式為 (num_channels, height, width)。
- "channels_last" 或 ChannelDimension.LAST：影像格式為 (height, width, num_channels)。
- "none" 或 ChannelDimension.NONE：影像格式為 (height, width)。
device (torch.device, optional) — 處理影像的裝置。如果未設定，則從輸入影像推斷裝置。
disable_grouping (bool, optional) — 是否停用按大小對影像進行分組以單獨處理而不是分批處理。如果為 None，則如果影像在 CPU 上，將設定為 True，否則設定為 False。此選擇基於經驗觀察，詳情請參閱：https://github.com/huggingface/transformers/pull/38157

<class 'transformers.image_processing_base.BatchFeature'>

data (dict) — 由 call 方法返回的列表/陣列/張量字典（“pixel_values”等）。
tensor_type (Union[None, str, TensorType], 可選) — 您可以在此處提供一個`tensor_type`，以便在初始化時將整數列表轉換為PyTorch/TensorFlow/Numpy張量。

PerceiverTextPreprocessor

class transformers.models.perceiver.modeling_perceiver.PerceiverTextPreprocessor

( config: PerceiverConfig )

引數

config (PerceiverConfig) — 模型配置。

Perceiver 編碼器的文字預處理。可用於嵌入 inputs 並新增位置編碼。

嵌入的維度由配置的 d_model 屬性決定。

PerceiverImagePreprocessor

class transformers.models.perceiver.modeling_perceiver.PerceiverImagePreprocessor

( config prep_type = 'conv' spatial_downsample: int = 4 temporal_downsample: int = 1 position_encoding_type: str = 'fourier' in_channels: int = 3 out_channels: int = 64 conv_after_patching: bool = False conv_after_patching_in_channels: int = 54 conv2d_use_batchnorm: bool = True concat_or_add_pos: str = 'concat' project_pos_dim: int = -1 **position_encoding_kwargs )

引數

config ([PerceiverConfig]) — 模型配置。
prep_type (str, optional, 預設為 "conv") — 預處理型別。可以是 “conv1x1”、“conv”、“patches” 或 “pixels”。
spatial_downsample (int, optional, 預設為 4) — 空間下采樣因子。
temporal_downsample (int, optional, 預設為 1) — 時間下采樣因子（僅在存在時間維度時相關）。
position_encoding_type (str, optional, 預設為 "fourier") — 位置編碼型別。可以是“fourier”或“trainable”。
in_channels (int, optional, 預設為 3) — 輸入中的通道數。
out_channels (int, optional, 預設為 64) — 輸出中的通道數。
conv_after_patching (bool, optional, 預設為 False) — 打補丁後是否應用卷積層。
conv_after_patching_in_channels (int, optional, 預設為 54) — 打補丁後卷積層輸入的通道數。
conv2d_use_batchnorm (bool, optional, 預設為 True) — 是否在卷積層中使用批歸一化。
concat_or_add_pos (str, optional, 預設為 "concat") — 如何將位置編碼連線到輸入。可以是“concat”或“add”。
project_pos_dim (int, optional, 預設為 -1) — 要投影到的位置編碼的維度。如果為 -1，則不應用投影。
**position_encoding_kwargs (Dict, optional) — 位置編碼的關鍵字引數。

Perceiver 編碼器的影像預處理。

注意：如果 prep_type 設定為 “conv1x1” 或 “conv”，則 out_channels 引數指的是卷積層的輸出通道。如果新增絕對位置嵌入，則必須確保位置編碼 kwargs 的 num_channels 設定等於 out_channels。

PerceiverOneHotPreprocessor

class transformers.models.perceiver.modeling_perceiver.PerceiverOneHotPreprocessor

( config: PerceiverConfig )

引數

config (PerceiverConfig) — 模型配置。

Perceiver 編碼器的一熱預處理器。可用於向輸入新增虛擬索引維度。

PerceiverAudioPreprocessor

class transformers.models.perceiver.modeling_perceiver.PerceiverAudioPreprocessor

( config prep_type: str = 'patches' samples_per_patch: int = 96 position_encoding_type: str = 'fourier' concat_or_add_pos: str = 'concat' out_channels = 64 project_pos_dim = -1 **position_encoding_kwargs )

引數

config ([PerceiverConfig]) — 模型配置。
prep_type (str, optional, 預設為 "patches") — 要使用的預處理器型別。僅支援“patches”。
samples_per_patch (int, optional, 預設為 96) — 每個補丁的樣本數。
position_encoding_type (str, optional, 預設為 "fourier") — 要使用的位置編碼型別。可以是“trainable”或“fourier”。
concat_or_add_pos (str, optional, 預設為 "concat") — 如何將位置編碼連線到輸入。可以是“concat”或“add”。
out_channels (int, optional, 預設為 64) — 輸出中的通道數。
project_pos_dim (int, optional, 預設為 -1) — 要投影到的位置編碼的維度。如果為 -1，則不應用投影。
**position_encoding_kwargs (Dict, optional) — 位置編碼的關鍵字引數。

Perceiver 編碼器的音訊預處理。

PerceiverMultimodalPreprocessor

class transformers.models.perceiver.modeling_perceiver.PerceiverMultimodalPreprocessor

( modalities: Mapping mask_probs: typing.Optional[collections.abc.Mapping[str, float]] = None min_padding_size: int = 2 )

引數

modalities (Mapping[str, PreprocessorType]) — 字典，將模態名稱對映到預處理器。
mask_probs (dict[str, float]) — 字典，將模態名稱對映到該模態的掩碼機率。
min_padding_size (int, optional, 預設為 2) — 所有模態的最小填充大小。最終輸出的通道數將等於所有模態的最大通道數加上 min_padding_size。

Perceiver 編碼器的多模態預處理。

對每個模態的輸入進行預處理，然後用可訓練的位置嵌入進行填充，使其具有相同的通道數。

PerceiverProjectionDecoder

class transformers.models.perceiver.modeling_perceiver.PerceiverProjectionDecoder

( config )

引數

config (PerceiverConfig) — 模型配置。

基線投影解碼器（無交叉注意力）。

PerceiverBasicDecoder

class transformers.models.perceiver.modeling_perceiver.PerceiverBasicDecoder