特徵提取器

特徵提取器負責為音訊或視覺模型準備輸入特徵。這包括從序列中提取特徵（例如，預處理音訊檔案以生成對數梅爾頻譜圖特徵）、從影像中提取特徵（例如，裁剪影像檔案），以及進行填充、歸一化，並轉換為 NumPy、PyTorch 和 TensorFlow 張量。

FeatureExtractionMixin

class transformers.FeatureExtractionMixin

< 原始檔 >

( **kwargs )

這是一個特徵提取的 mixin（混入類），用於為序列和影像特徵提取器提供儲存/載入功能。

from_pretrained

< 原始檔 >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] cache_dir: typing.Union[str, os.PathLike, NoneType] = None force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' **kwargs )

引數

pretrained_model_name_or_path (str 或 os.PathLike) — 可以是以下之一：
- 一個字串，即託管在 huggingface.co 模型倉庫中的預訓練 feature_extractor 的 *model id*。
- 一個包含使用 save_pretrained() 方法儲存的特徵提取器檔案的 *目錄* 的路徑，例如 ./my_model_directory/。
- 一個指向已儲存的特徵提取器 JSON *檔案* 的路徑或 URL，例如 ./my_model_directory/preprocessor_config.json。
cache_dir (str 或 os.PathLike, *可選*) — 如果不使用標準快取目錄，則指定一個目錄路徑，用於快取下載的預訓練模型特徵提取器。
force_download (bool, *可選*, 預設為 False) — 是否強制（重新）下載特徵提取器檔案並覆蓋已存在的快取版本。
resume_download — 已棄用並忽略。現在所有下載在可能時都會預設恢復。將在 Transformers v5 版本中移除。
proxies (dict[str, str], *可選*) — 一個根據協議或端點使用的代理伺服器字典，例如 {'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}。代理將在每個請求中使用。
token (str 或 bool, *可選*) — 用於遠端檔案的 HTTP Bearer 授權的令牌。如果為 True 或未指定，將使用執行 huggingface-cli login 時生成的令牌（儲存在 ~/.huggingface）。
revision (str, *可選*, 預設為 "main") — 要使用的特定模型版本。它可以是分支名、標籤名或提交 ID，因為我們在 huggingface.co 上使用基於 git 的系統來儲存模型和其他檔案，所以 revision 可以是 git 允許的任何識別符號。

從特徵提取器例項化一個 FeatureExtractionMixin 的子類，例如一個 SequenceFeatureExtractor 的派生類。

示例

# We can't instantiate directly the base class *FeatureExtractionMixin* nor *SequenceFeatureExtractor* so let's show the examples on a
# derived class: *Wav2Vec2FeatureExtractor*
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h"
)  # Download feature_extraction_config from huggingface.co and cache.
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "./test/saved_model/"
)  # E.g. feature_extractor (or model) was saved using *save_pretrained('./test/saved_model/')*
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("./test/saved_model/preprocessor_config.json")
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False
)
assert feature_extractor.return_attention_mask is False
feature_extractor, unused_kwargs = Wav2Vec2FeatureExtractor.from_pretrained(
    "facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False, return_unused_kwargs=True
)
assert feature_extractor.return_attention_mask is False
assert unused_kwargs == {"foo": False}

save_pretrained

< 原始檔 >

( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )

引數

save_directory (str 或 os.PathLike) — 將儲存特徵提取器 JSON 檔案的目錄（如果不存在，將被建立）。
push_to_hub (bool, *可選*, 預設為 False) — 儲存模型後是否將其推送到 Hugging Face 模型中心。您可以使用 repo_id 指定要推送到的倉庫（預設為您名稱空間中 save_directory 的名稱）。
kwargs (dict[str, Any], *可選*) — 傳遞給 push_to_hub() 方法的其他關鍵字引數。

將 feature_extractor 物件儲存到目錄 save_directory 中，以便可以使用 from_pretrained() 類方法重新載入。

SequenceFeatureExtractor

class transformers.SequenceFeatureExtractor

< 原始檔 >

( feature_size: int sampling_rate: int padding_value: float **kwargs )

引數

feature_size (int) — 提取特徵的維度。
sampling_rate (int) — 音訊檔案數字化的取樣率，以赫茲 (Hz) 表示。
padding_value (float) — 用於填充值/向量的值。

這是一個用於語音識別的通用特徵提取類。

pad

< 原始檔 >

( processed_features: typing.Union[transformers.feature_extraction_utils.BatchFeature, list[transformers.feature_extraction_utils.BatchFeature], dict[str, transformers.feature_extraction_utils.BatchFeature], dict[str, list[transformers.feature_extraction_utils.BatchFeature]], list[dict[str, transformers.feature_extraction_utils.BatchFeature]]] padding: typing.Union[bool, str, transformers.utils.generic.PaddingStrategy] = True max_length: typing.Optional[int] = None truncation: bool = False pad_to_multiple_of: typing.Optional[int] = None return_attention_mask: typing.Optional[bool] = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )

引數

processed_features (BatchFeature, BatchFeature 列表, dict[str, list[float]], dict[str, list[list[float]] 或 list[dict[str, list[float]]]) — 已處理的輸入。可以表示單個輸入（BatchFeature 或 dict[str, list[float]]）或一批輸入值/向量（BatchFeature 列表，*dict[str, list[list[float]]]* 或 *list[dict[str, list[float]]]*），因此您可以在預處理期間以及在 PyTorch Dataloader 的 collate 函式中使用此方法。

除了 list[float]，您也可以使用張量（numpy 陣列、PyTorch 張量或 TensorFlow 張量），請參閱上面關於返回型別的說明。
padding (bool, str 或 PaddingStrategy, *可選*, 預設為 True) — 選擇一種策略來填充返回的序列（根據模型的填充方向和填充索引）：
- True 或 'longest'：填充到批次中最長的序列（如果只提供單個序列則不填充）。
- 'max_length'：填充到由 max_length 引數指定的最大長度，或者如果未提供該引數，則填充到模型可接受的最大輸入長度。
- False 或 'do_not_pad' (預設)：不進行填充（即，可以輸出一個具有不同長度序列的批次）。
max_length (int, *可選*) — 返回列表的最大長度，以及可選的填充長度（見上文）。
truncation (bool) — 啟用截斷功能，將長於 max_length 的輸入序列截斷至 max_length。
pad_to_multiple_of (int, *可選*) — 如果設定，將把序列填充到所提供值的倍數。

這對於在計算能力 >= 7.5（Volta）的 NVIDIA 硬體上啟用 Tensor Cores，或者在 TPU 上特別有用，因為這些裝置受益於序列長度是 128 的倍數。
return_attention_mask (bool, *可選*) — 是否返回注意力掩碼。如果保留為預設值，將根據特定 feature_extractor 的預設設定返回注意力掩碼。

什麼是注意力掩碼？
return_tensors (str 或 TensorType, *可選*) — 如果設定，將返回張量而不是 Python 整數列表。可接受的值為：
- 'tf'：返回 TensorFlow tf.constant 物件。
- 'pt'：返回 PyTorch torch.Tensor 物件。
- 'np'：返回 Numpy np.ndarray 物件。

將輸入值/輸入向量或一批輸入值/輸入向量填充到預定義的長度或批次中的最大序列長度。

填充方向（左/右）和填充值在特徵提取器級別定義（透過 self.padding_side, self.padding_value）

如果傳入的 processed_features 是 numpy 陣列、PyTorch 張量或 TensorFlow 張量的字典，則結果將使用相同的型別，除非您使用 return_tensors 提供不同的張量型別。但在 PyTorch 張量的情況下，您將丟失張量的特定裝置資訊。

BatchFeature

class transformers.BatchFeature

< 原始檔 >

( data: typing.Optional[dict[str, typing.Any]] = None tensor_type: typing.Union[NoneType, str, transformers.utils.generic.TensorType] = None )

引數

data (dict, *可選*) — 由 call/pad 方法返回的列表/陣列/張量字典（‘input_values’, ‘attention_mask’ 等）。
tensor_type (Union[None, str, TensorType], *可選*) — 您可以在此處提供一個 tensor_type，以在初始化時將整數列表轉換為 PyTorch/TensorFlow/Numpy 張量。

儲存 pad() 和特定於特徵提取器的 __call__ 方法的輸出。

這個類派生自 Python 字典，可以像字典一樣使用。

convert_to_tensors

< 原始檔 >

( tensor_type: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )

引數

tensor_type (str 或 TensorType, *可選*) — 要使用的張量型別。如果為 str，則應為列舉 TensorType 的值之一。如果為 None，則不進行修改。

將內部內容轉換為張量。

到

< 原始檔 >

( *args **kwargs ) → BatchFeature

引數

args (Tuple) — 將傳遞給張量的 to(...) 函式。
kwargs (Dict, *可選*) — 將傳遞給張量的 to(...) 函式。要啟用非同步資料傳輸，請在 kwargs 中設定 non_blocking 標誌（預設為 False）。

批次特徵

修改後的同一例項。

透過呼叫 v.to(*args, **kwargs) 將所有值傳送到裝置（僅限 PyTorch）。這應該支援轉換為不同的 dtypes 並將 BatchFeature 傳送到不同的 device。

ImageFeatureExtractionMixin

class transformers.ImageFeatureExtractionMixin

< 原始檔 >

( )

包含準備影像特徵實用程式的 Mixin (混入類)。

center_crop

< 原始檔 >

( image size ) → new_image

引數

image (PIL.Image.Image 或 np.ndarray 或形狀為 (n_channels, height, width) 或 (height, width, n_channels) 的 torch.Tensor) — 要調整大小的影像。
size (int 或 tuple[int, int]) — 裁剪影像的目標尺寸。

新影像

一個經過中心裁剪的 PIL.Image.Image、np.ndarray 或 torch.Tensor，形狀為：(n_channels, height, width)。

使用中心裁剪將 image 裁剪至給定尺寸。請注意，如果影像太小無法裁剪到給定尺寸，它將被填充（因此返回的結果具有所要求的尺寸）。

convert_rgb

< 原始碼 >

( image )

引數

image (PIL.Image.Image) — 要轉換的影像。

將 PIL.Image.Image 轉換為 RGB 格式。

expand_dims

< 原始碼 >

( image )

引數

image (PIL.Image.Image 或 np.ndarray 或 torch.Tensor) — 要擴充套件的影像。

將二維 image 擴充套件為三維。

flip_channel_order

< 原始碼 >

( image )

引數

image (PIL.Image.Image 或 np.ndarray 或 torch.Tensor) — 要翻轉顏色通道的影像。如果型別是 np.ndarray 或 torch.Tensor，通道維度應為第一個維度。

翻轉 image 的通道順序，從 RGB 變為 BGR，或從 BGR 變為 RGB。注意，如果 image 是一個 PIL 影像，此操作將觸發其向 NumPy 陣列的轉換。

歸一化

< 原始碼 >

( image mean std rescale = False )

引數

image (PIL.Image.Image 或 np.ndarray 或 torch.Tensor) — 要歸一化的影像。
mean (list[float] 或 np.ndarray 或 torch.Tensor) — 用於歸一化的均值（按通道）。
std (list[float] 或 np.ndarray 或 torch.Tensor) — 用於歸一化的標準差（按通道）。
rescale (bool, 可選, 預設為 False) — 是否將影像縮放到 0 到 1 之間。如果提供的是 PIL 影像，縮放將自動進行。

使用 mean 和 std 對 image 進行歸一化。注意，如果 image 是一個 PIL 影像，此操作將觸發其向 NumPy 陣列的轉換。

rescale

< 原始碼 >

( image: ndarray scale: typing.Union[float, int] )

按指定比例縮放 numpy 影像

resize

< 原始碼 >

( image size resample = None default_to_square = True max_size = None ) → image

引數

image (PIL.Image.Image 或 np.ndarray 或 torch.Tensor) — 要調整大小的影像。
size (int 或 tuple[int, int]) — 用於調整影像大小的尺寸。如果 size 是一個序列，如 (h, w)，則輸出尺寸將與此匹配。

如果 size 是一個整數且 default_to_square 為 True，那麼影像將被調整為 (size, size)。如果 size 是一個整數且 default_to_square 為 False，那麼影像較短的邊將與此數字匹配。即，如果 height > width，則影像將被重新縮放為 (size * height / width, size)。
resample (int, 可選, 預設為 PILImageResampling.BILINEAR) — 用於重取樣的濾波器。
default_to_square (bool, 可選, 預設為 True) — 如何轉換當 size 是單個整數時。如果設定為 True，size 將被轉換為正方形 (size,size)。如果設定為 False，將複製 torchvision.transforms.Resize 的行為，支援僅調整最短邊並提供可選的 max_size。
max_size (int, 可選, 預設為 None) — 調整大小後圖像較長邊的最大允許值：如果根據 size 調整大小後圖像的較長邊大於 max_size，則影像將再次調整大小，使較長邊等於 max_size。因此，size 可能會被覆蓋，即較短邊可能會比 size 更短。僅當 default_to_square 為 False 時使用。

影像

一個調整大小後的 PIL.Image.Image。

調整 image 的大小。強制將輸入轉換為 PIL.Image。

rotate

< 原始碼 >

( image angle resample = None expand = 0 center = None translate = None fillcolor = None ) → image

引數

image (PIL.Image.Image 或 np.ndarray 或 torch.Tensor) — 要旋轉的影像。如果是 np.ndarray 或 torch.Tensor，在旋轉前將被轉換為 PIL.Image.Image。

影像

一個旋轉後的 PIL.Image.Image。

返回一個 image 的旋轉副本。此方法返回一個圍繞其中心逆時針旋轉給定度數的 image 副本。

to_numpy_array

< 原始碼 >

( image rescale = None channel_first = True )

引數

image (PIL.Image.Image 或 np.ndarray 或 torch.Tensor) — 要轉換為 NumPy 陣列的影像。
rescale (bool, 可選) — 是否應用縮放因子（使畫素值成為 0. 到 1. 之間的浮點數）。如果影像是 PIL 影像或整數陣列/張量，則預設為 True，否則預設為 False。
channel_first (bool, 可選, 預設為 True) — 是否置換影像維度，將通道維度放在第一位。

將 image 轉換為 numpy 陣列。可選地對其進行縮放並將通道維度作為第一個維度。

to_pil_image

< 原始碼 >

( image rescale = None )

引數

image (PIL.Image.Image 或 numpy.ndarray 或 torch.Tensor) — 要轉換為 PIL 影像格式的影像。
rescale (bool, 可選) — 是否應用縮放因子（使畫素值成為 0 到 255 之間的整數）。如果影像型別是浮點型別，則預設為 True，否則預設為 False。

將 image 轉換為 PIL 影像。如果需要，可選擇性地對其進行縮放並將通道維度放回最後一個軸。

< > 在 GitHub 上更新

Transformers

特徵提取器

FeatureExtractionMixin

class transformers.FeatureExtractionMixin

from_pretrained

save_pretrained

SequenceFeatureExtractor

class transformers.SequenceFeatureExtractor

pad

BatchFeature

class transformers.BatchFeature

convert_to_tensors

到

ImageFeatureExtractionMixin

class transformers.ImageFeatureExtractionMixin

center_crop

convert_rgb

expand_dims

flip_channel_order

歸一化

rescale

resize

rotate

to_numpy_array

to_pil_image