Diffusers 文件

量化

Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

量化

量化技術透過使用8位整數（int8）等低精度資料型別表示權重和啟用，從而降低記憶體和計算成本。這使得載入通常無法放入記憶體的更大模型成為可能，並加速了推理。

瞭解如何在量化指南中量化模型。

流水線量化配置

class diffusers.PipelineQuantizationConfig

< 來源 >

( quant_backend: str = None quant_kwargs: typing.Dict[str, typing.Union[str, float, int, dict]] = None components_to_quantize: typing.Optional[typing.List[str]] = None quant_mapping: typing.Dict[str, typing.Union[diffusers.quantizers.quantization_config.QuantizationConfigMixin, ForwardRef('TransformersQuantConfigMixin')]] = None )

引數

quant_backend (str) — 要使用的量化後端。使用此選項時，我們假設後端同時適用於 `diffusers` 和 `transformers`。
quant_kwargs (dict) — 用於初始化量化後端類的引數。
components_to_quantize (list) — 要量化的流水線元件。
quant_mapping (dict) — 定義用於流水線元件的量化規範的對映。使用此引數時，使用者無需提供 `quant_backend`、`quant_kawargs` 和 `components_to_quantize`。

用於在from_pretrained()中動態應用量化時的配置類。

BitsAndBytesConfig

class diffusers.BitsAndBytesConfig

< 來源 >

( *args **kwargs )

GGUFQuantizationConfig

class diffusers.GGUFQuantizationConfig

< 來源 >

( *args **kwargs )

QuantoConfig

class diffusers.QuantoConfig

< 來源 >

( *args **kwargs )

TorchAoConfig

class diffusers.TorchAoConfig

< 來源 >

( *args **kwargs )

DiffusersQuantizer

class diffusers.DiffusersQuantizer

< 來源 >

( quantization_config: QuantizationConfigMixin **kwargs )

HuggingFace 量化器的抽象類。目前支援量化 HF diffusers 模型以進行推理和/或量化。此類別僅用於 diffusers.models.modeling_utils.ModelMixin.from_pretrained，目前無法輕鬆在此方法範圍之外使用。

屬性 quantization_config (`diffusers.quantizers.quantization_config.QuantizationConfigMixin`): 定義要量化的模型的量化引數的量化配置。modules_to_not_convert (`List[str]`, *可選*): 量化模型時要跳過轉換的模組名稱列表。required_packages (`List[str]`, *可選*): 使用量化器之前需要安裝的 pip 包列表。requires_calibration (`bool`): 量化方法是否需要在使用模型之前進行校準。

調整最大記憶體

< 來源 >

( max_memory: typing.Dict[str, typing.Union[int, str]] )

如果量化需要額外記憶體，則調整 infer_auto_device_map() 的 max_memory 引數

調整目標資料型別

< 來源 >

( torch_dtype: torch.dtype )

引數

torch_dtype (torch.dtype, *可選*) — 用於計算 `device_map` 的 `torch_dtype`。

如果您想調整用於 `from_pretrained` 的 `target_dtype` 變數，以在 `device_map` 為 `str` 的情況下計算 `device_map`，請覆蓋此方法。例如，對於 bitsandbytes，我們強制將 `target_dtype` 設定為 `torch.int8`，對於 4 位，我們傳遞一個自定義列舉 `accelerate.CustomDtype.int4`。

檢查量化引數

< 來源 >

( model: ModelMixin param_value: torch.Tensor param_name: str state_dict: typing.Dict[str, typing.Any] **kwargs )

檢查載入的狀態字典元件是否為量化引數的一部分 + 一些驗證；僅針對需要為量化建立新引數的量化方法定義。

檢查量化引數形狀

< 來源 >

( *args **kwargs )

檢查量化引數是否具有預期形狀。

建立量化引數

< 來源 >

( *args **kwargs )

從狀態字典中獲取所需元件並建立量化引數。

反量化

< 來源 >

( model )

可能會對模型進行反量化以檢索原始模型，但會損失一些精度/效能。請注意，並非所有量化方案都支援此功能。

獲取特殊資料型別更新

< 來源 >

( model torch_dtype: torch.dtype )

引數

model (~diffusers.models.modeling_utils.ModelMixin) — 要量化的模型
torch_dtype (torch.dtype) — 在 `from_pretrained` 方法中傳遞的 dtype。

返回未量化模組的資料型別 - 用於在 `device_map` 為 `str` 的情況下計算 `device_map`。此方法將使用在 `_process_model_before_weight_loading` 中修改的 `modules_to_not_convert`。`diffusers` 模型目前沒有任何 `modules_to_not_convert` 屬性，但這在未來可能會很快改變。