Diffusers 文件
StableAudioDiTModel
並獲得增強的文件體驗
開始使用
StableAudioDiT模型
來自 Stable Audio Open 的音訊波形 Transformer 模型。
StableAudioDiTModel
類 diffusers.StableAudioDiTModel
< 來源 >( sample_size: int = 1024 in_channels: int = 64 num_layers: int = 24 attention_head_dim: int = 64 num_attention_heads: int = 24 num_key_value_attention_heads: int = 12 out_channels: int = 64 cross_attention_dim: int = 768 time_proj_dim: int = 256 global_states_input_dim: int = 1536 cross_attention_input_dim: int = 768 )
引數
- sample_size (
int
, 可選, 預設為 1024) — 輸入樣本的大小。 - in_channels (
int
, 可選, 預設為 64) — 輸入中的通道數。 - num_layers (
int
, 可選, 預設為 24) — 要使用的 Transformer 塊層數。 - attention_head_dim (
int
, 可選, 預設為 64) — 每個頭的通道數。 - num_attention_heads (
int
, 可選, 預設為 24) — 用於查詢狀態的頭數。 - num_key_value_attention_heads (
int
, 可選, 預設為 12) — 用於鍵和值狀態的頭數。 - out_channels (
int
, 預設為 64) — 輸出通道數。 - cross_attention_dim (
int
, 可選, 預設為 768) — 交叉注意力投影的維度。 - time_proj_dim (
int
, 可選, 預設為 256) — 時間步內部投影的維度。 - global_states_input_dim (
int
, 可選, 預設為 1536) — 全域性隱藏狀態投影的輸入維度。 - cross_attention_input_dim (
int
, 可選, 預設為 768) — 交叉注意力投影的輸入維度
Stable Audio 中引入的擴散變換器模型。
參考:https://github.com/Stability-AI/stable-audio-tools
forward
< 來源 >( hidden_states: FloatTensor timestep: LongTensor = None encoder_hidden_states: FloatTensor = None global_hidden_states: FloatTensor = None rotary_embedding: FloatTensor = None return_dict: bool = True attention_mask: typing.Optional[torch.LongTensor] = None encoder_attention_mask: typing.Optional[torch.LongTensor] = None )
引數
- hidden_states (形狀為
(batch size, in_channels, sequence_len)
的torch.FloatTensor
) — 輸入hidden_states
。 - timestep (
torch.LongTensor
) — 用於指示去噪步驟。 - encoder_hidden_states (形狀為
(batch size, encoder_sequence_len, cross_attention_input_dim)
的torch.FloatTensor
) — 要使用的條件嵌入(從提示等輸入條件計算出的嵌入)。 - global_hidden_states (形狀為
(batch size, global_sequence_len, global_states_input_dim)
的torch.FloatTensor
) — 將新增到隱藏狀態前面的全域性嵌入。 - rotary_embedding (
torch.Tensor
) — 在注意力計算期間應用於查詢和鍵張量的旋轉嵌入。 - return_dict (
bool
, 可選, 預設為True
) — 是否返回~models.transformer_2d.Transformer2DModelOutput
而不是普通元組。 - attention_mask (形狀為
(batch_size, sequence_len)
的torch.Tensor
, 可選) — 用於避免對填充令牌索引執行注意力的掩碼,透過連線兩個文字編碼器的注意力掩碼形成。掩碼值在[0, 1]
中選擇:- 1 表示**未被遮蓋**的令牌,
- 0 表示**被遮蓋**的令牌。
- encoder_attention_mask (形狀為
(batch_size, sequence_len)
的torch.Tensor
, 可選) — 用於避免對填充令牌交叉注意力索引執行注意力的掩碼,透過連線兩個文字編碼器的注意力掩碼形成。掩碼值在[0, 1]
中選擇:- 1 表示**未被遮蓋**的令牌,
- 0 表示**被遮蓋**的令牌。
StableAudioDiTModel 的 forward 方法。
設定注意力處理器
< 來源 >( processor: typing.Union[diffusers.models.attention_processor.AttnProcessor, diffusers.models.attention_processor.CustomDiffusionAttnProcessor, diffusers.models.attention_processor.AttnAddedKVProcessor, diffusers.models.attention_processor.AttnAddedKVProcessor2_0, diffusers.models.attention_processor.JointAttnProcessor2_0, diffusers.models.attention_processor.PAGJointAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGJointAttnProcessor2_0, diffusers.models.attention_processor.FusedJointAttnProcessor2_0, diffusers.models.attention_processor.AllegroAttnProcessor2_0, diffusers.models.attention_processor.AuraFlowAttnProcessor2_0, diffusers.models.attention_processor.FusedAuraFlowAttnProcessor2_0, diffusers.models.attention_processor.FluxAttnProcessor2_0, diffusers.models.attention_processor.FluxAttnProcessor2_0_NPU, diffusers.models.attention_processor.FusedFluxAttnProcessor2_0, diffusers.models.attention_processor.FusedFluxAttnProcessor2_0_NPU, diffusers.models.attention_processor.CogVideoXAttnProcessor2_0, diffusers.models.attention_processor.FusedCogVideoXAttnProcessor2_0, diffusers.models.attention_processor.XFormersAttnAddedKVProcessor, diffusers.models.attention_processor.XFormersAttnProcessor, diffusers.models.attention_processor.XLAFlashAttnProcessor2_0, diffusers.models.attention_processor.AttnProcessorNPU, diffusers.models.attention_processor.AttnProcessor2_0, diffusers.models.attention_processor.MochiVaeAttnProcessor2_0, diffusers.models.attention_processor.MochiAttnProcessor2_0, diffusers.models.attention_processor.StableAudioAttnProcessor2_0, diffusers.models.attention_processor.HunyuanAttnProcessor2_0, diffusers.models.attention_processor.FusedHunyuanAttnProcessor2_0, diffusers.models.attention_processor.PAGHunyuanAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGHunyuanAttnProcessor2_0, diffusers.models.attention_processor.LuminaAttnProcessor2_0, diffusers.models.attention_processor.FusedAttnProcessor2_0, diffusers.models.attention_processor.CustomDiffusionXFormersAttnProcessor, diffusers.models.attention_processor.CustomDiffusionAttnProcessor2_0, diffusers.models.attention_processor.SlicedAttnProcessor, diffusers.models.attention_processor.SlicedAttnAddedKVProcessor, diffusers.models.attention_processor.SanaLinearAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGSanaLinearAttnProcessor2_0, diffusers.models.attention_processor.PAGIdentitySanaLinearAttnProcessor2_0, diffusers.models.attention_processor.SanaMultiscaleLinearAttention, diffusers.models.attention_processor.SanaMultiscaleAttnProcessor2_0, diffusers.models.attention_processor.SanaMultiscaleAttentionProjection, diffusers.models.attention_processor.IPAdapterAttnProcessor, diffusers.models.attention_processor.IPAdapterAttnProcessor2_0, diffusers.models.attention_processor.IPAdapterXFormersAttnProcessor, diffusers.models.attention_processor.SD3IPAdapterJointAttnProcessor2_0, diffusers.models.attention_processor.PAGIdentitySelfAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGIdentitySelfAttnProcessor2_0, diffusers.models.attention_processor.LoRAAttnProcessor, diffusers.models.attention_processor.LoRAAttnProcessor2_0, diffusers.models.attention_processor.LoRAXFormersAttnProcessor, diffusers.models.attention_processor.LoRAAttnAddedKVProcessor, typing.Dict[str, typing.Union[diffusers.models.attention_processor.AttnProcessor, diffusers.models.attention_processor.CustomDiffusionAttnProcessor, diffusers.models.attention_processor.AttnAddedKVProcessor, diffusers.models.attention_processor.AttnAddedKVProcessor2_0, diffusers.models.attention_processor.JointAttnProcessor2_0, diffusers.models.attention_processor.PAGJointAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGJointAttnProcessor2_0, diffusers.models.attention_processor.FusedJointAttnProcessor2_0, diffusers.models.attention_processor.AllegroAttnProcessor2_0, diffusers.models.attention_processor.AuraFlowAttnProcessor2_0, diffusers.models.attention_processor.FusedAuraFlowAttnProcessor2_0, diffusers.models.attention_processor.FluxAttnProcessor2_0, diffusers.models.attention_processor.FluxAttnProcessor2_0_NPU, diffusers.models.attention_processor.FusedFluxAttnProcessor2_0, diffusers.models.attention_processor.FusedFluxAttnProcessor2_0_NPU, diffusers.models.attention_processor.CogVideoXAttnProcessor2_0, diffusers.models.attention_processor.FusedCogVideoXAttnProcessor2_0, diffusers.models.attention_processor.XFormersAttnAddedKVProcessor, diffusers.models.attention_processor.XFormersAttnProcessor, diffusers.models.attention_processor.XLAFlashAttnProcessor2_0, diffusers.models.attention_processor.AttnProcessorNPU, diffusers.models.attention_processor.AttnProcessor2_0, diffusers.models.attention_processor.MochiVaeAttnProcessor2_0, diffusers.models.attention_processor.MochiAttnProcessor2_0, diffusers.models.attention_processor.StableAudioAttnProcessor2_0, diffusers.models.attention_processor.HunyuanAttnProcessor2_0, diffusers.models.attention_processor.FusedHunyuanAttnProcessor2_0, diffusers.models.attention_processor.PAGHunyuanAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGHunyuanAttnProcessor2_0, diffusers.models.attention_processor.LuminaAttnProcessor2_0, diffusers.models.attention_processor.FusedAttnProcessor2_0, diffusers.models.attention_processor.CustomDiffusionXFormersAttnProcessor, diffusers.models.attention_processor.CustomDiffusionAttnProcessor2_0, diffusers.models.attention_processor.SlicedAttnProcessor, diffusers.models.attention_processor.SlicedAttnAddedKVProcessor, diffusers.models.attention_processor.SanaLinearAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGSanaLinearAttnProcessor2_0, diffusers.models.attention_processor.PAGIdentitySanaLinearAttnProcessor2_0, diffusers.models.attention_processor.SanaMultiscaleLinearAttention, diffusers.models.attention_processor.SanaMultiscaleAttnProcessor2_0, diffusers.models.attention_processor.SanaMultiscaleAttentionProjection, diffusers.models.attention_processor.IPAdapterAttnProcessor, diffusers.models.attention_processor.IPAdapterAttnProcessor2_0, diffusers.models.attention_processor.IPAdapterXFormersAttnProcessor, diffusers.models.attention_processor.SD3IPAdapterJointAttnProcessor2_0, diffusers.models.attention_processor.PAGIdentitySelfAttnProcessor2_0, diffusers.models.attention_processor.PAGCFGIdentitySelfAttnProcessor2_0, diffusers.models.attention_processor.LoRAAttnProcessor, diffusers.models.attention_processor.LoRAAttnProcessor2_0, diffusers.models.attention_processor.LoRAXFormersAttnProcessor, diffusers.models.attention_processor.LoRAAttnAddedKVProcessor]]] )
設定用於計算注意力的注意力處理器。
停用自定義注意力處理器並設定預設注意力實現。