Diffusers 文件

Chroma

Diffusers

加入 Hugging Face 社群

並獲得增強的文件體驗

在模型、資料集和 Spaces 上進行協作

透過加速推理獲得更快的示例

切換文件主題

開始使用

Chroma

Chroma 是一個基於 Flux 的文字到影像生成模型。

Chroma 的原始模型檢查點可以在此處找到。

Chroma 可以使用與 Flux 相同的所有最佳化。

推理

Diffusers 版 Chroma 基於原始模型的unlocked-v37版本，可在Chroma 倉庫中獲取。

import torch
from diffusers import ChromaPipeline

pipe = ChromaPipeline.from_pretrained("lodestones/Chroma", torch_dtype=torch.bfloat16)
pipe.enabe_model_cpu_offload()

prompt = [
    "A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
]
negative_prompt =  ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    generator=torch.Generator("cpu").manual_seed(433),
    num_inference_steps=40,
    guidance_scale=3.0,
    num_images_per_prompt=1,
).images[0]
image.save("chroma.png")

從單個檔案載入

要使用未採用 Diffusers 格式的更新模型檢查點，可以使用 ChromaTransformer2DModel 類從原始格式的單個檔案載入模型。當嘗試載入社群釋出的微調或量化版模型時，這同樣有用。

以下示例演示瞭如何從單個檔案執行 Chroma。

然後執行以下示例

import torch
from diffusers import ChromaTransformer2DModel, ChromaPipeline

model_id = "lodestones/Chroma"
dtype = torch.bfloat16

transformer = ChromaTransformer2DModel.from_single_file("https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v37.safetensors", torch_dtype=dtype)

pipe = ChromaPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=dtype)
pipe.enable_model_cpu_offload()

prompt = [
    "A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
]
negative_prompt =  ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    generator=torch.Generator("cpu").manual_seed(433),
    num_inference_steps=40,
    guidance_scale=3.0,
).images[0]

image.save("chroma-single-file.png")

ChromaPipeline

class diffusers.ChromaPipeline

< 來源 >

( scheduler: FlowMatchEulerDiscreteScheduler vae: AutoencoderKL text_encoder: T5EncoderModel tokenizer: T5TokenizerFast transformer: ChromaTransformer2DModel image_encoder: CLIPVisionModelWithProjection = None feature_extractor: CLIPImageProcessor = None )

引數

transformer (ChromaTransformer2DModel) — 用於去噪編碼影像潛在值的條件 Transformer (MMDiT) 架構。
scheduler (FlowMatchEulerDiscreteScheduler) — 用於與 transformer 結合去噪編碼影像潛在值的排程器。
vae (AutoencoderKL) — 用於將影像編碼和解碼為潛在表示的變分自編碼器（VAE）模型
text_encoder (T5EncoderModel) — T5，特別是google/t5-v1_1-xxl 變體。
tokenizer (T5TokenizerFast) — T5TokenizerFast 類的第二個分詞器。

用於文字到影像生成的 Chroma 管道。

參考：https://huggingface.co/lodestones/Chroma/

call

< 來源 >

( prompt: typing.Union[str, typing.List[str]] = None negative_prompt: typing.Union[str, typing.List[str]] = None height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: int = 35 sigmas: typing.Optional[typing.List[float]] = None guidance_scale: float = 5.0 num_images_per_prompt: typing.Optional[int] = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None prompt_embeds: typing.Optional[torch.Tensor] = None ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None ip_adapter_image_embeds: typing.Optional[typing.List[torch.Tensor]] = None negative_ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None negative_ip_adapter_image_embeds: typing.Optional[typing.List[torch.Tensor]] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None prompt_attention_mask: typing.Optional[torch.Tensor] = None negative_prompt_attention_mask: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True joint_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] max_sequence_length: int = 512 ) → ~pipelines.chroma.ChromaPipelineOutput 或 tuple

引數

prompt (str 或 List[str], 可選) — 用於引導影像生成的提示詞。如果未定義，則必須傳遞 prompt_embeds。
negative_prompt (str 或 List[str], 可選) — 不用於引導影像生成的提示詞。如果未定義，則必須傳遞 negative_prompt_embeds。當不使用引導時（即，如果 guidance_scale 不大於 1），此引數將被忽略。
height (int, 可選, 預設為 self.unet.config.sample_size * self.vae_scale_factor) — 生成影像的高度（畫素）。為獲得最佳效果，此引數預設為 1024。
width (int, 可選, 預設為 self.unet.config.sample_size * self.vae_scale_factor) — 生成影像的寬度（畫素）。為獲得最佳效果，此引數預設為 1024。
num_inference_steps (int, 可選, 預設為 50) — 去噪步數。更多的去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
sigmas (List[float], 可選) — 用於去噪過程的自定義 sigmas，適用於支援其 set_timesteps 方法中 sigmas 引數的排程器。如果未定義，將使用傳遞 num_inference_steps 時的預設行為。
guidance_scale (float, 可選, 預設為 3.5) — Classifier-Free Diffusion Guidance 中定義的引導比例。guidance_scale 定義為 Imagen 論文中公式 2 的 w。透過設定 guidance_scale > 1 啟用引導比例。較高的引導比例鼓勵生成與文字 prompt 緊密相關的影像，但通常會降低影像質量。
num_images_per_prompt (int, 可選, 預設為 1) — 每個提示詞生成的影像數量。
generator (torch.Generator 或 List[torch.Generator], 可選) — 一個或多個 torch 生成器，用於使生成具有確定性。
latents (torch.Tensor, 可選) — 預生成的帶噪聲的潛在值，從高斯分佈中取樣，用作影像生成的輸入。可用於使用不同提示詞調整同一生成。如果未提供，將使用提供的隨機 generator 取樣生成潛在張量。
prompt_embeds (torch.Tensor, 可選) — 預生成的文字嵌入。可用於輕鬆調整文字輸入，例如提示詞權重。如果未提供，將從 prompt 輸入引數生成文字嵌入。
ip_adapter_image — (PipelineImageInput, 可選): 與 IP 介面卡配合使用的可選影像輸入。
ip_adapter_image_embeds (List[torch.Tensor], 可選) — IP-Adapter 的預生成影像嵌入。它應該是一個列表，長度與 IP 介面卡數量相同。每個元素應該是一個形狀為 (batch_size, num_images, emb_dim) 的張量。如果未提供，嵌入將從 ip_adapter_image 輸入引數計算。
negative_ip_adapter_image — （PipelineImageInput，可選）：與IP Adapter配合使用的可選影像輸入。
negative_ip_adapter_image_embeds (List[torch.Tensor]，可選) — 預生成的IP-Adapter影像嵌入。它應該是一個列表，長度與IP-adapter的數量相同。每個元素應該是一個形狀為(batch_size, num_images, emb_dim)的張量。如果未提供，則根據ip_adapter_image輸入引數計算嵌入。
negative_prompt_embeds (torch.Tensor，可選) — 預生成的負文字嵌入。可用於輕鬆調整文字輸入，例如提示詞權重。如果未提供，將根據negative_prompt輸入引數生成negative_prompt_embeds。
prompt_attention_mask (torch.Tensor，可選) — 提示詞嵌入的注意力掩碼。用於遮蓋提示詞序列中的填充標記。Chroma要求一個填充標記保持未遮蓋狀態。請參閱https://huggingface.co/lodestones/Chroma#tldr-masking-t5-padding-tokens-enhanced-fidelity-and-increased-stability-during-training
negative_prompt_attention_mask (torch.Tensor，可選) — 負提示詞嵌入的注意力掩碼。用於遮蓋負提示詞序列中的填充標記。Chroma要求一個填充標記保持未遮蓋狀態。請參閱https://huggingface.co/lodestones/Chroma#tldr-masking-t5-padding-tokens-enhanced-fidelity-and-increased-stability-during-training
output_type (str，可選，預設為"pil") — 生成影像的輸出格式。在PIL: PIL.Image.Image或np.array之間選擇。
return_dict (bool，可選，預設為True) — 是否返回~pipelines.flux.ChromaPipelineOutput而不是普通元組。
joint_attention_kwargs (dict，可選) — 如果指定，則將作為kwargs字典傳遞給AttentionProcessor，其定義在diffusers.models.attention_processor中的self.processor下。
callback_on_step_end (Callable，可選) — 一個在推理過程中每個去噪步驟結束時呼叫的函式。該函式將使用以下引數呼叫：callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)。callback_kwargs將包含callback_on_step_end_tensor_inputs中指定的所有張量列表。
callback_on_step_end_tensor_inputs (List，可選) — callback_on_step_end函式的張量輸入列表。列表中指定的張量將作為callback_kwargs引數傳遞。您只能包含在管道類的._callback_tensor_inputs屬性中列出的變數。
max_sequence_length (int，預設為512) — 與prompt一起使用的最大序列長度。

~pipelines.chroma.ChromaPipelineOutput 或 tuple

如果return_dict為True，則為~pipelines.chroma.ChromaPipelineOutput，否則為tuple。當返回元組時，第一個元素是生成的影像列表。

呼叫管道進行生成時呼叫的函式。

示例

>>> import torch
>>> from diffusers import ChromaPipeline

>>> model_id = "lodestones/Chroma"
>>> ckpt_path = "https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v37.safetensors"
>>> transformer = ChromaTransformer2DModel.from_single_file(ckpt_path, torch_dtype=torch.bfloat16)
>>> pipe = ChromaPipeline.from_pretrained(
...     model_id,
...     transformer=transformer,
...     torch_dtype=torch.bfloat16,
... )
>>> pipe.enable_model_cpu_offload()
>>> prompt = [
...     "A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
... ]
>>> negative_prompt = [
...     "low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"
... ]
>>> image = pipe(prompt, negative_prompt=negative_prompt).images[0]
>>> image.save("chroma.png")

disable_vae_slicing

< 源 >

( )

停用切片 VAE 解碼。如果之前啟用了 enable_vae_slicing，此方法將返回一步計算解碼。

disable_vae_tiling

< 源 >

( )

停用平鋪 VAE 解碼。如果之前啟用了 enable_vae_tiling，此方法將恢復一步計算解碼。

enable_vae_slicing

< 源 >

( )

啟用切片 VAE 解碼。啟用此選項後，VAE 會將輸入張量分片，分步計算解碼。這有助於節省一些記憶體並允許更大的批次大小。

enable_vae_tiling

< 源 >

( )

啟用平鋪 VAE 解碼。啟用此選項後，VAE 將把輸入張量分割成瓦片，分多步計算編碼和解碼。這對於節省大量記憶體和處理更大的影像非常有用。

encode_prompt

< 源 >

( prompt: typing.Union[str, typing.List[str]] negative_prompt: typing.Union[str, typing.List[str]] = None device: typing.Optional[torch.device] = None num_images_per_prompt: int = 1 prompt_embeds: typing.Optional[torch.Tensor] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None prompt_attention_mask: typing.Optional[torch.Tensor] = None negative_prompt_attention_mask: typing.Optional[torch.Tensor] = None do_classifier_free_guidance: bool = True max_sequence_length: int = 512 lora_scale: typing.Optional[float] = None )

引數

prompt (str 或 List[str]，可選) — 要編碼的提示詞
negative_prompt (str 或 List[str]，可選) — 不用於引導影像生成的提示詞。如果未定義，則必須傳遞negative_prompt_embeds。當不使用引導時（即，如果guidance_scale小於1），則忽略此引數。
device — (torch.device)：torch裝置
num_images_per_prompt (int) — 每個提示詞應生成的影像數量
prompt_embeds (torch.Tensor，可選) — 預生成的文字嵌入。可用於輕鬆調整文字輸入，例如提示詞權重。如果未提供，將根據prompt輸入引數生成文字嵌入。
lora_scale (float，可選) — 應用於文字編碼器所有LoRA層的LoRA比例，如果已載入LoRA層。

ChromaImg2ImgPipeline

class diffusers.ChromaImg2ImgPipeline

< 源 >

引數

transformer (ChromaTransformer2DModel) — 用於去噪編碼影像潛在表示的條件Transformer（MMDiT）架構。
scheduler (FlowMatchEulerDiscreteScheduler) — 與transformer結合使用以去噪編碼影像潛在表示的排程器。
vae (AutoencoderKL) — 用於編碼和解碼影像到潛在表示的變分自編碼器（VAE）模型
text_encoder (T5EncoderModel) — T5，特別是google/t5-v1_1-xxl變體。
tokenizer (T5TokenizerFast) — T5TokenizerFast類的第二個分詞器。

Chroma影像到影像生成管道。

參考：https://huggingface.co/lodestones/Chroma/

call

< 源 >

( prompt: typing.Union[str, typing.List[str]] = None negative_prompt: typing.Union[str, typing.List[str]] = None image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] = None height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: int = 35 sigmas: typing.Optional[typing.List[float]] = None guidance_scale: float = 5.0 strength: float = 0.9 num_images_per_prompt: typing.Optional[int] = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None prompt_embeds: typing.Optional[torch.Tensor] = None ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None ip_adapter_image_embeds: typing.Optional[typing.List[torch.Tensor]] = None negative_ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None negative_ip_adapter_image_embeds: typing.Optional[typing.List[torch.Tensor]] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None prompt_attention_mask: typing.Optional[torch.Tensor] = None negative_prompt_attention_mask: typing.Optional[<built-in method tensor of type object at 0x7fc54b7d6f40>] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True joint_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] max_sequence_length: int = 512 ) → ~pipelines.chroma.ChromaPipelineOutput 或 tuple

引數

prompt (str 或 List[str]，可選) — 引導影像生成的提示詞。如果未定義，則必須傳遞prompt_embeds。
negative_prompt (str 或 List[str]，可選) — 不用於引導影像生成的提示詞。如果未定義，則必須傳遞negative_prompt_embeds。當不使用引導時（即，如果guidance_scale不大於1），則忽略此引數。
height (int，可選，預設為self.unet.config.sample_size * self.vae_scale_factor) — 生成影像的畫素高度。為獲得最佳效果，預設設定為1024。
width (int，可選，預設為self.unet.config.sample_size * self.vae_scale_factor) — 生成影像的畫素寬度。為獲得最佳效果，預設設定為1024。
num_inference_steps (int，可選，預設為35) — 去噪步驟的數量。更多的去噪步驟通常會帶來更高質量的影像，但會以較慢的推理速度為代價。
sigmas (List[float]，可選) — 用於去噪過程的自定義sigmas，適用於支援其set_timesteps方法中sigmas引數的排程器。如果未定義，將使用傳遞num_inference_steps時的預設行為。
guidance_scale (float，可選，預設為5.0) — Classifier-Free Diffusion Guidance中定義的引導比例。guidance_scale定義為Imagen Paper方程2中的w。透過將guidance_scale設定為大於1來啟用引導比例。更高的引導比例鼓勵生成與文字prompt密切相關的影像，通常以較低的影像質量為代價。
strength (`float，可選，預設為0.9) — 概念上，表示轉換參考影像的程度。必須介於0和1之間。影像將作為起點，強度越大，新增的噪聲越多。去噪步驟的數量取決於最初新增的噪聲量。當強度為1時，新增的噪聲將最大，去噪過程將執行在num_inference_steps中指定的完整迭代次數。因此，值為1基本上會忽略影像。
num_images_per_prompt (int，可選，預設為1) — 每個提示詞生成的影像數量。
generator (torch.Generator 或 List[torch.Generator]，可選) — 一個或多個torch生成器，用於使生成具有確定性。
latents (torch.Tensor，可選) — 預生成的嘈雜潛在表示，從高斯分佈中取樣，用作影像生成的輸入。可用於使用不同的提示詞調整相同的生成。如果未提供，將透過使用提供的隨機generator取樣來生成潛在張量。
prompt_embeds (torch.Tensor，可選) — 預生成的文字嵌入。可用於輕鬆調整文字輸入，例如提示詞權重。如果未提供，將根據prompt輸入引數生成文字嵌入。
ip_adapter_image — (PipelineImageInput，可選)：與IP Adapter配合使用的可選影像輸入。
ip_adapter_image_embeds (List[torch.Tensor]，可選) — 預生成的IP-Adapter影像嵌入。它應該是一個列表，長度與IP-adapter的數量相同。每個元素應該是一個形狀為(batch_size, num_images, emb_dim)的張量。如果未提供，則根據ip_adapter_image輸入引數計算嵌入。
negative_ip_adapter_image — （PipelineImageInput，可選）：與IP Adapter配合使用的可選影像輸入。
negative_ip_adapter_image_embeds (List[torch.Tensor]，可選) — 預生成的IP-Adapter影像嵌入。它應該是一個列表，長度與IP-adapter的數量相同。每個元素應該是一個形狀為(batch_size, num_images, emb_dim)的張量。如果未提供，則根據ip_adapter_image輸入引數計算嵌入。
negative_prompt_embeds (torch.Tensor，可選) — 預生成的負文字嵌入。可用於輕鬆調整文字輸入，例如提示詞權重。如果未提供，將根據negative_prompt輸入引數生成negative_prompt_embeds。
prompt_attention_mask (torch.Tensor，可選) — 提示詞嵌入的注意力掩碼。用於遮蓋提示詞序列中的填充標記。Chroma要求一個填充標記保持未遮蓋狀態。請參閱https://huggingface.co/lodestones/Chroma#tldr-masking-t5-padding-tokens-enhanced-fidelity-and-increased-stability-during-training
negative_prompt_attention_mask (torch.Tensor，可選) — 負提示詞嵌入的注意力掩碼。用於遮蓋負提示詞序列中的填充標記。Chroma要求一個填充標記保持未遮蓋狀態。請參閱https://huggingface.co/lodestones/Chroma#tldr-masking-t5-padding-tokens-enhanced-fidelity-and-increased-stability-during-training
output_type (str，可選，預設為"pil") — 生成影像的輸出格式。在PIL: PIL.Image.Image或np.array之間選擇。
return_dict (bool，可選，預設為True) — 是否返回~pipelines.flux.ChromaPipelineOutput而不是普通元組。
joint_attention_kwargs (dict, 可選) — 一個 kwargs 字典，如果指定，將作為引數傳遞給 self.processor 中定義的 AttentionProcessor，參見 diffusers.models.attention_processor。
callback_on_step_end (Callable, 可選) — 一個在推理過程中每個去噪步驟結束時呼叫的函式。該函式以以下引數呼叫：callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)。callback_kwargs 將包含 callback_on_step_end_tensor_inputs 中指定的所有張量列表。
callback_on_step_end_tensor_inputs (List, 可選) — callback_on_step_end 函式的張量輸入列表。列表中指定的張量將作為 callback_kwargs 引數傳遞。你只能包含流水線類 ._callback_tensor_inputs 屬性中列出的變數。
max_sequence_length (int，預設為 512) — 與 prompt 一起使用的最大序列長度。

~pipelines.chroma.ChromaPipelineOutput 或 tuple

如果return_dict為True，則為~pipelines.chroma.ChromaPipelineOutput，否則為tuple。當返回元組時，第一個元素是生成的影像列表。

呼叫管道進行生成時呼叫的函式。

示例

>>> import torch
>>> from diffusers import ChromaTransformer2DModel, ChromaImg2ImgPipeline

>>> model_id = "lodestones/Chroma"
>>> ckpt_path = "https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v37.safetensors"
>>> pipe = ChromaImg2ImgPipeline.from_pretrained(
...     model_id,
...     transformer=transformer,
...     torch_dtype=torch.bfloat16,
... )
>>> pipe.enable_model_cpu_offload()
>>> init_image = load_image(
...     "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
... )
>>> prompt = "a scenic fastasy landscape with a river and mountains in the background, vibrant colors, detailed, high resolution"
>>> negative_prompt = "low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"
>>> image = pipe(prompt, image=init_image, negative_prompt=negative_prompt).images[0]
>>> image.save("chroma-img2img.png")

disable_vae_slicing

< 源 >

( )

停用切片 VAE 解碼。如果之前啟用了 enable_vae_slicing，此方法將返回一步計算解碼。

disable_vae_tiling

< 源 >

( )

停用平鋪 VAE 解碼。如果之前啟用了 enable_vae_tiling，此方法將恢復一步計算解碼。

enable_vae_slicing

< 源 >

( )

啟用切片 VAE 解碼。啟用此選項後，VAE 會將輸入張量分片，分步計算解碼。這有助於節省一些記憶體並允許更大的批次大小。

enable_vae_tiling

< 源 >

( )

啟用平鋪 VAE 解碼。啟用此選項後，VAE 將把輸入張量分割成瓦片，分多步計算編碼和解碼。這對於節省大量記憶體和處理更大的影像非常有用。

encode_prompt

< 源 >

引數

prompt (str 或 List[str], 可選) — 待編碼的提示詞
negative_prompt (str 或 List[str], 可選) — 不用於引導影像生成的提示詞。如果未定義，則必須傳遞 negative_prompt_embeds。當不使用引導時（即，如果 guidance_scale 小於 1 時），此引數將被忽略。
device — (torch.device): torch 裝置
num_images_per_prompt (int) — 每個提示詞應生成的影像數量
prompt_embeds (torch.Tensor, 可選) — 預生成的文字嵌入。可用於輕鬆調整文字輸入，例如提示詞權重。如果未提供，將根據 prompt 輸入引數生成文字嵌入。
lora_scale (float, 可選) — 應用於文字編碼器所有 LoRA 層的 LoRA 比例（如果 LoRA 層已載入）。

< > 在 GitHub 上更新

←BLIP-Diffusion CogVideoX→

Diffusers

Chroma

推理

從單個檔案載入

ChromaPipeline

class diffusers.ChromaPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_vae_slicing

enable_vae_tiling

encode_prompt

ChromaImg2ImgPipeline

class diffusers.ChromaImg2ImgPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_vae_slicing

enable_vae_tiling

encode_prompt

call

call