Hub Python 庫

( model: typing.Optional[str] = None provider: typing.Union[typing.Literal['black-forest-labs', 'cerebras', 'cohere', 'fal-ai', 'featherless-ai', 'fireworks-ai', 'groq', 'hf-inference', 'hyperbolic', 'nebius', 'novita', 'nscale', 'openai', 'replicate', 'sambanova', 'together'], typing.Literal['auto'], NoneType] = None token: typing.Optional[str] = None timeout: typing.Optional[float] = None headers: typing.Optional[typing.Dict[str, str]] = None cookies: typing.Optional[typing.Dict[str, str]] = None proxies: typing.Optional[typing.Any] = None bill_to: typing.Optional[str] = None base_url: typing.Optional[str] = None api_key: typing.Optional[str] = None )

引數

model (str, 可選) — 用於執行推理的模型。可以是託管在 Hugging Face Hub 上的模型 ID，例如 meta-llama/Meta-Llama-3-8B-Instruct，也可以是已部署的推理端點的 URL。預設為 None，在這種情況下，將自動為任務選擇推薦的模型。注意：為了更好地相容 OpenAI 的客戶端，model 已被別名為 base_url。這兩個引數是互斥的。如果將 URL 作為 model 或 base_url 傳遞給聊天補全，則 (/v1)/chat/completions 字尾路徑將附加到 URL。
provider (str, 可選) — 用於推理的提供商名稱。可以是 "black-forest-labs", "cerebras", "cohere", "fal-ai", "featherless-ai", "fireworks-ai", "groq", "hf-inference", "hyperbolic", "nebius", "novita", "nscale", "openai", "replicate", “sambanova” 或 “together”。預設為 "auto"，即模型可用提供商中的第一個，按使用者在 https://huggingface.co/settings/inference-providers 中的順序排序。如果 model 是 URL 或 base_url 已傳遞，則不使用 provider。
token (str, 可選) — Hugging Face token。如果未提供，將預設為本地儲存的 token。注意：為了更好地相容 OpenAI 的客戶端，token 已被別名為 api_key。這兩個引數互斥，且行為完全相同。
timeout (float, 可選) — 等待伺服器響應的最大秒數。預設為 None，這意味著它將一直迴圈直到伺服器可用。
headers (Dict[str, str], 可選) — 傳送到伺服器的附加標頭。預設情況下，只發送授權和使用者代理標頭。此字典中的值將覆蓋預設值。
bill_to (str, 可選) — 用於請求的計費賬戶。預設情況下，請求將計費到使用者的賬戶。請求只能計費到使用者是其成員且已訂閱企業中心 (Enterprise Hub) 的組織。
cookies (Dict[str, str], 可選) — 傳送到伺服器的附加 cookie。
proxies (Any, 可選) — 用於請求的代理。
base_url (str, 可選) — 執行推理的基礎 URL。這是從 model 複製的引數，以使 InferenceClient 遵循與 openai.OpenAI 客戶端相同的模式。如果設定了 model，則不能使用此引數。預設為 None。
api_key (str, 可選) — 用於身份驗證的 token。這是從 token 複製的引數，以使 InferenceClient 遵循與 openai.OpenAI 客戶端相同的模式。如果設定了 token，則不能使用此引數。預設為 None。

初始化新的推理客戶端。

InferenceClient 旨在提供統一的推理體驗。客戶端可以無縫地與（免費的）推理 API、自託管推理端點或第三方推理提供商一起使用。

音訊分類

( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] model: typing.Optional[str] = None top_k: typing.Optional[int] = None function_to_apply: typing.Optional[ForwardRef('AudioClassificationOutputTransform')] = None ) → List[AudioClassificationOutputElement]

引數

audio (Union[str, Path, bytes, BinaryIO]) — 要分類的音訊內容。可以是原始音訊位元組、本地音訊檔案或指向音訊檔案的 URL。
model (str, 可選) — 用於音訊分類的模型。可以是託管在 Hugging Face Hub 上的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用音訊分類的預設推薦模型。
top_k (int, 可選) — 指定後，將輸出限制為最可能的 K 個類別。
function_to_apply ("AudioClassificationOutputTransform", 可選) — 應用於模型輸出以檢索分數的功能。

List[AudioClassificationOutputElement]

包含預測標籤及其置信度的 AudioClassificationOutputElement 項列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

對提供的音訊內容執行音訊分類。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.audio_classification("audio.flac")
[
    AudioClassificationOutputElement(score=0.4976358711719513, label='hap'),
    AudioClassificationOutputElement(score=0.3677836060523987, label='neu'),
    ...
]

音訊到音訊

( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] model: typing.Optional[str] = None ) → List[AudioToAudioOutputElement]

引數

audio (Union[str, Path, bytes, BinaryIO]) — 用於模型的音訊內容。可以是原始音訊位元組、本地音訊檔案或指向音訊檔案的 URL。
model (str, 可選) — 該模型可以是任何接受音訊檔案並返回另一個音訊檔案的模型。可以是託管在 Hugging Face Hub 上的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用 audio_to_audio 的預設推薦模型。

List[AudioToAudioOutputElement]

包含音訊標籤、內容型別和 blob 中的音訊內容的 AudioToAudioOutputElement 項列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

根據模型執行與音訊到音訊相關的多項任務（例如：語音增強、源分離）。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> audio_output = client.audio_to_audio("audio.flac")
>>> for i, item in enumerate(audio_output):
>>>     with open(f"output_{i}.flac", "wb") as f:
            f.write(item.blob)

自動語音識別

AutomaticSpeechRecognitionOutput

( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] model: typing.Optional[str] = None extra_body: typing.Optional[typing.Dict] = None ) → AutomaticSpeechRecognitionOutput

引數

audio (Union[str, Path, bytes, BinaryIO]) — 要轉錄的內容。可以是原始音訊位元組、本地音訊檔案或音訊檔案的 URL。
model (str, 可選) — 用於 ASR 的模型。可以是託管在 Hugging Face Hub 上的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用 ASR 的預設推薦模型。
extra_body (Dict, 可選) — 傳遞給模型的額外提供商特定引數。有關支援的引數，請參閱提供商的文件。

包含轉錄文字和可選時間戳塊的項。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

對給定的音訊內容執行自動語音識別（ASR 或音訊轉文字）。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.automatic_speech_recognition("hello_world.flac").text
"hello world"

聊天補全

( messages: typing.List[typing.Union[typing.Dict, huggingface_hub.inference._generated.types.chat_completion.ChatCompletionInputMessage]] model: typing.Optional[str] = None stream: bool = False frequency_penalty: typing.Optional[float] = None logit_bias: typing.Optional[typing.List[float]] = None logprobs: typing.Optional[bool] = None max_tokens: typing.Optional[int] = None n: typing.Optional[int] = None presence_penalty: typing.Optional[float] = None response_format: typing.Union[huggingface_hub.inference._generated.types.chat_completion.ChatCompletionInputResponseFormatText, huggingface_hub.inference._generated.types.chat_completion.ChatCompletionInputResponseFormatJSONSchema, huggingface_hub.inference._generated.types.chat_completion.ChatCompletionInputResponseFormatJSONObject, NoneType] = None seed: typing.Optional[int] = None stop: typing.Optional[typing.List[str]] = None stream_options: typing.Optional[huggingface_hub.inference._generated.types.chat_completion.ChatCompletionInputStreamOptions] = None temperature: typing.Optional[float] = None tool_choice: typing.Union[huggingface_hub.inference._generated.types.chat_completion.ChatCompletionInputToolChoiceClass, ForwardRef('ChatCompletionInputToolChoiceEnum'), NoneType] = None tool_prompt: typing.Optional[str] = None tools: typing.Optional[typing.List[huggingface_hub.inference._generated.types.chat_completion.ChatCompletionInputTool]] = None top_logprobs: typing.Optional[int] = None top_p: typing.Optional[float] = None extra_body: typing.Optional[typing.Dict] = None ) → ChatCompletionOutput 或 Iterable of ChatCompletionStreamOutput

引數

messages (ChatCompletionInputMessage 列表) — 由角色和內容對組成的對話歷史。
model (str, 可選) — 用於聊天補全的模型。可以是託管在 Hugging Face Hub 上的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用基於聊天的文字生成的預設推薦模型。有關更多詳細資訊，請參閱 https://huggingface.co/tasks/text-generation。如果 model 是模型 ID，它將作為 model 引數傳遞給伺服器。如果要在設定請求負載中的 model 時定義自定義 URL，則必須在初始化 InferenceClient 時設定 base_url。
frequency_penalty (float, 可選) — 根據新 token 在文字中已有的頻率對其進行懲罰。範圍：[-2.0, 2.0]。預設為 0.0。
logit_bias (List[float], 可選) — 調整特定 token 出現在生成輸出中的可能性。
logprobs (bool, 可選) — 是否返回輸出 token 的對數機率。如果為 true，則返回訊息內容中每個輸出 token 的對數機率。
max_tokens (int, 可選) — 響應中允許的最大 token 數。預設為 100。
n (int, 可選) — 為每個提示生成的補全數量。
presence_penalty (float, 可選) — 介於 -2.0 和 2.0 之間的數字。正值會根據新 token 是否出現在目前文字中來懲罰它們，從而增加模型談論新主題的可能性。
response_format (ChatCompletionInputGrammarType(), 可選) — 語法約束。可以是 JSONSchema 或正則表示式。
seed (可選int, 可選) — 可復現控制流的種子。預設為 None。
stop (List[str], 可選) — 觸發響應結束的最多四個字串。預設為 None。
stream (bool, 可選) — 啟用即時響應流。預設為 False。
stream_options (ChatCompletionInputStreamOptions, 可選) — 用於流式完成的選項。
temperature (float, 可選) — 控制生成結果的隨機性。值越低，完成結果的隨機性越小。範圍：[0, 2]。預設為 1.0。
top_logprobs (int, 可選) — 一個介於 0 到 5 之間的整數，指定在每個 token 位置返回的最有可能的 token 數量，每個 token 都有一個相關的對數機率。如果使用此引數，則必須將 logprobs 設定為 true。
top_p (float, 可選) — 從最可能的下一個詞中抽樣的比例。必須介於 0 和 1 之間。預設為 1.0。
tool_choice (ChatCompletionInputToolChoiceClass 或 ChatCompletionInputToolChoiceEnum(), 可選) — 用於完成的工具。預設為“auto”。
tool_prompt (str, 可選) — 要在工具前附加的提示。
tools (List of ChatCompletionInputTool, 可選) — 模型可能呼叫的工具列表。目前，只支援函式作為工具。使用此引數提供模型可能生成 JSON 輸入的函式列表。
extra_body (Dict, 可選) — 要傳遞給模型的額外提供方特定引數。有關支援的引數，請參閱提供方的文件。

ChatCompletionOutput 或 ChatCompletionStreamOutput 的可迭代物件

伺服器返回的生成文字

如果 stream=False，則生成文字作為 ChatCompletionOutput 返回（預設）。
如果 stream=True，則生成文字以 token 為單位作為 ChatCompletionStreamOutput 序列返回。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

使用指定的語言模型完成對話的方法。

client.chat_completion 方法被別名為 client.chat.completions.create，以與 OpenAI 的客戶端相容。輸入和輸出嚴格相同，使用任一語法都將產生相同的結果。有關 OpenAI 相容性的更多詳細資訊，請檢視推理指南。

您可以透過使用 `extra_body` 引數將特定於提供方的引數傳遞給模型。

示例

>>> from huggingface_hub import InferenceClient
>>> messages = [{"role": "user", "content": "What is the capital of France?"}]
>>> client = InferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
>>> client.chat_completion(messages, max_tokens=100)
ChatCompletionOutput(
    choices=[
        ChatCompletionOutputComplete(
            finish_reason='eos_token',
            index=0,
            message=ChatCompletionOutputMessage(
                role='assistant',
                content='The capital of France is Paris.',
                name=None,
                tool_calls=None
            ),
            logprobs=None
        )
    ],
    created=1719907176,
    id='',
    model='meta-llama/Meta-Llama-3-8B-Instruct',
    object='text_completion',
    system_fingerprint='2.0.4-sha-f426a33',
    usage=ChatCompletionOutputUsage(
        completion_tokens=8,
        prompt_tokens=17,
        total_tokens=25
    )
)

使用流式傳輸的示例

>>> from huggingface_hub import InferenceClient
>>> messages = [{"role": "user", "content": "What is the capital of France?"}]
>>> client = InferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
>>> for token in client.chat_completion(messages, max_tokens=10, stream=True):
...     print(token)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content='The', role='assistant'), index=0, finish_reason=None)], created=1710498504)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' capital', role='assistant'), index=0, finish_reason=None)], created=1710498504)
(...)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' may', role='assistant'), index=0, finish_reason=None)], created=1710498504)

使用 OpenAI 語法的示例

# instead of `from openai import OpenAI`
from huggingface_hub import InferenceClient

# instead of `client = OpenAI(...)`
client = InferenceClient(
    base_url=...,
    api_key=...,
)

output = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

for chunk in output:
    print(chunk.choices[0].delta.content)

直接使用額外（提供方特定）引數的第三方提供方示例。使用費用將計入您的 Together AI 賬戶。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="together",  # Use Together AI provider
...     api_key="<together_api_key>",  # Pass your Together API key directly
... )
>>> client.chat_completion(
...     model="meta-llama/Meta-Llama-3-8B-Instruct",
...     messages=[{"role": "user", "content": "What is the capital of France?"}],
...     extra_body={"safety_model": "Meta-Llama/Llama-Guard-7b"},
... )

透過 Hugging Face 路由使用第三方提供方的示例。使用費用將計入您的 Hugging Face 賬戶。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="sambanova",  # Use Sambanova provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> client.chat_completion(
...     model="meta-llama/Meta-Llama-3-8B-Instruct",
...     messages=[{"role": "user", "content": "What is the capital of France?"}],
... )

使用影像 + 文字作為輸入的示例

>>> from huggingface_hub import InferenceClient

# provide a remote URL
>>> image_url ="https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
# or a base64-encoded image
>>> image_path = "/path/to/image.jpeg"
>>> with open(image_path, "rb") as f:
...     base64_image = base64.b64encode(f.read()).decode("utf-8")
>>> image_url = f"data:image/jpeg;base64,{base64_image}"

>>> client = InferenceClient("meta-llama/Llama-3.2-11B-Vision-Instruct")
>>> output = client.chat.completions.create(
...     messages=[
...         {
...             "role": "user",
...             "content": [
...                 {
...                     "type": "image_url",
...                     "image_url": {"url": image_url},
...                 },
...                 {
...                     "type": "text",
...                     "text": "Describe this image in one sentence.",
...                 },
...             ],
...         },
...     ],
... )
>>> output
The image depicts the iconic Statue of Liberty situated in New York Harbor, New York, on a clear day.

使用工具的示例

>>> client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> messages = [
...     {
...         "role": "system",
...         "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.",
...     },
...     {
...         "role": "user",
...         "content": "What's the weather like the next 3 days in San Francisco, CA?",
...     },
... ]
>>> tools = [
...     {
...         "type": "function",
...         "function": {
...             "name": "get_current_weather",
...             "description": "Get the current weather",
...             "parameters": {
...                 "type": "object",
...                 "properties": {
...                     "location": {
...                         "type": "string",
...                         "description": "The city and state, e.g. San Francisco, CA",
...                     },
...                     "format": {
...                         "type": "string",
...                         "enum": ["celsius", "fahrenheit"],
...                         "description": "The temperature unit to use. Infer this from the users location.",
...                     },
...                 },
...                 "required": ["location", "format"],
...             },
...         },
...     },
...     {
...         "type": "function",
...         "function": {
...             "name": "get_n_day_weather_forecast",
...             "description": "Get an N-day weather forecast",
...             "parameters": {
...                 "type": "object",
...                 "properties": {
...                     "location": {
...                         "type": "string",
...                         "description": "The city and state, e.g. San Francisco, CA",
...                     },
...                     "format": {
...                         "type": "string",
...                         "enum": ["celsius", "fahrenheit"],
...                         "description": "The temperature unit to use. Infer this from the users location.",
...                     },
...                     "num_days": {
...                         "type": "integer",
...                         "description": "The number of days to forecast",
...                     },
...                 },
...                 "required": ["location", "format", "num_days"],
...             },
...         },
...     },
... ]

>>> response = client.chat_completion(
...     model="meta-llama/Meta-Llama-3-70B-Instruct",
...     messages=messages,
...     tools=tools,
...     tool_choice="auto",
...     max_tokens=500,
... )
>>> response.choices[0].message.tool_calls[0].function
ChatCompletionOutputFunctionDefinition(
    arguments={
        'location': 'San Francisco, CA',
        'format': 'fahrenheit',
        'num_days': 3
    },
    name='get_n_day_weather_forecast',
    description=None
)

使用 response_format 的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> messages = [
...     {
...         "role": "user",
...         "content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
...     },
... ]
>>> response_format = {
...     "type": "json",
...     "value": {
...         "properties": {
...             "location": {"type": "string"},
...             "activity": {"type": "string"},
...             "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
...             "animals": {"type": "array", "items": {"type": "string"}},
...         },
...         "required": ["location", "activity", "animals_seen", "animals"],
...     },
... }
>>> response = client.chat_completion(
...     messages=messages,
...     response_format=response_format,
...     max_tokens=500,
... )
>>> response.choices[0].message.content
'{

y": "bike ride",
": ["puppy", "cat", "raccoon"],
_seen": 3,
n": "park"}'

文件問答

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] question: str model: typing.Optional[str] = None doc_stride: typing.Optional[int] = None handle_impossible_answer: typing.Optional[bool] = None lang: typing.Optional[str] = None max_answer_len: typing.Optional[int] = None max_question_len: typing.Optional[int] = None max_seq_len: typing.Optional[int] = None top_k: typing.Optional[int] = None word_boxes: typing.Optional[typing.List[typing.Union[typing.List[float], str]]] = None ) → List[DocumentQuestionAnsweringOutputElement]

引數

image (Union[str, Path, bytes, BinaryIO]) — 上下文的輸入影像。可以是原始位元組、影像檔案或線上影像的 URL。
question (str) — 要回答的問題。
model (str, 可選) — 用於文件問答任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的文件問答模型。預設為 None。
doc_stride (int, 可選) — 如果文件中的單詞太長，無法與模型的問題一起容納，它將被分割成幾個帶有重疊的塊。此引數控制重疊的大小。
handle_impossible_answer (bool, 可選) — 是否接受“不可能”作為答案。
lang (str, 可選) — 執行 OCR 時使用的語言。預設為英語。
max_answer_len (int, 可選) — 預測答案的最大長度（例如，只考慮較短長度的答案）。
max_question_len (int, 可選) — 分詞後問題的最大長度。如果需要，將被截斷。
max_seq_len (int, 可選) — 傳遞給模型的每個塊中總句子（上下文 + 問題）的 token 最大長度。如果需要，上下文將分為幾個塊（使用 doc_stride 作為重疊）。
top_k (int, 可選) — 要返回的答案數量（將按可能性順序選擇）。如果上下文中沒有足夠的可用選項，則可以返回少於 top_k 個答案。
word_boxes (List[Union[List[float], str, 可選) — 單詞和邊界框列表（歸一化為 0->1000）。如果提供，推理將跳過 OCR 步驟並使用提供的邊界框。

List[DocumentQuestionAnsweringOutputElement]

包含預測標籤、相關機率、單詞 ID 和頁碼的 DocumentQuestionAnsweringOutputElement 項列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

回答文件影像上的問題。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.document_question_answering(image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png", question="What is the invoice number?")
[DocumentQuestionAnsweringOutputElement(answer='us-001', end=16, score=0.9999666213989258, start=16)]

特徵提取

( text: str normalize: typing.Optional[bool] = None prompt_name: typing.Optional[str] = None truncate: typing.Optional[bool] = None truncation_direction: typing.Optional[typing.Literal['Left', 'Right']] = None model: typing.Optional[str] = None ) → np.ndarray

引數

text (str) — 要嵌入的文字。
model (str, 可選) — 用於特徵提取任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的特徵提取模型。預設為 None。
normalize (bool, 可選) — 是否對嵌入進行歸一化。僅適用於由 Text-Embedding-Inference 提供支援的伺服器。
prompt_name (str, 可選) — 編碼時應使用的提示名稱。如果未設定，將不應用任何提示。必須是 *Sentence Transformers* 配置 *prompts* 字典中的鍵。例如，如果 prompt_name 是“query”且 prompts 是 {“query”: “query: ”,…}，則句子“What is the capital of France?” 將被編碼為“query: What is the capital of France?”，因為提示文字將預先新增到要編碼的任何文字之前。
truncate (bool, 可選) — 是否截斷嵌入。僅適用於由 Text-Embedding-Inference 提供支援的伺服器。
truncation_direction (Literal[“Left”, “Right”], 可選) — 當傳入 *truncate=True* 時，輸入的哪一側應被截斷。

np.ndarray

表示輸入文字的嵌入，為 float32 numpy 陣列。

引發

[InferenceTimeoutError] 或 HTTPError

[InferenceTimeoutError] — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

為給定文字生成嵌入。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.feature_extraction("Hi, who are you?")
array([[ 2.424802  ,  2.93384   ,  1.1750331 , ...,  1.240499, -0.13776633, -0.7889173 ],
[-0.42943227, -0.6364878 , -1.693462  , ...,  0.41978157, -2.4336355 ,  0.6162071 ],
...,
[ 0.28552425, -0.928395  , -1.2077185 , ...,  0.76810825, -2.1069427 ,  0.6236161 ]], dtype=float32)

填補掩碼

( text: str model: typing.Optional[str] = None targets: typing.Optional[typing.List[str]] = None top_k: typing.Optional[int] = None ) → List[FillMaskOutputElement]

引數

text (str) — 要填充的字串，必須包含 [MASK] 標記（檢查模型卡以獲取掩碼的確切名稱）。
model (str, 可選) — 用於填補掩碼任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的填補掩碼模型。
targets (List[str, 可選) — 傳入時，模型將分數限制在傳入的目標上，而不是在整個詞彙表中查詢。如果提供的目標不在模型詞彙表中，它們將被分詞，並將使用第一個生成的 token（帶有警告，並且可能會較慢）。
top_k (int, 可選) — 傳入時，覆蓋要返回的預測數量。

List[FillMaskOutputElement]

包含預測標籤、相關機率、token 引用和完成文字的 FillMaskOutputElement 項列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

用缺失的單詞（更準確地說是 token）填充一個空白。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.fill_mask("The goal of life is <mask>.")
[
    FillMaskOutputElement(score=0.06897063553333282, token=11098, token_str=' happiness', sequence='The goal of life is happiness.'),
    FillMaskOutputElement(score=0.06554922461509705, token=45075, token_str=' immortality', sequence='The goal of life is immortality.')
]

獲取端點資訊

( model: typing.Optional[str] = None ) → Dict[str, Any]

引數

model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數會覆蓋在例項級別定義的模型。預設為 None。

Dict[str, Any]

關於端點的資訊。

獲取已部署端點的資訊。

此端點僅適用於由 Text-Generation-Inference (TGI) 或 Text-Embedding-Inference (TEI) 提供支援的端點。由 transformers 提供支援的端點返回空載荷。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> client.get_endpoint_info()
{
    'model_id': 'meta-llama/Meta-Llama-3-70B-Instruct',
    'model_sha': None,
    'model_dtype': 'torch.float16',
    'model_device_type': 'cuda',
    'model_pipeline_tag': None,
    'max_concurrent_requests': 128,
    'max_best_of': 2,
    'max_stop_sequences': 4,
    'max_input_length': 8191,
    'max_total_tokens': 8192,
    'waiting_served_ratio': 0.3,
    'max_batch_total_tokens': 1259392,
    'max_waiting_tokens': 20,
    'max_batch_size': None,
    'validation_workers': 32,
    'max_client_batch_size': 4,
    'version': '2.0.2',
    'sha': 'dccab72549635c7eb5ddb17f43f0b7cdff07c214',
    'docker_label': 'sha-dccab72'
}

獲取模型狀態

( model: typing.Optional[str] = None ) → ModelStatus

引數

model (str, 可選) — 要檢查其狀態的模型識別符號。如果未提供模型，將使用與此 InferenceClient 例項關聯的模型。只能檢查 HF Inference API 服務，因此識別符號不能是 URL。

ModelStatus

ModelStatus 資料類的一個例項，包含模型狀態資訊：載入、狀態、計算型別和框架。

獲取託管在 HF Inference API 上的模型的狀態。

此端點主要在您已經知道要使用的模型並想檢查其可用性時有用。如果您想發現已部署的模型，則應使用 list_deployed_models()。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.get_model_status("meta-llama/Meta-Llama-3-8B-Instruct")
ModelStatus(loaded=True, state='Loaded', compute_type='gpu', framework='text-generation-inference')

健康檢查

( model: typing.Optional[str] = None ) → bool

引數

model (str, 可選) — 推理端點的 URL。此引數會覆蓋在例項級別定義的模型。預設為 None。

布林值

如果一切正常，則為 True。

檢查已部署端點的健康狀況。

健康檢查僅適用於由 Text-Generation-Inference (TGI) 或 Text-Embedding-Inference (TEI) 提供支援的推理端點。對於推理 API，請改用 InferenceClient.get_model_status()。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient("https://jzgu0buei5.us-east-1.aws.endpoints.huggingface.cloud")
>>> client.health_check()
True

影像分類

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] model: typing.Optional[str] = None function_to_apply: typing.Optional[ForwardRef('ImageClassificationOutputTransform')] = None top_k: typing.Optional[int] = None ) → List[ImageClassificationOutputElement]

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 要分類的影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
model (str, 可選) — 用於影像分類的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的影像分類模型。
function_to_apply ("ImageClassificationOutputTransform", 可選) — 要應用於模型輸出以檢索分數的函式。
top_k (int, 可選) — 指定時，將輸出限制為最有可能的前 K 個類別。

List[ImageClassificationOutputElement]

包含預測標籤和相關機率的 ImageClassificationOutputElement 項列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

使用指定模型對給定影像執行影像分類。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
[ImageClassificationOutputElement(label='Blenheim spaniel', score=0.9779096841812134), ...]

影像分割

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] model: typing.Optional[str] = None mask_threshold: typing.Optional[float] = None overlap_mask_area_threshold: typing.Optional[float] = None subtask: typing.Optional[ForwardRef('ImageSegmentationSubtask')] = None threshold: typing.Optional[float] = None ) → List[ImageSegmentationOutputElement]

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 要分割的影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
model (str, optional) — 用於影像分割的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用影像分割的預設推薦模型。
mask_threshold (float, optional) — 將預測的掩碼轉換為二進位制值時使用的閾值。
overlap_mask_area_threshold (float, optional) — 用於消除小而不連貫的分割區域的掩碼重疊閾值。
subtask ("ImageSegmentationSubtask", optional) — 要執行的分割任務，取決於模型功能。
threshold (float, optional) — 用於過濾預測掩碼的機率閾值。

List[ImageSegmentationOutputElement]

包含分割掩碼和相關屬性的 ImageSegmentationOutputElement 列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

使用指定的模型對給定影像執行影像分割。

如果您想處理影像，必須安裝 `PIL` (`pip install Pillow`)。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_segmentation("cat.jpg")
[ImageSegmentationOutputElement(score=0.989008, label='LABEL_184', mask=<PIL.PngImagePlugin.PngImageFile image mode=L size=400x300 at 0x7FDD2B129CC0>), ...]

影像到影像

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] prompt: typing.Optional[str] = None negative_prompt: typing.Optional[str] = None num_inference_steps: typing.Optional[int] = None guidance_scale: typing.Optional[float] = None model: typing.Optional[str] = None target_size: typing.Optional[huggingface_hub.inference._generated.types.image_to_image.ImageToImageTargetSize] = None **kwargs ) → Image

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 用於轉換的輸入影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
prompt (str, optional) — 用於指導影像生成的文字提示。
negative_prompt (str, optional) — 一個提示，用於指導影像生成中不應包含的內容。
num_inference_steps (int, optional) — 對於擴散模型。去噪步數。更多的去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
guidance_scale (float, optional) — 對於擴散模型。更高的指導尺度值會鼓勵模型生成與文字提示緊密相關的影像，但影像質量會降低。
model (str, optional) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數會覆蓋例項級別定義_的_模型。預設為 None。
target_size (ImageToImageTargetSize, optional) — 輸出影像的畫素大小。

影像

轉換後的影像。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

使用指定模型執行影像到影像的轉換。

如果您想處理影像，必須安裝 `PIL` (`pip install Pillow`)。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> image = client.image_to_image("cat.jpg", prompt="turn the cat into a tiger")
>>> image.save("tiger.jpg")

影像到文字

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] model: typing.Optional[str] = None ) → ImageToTextOutput

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 要新增字幕的輸入影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
model (str, optional) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數會覆蓋例項級別定義_的_模型。預設為 None。

ImageToTextOutput

生成的文字。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

接收輸入影像並返回文字。

模型根據您的用例（影像字幕、光學字元識別 (OCR)、Pix2Struct 等）可以有非常不同的輸出。請檢視模型卡以瞭解模型的具體特性。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.image_to_text("cat.jpg")
'a cat standing in a grassy field '
>>> client.image_to_text("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
'a dog laying on the grass next to a flower pot '

影像到影片

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] model: typing.Optional[str] = None prompt: typing.Optional[str] = None negative_prompt: typing.Optional[str] = None num_frames: typing.Optional[float] = None num_inference_steps: typing.Optional[int] = None guidance_scale: typing.Optional[float] = None seed: typing.Optional[int] = None target_size: typing.Optional[huggingface_hub.inference._generated.types.image_to_video.ImageToVideoTargetSize] = None **kwargs ) → bytes

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 用於生成影片的輸入影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
model (str, optional) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數會覆蓋例項級別定義_的_模型。預設為 None。
prompt (str, optional) — 用於指導影片生成的文字提示。
negative_prompt (str, optional) — 一個提示，用於指導影片生成中不應包含的內容。
num_frames (float, optional) — `num_frames` 引數決定生成多少影片幀。
num_inference_steps (int, optional) — 對於擴散模型。去噪步數。更多的去噪步數通常會帶來更高質量的影片，但推理速度會變慢。
guidance_scale (float, optional) — 對於擴散模型。更高的指導尺度值會鼓勵模型生成與文字提示緊密相關的影片，但影像質量會降低。
seed (int, optional) — 用於影片生成的種子。
target_size (ImageToVideoTargetSize, optional) — 輸出影片幀的畫素大小。
num_inference_steps (int, optional) — 去噪步數。更多的去噪步數通常會帶來更高質量的影片，但推理速度會變慢。
seed (int, optional) — 隨機數生成器的種子。

位元組

生成的影片。

從輸入影像生成影片。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> video = client.image_to_video("cat.jpg", model="Wan-AI/Wan2.2-I2V-A14B", prompt="turn the cat into a tiger")
>>> with open("tiger.mp4", "wb") as f:
...     f.write(video)

列出已部署的模型

( frameworks: typing.Union[NoneType, str, typing.Literal['all'], typing.List[str]] = None ) → Dict[str, List[str]]

引數

frameworks (Literal["all"] 或 List[str] 或 str, optional) — 要篩選的框架。預設情況下，僅測試可用框架的子集。如果設定為“all”，將測試所有可用框架。也可以提供單個框架或自定義框架集進行檢查。

Dict[str, List[str]]

將任務名稱對映到模型 ID 的排序列表的字典。

列出部署在 HF Serverless Inference API 服務上的模型。

此幫助程式逐框架檢查已部署的模型。預設情況下，它將檢查支援的 4 個主要框架，這些框架佔託管模型的 95%。但是，如果您想要一個完整的模型列表，可以將 `frameworks` 設定為“all”作為輸入。或者，如果您事先知道您對哪個框架感興趣，您也可以將搜尋限制在該框架（例如 `frameworks="text-generation-inference"`）。檢查的框架越多，所需時間就越長。

此端點方法不返回 HF Inference API 服務可用的所有模型的即時列表。它搜尋最近可用的模型快取列表，該列表可能不是最新的。如果您想了解特定模型的即時狀態，請使用 get_model_status()。

此端點方法主要用於發現。如果您已經知道要使用的模型並想檢查其可用性，可以直接使用 get_model_status()。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

# Discover zero-shot-classification models currently deployed
>>> models = client.list_deployed_models()
>>> models["zero-shot-classification"]
['Narsil/deberta-large-mnli-zero-cls', 'facebook/bart-large-mnli', ...]

# List from only 1 framework
>>> client.list_deployed_models("text-generation-inference")
{'text-generation': ['bigcode/starcoder', 'meta-llama/Llama-2-70b-chat-hf', ...], ...}

物件檢測

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] model: typing.Optional[str] = None threshold: typing.Optional[float] = None ) → List[ObjectDetectionOutputElement]

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 要檢測物件的影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
model (str, optional) — 用於物件檢測的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的物件檢測模型（DETR）。
threshold (float, optional) — 進行預測所需的機率。

List[ObjectDetectionOutputElement]

包含邊界框和相關屬性的 ObjectDetectionOutputElement 列表。

引發

InferenceTimeoutError 或 HTTPError 或 ValueError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。
ValueError — 如果請求輸出不是列表。

使用指定的模型對給定影像執行物件檢測。

如果您想處理影像，必須安裝 `PIL` (`pip install Pillow`)。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.object_detection("people.jpg")
[ObjectDetectionOutputElement(score=0.9486683011054993, label='person', box=ObjectDetectionBoundingBox(xmin=59, ymin=39, xmax=420, ymax=510)), ...]

問答

( question: str context: str model: typing.Optional[str] = None align_to_words: typing.Optional[bool] = None doc_stride: typing.Optional[int] = None handle_impossible_answer: typing.Optional[bool] = None max_answer_len: typing.Optional[int] = None max_question_len: typing.Optional[int] = None max_seq_len: typing.Optional[int] = None top_k: typing.Optional[int] = None ) → Union[QuestionAnsweringOutputElement, ListQuestionAnsweringOutputElement]

引數

question (str) — 要回答的問題。
context (str) — 問題的上下文。
model (str) — 用於問答任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。
align_to_words (bool, optional) — 嘗試將答案與實際單詞對齊。可提高空格分隔語言的質量。可能會損害非空格分隔語言（如日語或中文）的質量。
doc_stride (int, optional) — 如果上下文太長，無法與模型的問題匹配，它將被分成幾個有重疊的塊。此引數控制重疊的大小。
handle_impossible_answer (bool, optional) — 是否接受不可能的答案。
max_answer_len (int, optional) — 預測答案的最大長度（例如，只考慮較短長度的答案）。
max_question_len (int, optional) — 分詞後問題的最大長度。如果需要，將被截斷。
max_seq_len (int, optional) — 傳遞給模型的每個塊中總句子（上下文 + 問題）的最大標記長度。如果需要，上下文將分為幾個塊（使用 docStride 作為重疊）。
top_k (int, optional) — 要返回的答案數量（將按可能性順序選擇）。請注意，如果上下文內沒有足夠的可用選項，我們將返回少於 topk 的答案。

Union[QuestionAnsweringOutputElement, ListQuestionAnsweringOutputElement]

當 top_k 為 1 或未提供時，返回單個 QuestionAnsweringOutputElement。當 top_k 大於 1 時，返回 QuestionAnsweringOutputElement 列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

從給定文字中檢索問題的答案。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.question_answering(question="What's my name?", context="My name is Clara and I live in Berkeley.")
QuestionAnsweringOutputElement(answer='Clara', end=16, score=0.9326565265655518, start=11)

句子相似度

( sentence: str other_sentences: typing.List[str] model: typing.Optional[str] = None ) → List[float]

引數

sentence (str) — 要與其他句子進行比較的主要句子。
other_sentences (List[str]) — 要比較的句子列表。
model (str, optional) — 用於句子相似度任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的句子相似度模型。預設為 None。

List[float]

表示輸入文字的嵌入。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

透過比較句子的嵌入來計算句子與一列其他句子之間的語義相似度。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.sentence_similarity(
...     "Machine learning is so easy.",
...     other_sentences=[
...         "Deep learning is so straightforward.",
...         "This is so difficult, like rocket science.",
...         "I can't believe how much I struggled with this.",
...     ],
... )
[0.7785726189613342, 0.45876261591911316, 0.2906220555305481]

摘要

( text: str model: typing.Optional[str] = None clean_up_tokenization_spaces: typing.Optional[bool] = None generate_parameters: typing.Optional[typing.Dict[str, typing.Any]] = None truncation: typing.Optional[ForwardRef('SummarizationTruncationStrategy')] = None ) → SummarizationOutput

引數

text (str) — 要總結的輸入文字。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的摘要模型。
clean_up_tokenization_spaces (bool, 可選) — 是否清除文字輸出中可能存在的額外空格。
generate_parameters (Dict[str, Any], 可選) — 文字生成演算法的額外引數設定。
truncation ("SummarizationTruncationStrategy", 可選) — 要使用的截斷策略。

SummarizationOutput

生成的摘要文字。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

使用指定模型生成給定文字的摘要。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.summarization("The Eiffel tower...")
SummarizationOutput(generated_text="The Eiffel tower is one of the most famous landmarks in the world....")

table_question_answering

TableQuestionAnsweringOutputElement

( table: typing.Dict[str, typing.Any] query: str model: typing.Optional[str] = None padding: typing.Optional[ForwardRef('Padding')] = None sequential: typing.Optional[bool] = None truncation: typing.Optional[bool] = None ) → TableQuestionAnsweringOutputElement

引數

table (str) — 以列表字典表示的資料表，其中條目是標題，列表是所有值，所有列表必須具有相同的大小。
query (str) — 你想向表格提出的純文字查詢。
model (str) — 用於表格問答任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。
padding ("Padding", 可選) — 啟用和控制填充。
sequential (bool, 可選) — 是順序推理還是批次推理。批次推理更快，但像 SQA 這樣的模型需要順序推理才能在序列中提取關係，因為它們具有對話性質。
truncation (bool, 可選) — 啟用和控制截斷。

一個表格問答輸出，包含答案、座標、單元格和使用的聚合器。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

從表格中給出的資訊中檢索問題的答案。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> query = "How many stars does the transformers repository have?"
>>> table = {"Repository": ["Transformers", "Datasets", "Tokenizers"], "Stars": ["36542", "4512", "3934"]}
>>> client.table_question_answering(table, query, model="google/tapas-base-finetuned-wtq")
TableQuestionAnsweringOutputElement(answer='36542', coordinates=[[0, 1]], cells=['36542'], aggregator='AVERAGE')

tabular_classification

( table: typing.Dict[str, typing.Any] model: typing.Optional[str] = None ) → List

引數

table (Dict[str, Any]) — 要分類的屬性集。
model (str, 可選) — 用於表格分類任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的表格分類模型。預設為 None。

List

標籤列表，每行一個，與初始表格中的行對應。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

根據一組屬性對目標類別（一個組）進行分類。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> table = {
...     "fixed_acidity": ["7.4", "7.8", "10.3"],
...     "volatile_acidity": ["0.7", "0.88", "0.32"],
...     "citric_acid": ["0", "0", "0.45"],
...     "residual_sugar": ["1.9", "2.6", "6.4"],
...     "chlorides": ["0.076", "0.098", "0.073"],
...     "free_sulfur_dioxide": ["11", "25", "5"],
...     "total_sulfur_dioxide": ["34", "67", "13"],
...     "density": ["0.9978", "0.9968", "0.9976"],
...     "pH": ["3.51", "3.2", "3.23"],
...     "sulphates": ["0.56", "0.68", "0.82"],
...     "alcohol": ["9.4", "9.8", "12.6"],
... }
>>> client.tabular_classification(table=table, model="julien-c/wine-quality")
["5", "5", "5"]

tabular_regression

( table: typing.Dict[str, typing.Any] model: typing.Optional[str] = None ) → List

引數

table (Dict[str, Any]) — 儲存在表格中的屬性集。用於預測目標的屬性可以是數值型和類別型。
model (str, 可選) — 用於表格迴歸任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的表格迴歸模型。預設為 None。

List

預測的數值目標值列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

在給定表格中一組屬性/特徵的情況下預測數值目標值。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> table = {
...     "Height": ["11.52", "12.48", "12.3778"],
...     "Length1": ["23.2", "24", "23.9"],
...     "Length2": ["25.4", "26.3", "26.5"],
...     "Length3": ["30", "31.2", "31.1"],
...     "Species": ["Bream", "Bream", "Bream"],
...     "Width": ["4.02", "4.3056", "4.6961"],
... }
>>> client.tabular_regression(table, model="scikit-learn/Fish-Weight")
[110, 120, 130]

text_classification

( text: str model: typing.Optional[str] = None top_k: typing.Optional[int] = None function_to_apply: typing.Optional[ForwardRef('TextClassificationOutputTransform')] = None ) → List[TextClassificationOutputElement]

引數

text (str) — 要分類的字串。
model (str, 可選) — 用於文字分類任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的文字分類模型。預設為 None。
top_k (int, 可選) — 指定時，將輸出限制為最有可能的前 K 個類別。
function_to_apply ("TextClassificationOutputTransform", 可選) — 應用於模型輸出以檢索分數的函式。

List[TextClassificationOutputElement]

包含預測標籤和相關機率的 TextClassificationOutputElement 項列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

對給定文字執行文字分類（例如情感分析）。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.text_classification("I like you")
[
    TextClassificationOutputElement(label='POSITIVE', score=0.9998695850372314),
    TextClassificationOutputElement(label='NEGATIVE', score=0.0001304351753788069),
]

text_generation

( prompt: str details: typing.Optional[bool] = None stream: typing.Optional[bool] = None model: typing.Optional[str] = None adapter_id: typing.Optional[str] = None best_of: typing.Optional[int] = None decoder_input_details: typing.Optional[bool] = None do_sample: typing.Optional[bool] = None frequency_penalty: typing.Optional[float] = None grammar: typing.Optional[huggingface_hub.inference._generated.types.text_generation.TextGenerationInputGrammarType] = None max_new_tokens: typing.Optional[int] = None repetition_penalty: typing.Optional[float] = None return_full_text: typing.Optional[bool] = None seed: typing.Optional[int] = None stop: typing.Optional[typing.List[str]] = None stop_sequences: typing.Optional[typing.List[str]] = None temperature: typing.Optional[float] = None top_k: typing.Optional[int] = None top_n_tokens: typing.Optional[int] = None top_p: typing.Optional[float] = None truncate: typing.Optional[int] = None typical_p: typing.Optional[float] = None watermark: typing.Optional[bool] = None ) → Union[str, TextGenerationOutput, Iterable[str], Iterable[TextGenerationStreamOutput]]

引數

prompt (str) — 輸入文字。
details (bool, 可選) — 預設情況下，text_generation 返回一個字串。如果您想要詳細輸出（標記、機率、種子、結束原因等），請傳遞 details=True。僅適用於執行 text-generation-inference 後端的模型。
stream (bool, 可選) — 預設情況下，text_generation 返回完整的生成文字。如果您希望以流式方式返回標記，請傳遞 stream=True。僅適用於執行 text-generation-inference 後端的模型。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數會覆蓋例項級別定義的模型。預設為 None。
adapter_id (str, 可選) — Lora 介面卡 ID。
best_of (int, 可選) — 生成最佳序列，並返回具有最高標記對數機率的序列。
decoder_input_details (bool, 可選) — 返回解碼器輸入標記的對數機率和 ID。您還必須將 details=True 設定為才能考慮在內。預設為 False。
do_sample (bool, 可選) — 啟用 logits 取樣
frequency_penalty (float, 可選) — 介於 -2.0 和 2.0 之間的數字。正值根據新標記在文字中已有的頻率對其進行懲罰，從而降低模型逐字重複相同行的可能性。
grammar (TextGenerationInputGrammarType, 可選) — 語法約束。可以是 JSONSchema 或正則表示式。
max_new_tokens (int, 可選) — 生成的最大標記數。預設為 100。
repetition_penalty (float, 可選) — 重複懲罰的引數。1.0 表示無懲罰。有關更多詳細資訊，請參閱這篇論文。
return_full_text (bool, 可選) — 是否在生成文字前加上提示。
seed (int, 可選) — 隨機取樣種子
stop (List[str], 可選) — 如果生成了 stop 中的成員，則停止生成標記。
stop_sequences (List[str], 可選) — 已棄用的引數。請改用 stop。
temperature (float, 可選) — 用於調整 logits 分佈的值。
top_n_tokens (int, 可選) — 返回每個生成步驟中最有可能的 top_n_tokens 個標記的資訊，而不是僅僅取樣標記。
top_k (int, *可選`) — 用於 top-k 過濾的最高機率詞彙標記數。
top_p (float, *可選`) -- 如果設定為 < 1，則僅保留最小的、機率總和為 top_p 或更高的最可能標記集用於生成。
truncate (int, *可選`) — 將輸入標記截斷為給定大小。
typical_p (float, *可選`) — 典型解碼質量。有關更多資訊，請參閱自然語言生成中的典型解碼。
watermark (bool, 可選) — 使用大型語言模型水印進行水印。

Union[str, TextGenerationOutput, Iterable[str], Iterable[TextGenerationStreamOutput]]

伺服器返回的生成文字

如果 stream=False 且 details=False，則生成文字以 str 形式返回（預設）
如果 stream=True 且 details=False，則生成文字以 Iterable[str] 形式逐個標記返回
如果 stream=False 且 details=True，則生成文字以 TextGenerationOutput 形式返回更多詳細資訊
如果 details=True 且 stream=True，則生成文字以 TextGenerationStreamOutput 的可迭代形式逐個標記返回

引發

ValidationError 或 InferenceTimeoutError 或 HTTPError

ValidationError — 如果輸入值無效。不對伺服器進行 HTTP 呼叫。
InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

給定一個提示，生成以下文字。

如果您想從聊天訊息生成響應，應使用 InferenceClient.chat_completion() 方法。它接受訊息列表而不是單個文字提示，併為您處理聊天模板。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

# Case 1: generate text
>>> client.text_generation("The huggingface_hub library is ", max_new_tokens=12)
'100% open source and built to be easy to use.'

# Case 2: iterate over the generated tokens. Useful for large generation.
>>> for token in client.text_generation("The huggingface_hub library is ", max_new_tokens=12, stream=True):
...     print(token)
100
%
open
source
and
built
to
be
easy
to
use
.

# Case 3: get more details about the generation process.
>>> client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
TextGenerationOutput(
    generated_text='100% open source and built to be easy to use.',
    details=TextGenerationDetails(
        finish_reason='length',
        generated_tokens=12,
        seed=None,
        prefill=[
            TextGenerationPrefillOutputToken(id=487, text='The', logprob=None),
            TextGenerationPrefillOutputToken(id=53789, text=' hugging', logprob=-13.171875),
            (...)
            TextGenerationPrefillOutputToken(id=204, text=' ', logprob=-7.0390625)
        ],
        tokens=[
            TokenElement(id=1425, text='100', logprob=-1.0175781, special=False),
            TokenElement(id=16, text='%', logprob=-0.0463562, special=False),
            (...)
            TokenElement(id=25, text='.', logprob=-0.5703125, special=False)
        ],
        best_of_sequences=None
    )
)

# Case 4: iterate over the generated tokens with more details.
# Last object is more complete, containing the full generated text and the finish reason.
>>> for details in client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True, stream=True):
...     print(details)
...
TextGenerationStreamOutput(token=TokenElement(id=1425, text='100', logprob=-1.0175781, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=16, text='%', logprob=-0.0463562, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=1314, text=' open', logprob=-1.3359375, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=3178, text=' source', logprob=-0.28100586, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=273, text=' and', logprob=-0.5961914, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=3426, text=' built', logprob=-1.9423828, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-1.4121094, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=314, text=' be', logprob=-1.5224609, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=1833, text=' easy', logprob=-2.1132812, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-0.08520508, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=745, text=' use', logprob=-0.39453125, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(
    id=25,
    text='.',
    logprob=-0.5703125,
    special=False),
    generated_text='100% open source and built to be easy to use.',
    details=TextGenerationStreamOutputStreamDetails(finish_reason='length', generated_tokens=12, seed=None)
)

# Case 5: generate constrained output using grammar
>>> response = client.text_generation(
...     prompt="I saw a puppy a cat and a raccoon during my bike ride in the park",
...     model="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
...     max_new_tokens=100,
...     repetition_penalty=1.3,
...     grammar={
...         "type": "json",
...         "value": {
...             "properties": {
...                 "location": {"type": "string"},
...                 "activity": {"type": "string"},
...                 "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
...                 "animals": {"type": "array", "items": {"type": "string"}},
...             },
...             "required": ["location", "activity", "animals_seen", "animals"],
...         },
...     },
... )
>>> json.loads(response)
{
    "activity": "bike riding",
    "animals": ["puppy", "cat", "raccoon"],
    "animals_seen": 3,
    "location": "park"
}

text_to_image

( prompt: str negative_prompt: typing.Optional[str] = None height: typing.Optional[int] = None width: typing.Optional[int] = None num_inference_steps: typing.Optional[int] = None guidance_scale: typing.Optional[float] = None model: typing.Optional[str] = None scheduler: typing.Optional[str] = None seed: typing.Optional[int] = None extra_body: typing.Optional[typing.Dict[str, typing.Any]] = None ) → Image

引數

prompt (str) — 用於生成影像的提示語。
negative_prompt (str, 可選) — 用於指導影像生成中不應包含內容的提示語。
height (int, 可選) — 輸出影像的高度（畫素）。
width (int, 可選) — 輸出影像的寬度（畫素）。
num_inference_steps (int, 可選) — 去噪步驟的數量。更多的去噪步驟通常會帶來更高質量的影像，但推理速度會變慢。
guidance_scale (float, 可選) — 較高的指導尺度值鼓勵模型生成與文字提示緊密相關的影像，但過高的值可能會導致飽和和其他偽影。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。如果未提供，將使用預設推薦的文字到影像模型。預設為 None。
scheduler (str, 可選) — 用相容的排程器覆蓋。
seed (int, 可選) — 隨機數生成器的種子。
extra_body (Dict[str, Any], 可選) — 要傳遞給模型的其他提供商特定引數。有關支援的引數，請參閱提供商的文件。

影像

生成的影像。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

根據給定文字使用指定模型生成影像。

如果您想處理影像，必須安裝 `PIL` (`pip install Pillow`)。

您可以透過使用 `extra_body` 引數將特定於提供方的引數傳遞給模型。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

>>> image = client.text_to_image("An astronaut riding a horse on the moon.")
>>> image.save("astronaut.png")

>>> image = client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     negative_prompt="low resolution, blurry",
...     model="stabilityai/stable-diffusion-2-1",
... )
>>> image.save("better_astronaut.png")

直接使用第三方提供商的示例。使用費用將在您的 fal.ai 賬戶中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="fal-ai",  # Use fal.ai provider
...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
... )
>>> image = client.text_to_image(
...     "A majestic lion in a fantasy forest",
...     model="black-forest-labs/FLUX.1-schnell",
... )
>>> image.save("lion.png")

透過 Hugging Face 路由使用第三方提供方的示例。使用費用將計入您的 Hugging Face 賬戶。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> image = client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     model="black-forest-labs/FLUX.1-dev",
... )
>>> image.save("astronaut.png")

使用 Replicate 提供商和額外引數的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> image = client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     model="black-forest-labs/FLUX.1-schnell",
...     extra_body={"output_quality": 100},
... )
>>> image.save("astronaut.png")

文字轉語音

( text: str model: typing.Optional[str] = None do_sample: typing.Optional[bool] = None early_stopping: typing.Union[bool, ForwardRef('TextToSpeechEarlyStoppingEnum'), NoneType] = None epsilon_cutoff: typing.Optional[float] = None eta_cutoff: typing.Optional[float] = None max_length: typing.Optional[int] = None max_new_tokens: typing.Optional[int] = None min_length: typing.Optional[int] = None min_new_tokens: typing.Optional[int] = None num_beam_groups: typing.Optional[int] = None num_beams: typing.Optional[int] = None penalty_alpha: typing.Optional[float] = None temperature: typing.Optional[float] = None top_k: typing.Optional[int] = None top_p: typing.Optional[float] = None typical_p: typing.Optional[float] = None use_cache: typing.Optional[bool] = None extra_body: typing.Optional[typing.Dict[str, typing.Any]] = None ) → bytes

引數

text (str) — 要合成的文字。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。如果未提供，將使用預設推薦的文字到語音模型。預設為 None。
do_sample (bool, 可選) — 生成新令牌時是否使用取樣而不是貪婪解碼。
early_stopping (Union[bool, "TextToSpeechEarlyStoppingEnum"], 可選) — 控制基於束的方法的停止條件。
epsilon_cutoff (float, 可選) — 如果設定為嚴格介於 0 和 1 之間的浮點數，則僅對條件機率大於 epsilon_cutoff 的令牌進行取樣。在論文中，建議的值範圍為 3e-4 到 9e-4，具體取決於模型的大小。有關更多詳細資訊，請參閱截斷取樣作為語言模型去平滑。
eta_cutoff (float, 可選) — Eta 取樣是區域性典型取樣和 epsilon 取樣的混合。如果設定為嚴格介於 0 和 1 之間的浮點數，則僅當令牌大於 eta_cutoff 或 sqrt(eta_cutoff)
- exp(-entropy(softmax(next_token_logits))) 時才考慮該令牌。後一項直觀地講是預期下一個令牌的機率，由 sqrt(eta_cutoff) 縮放。在論文中，建議的值範圍為 3e-4 到 2e-3，具體取決於模型的大小。有關更多詳細資訊，請參閱截斷取樣作為語言模型去平滑。
max_length (int, 可選) — 生成文字的最大長度（以令牌為單位），包括輸入。
max_new_tokens (int, 可選) — 要生成的最大令牌數。優先於 max_length。
min_length (int, 可選) — 生成文字的最小長度（以令牌為單位），包括輸入。
min_new_tokens (int, 可選) — 要生成的最小令牌數。優先於 min_length。
num_beam_groups (int, 可選) — 將 num_beams 分組的數量，以確保不同束組之間的多樣性。有關更多詳細資訊，請參閱此論文。
num_beams (int, 可選) — 用於束搜尋的束數。
penalty_alpha (float, 可選) — 該值平衡了模型置信度和對比搜尋解碼中的退化懲罰。
temperature (float, 可選) — 用於調整下一個令牌機率的值。
top_k (int, 可選) — 保留用於 top-k 過濾的最高機率詞彙令牌的數量。
top_p (float, 可選) — 如果設定為小於 1 的浮點數，則僅保留機率總和達到或超過 top_p 的最小機率最高令牌集用於生成。
typical_p (float, 可選) — 區域性典型性衡量了在給定已生成的部分文字的情況下，預測下一個目標令牌的條件機率與預測下一個隨機令牌的預期條件機率的相似程度。如果設定為小於 1 的浮點數，則保留區域性最典型的令牌的最小集合，其機率總和達到或超過 typical_p 用於生成。有關更多詳細資訊，請參閱此論文。
use_cache (bool, 可選) — 模型是否應使用過去的最後鍵/值注意力來加速解碼。
extra_body (Dict[str, Any], 可選) — 要傳遞給模型的其他提供商特定引數。有關支援的引數，請參閱提供商的文件。

位元組

生成的音訊。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

合成給定文字的語音音訊。

您可以透過使用 `extra_body` 引數將特定於提供方的引數傳遞給模型。

示例

>>> from pathlib import Path
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

>>> audio = client.text_to_speech("Hello world")
>>> Path("hello_world.flac").write_bytes(audio)

直接使用第三方提供商的示例。使用費用將在您的 Replicate 賬戶中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",
...     api_key="your-replicate-api-key",  # Pass your Replicate API key directly
... )
>>> audio = client.text_to_speech(
...     text="Hello world",
...     model="OuteAI/OuteTTS-0.3-500M",
... )
>>> Path("hello_world.flac").write_bytes(audio)

透過 Hugging Face 路由使用第三方提供方的示例。使用費用將計入您的 Hugging Face 賬戶。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",
...     api_key="hf_...",  # Pass your HF token
... )
>>> audio =client.text_to_speech(
...     text="Hello world",
...     model="OuteAI/OuteTTS-0.3-500M",
... )
>>> Path("hello_world.flac").write_bytes(audio)

使用 Replicate 提供商和額外引數的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> audio = client.text_to_speech(
...     "Hello, my name is Kororo, an awesome text-to-speech model.",
...     model="hexgrad/Kokoro-82M",
...     extra_body={"voice": "af_nicole"},
... )
>>> Path("hello.flac").write_bytes(audio)

在 fal.ai 上使用“YuE-s1-7B-anneal-en-cot”生成音樂的示例

>>> from huggingface_hub import InferenceClient
>>> lyrics = '''
... [verse]
... In the town where I was born
... Lived a man who sailed to sea
... And he told us of his life
... In the land of submarines
... So we sailed on to the sun
... 'Til we found a sea of green
... And we lived beneath the waves
... In our yellow submarine

... [chorus]
... We all live in a yellow submarine
... Yellow submarine, yellow submarine
... We all live in a yellow submarine
... Yellow submarine, yellow submarine
... '''
>>> genres = "pavarotti-style tenor voice"
>>> client = InferenceClient(
...     provider="fal-ai",
...     model="m-a-p/YuE-s1-7B-anneal-en-cot",
...     api_key=...,
... )
>>> audio = client.text_to_speech(lyrics, extra_body={"genres": genres})
>>> with open("output.mp3", "wb") as f:
...     f.write(audio)

文字轉影片

( prompt: str model: typing.Optional[str] = None guidance_scale: typing.Optional[float] = None negative_prompt: typing.Optional[typing.List[str]] = None num_frames: typing.Optional[float] = None num_inference_steps: typing.Optional[int] = None seed: typing.Optional[int] = None extra_body: typing.Optional[typing.Dict[str, typing.Any]] = None ) → bytes

引數

prompt (str) — 用於從文字生成影片的提示。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。如果未提供，將使用預設推薦的文字到影片模型。預設為 None。
guidance_scale (float, 可選) — 較高的指導尺度值鼓勵模型生成與文字提示緊密相關的影片，但過高的值可能會導致飽和和其他偽影。
negative_prompt (List[str], 可選) — 用於指導影片生成中不應包含內容的一個或多個提示。
num_frames (float, 可選) — num_frames 引數決定生成多少影片幀。
num_inference_steps (int, 可選) — 去噪步驟的數量。更多的去噪步驟通常會帶來更高質量的影片，但推理速度會變慢。
seed (int, 可選) — 隨機數生成器的種子。
extra_body (Dict[str, Any], 可選) — 要傳遞給模型的其他提供商特定引數。有關支援的引數，請參閱提供商的文件。

位元組

生成的影片。

根據給定文字生成影片。

您可以透過使用 `extra_body` 引數將特定於提供方的引數傳遞給模型。

示例

直接使用第三方提供商的示例。使用費用將在您的 fal.ai 賬戶中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="fal-ai",  # Using fal.ai provider
...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
... )
>>> video = client.text_to_video(
...     "A majestic lion running in a fantasy forest",
...     model="tencent/HunyuanVideo",
... )
>>> with open("lion.mp4", "wb") as file:
...     file.write(video)

透過 Hugging Face 路由使用第三方提供方的示例。使用費用將計入您的 Hugging Face 賬戶。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Using replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> video = client.text_to_video(
...     "A cat running in a park",
...     model="genmo/mochi-1-preview",
... )
>>> with open("cat.mp4", "wb") as file:
...     file.write(video)

令牌分類

( text: str model: typing.Optional[str] = None aggregation_strategy: typing.Optional[ForwardRef('TokenClassificationAggregationStrategy')] = None ignore_labels: typing.Optional[typing.List[str]] = None stride: typing.Optional[int] = None ) → List[TokenClassificationOutputElement]

引數

text (str) — 要分類的字串。
model (str, 可選) — 用於令牌分類任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。如果未提供，將使用預設推薦的令牌分類模型。預設為 None。
aggregation_strategy ("TokenClassificationAggregationStrategy", 可選) — 根據模型預測合併令牌的策略
ignore_labels (List[str, 可選) — 要忽略的標籤列表
stride (int, 可選) — 分割輸入文字時塊之間的重疊令牌數。

列表[TokenClassificationOutputElement]

包含實體組、置信度分數、單詞、開始和結束索引的 TokenClassificationOutputElement 列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

對給定文字執行令牌分類。通常用於句子解析，無論是語法解析，還是命名實體識別 (NER) 以理解文字中包含的關鍵詞。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.token_classification("My name is Sarah Jessica Parker but you can call me Jessica")
[
    TokenClassificationOutputElement(
        entity_group='PER',
        score=0.9971321225166321,
        word='Sarah Jessica Parker',
        start=11,
        end=31,
    ),
    TokenClassificationOutputElement(
        entity_group='PER',
        score=0.9773476123809814,
        word='Jessica',
        start=52,
        end=59,
    )
]

翻譯

( text: str model: typing.Optional[str] = None src_lang: typing.Optional[str] = None tgt_lang: typing.Optional[str] = None clean_up_tokenization_spaces: typing.Optional[bool] = None truncation: typing.Optional[ForwardRef('TranslationTruncationStrategy')] = None generate_parameters: typing.Optional[typing.Dict[str, typing.Any]] = None ) → TranslationOutput

引數

text (str) — 要翻譯的字串。
model (str, 可選) — 用於翻譯任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。如果未提供，將使用預設推薦的翻譯模型。預設為 None。
src_lang (str, 可選) — 文字的源語言。對於可以從多種語言翻譯的模型是必需的。
tgt_lang (str, 可選) — 目標翻譯語言。對於可以翻譯成多種語言的模型是必需的。
clean_up_tokenization_spaces (bool, 可選) — 是否清除文字輸出中可能存在的額外空格。
truncation ("TranslationTruncationStrategy", 可選) — 要使用的截斷策略。
generate_parameters (Dict[str, Any], 可選) — 文字生成演算法的附加引數。

TranslationOutput

生成的翻譯文字。

引發

InferenceTimeoutError 或 HTTPError 或 ValueError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。
ValueError — 如果僅提供了 src_lang 和 tgt_lang 引數中的一個。

將文字從一種語言轉換為另一種語言。

請檢視 https://huggingface.co/tasks/translation 以獲取有關如何為您的特定用例選擇最佳模型的更多資訊。源語言和目標語言通常取決於模型。但是，對於某些模型，可以指定源語言和目標語言。如果您正在使用其中一個模型，可以使用 src_lang 和 tgt_lang 引數來傳遞相關資訊。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.translation("My name is Wolfgang and I live in Berlin")
'Mein Name ist Wolfgang und ich lebe in Berlin.'
>>> client.translation("My name is Wolfgang and I live in Berlin", model="Helsinki-NLP/opus-mt-en-fr")
TranslationOutput(translation_text='Je m'appelle Wolfgang et je vis à Berlin.')

指定語言

>>> client.translation("My name is Sarah Jessica Parker but you can call me Jessica", model="facebook/mbart-large-50-many-to-many-mmt", src_lang="en_XX", tgt_lang="fr_XX")
"Mon nom est Sarah Jessica Parker mais vous pouvez m'appeler Jessica"

視覺問答

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] question: str model: typing.Optional[str] = None top_k: typing.Optional[int] = None ) → List[VisualQuestionAnsweringOutputElement]

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 上下文的輸入影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
question (str) — 要回答的問題。
model (str, 可選) — 用於視覺問答任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。如果未提供，將使用預設推薦的視覺問答模型。預設為 None。
top_k (int, 可選) — 返回的答案數量（將按可能性順序選擇）。請注意，如果上下文中沒有足夠的選項，我們將返回少於 topk 的答案。

List[VisualQuestionAnsweringOutputElement]

包含預測標籤和相關機率的 VisualQuestionAnsweringOutputElement 專案列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

基於影像回答開放式問題。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.visual_question_answering(
...     image="https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg",
...     question="What is the animal doing?"
... )
[
    VisualQuestionAnsweringOutputElement(score=0.778609573841095, answer='laying down'),
    VisualQuestionAnsweringOutputElement(score=0.6957435607910156, answer='sitting'),
]

零樣本分類

( text: str candidate_labels: typing.List[str] multi_label: typing.Optional[bool] = False hypothesis_template: typing.Optional[str] = None model: typing.Optional[str] = None ) → List[ZeroShotClassificationOutputElement]

引數

text (str) — 要分類的輸入文字。
candidate_labels (List[str]) — 用於將文字分類為可能的類別標籤集。
labels (List[str], 可選) — (已棄用) 字串列表。每個字串都是輸入文字可能標籤的口語化表達。
multi_label (bool, 可選) — 多個候選標籤是否可以為真。如果為 false，則分數將被歸一化，使得每個序列的標籤可能性之和為 1。如果為 true，則標籤被認為是獨立的，並且每個候選的機率都被歸一化。
hypothesis_template (str, 可選) — 與 candidate_labels 結合使用的句子，透過將佔位符替換為候選標籤來嘗試文字分類。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。此引數會覆蓋例項級別定義的模型。如果未提供，將使用預設推薦的零樣本分類模型。

List[ZeroShotClassificationOutputElement]

包含預測標籤及其置信度的 ZeroShotClassificationOutputElement 項列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

提供文字和一組候選標籤作為輸入，以對輸入文字進行分類。

multi_label=False 的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> text = (
...     "A new model offers an explanation for how the Galilean satellites formed around the solar system's"
...     "largest world. Konstantin Batygin did not set out to solve one of the solar system's most puzzling"
...     " mysteries when he went for a run up a hill in Nice, France."
... )
>>> labels = ["space & cosmos", "scientific discovery", "microbiology", "robots", "archeology"]
>>> client.zero_shot_classification(text, labels)
[
    ZeroShotClassificationOutputElement(label='scientific discovery', score=0.7961668968200684),
    ZeroShotClassificationOutputElement(label='space & cosmos', score=0.18570658564567566),
    ZeroShotClassificationOutputElement(label='microbiology', score=0.00730885099619627),
    ZeroShotClassificationOutputElement(label='archeology', score=0.006258360575884581),
    ZeroShotClassificationOutputElement(label='robots', score=0.004559356719255447),
]
>>> client.zero_shot_classification(text, labels, multi_label=True)
[
    ZeroShotClassificationOutputElement(label='scientific discovery', score=0.9829297661781311),
    ZeroShotClassificationOutputElement(label='space & cosmos', score=0.755190908908844),
    ZeroShotClassificationOutputElement(label='microbiology', score=0.0005462635890580714),
    ZeroShotClassificationOutputElement(label='archeology', score=0.00047131875180639327),
    ZeroShotClassificationOutputElement(label='robots', score=0.00030448526376858354),
]

multi_label=True 和自定義 hypothesis_template 的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.zero_shot_classification(
...    text="I really like our dinner and I'm very happy. I don't like the weather though.",
...    labels=["positive", "negative", "pessimistic", "optimistic"],
...    multi_label=True,
...    hypothesis_template="This text is {} towards the weather"
... )
[
    ZeroShotClassificationOutputElement(label='negative', score=0.9231801629066467),
    ZeroShotClassificationOutputElement(label='pessimistic', score=0.8760990500450134),
    ZeroShotClassificationOutputElement(label='optimistic', score=0.0008674879791215062),
    ZeroShotClassificationOutputElement(label='positive', score=0.0005250611575320363)
]

零樣本影像分類

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] candidate_labels: typing.List[str] model: typing.Optional[str] = None hypothesis_template: typing.Optional[str] = None labels: typing.List[str] = None ) → List[ZeroShotImageClassificationOutputElement]

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 要新增字幕的輸入影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
candidate_labels (List[str]) — 此影像的候選標籤
labels (List[str], 可選) — (已棄用) 字串列表，每個字串都是可能的標籤。必須至少有 2 個標籤。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。此引數會覆蓋例項級別定義的模型。如果未提供，將使用預設推薦的零樣本影像分類模型。
hypothesis_template (str, 可選) — 與 candidate_labels 結合使用的句子，透過將佔位符替換為候選標籤來嘗試影像分類。

List[ZeroShotImageClassificationOutputElement]

包含預測標籤及其置信度的 ZeroShotImageClassificationOutputElement 項列表。

引發

InferenceTimeoutError 或 HTTPError

InferenceTimeoutError — 如果模型不可用或請求超時。
HTTPError — 如果請求失敗並返回 HTTP 錯誤狀態程式碼（HTTP 503 除外）。

提供輸入影像和文字標籤，以預測影像的文字標籤。

示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

>>> client.zero_shot_image_classification(
...     "https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg",
...     labels=["dog", "cat", "horse"],
... )
[ZeroShotImageClassificationOutputElement(label='dog', score=0.956),...]

非同步推理客戶端

還提供了一個基於 asyncio 和 aiohttp 的非同步客戶端版本。要使用它，您可以直接安裝 aiohttp 或使用 [inference] 額外包

pip install --upgrade huggingface_hub[inference]
# or
# pip install aiohttp

class huggingface_hub.AsyncInferenceClient

( model: typing.Optional[str] = None provider: typing.Union[typing.Literal['black-forest-labs', 'cerebras', 'cohere', 'fal-ai', 'featherless-ai', 'fireworks-ai', 'groq', 'hf-inference', 'hyperbolic', 'nebius', 'novita', 'nscale', 'openai', 'replicate', 'sambanova', 'together'], typing.Literal['auto'], NoneType] = None token: typing.Optional[str] = None timeout: typing.Optional[float] = None headers: typing.Optional[typing.Dict[str, str]] = None cookies: typing.Optional[typing.Dict[str, str]] = None trust_env: bool = False proxies: typing.Optional[typing.Any] = None bill_to: typing.Optional[str] = None base_url: typing.Optional[str] = None api_key: typing.Optional[str] = None )

引數

model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，例如 meta-llama/Meta-Llama-3-8B-Instruct，也可以是已部署推理端點的 URL。預設為 None，在這種情況下，將自動為任務選擇推薦的模型。注意：為了更好地與 OpenAI 的客戶端相容，model 已被別名為 base_url。這兩個引數是互斥的。如果將 URL 作為 model 或 base_url 傳遞給聊天完成，則 (/v1)/chat/completions 字尾路徑將附加到 URL。
provider (str, 可選) — 用於推理的提供商名稱。可以是 "black-forest-labs", "cerebras", "cohere", "fal-ai", "featherless-ai", "fireworks-ai", "groq", "hf-inference", "hyperbolic", "nebius", "novita", "nscale", "openai", "replicate", “sambanova” 或 “together” 。預設為 “auto”，即模型可用提供商中的第一個，按使用者在 https://huggingface.co/settings/inference-providers 中的順序排序。如果 model 是 URL 或 base_url 已傳遞，則不使用 provider。
token (str, 可選) — Hugging Face 令牌。如果未提供，將預設為本地儲存的令牌。注意：為了更好地與 OpenAI 的客戶端相容，token 已被別名為 api_key。這兩個引數是互斥的，並且具有完全相同的行為。
timeout (float, 可選) — 等待伺服器響應的最大秒數。預設為 None，表示將迴圈直到伺服器可用。
headers (Dict[str, str], 可選) — 傳送到伺服器的附加標頭。預設情況下，只發送授權和使用者代理標頭。此字典中的值將覆蓋預設值。
bill_to (str, 可選) — 用於請求的計費賬戶。預設情況下，請求計費到使用者的賬戶。請求只能計費到使用者是其成員的組織，並且該組織已訂閱企業中心。
cookies (Dict[str, str], 可選) — 傳送到伺服器的附加 Cookie。
trust_env (bool, 可選) — 如果引數為 True（預設為 False），則信任代理配置的環境設定。
proxies (Any, 可選) — 用於請求的代理。
base_url (str, 可選) — 用於推理的基礎 URL。這是 model 的重複引數，旨在使 InferenceClient 遵循與 openai.OpenAI 客戶端相同的模式。不能與 model 同時設定。預設為 None。
api_key (str, 可選) — 用於身份驗證的令牌。這是 token 的重複引數，旨在使 InferenceClient 遵循與 openai.OpenAI 客戶端相同的模式。不能與 token 同時設定。預設為 None。

初始化新的推理客戶端。

InferenceClient 旨在提供統一的推理體驗。客戶端可以無縫地與（免費的）推理 API、自託管推理端點或第三方推理提供商一起使用。

音訊分類

引數

audio (Union[str, Path, bytes, BinaryIO]) — 要分類的音訊內容。可以是原始音訊位元組、本地音訊檔案或指向音訊檔案的 URL。
model (str, 可選) — 用於音訊分類的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。如果未提供，將使用預設推薦的音訊分類模型。
top_k (int, 可選) — 指定時，將輸出限制為最有可能的前 K 個類別。
function_to_apply ("AudioClassificationOutputTransform", 可選) — 應用於模型輸出以檢索分數的功能。

List[AudioClassificationOutputElement]

包含預測標籤及其置信度的 AudioClassificationOutputElement 項列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

對提供的音訊內容執行音訊分類。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.audio_classification("audio.flac")
[
    AudioClassificationOutputElement(score=0.4976358711719513, label='hap'),
    AudioClassificationOutputElement(score=0.3677836060523987, label='neu'),
    ...
]

音訊到音訊

( audio: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] model: typing.Optional[str] = None ) → List[AudioToAudioOutputElement]

引數

audio (Union[str, Path, bytes, BinaryIO]) — 模型的音訊內容。可以是原始音訊位元組、本地音訊檔案或指向音訊檔案的 URL。
model (str, 可選) — 該模型可以是任何接受音訊檔案並返回另一個音訊檔案的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。如果未提供，將使用預設推薦的 audio_to_audio 模型。

List[AudioToAudioOutputElement]

包含音訊標籤、內容型別和 blob 中的音訊內容的 AudioToAudioOutputElement 項列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

根據模型執行與音訊到音訊相關的多項任務（例如：語音增強、源分離）。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> audio_output = await client.audio_to_audio("audio.flac")
>>> async for i, item in enumerate(audio_output):
>>>     with open(f"output_{i}.flac", "wb") as f:
            f.write(item.blob)

自動語音識別

AutomaticSpeechRecognitionOutput

引數

audio (Union[str, Path, bytes, BinaryIO]) — 要轉錄的內容。可以是原始音訊位元組、本地音訊檔案或指向音訊檔案的 URL。
model (str, 可選) — 用於 ASR 的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。如果未提供，將使用預設推薦的 ASR 模型。
extra_body (Dict, 可選) — 傳遞給模型的其他提供商特定引數。有關支援的引數，請參閱提供商文件。

包含轉錄文字和可選時間戳塊的項。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

對給定的音訊內容執行自動語音識別（ASR 或音訊轉文字）。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.automatic_speech_recognition("hello_world.flac").text
"hello world"

聊天補全

引數

messages (List of ChatCompletionInputMessage) — 由角色和內容對組成的對話歷史。
model (str, 可選) — 用於聊天完成的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署推理端點的 URL。如果未提供，將使用預設推薦的基於聊天的文字生成模型。有關詳細資訊，請參閱 https://huggingface.co/tasks/text-generation。如果 model 是模型 ID，它將作為 model 引數傳遞給伺服器。如果您想在請求有效負載中設定 model 的同時定義自定義 URL，則必須在初始化 InferenceClient 時設定 base_url。
frequency_penalty (float, 可選) — 根據新詞在文字中出現的頻率對其進行懲罰。範圍：[-2.0, 2.0]。預設為 0.0。
logit_bias (List[float], 可選) — 調整特定詞元在生成輸出中出現的可能性。
logprobs (bool, 可選) — 是否返回輸出詞元的對數機率。如果為 true，則返回訊息內容中每個輸出詞元的對數機率。
max_tokens (int, 可選) — 響應中允許的最大詞元數。預設為 100。
n (int, 可選) — 為每個提示生成完成的數量。
presence_penalty (float, 可選) — 介於 -2.0 和 2.0 之間的數字。正值根據新詞是否出現在文字中對其進行懲罰，從而增加模型談論新主題的可能性。
response_format (ChatCompletionInputGrammarType(), 可選) — 語法約束。可以是 JSONSchema 或正則表示式。
seed (可選int，可選) — 用於可重現控制流的種子。預設為 None。
stop (List[str]，可選) — 最多四個字串，用於觸發響應結束。預設為 None。
stream (bool，可選) — 啟用即時流式響應。預設為 False。
stream_options (ChatCompletionInputStreamOptions，可選) — 流式完成的選項。
temperature (float，可選) — 控制生成的隨機性。較低的值可確保較少的隨機完成。範圍：[0, 2]。預設為 1.0。
top_logprobs (int，可選) — 一個介於 0 到 5 之間的整數，指定在每個令牌位置返回最有可能的令牌數量，每個令牌都帶有一個關聯的對數機率。如果使用此引數，logprobs 必須設定為 true。
top_p (float，可選) — 從最有可能的下一個詞中取樣的分數。必須介於 0 和 1 之間。預設為 1.0。
tool_choice (ChatCompletionInputToolChoiceClass 或 ChatCompletionInputToolChoiceEnum()，可選) — 用於完成的工具。預設為“auto”。
tool_prompt (str，可選) — 在工具之前附加的提示。
tools (List of ChatCompletionInputTool，可選) — 模型可能呼叫的工具列表。目前，只支援函式作為工具。使用此引數可提供模型可能生成 JSON 輸入的函式列表。
extra_body (Dict，可選) — 傳遞給模型的額外提供程式特定引數。請參閱提供程式的文件以獲取支援的引數。

ChatCompletionOutput 或 ChatCompletionStreamOutput 的可迭代物件

伺服器返回的生成文字

如果 stream=False，則生成文字作為 ChatCompletionOutput 返回（預設）。
如果 stream=True，則生成文字以 token 為單位作為 ChatCompletionStreamOutput 序列返回。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

使用指定的語言模型完成對話的方法。

您可以透過使用 `extra_body` 引數將特定於提供方的引數傳遞給模型。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> messages = [{"role": "user", "content": "What is the capital of France?"}]
>>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
>>> await client.chat_completion(messages, max_tokens=100)
ChatCompletionOutput(
    choices=[
        ChatCompletionOutputComplete(
            finish_reason='eos_token',
            index=0,
            message=ChatCompletionOutputMessage(
                role='assistant',
                content='The capital of France is Paris.',
                name=None,
                tool_calls=None
            ),
            logprobs=None
        )
    ],
    created=1719907176,
    id='',
    model='meta-llama/Meta-Llama-3-8B-Instruct',
    object='text_completion',
    system_fingerprint='2.0.4-sha-f426a33',
    usage=ChatCompletionOutputUsage(
        completion_tokens=8,
        prompt_tokens=17,
        total_tokens=25
    )
)

使用流式傳輸的示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> messages = [{"role": "user", "content": "What is the capital of France?"}]
>>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
>>> async for token in await client.chat_completion(messages, max_tokens=10, stream=True):
...     print(token)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content='The', role='assistant'), index=0, finish_reason=None)], created=1710498504)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' capital', role='assistant'), index=0, finish_reason=None)], created=1710498504)
(...)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' may', role='assistant'), index=0, finish_reason=None)], created=1710498504)

使用 OpenAI 語法的示例

# Must be run in an async context
# instead of `from openai import OpenAI`
from huggingface_hub import AsyncInferenceClient

# instead of `client = OpenAI(...)`
client = AsyncInferenceClient(
    base_url=...,
    api_key=...,
)

output = await client.chat.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Count to 10"},
    ],
    stream=True,
    max_tokens=1024,
)

for chunk in output:
    print(chunk.choices[0].delta.content)

直接使用額外（提供方特定）引數的第三方提供方示例。使用費用將計入您的 Together AI 賬戶。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="together",  # Use Together AI provider
...     api_key="<together_api_key>",  # Pass your Together API key directly
... )
>>> client.chat_completion(
...     model="meta-llama/Meta-Llama-3-8B-Instruct",
...     messages=[{"role": "user", "content": "What is the capital of France?"}],
...     extra_body={"safety_model": "Meta-Llama/Llama-Guard-7b"},
... )

透過 Hugging Face 路由使用第三方提供方的示例。使用費用將計入您的 Hugging Face 賬戶。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="sambanova",  # Use Sambanova provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> client.chat_completion(
...     model="meta-llama/Meta-Llama-3-8B-Instruct",
...     messages=[{"role": "user", "content": "What is the capital of France?"}],
... )

使用影像 + 文字作為輸入的示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient

# provide a remote URL
>>> image_url ="https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
# or a base64-encoded image
>>> image_path = "/path/to/image.jpeg"
>>> with open(image_path, "rb") as f:
...     base64_image = base64.b64encode(f.read()).decode("utf-8")
>>> image_url = f"data:image/jpeg;base64,{base64_image}"

>>> client = AsyncInferenceClient("meta-llama/Llama-3.2-11B-Vision-Instruct")
>>> output = await client.chat.completions.create(
...     messages=[
...         {
...             "role": "user",
...             "content": [
...                 {
...                     "type": "image_url",
...                     "image_url": {"url": image_url},
...                 },
...                 {
...                     "type": "text",
...                     "text": "Describe this image in one sentence.",
...                 },
...             ],
...         },
...     ],
... )
>>> output
The image depicts the iconic Statue of Liberty situated in New York Harbor, New York, on a clear day.

使用工具的示例

# Must be run in an async context
>>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> messages = [
...     {
...         "role": "system",
...         "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.",
...     },
...     {
...         "role": "user",
...         "content": "What's the weather like the next 3 days in San Francisco, CA?",
...     },
... ]
>>> tools = [
...     {
...         "type": "function",
...         "function": {
...             "name": "get_current_weather",
...             "description": "Get the current weather",
...             "parameters": {
...                 "type": "object",
...                 "properties": {
...                     "location": {
...                         "type": "string",
...                         "description": "The city and state, e.g. San Francisco, CA",
...                     },
...                     "format": {
...                         "type": "string",
...                         "enum": ["celsius", "fahrenheit"],
...                         "description": "The temperature unit to use. Infer this from the users location.",
...                     },
...                 },
...                 "required": ["location", "format"],
...             },
...         },
...     },
...     {
...         "type": "function",
...         "function": {
...             "name": "get_n_day_weather_forecast",
...             "description": "Get an N-day weather forecast",
...             "parameters": {
...                 "type": "object",
...                 "properties": {
...                     "location": {
...                         "type": "string",
...                         "description": "The city and state, e.g. San Francisco, CA",
...                     },
...                     "format": {
...                         "type": "string",
...                         "enum": ["celsius", "fahrenheit"],
...                         "description": "The temperature unit to use. Infer this from the users location.",
...                     },
...                     "num_days": {
...                         "type": "integer",
...                         "description": "The number of days to forecast",
...                     },
...                 },
...                 "required": ["location", "format", "num_days"],
...             },
...         },
...     },
... ]

>>> response = await client.chat_completion(
...     model="meta-llama/Meta-Llama-3-70B-Instruct",
...     messages=messages,
...     tools=tools,
...     tool_choice="auto",
...     max_tokens=500,
... )
>>> response.choices[0].message.tool_calls[0].function
ChatCompletionOutputFunctionDefinition(
    arguments={
        'location': 'San Francisco, CA',
        'format': 'fahrenheit',
        'num_days': 3
    },
    name='get_n_day_weather_forecast',
    description=None
)

使用 response_format 的示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> messages = [
...     {
...         "role": "user",
...         "content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
...     },
... ]
>>> response_format = {
...     "type": "json",
...     "value": {
...         "properties": {
...             "location": {"type": "string"},
...             "activity": {"type": "string"},
...             "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
...             "animals": {"type": "array", "items": {"type": "string"}},
...         },
...         "required": ["location", "activity", "animals_seen", "animals"],
...     },
... }
>>> response = await client.chat_completion(
...     messages=messages,
...     response_format=response_format,
...     max_tokens=500,
... )
>>> response.choices[0].message.content
'{

y": "bike ride",
": ["puppy", "cat", "raccoon"],
_seen": 3,
n": "park"}'

關閉

( )

關閉所有開啟的會話。

預設情況下，在呼叫完成後，‘aiohttp.ClientSession’ 物件會自動關閉。但是，如果從伺服器流式傳輸資料並在流完成之前停止，則必須呼叫此方法才能正確關閉會話。

另一種可能性是使用非同步上下文（例如 async with AsyncInferenceClient(): ...）。

文件問答

引數

image (Union[str, Path, bytes, BinaryIO]) — 上下文的輸入影像。可以是原始位元組、影像檔案或線上影像的 URL。
question (str) — 要回答的問題。
model (str，可選) — 用於文件問答任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的文件問答模型。預設為 None。
doc_stride (int，可選) — 如果文件中的單詞太長而無法與模型的問題匹配，它將被分成幾個有重疊的塊。此引數控制重疊的大小。
handle_impossible_answer (bool，可選) — 是否接受“不可能”作為答案。
lang (str，可選) — 執行 OCR 時使用的語言。預設為英語。
max_answer_len (int，可選) — 預測答案的最大長度（例如，只考慮較短長度的答案）。
max_question_len (int，可選) — 令牌化後問題的最大長度。如果需要，將被截斷。
max_seq_len (int，可選) — 傳遞給模型的每個塊中總句子（上下文 + 問題）的最大長度（以令牌為單位）。如果需要，上下文將被分成幾個塊（使用 doc_stride 作為重疊）。
top_k (int，可選) — 要返回的答案數量（將按可能性順序選擇）。如果上下文中沒有足夠的選項，可以返回少於 top_k 個答案。
word_boxes (List[Union[List[float], str，可選) — 單詞和邊界框列表（歸一化 0->1000）。如果提供，推理將跳過 OCR 步驟並使用提供的邊界框。

List[DocumentQuestionAnsweringOutputElement]

包含預測標籤、相關機率、單詞 ID 和頁碼的 DocumentQuestionAnsweringOutputElement 項列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

回答文件影像上的問題。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.document_question_answering(image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png", question="What is the invoice number?")
[DocumentQuestionAnsweringOutputElement(answer='us-001', end=16, score=0.9999666213989258, start=16)]

特徵提取

引數

text (str) — 要嵌入的文字。
model (str，可選) — 用於特徵提取任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的特徵提取模型。預設為 None。
normalize (bool，可選) — 是否對嵌入進行歸一化。僅在由 Text-Embedding-Inference 提供支援的伺服器上可用。
prompt_name (str，可選) — 編碼時應使用的提示名稱。如果未設定，則不應用提示。必須是 *Sentence Transformers* 配置 *prompts* 字典中的鍵。例如，如果 prompt_name 是“query”，並且 prompts 是 {“query”: “query: ”，…}，那麼句子“What is the capital of France?” 將被編碼為“query: What is the capital of France?”，因為提示文字將在任何要編碼的文字之前新增。
truncate (bool，可選) — 是否截斷嵌入。僅在由 Text-Embedding-Inference 提供支援的伺服器上可用。
truncation_direction (Literal[“Left”, “Right”]，可選) — 當傳遞 truncate=True 時，輸入應從哪一側截斷。

np.ndarray

表示輸入文字的嵌入，為 float32 numpy 陣列。

引發

[InferenceTimeoutError] 或 aiohttp.ClientResponseError

[InferenceTimeoutError] — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

為給定文字生成嵌入。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.feature_extraction("Hi, who are you?")
array([[ 2.424802  ,  2.93384   ,  1.1750331 , ...,  1.240499, -0.13776633, -0.7889173 ],
[-0.42943227, -0.6364878 , -1.693462  , ...,  0.41978157, -2.4336355 ,  0.6162071 ],
...,
[ 0.28552425, -0.928395  , -1.2077185 , ...,  0.76810825, -2.1069427 ,  0.6236161 ]], dtype=float32)

填補掩碼

( text: str model: typing.Optional[str] = None targets: typing.Optional[typing.List[str]] = None top_k: typing.Optional[int] = None ) → List[FillMaskOutputElement]

引數

text (str) — 要填充的字串，必須包含 [MASK] 令牌（檢視模型卡以獲取掩碼的確切名稱）。
model (str，可選) — 用於填充掩碼任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的填充掩碼模型。
targets (List[str，可選) — 傳遞時，模型會將分數限制為傳遞的目標，而不是在整個詞彙表中查詢。如果提供的目標不在模型的詞彙表中，它們將被標記化，並使用第一個結果令牌（有警告，並且可能會較慢）。
top_k (int，可選) — 傳遞時，覆蓋要返回的預測數量。

List[FillMaskOutputElement]

包含預測標籤、相關機率、token 引用和完成文字的 FillMaskOutputElement 項列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

用缺失的單詞（更準確地說是 token）填充一個空白。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.fill_mask("The goal of life is <mask>.")
[
    FillMaskOutputElement(score=0.06897063553333282, token=11098, token_str=' happiness', sequence='The goal of life is happiness.'),
    FillMaskOutputElement(score=0.06554922461509705, token=45075, token_str=' immortality', sequence='The goal of life is immortality.')
]

獲取端點資訊

( model: typing.Optional[str] = None ) → Dict[str, Any]

引數

model (str，可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數會覆蓋例項級別定義的模型。預設為 None。

Dict[str, Any]

關於端點的資訊。

獲取已部署端點的資訊。

此端點僅適用於由 Text-Generation-Inference (TGI) 或 Text-Embedding-Inference (TEI) 提供支援的端點。由 transformers 提供支援的端點返回空載荷。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")
>>> await client.get_endpoint_info()
{
    'model_id': 'meta-llama/Meta-Llama-3-70B-Instruct',
    'model_sha': None,
    'model_dtype': 'torch.float16',
    'model_device_type': 'cuda',
    'model_pipeline_tag': None,
    'max_concurrent_requests': 128,
    'max_best_of': 2,
    'max_stop_sequences': 4,
    'max_input_length': 8191,
    'max_total_tokens': 8192,
    'waiting_served_ratio': 0.3,
    'max_batch_total_tokens': 1259392,
    'max_waiting_tokens': 20,
    'max_batch_size': None,
    'validation_workers': 32,
    'max_client_batch_size': 4,
    'version': '2.0.2',
    'sha': 'dccab72549635c7eb5ddb17f43f0b7cdff07c214',
    'docker_label': 'sha-dccab72'
}

獲取模型狀態

( model: typing.Optional[str] = None ) → ModelStatus

引數

model (str，可選) — 要檢查狀態的模型的識別符號。如果未提供模型，將使用此 InferenceClient 例項關聯的模型。只能檢查 HF Inference API 服務，因此識別符號不能是 URL。

ModelStatus

ModelStatus 資料類的一個例項，包含模型狀態資訊：載入、狀態、計算型別和框架。

獲取託管在 HF Inference API 上的模型的狀態。

此端點主要在您已經知道要使用的模型並想檢查其可用性時有用。如果您想發現已部署的模型，則應使用 list_deployed_models()。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.get_model_status("meta-llama/Meta-Llama-3-8B-Instruct")
ModelStatus(loaded=True, state='Loaded', compute_type='gpu', framework='text-generation-inference')

健康檢查

( model: typing.Optional[str] = None ) → bool

引數

model (str，可選) — 推理端點的 URL。此引數會覆蓋例項級別定義的模型。預設為 None。

布林值

如果一切正常，則為 True。

檢查已部署端點的健康狀況。

健康檢查僅適用於由 Text-Generation-Inference (TGI) 或 Text-Embedding-Inference (TEI) 提供支援的推理端點。對於推理 API，請改用 InferenceClient.get_model_status()。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient("https://jzgu0buei5.us-east-1.aws.endpoints.huggingface.cloud")
>>> await client.health_check()
True

影像分類

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 要分類的影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
model (str，可選) — 用於影像分類的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用影像分類的預設推薦模型。
function_to_apply ("ImageClassificationOutputTransform"，可選) — 應用於模型輸出以檢索分數的函式。
top_k (int，可選) — 指定時，將輸出限制為最有可能的前 K 個類別。

List[ImageClassificationOutputElement]

包含預測標籤和相關機率的 ImageClassificationOutputElement 項列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

使用指定模型對給定影像執行影像分類。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
[ImageClassificationOutputElement(label='Blenheim spaniel', score=0.9779096841812134), ...]

影像分割

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 要分割的影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
model (str，可選) — 用於影像分割的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用影像分割的預設推薦模型。
mask_threshold (float，可選) — 將預測掩碼轉換為二進位制值時使用的閾值。
overlap_mask_area_threshold (float，可選) — 掩碼重疊閾值，用於消除小的、不連通的片段。
subtask ("ImageSegmentationSubtask"，可選) — 要執行的分割任務，取決於模型能力。
threshold (float，可選) — 過濾預測掩碼的機率閾值。

List[ImageSegmentationOutputElement]

包含分割掩碼和相關屬性的 ImageSegmentationOutputElement 列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

使用指定的模型對給定影像執行影像分割。

如果您想處理影像，必須安裝 `PIL` (`pip install Pillow`)。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.image_segmentation("cat.jpg")
[ImageSegmentationOutputElement(score=0.989008, label='LABEL_184', mask=<PIL.PngImagePlugin.PngImageFile image mode=L size=400x300 at 0x7FDD2B129CC0>), ...]

影像到影像

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 用於翻譯的輸入影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
prompt (str，可選) — 指導影像生成的文字提示。
negative_prompt (str，可選) — 指導影像生成中不包含的內容的提示。
num_inference_steps (int，可選) — 對於擴散模型。去噪步驟的數量。更多的去噪步驟通常會帶來更高質量的影像，但推理速度會變慢。
guidance_scale (float，可選) — 對於擴散模型。較高的指導尺度值會促使模型生成與文字提示緊密關聯的影像，但會降低影像質量。
model (str，可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數會覆蓋例項級別定義的模型。預設為 None。
target_size (ImageToImageTargetSize，可選) — 輸出影像的畫素大小。

影像

轉換後的影像。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

使用指定模型執行影像到影像的轉換。

如果您想處理影像，必須安裝 `PIL` (`pip install Pillow`)。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> image = await client.image_to_image("cat.jpg", prompt="turn the cat into a tiger")
>>> image.save("tiger.jpg")

影像到文字

( image: typing.Union[bytes, typing.BinaryIO, str, pathlib.Path, ForwardRef('Image')] model: typing.Optional[str] = None ) → ImageToTextOutput

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 要配字幕的輸入影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
model (str，可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數會覆蓋例項級別定義的模型。預設為 None。

ImageToTextOutput

生成的文字。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

接收輸入影像並返回文字。

模型根據您的用例（影像字幕、光學字元識別 (OCR)、Pix2Struct 等）可以有非常不同的輸出。請檢視模型卡以瞭解模型的具體特性。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.image_to_text("cat.jpg")
'a cat standing in a grassy field '
>>> await client.image_to_text("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
'a dog laying on the grass next to a flower pot '

影像到影片

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 用於生成影片的輸入影像。它可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數會覆蓋例項級別定義的模型。預設為 None。
prompt (str, 可選) — 用於指導影片生成的文字提示。
negative_prompt (str, 可選) — 用於指導影片生成中不應包含內容的提示。
num_frames (float, 可選) — num_frames 引數決定了要生成多少影片幀。
num_inference_steps (int, 可選) — 對於擴散模型。去噪步數。更多的去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
guidance_scale (float, 可選) — 對於擴散模型。更高的指導比例值會促使模型生成與文字提示緊密相關的影片，但會犧牲影像質量。
seed (int, 可選) — 用於影片生成的種子。
target_size (ImageToVideoTargetSize, 可選) — 輸出影片幀的畫素大小。
num_inference_steps (int, 可選) — 去噪步數。更多的去噪步數通常會帶來更高質量的影片，但推理速度會變慢。
seed (int, 可選) — 隨機數生成器的種子。

位元組

生成的影片。

從輸入影像生成影片。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> video = await client.image_to_video("cat.jpg", model="Wan-AI/Wan2.2-I2V-A14B", prompt="turn the cat into a tiger")
>>> with open("tiger.mp4", "wb") as f:
...     f.write(video)

列出已部署的模型

( frameworks: typing.Union[NoneType, str, typing.Literal['all'], typing.List[str]] = None ) → Dict[str, List[str]]

引數

frameworks (Literal["all"] 或 List[str] 或 str, 可選) — 要篩選的框架。預設情況下，只測試可用框架的子集。如果設定為“all”，將測試所有可用框架。也可以提供單個框架或一組自定義框架進行檢查。

Dict[str, List[str]]

將任務名稱對映到模型 ID 的排序列表的字典。

列出部署在 HF Serverless Inference API 服務上的模型。

此端點方法主要用於發現。如果您已經知道要使用的模型並想檢查其可用性，可以直接使用 get_model_status()。

示例

# Must be run in an async contextthon
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

# Discover zero-shot-classification models currently deployed
>>> models = await client.list_deployed_models()
>>> models["zero-shot-classification"]
['Narsil/deberta-large-mnli-zero-cls', 'facebook/bart-large-mnli', ...]

# List from only 1 framework
>>> await client.list_deployed_models("text-generation-inference")
{'text-generation': ['bigcode/starcoder', 'meta-llama/Llama-2-70b-chat-hf', ...], ...}

物件檢測

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 要檢測影像上的物件。它可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
model (str, 可選) — 用於物件檢測的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用物件檢測的預設推薦模型 (DETR)。
threshold (float, 可選) — 進行預測所需的機率。

List[ObjectDetectionOutputElement]

包含邊界框和相關屬性的 ObjectDetectionOutputElement 列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError 或 ValueError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。
ValueError — 如果請求輸出不是列表。

使用指定的模型對給定影像執行物件檢測。

如果您想處理影像，必須安裝 `PIL` (`pip install Pillow`)。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.object_detection("people.jpg")
[ObjectDetectionOutputElement(score=0.9486683011054993, label='person', box=ObjectDetectionBoundingBox(xmin=59, ymin=39, xmax=420, ymax=510)), ...]

問答

引數

question (str) — 要回答的問題。
context (str) — 問題的上下文。
model (str) — 用於問答任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。
align_to_words (bool, 可選) — 嘗試將答案與真實詞語對齊。可提高以空格分隔的語言的質量。但可能對不以空格分隔的語言（如日語或中文）造成負面影響。
doc_stride (int, 可選) — 如果上下文太長，無法與模型的問題匹配，則會將其分成幾個具有重疊的塊。此引數控制重疊的大小。
handle_impossible_answer (bool, 可選) — 是否接受不可能的答案。
max_answer_len (int, 可選) — 預測答案的最大長度（例如，只考慮長度較短的答案）。
max_question_len (int, 可選) — 標記化後問題的最大長度。如果需要，將被截斷。
max_seq_len (int, 可選) — 傳遞給模型的每個塊中總句（上下文 + 問題）的最大長度（以標記為單位）。如果需要，上下文將分成幾個塊（使用 docStride 作為重疊）。
top_k (int, 可選) — 要返回的答案數量（將按可能性順序選擇）。請注意，如果上下文內沒有足夠的可用選項，我們返回的答案數量將少於 topk。

Union[QuestionAnsweringOutputElement, ListQuestionAnsweringOutputElement]

當 top_k 為 1 或未提供時，返回單個 QuestionAnsweringOutputElement。當 top_k 大於 1 時，返回 QuestionAnsweringOutputElement 列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

從給定文字中檢索問題的答案。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.question_answering(question="What's my name?", context="My name is Clara and I live in Berkeley.")
QuestionAnsweringOutputElement(answer='Clara', end=16, score=0.9326565265655518, start=11)

句子相似度

( sentence: str other_sentences: typing.List[str] model: typing.Optional[str] = None ) → List[float]

引數

sentence (str) — 要與其他句子進行比較的主句。
other_sentences (List[str]) — 要比較的句子列表。
model (str, 可選) — 用於句子相似性任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的句子相似性模型。預設為 None。

List[float]

表示輸入文字的嵌入。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

透過比較句子的嵌入來計算句子與一列其他句子之間的語義相似度。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.sentence_similarity(
...     "Machine learning is so easy.",
...     other_sentences=[
...         "Deep learning is so straightforward.",
...         "This is so difficult, like rocket science.",
...         "I can't believe how much I struggled with this.",
...     ],
... )
[0.7785726189613342, 0.45876261591911316, 0.2906220555305481]

摘要

引數

text (str) — 要概括的輸入文字。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的摘要模型。
clean_up_tokenization_spaces (bool, 可選) — 是否清理文字輸出中可能存在的額外空格。
generate_parameters (Dict[str, Any], 可選) — 文字生成演算法的額外引數。
truncation ("SummarizationTruncationStrategy", 可選) — 要使用的截斷策略。

SummarizationOutput

生成的摘要文字。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

使用指定模型生成給定文字的摘要。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.summarization("The Eiffel tower...")
SummarizationOutput(generated_text="The Eiffel tower is one of the most famous landmarks in the world....")

table_question_answering

TableQuestionAnsweringOutputElement

引數

table (str) — 以列表字典表示的資料表，其中條目是標題，列表是所有值，所有列表必須具有相同的大小。
query (str) — 您要向表格提出的純文字查詢。
model (str) — 用於表格問答任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。
padding ("Padding", 可選) — 啟用和控制填充。
sequential (bool, 可選) — 是否按順序或分批進行推理。分批速度更快，但像 SQA 這樣的模型需要按順序進行推理才能提取序列中的關係，因為它們的對話性質。
truncation (bool, 可選) — 啟用和控制截斷。

一個表格問答輸出，包含答案、座標、單元格和使用的聚合器。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

從表格中給出的資訊中檢索問題的答案。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> query = "How many stars does the transformers repository have?"
>>> table = {"Repository": ["Transformers", "Datasets", "Tokenizers"], "Stars": ["36542", "4512", "3934"]}
>>> await client.table_question_answering(table, query, model="google/tapas-base-finetuned-wtq")
TableQuestionAnsweringOutputElement(answer='36542', coordinates=[[0, 1]], cells=['36542'], aggregator='AVERAGE')

tabular_classification

( table: typing.Dict[str, typing.Any] model: typing.Optional[str] = None ) → List

引數

table (Dict[str, Any]) — 要分類的屬性集。
model (str, 可選) — 用於表格分類任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的表格分類模型。預設為 None。

List

標籤列表，每行一個，與初始表格中的行對應。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

根據一組屬性對目標類別（一個組）進行分類。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> table = {
...     "fixed_acidity": ["7.4", "7.8", "10.3"],
...     "volatile_acidity": ["0.7", "0.88", "0.32"],
...     "citric_acid": ["0", "0", "0.45"],
...     "residual_sugar": ["1.9", "2.6", "6.4"],
...     "chlorides": ["0.076", "0.098", "0.073"],
...     "free_sulfur_dioxide": ["11", "25", "5"],
...     "total_sulfur_dioxide": ["34", "67", "13"],
...     "density": ["0.9978", "0.9968", "0.9976"],
...     "pH": ["3.51", "3.2", "3.23"],
...     "sulphates": ["0.56", "0.68", "0.82"],
...     "alcohol": ["9.4", "9.8", "12.6"],
... }
>>> await client.tabular_classification(table=table, model="julien-c/wine-quality")
["5", "5", "5"]

tabular_regression

( table: typing.Dict[str, typing.Any] model: typing.Optional[str] = None ) → List

引數

table (Dict[str, Any]) — 儲存在表中的屬性集。用於預測目標的屬性可以是數值型和類別型。
model (str, 可選) — 用於表格迴歸任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的表格迴歸模型。預設為 None。

List

預測的數值目標值列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

在給定表格中一組屬性/特徵的情況下預測數值目標值。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> table = {
...     "Height": ["11.52", "12.48", "12.3778"],
...     "Length1": ["23.2", "24", "23.9"],
...     "Length2": ["25.4", "26.3", "26.5"],
...     "Length3": ["30", "31.2", "31.1"],
...     "Species": ["Bream", "Bream", "Bream"],
...     "Width": ["4.02", "4.3056", "4.6961"],
... }
>>> await client.tabular_regression(table, model="scikit-learn/Fish-Weight")
[110, 120, 130]

text_classification

引數

text (str) — 要分類的字串。
model (str, 可選) — 用於文字分類任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的文字分類模型。預設為 None。
top_k (int, 可選) — 指定時，將輸出限制為最可能的 K 個類別。
function_to_apply ("TextClassificationOutputTransform", 可選) — 用於將模型輸出轉換為分數的功能。

List[TextClassificationOutputElement]

包含預測標籤和相關機率的 TextClassificationOutputElement 項列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

對給定文字執行文字分類（例如情感分析）。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.text_classification("I like you")
[
    TextClassificationOutputElement(label='POSITIVE', score=0.9998695850372314),
    TextClassificationOutputElement(label='NEGATIVE', score=0.0001304351753788069),
]

text_generation

引數

prompt (str) — 輸入文字。
details (bool, 可選) — 預設情況下，text_generation 返回完整的生成文字。如果需要詳細輸出（標記、機率、種子、完成原因等），請傳遞 details=True。僅適用於使用 text-generation-inference 後端執行的模型。
stream (bool, 可選) — 預設情況下，text_generation 返回完整的生成文字。如果需要返回標記流，請傳遞 stream=True。僅適用於使用 text-generation-inference 後端執行的模型。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數會覆蓋例項級別定義的模型。預設為 None。
adapter_id (str, 可選) — Lora 介面卡 ID。
best_of (int, 可選) — 生成 best_of 個序列並返回具有最高標記對數機率的那個。
decoder_input_details (bool, 可選) — 返回解碼器輸入標記的對數機率和 ID。必須同時設定 details=True 才能生效。預設為 False。
do_sample (bool, 可選) — 啟用 logits 取樣。
frequency_penalty (float, optional) — 介於 -2.0 和 2.0 之間的數字。正值根據新標記在文字中已有的頻率進行懲罰，從而降低模型逐字重複相同行的可能性。
grammar (TextGenerationInputGrammarType, 可選) — 語法約束。可以是 JSONSchema 或正則表示式。
max_new_tokens (int, 可選) — 生成標記的最大數量。預設為 100。
repetition_penalty (float, 可選) — 重複懲罰的引數。1.0 表示沒有懲罰。有關更多詳細資訊，請參閱這篇論文。
return_full_text (bool, 可選) — 是否在生成的文字前加上提示。
seed (int, 可選) — 隨機取樣種子。
stop (List[str], 可選) — 如果生成了 `stop` 成員，則停止生成標記。
stop_sequences (List[str], 可選) — 已棄用的引數。請改用 `stop`。
temperature (float, 可選) — 用於調製對數分佈的值。
top_n_tokens (int, 可選) — 返回每個生成步驟中 `top_n_tokens` 最可能標記的資訊，而不僅僅是取樣的標記。
top_k (int, *可選`) — 用於 top-k 過濾的最高機率詞彙標記的數量。
top_p (float, *可選) -- 如果設定為 < 1，則僅保留機率總和達到或高於top_p` 的最小機率標記集。
truncate (int, *可選`) — 將輸入標記截斷為給定大小。
typical_p (float, *可選`) — 典型解碼質量，詳情請參閱典型解碼用於自然語言生成。
watermark (bool, 可選) — 使用大型語言模型水印進行水印。

Union[str, TextGenerationOutput, Iterable[str], Iterable[TextGenerationStreamOutput]]

伺服器返回的生成文字

如果 stream=False 且 details=False，則生成文字以 str 形式返回（預設）
如果 stream=True 且 details=False，則生成文字以 Iterable[str] 形式逐個標記返回
如果 stream=False 且 details=True，則生成文字以 TextGenerationOutput 形式返回更多詳細資訊
如果 details=True 且 stream=True，則生成文字以 TextGenerationStreamOutput 的可迭代形式逐個標記返回

引發

ValidationError 或 InferenceTimeoutError 或 aiohttp.ClientResponseError

ValidationError — 如果輸入值無效。不對伺服器進行 HTTP 呼叫。
InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

給定一個提示，生成以下文字。

如果您想從聊天訊息生成響應，應使用 InferenceClient.chat_completion() 方法。它接受訊息列表而不是單個文字提示，併為您處理聊天模板。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

# Case 1: generate text
>>> await client.text_generation("The huggingface_hub library is ", max_new_tokens=12)
'100% open source and built to be easy to use.'

# Case 2: iterate over the generated tokens. Useful for large generation.
>>> async for token in await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, stream=True):
...     print(token)
100
%
open
source
and
built
to
be
easy
to
use
.

# Case 3: get more details about the generation process.
>>> await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True)
TextGenerationOutput(
    generated_text='100% open source and built to be easy to use.',
    details=TextGenerationDetails(
        finish_reason='length',
        generated_tokens=12,
        seed=None,
        prefill=[
            TextGenerationPrefillOutputToken(id=487, text='The', logprob=None),
            TextGenerationPrefillOutputToken(id=53789, text=' hugging', logprob=-13.171875),
            (...)
            TextGenerationPrefillOutputToken(id=204, text=' ', logprob=-7.0390625)
        ],
        tokens=[
            TokenElement(id=1425, text='100', logprob=-1.0175781, special=False),
            TokenElement(id=16, text='%', logprob=-0.0463562, special=False),
            (...)
            TokenElement(id=25, text='.', logprob=-0.5703125, special=False)
        ],
        best_of_sequences=None
    )
)

# Case 4: iterate over the generated tokens with more details.
# Last object is more complete, containing the full generated text and the finish reason.
>>> async for details in await client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True, stream=True):
...     print(details)
...
TextGenerationStreamOutput(token=TokenElement(id=1425, text='100', logprob=-1.0175781, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=16, text='%', logprob=-0.0463562, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=1314, text=' open', logprob=-1.3359375, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=3178, text=' source', logprob=-0.28100586, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=273, text=' and', logprob=-0.5961914, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=3426, text=' built', logprob=-1.9423828, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-1.4121094, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=314, text=' be', logprob=-1.5224609, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=1833, text=' easy', logprob=-2.1132812, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=271, text=' to', logprob=-0.08520508, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(id=745, text=' use', logprob=-0.39453125, special=False), generated_text=None, details=None)
TextGenerationStreamOutput(token=TokenElement(
    id=25,
    text='.',
    logprob=-0.5703125,
    special=False),
    generated_text='100% open source and built to be easy to use.',
    details=TextGenerationStreamOutputStreamDetails(finish_reason='length', generated_tokens=12, seed=None)
)

# Case 5: generate constrained output using grammar
>>> response = await client.text_generation(
...     prompt="I saw a puppy a cat and a raccoon during my bike ride in the park",
...     model="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1",
...     max_new_tokens=100,
...     repetition_penalty=1.3,
...     grammar={
...         "type": "json",
...         "value": {
...             "properties": {
...                 "location": {"type": "string"},
...                 "activity": {"type": "string"},
...                 "animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
...                 "animals": {"type": "array", "items": {"type": "string"}},
...             },
...             "required": ["location", "activity", "animals_seen", "animals"],
...         },
...     },
... )
>>> json.loads(response)
{
    "activity": "bike riding",
    "animals": ["puppy", "cat", "raccoon"],
    "animals_seen": 3,
    "location": "park"
}

text_to_image

引數

prompt (str) — 用於生成影像的提示。
negative_prompt (str, 可選) — 一條用於指導影像生成中不應包含哪些內容的提示。
height (int, 可選) — 輸出影像的高度（以畫素為單位）。
width (int, 可選) — 輸出影像的寬度（以畫素為單位）。
num_inference_steps (int, 可選) — 去噪步數。更多去噪步數通常會帶來更高質量的影像，但推理速度會變慢。
guidance_scale (float, 可選) — 較高的指導尺度值鼓勵模型生成與文字提示緊密關聯的影像，但過高的值可能導致飽和度和其他偽影。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是部署的推理端點的 URL。如果未提供，將使用預設推薦的文字到影像模型。預設為 None。
scheduler (str, 可選) — 使用相容的排程器覆蓋。
seed (int, 可選) — 隨機數生成器的種子。
extra_body (Dict[str, Any], 可選) — 傳遞給模型的額外提供者特定引數。有關支援的引數，請參閱提供者的文件。

影像

生成的影像。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

根據給定文字使用指定模型生成影像。

如果您想處理影像，必須安裝 `PIL` (`pip install Pillow`)。

您可以透過使用 `extra_body` 引數將特定於提供方的引數傳遞給模型。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

>>> image = await client.text_to_image("An astronaut riding a horse on the moon.")
>>> image.save("astronaut.png")

>>> image = await client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     negative_prompt="low resolution, blurry",
...     model="stabilityai/stable-diffusion-2-1",
... )
>>> image.save("better_astronaut.png")

直接使用第三方提供商的示例。使用費用將在您的 fal.ai 賬戶中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="fal-ai",  # Use fal.ai provider
...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
... )
>>> image = client.text_to_image(
...     "A majestic lion in a fantasy forest",
...     model="black-forest-labs/FLUX.1-schnell",
... )
>>> image.save("lion.png")

透過 Hugging Face 路由使用第三方提供方的示例。使用費用將計入您的 Hugging Face 賬戶。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> image = client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     model="black-forest-labs/FLUX.1-dev",
... )
>>> image.save("astronaut.png")

使用 Replicate 提供商和額外引數的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> image = client.text_to_image(
...     "An astronaut riding a horse on the moon.",
...     model="black-forest-labs/FLUX.1-schnell",
...     extra_body={"output_quality": 100},
... )
>>> image.save("astronaut.png")

文字轉語音

引數

text (str) — 要合成的文字。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是部署的推理端點的 URL。如果未提供，將使用預設推薦的文字到語音模型。預設為 None。
do_sample (bool, 可選) — 生成新標記時是否使用取樣而不是貪婪解碼。
early_stopping (Union[bool, "TextToSpeechEarlyStoppingEnum"], 可選) — 控制基於束的方法的停止條件。
epsilon_cutoff (float, 可選) — 如果設定為嚴格介於 0 和 1 之間的浮點數，則僅對條件機率大於 `epsilon_cutoff` 的標記進行取樣。在論文中，建議的值範圍為 3e-4 到 9e-4，具體取決於模型的大小。詳情請參閱截斷取樣作為語言模型去平滑。
eta_cutoff (float, 可選) — Eta 取樣是區域性典型取樣和 epsilon 取樣的混合。如果設定為嚴格介於 0 和 1 之間的浮點數，則只有當標記大於 `eta_cutoff` 或 `sqrt(eta_cutoff)`
- `exp(-entropy(softmax(next_token_logits)))` 時才會被考慮。後一個項直觀上是預期下一個標記機率，由 `sqrt(eta_cutoff)` 縮放。在論文中，建議的值範圍為 3e-4 到 2e-3，具體取決於模型的大小。詳情請參閱截斷取樣作為語言模型去平滑。
max_length (int, 可選) — 生成文字的最大長度（以標記為單位），包括輸入。
max_new_tokens (int, 可選) — 生成標記的最大數量。優先於 `max_length`。
min_length (int, 可選) — 生成文字的最小長度（以標記為單位），包括輸入。
min_new_tokens (int, 可選) — 生成標記的最小數量。優先於 `min_length`。
num_beam_groups (int, 可選) — 將 `num_beams` 分成若干組，以確保不同束組之間的多樣性。詳情請參閱這篇論文。
num_beams (int, 可選) — 用於束搜尋的束數。
penalty_alpha (float, 可選) — 該值平衡了對比搜尋解碼中的模型置信度和退化懲罰。
temperature (float, 可選) — 用於調節下一個標記機率的值。
top_k (int, 可選) — 用於 top-k 過濾的最高機率詞彙標記的數量。
top_p (float, 可選) — 如果設定為浮點數 < 1，則僅保留機率總和達到或高於 `top_p` 的最小機率標記集。
typical_p (float, 可選) — 區域性典型性衡量預測目標標記的條件機率與預測隨機標記的預期條件機率（給定已生成的部分文字）的相似程度。如果設定為浮點數 < 1，則保留區域性典型性最高且機率總和達到或高於 `typical_p` 的最小標記集。詳情請參閱這篇論文。
use_cache (bool, 可選) — 模型是否應使用過去的最後鍵/值注意力來加速解碼。
extra_body (Dict[str, Any], 可選) — 傳遞給模型的額外提供者特定引數。有關支援的引數，請參閱提供者的文件。

位元組

生成的音訊。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

合成給定文字的語音音訊。

您可以透過使用 `extra_body` 引數將特定於提供方的引數傳遞給模型。

示例

# Must be run in an async context
>>> from pathlib import Path
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

>>> audio = await client.text_to_speech("Hello world")
>>> Path("hello_world.flac").write_bytes(audio)

直接使用第三方提供商的示例。使用費用將在您的 Replicate 賬戶中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",
...     api_key="your-replicate-api-key",  # Pass your Replicate API key directly
... )
>>> audio = client.text_to_speech(
...     text="Hello world",
...     model="OuteAI/OuteTTS-0.3-500M",
... )
>>> Path("hello_world.flac").write_bytes(audio)

透過 Hugging Face 路由使用第三方提供方的示例。使用費用將計入您的 Hugging Face 賬戶。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",
...     api_key="hf_...",  # Pass your HF token
... )
>>> audio =client.text_to_speech(
...     text="Hello world",
...     model="OuteAI/OuteTTS-0.3-500M",
... )
>>> Path("hello_world.flac").write_bytes(audio)

使用 Replicate 提供商和額外引數的示例

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Use replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> audio = client.text_to_speech(
...     "Hello, my name is Kororo, an awesome text-to-speech model.",
...     model="hexgrad/Kokoro-82M",
...     extra_body={"voice": "af_nicole"},
... )
>>> Path("hello.flac").write_bytes(audio)

在 fal.ai 上使用“YuE-s1-7B-anneal-en-cot”生成音樂的示例

>>> from huggingface_hub import InferenceClient
>>> lyrics = '''
... [verse]
... In the town where I was born
... Lived a man who sailed to sea
... And he told us of his life
... In the land of submarines
... So we sailed on to the sun
... 'Til we found a sea of green
... And we lived beneath the waves
... In our yellow submarine

... [chorus]
... We all live in a yellow submarine
... Yellow submarine, yellow submarine
... We all live in a yellow submarine
... Yellow submarine, yellow submarine
... '''
>>> genres = "pavarotti-style tenor voice"
>>> client = InferenceClient(
...     provider="fal-ai",
...     model="m-a-p/YuE-s1-7B-anneal-en-cot",
...     api_key=...,
... )
>>> audio = client.text_to_speech(lyrics, extra_body={"genres": genres})
>>> with open("output.mp3", "wb") as f:
...     f.write(audio)

文字轉影片

引數

prompt (str) — 用於生成影片的提示。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是部署的推理端點的 URL。如果未提供，將使用預設推薦的文字到影片模型。預設為 None。
guidance_scale (float, 可選) — 較高的指導尺度值鼓勵模型生成與文字提示緊密關聯的影片，但過高的值可能導致飽和度和其他偽影。
negative_prompt (List[str], 可選) — 一條或多條指導影片生成中不應包含哪些內容的提示。
num_frames (float, 可選) — `num_frames` 引數確定生成多少影片幀。
num_inference_steps (int, 可選) — 去噪步數。更多去噪步數通常會帶來更高質量的影片，但推理速度會變慢。
seed (int, 可選) — 隨機數生成器的種子。
extra_body (Dict[str, Any], 可選) — 傳遞給模型的額外提供者特定引數。有關支援的引數，請參閱提供者的文件。

位元組

生成的影片。

根據給定文字生成影片。

您可以透過使用 `extra_body` 引數將特定於提供方的引數傳遞給模型。

示例

直接使用第三方提供商的示例。使用費用將在您的 fal.ai 賬戶中扣除。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="fal-ai",  # Using fal.ai provider
...     api_key="fal-ai-api-key",  # Pass your fal.ai API key
... )
>>> video = client.text_to_video(
...     "A majestic lion running in a fantasy forest",
...     model="tencent/HunyuanVideo",
... )
>>> with open("lion.mp4", "wb") as file:
...     file.write(video)

透過 Hugging Face 路由使用第三方提供方的示例。使用費用將計入您的 Hugging Face 賬戶。

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="replicate",  # Using replicate provider
...     api_key="hf_...",  # Pass your HF token
... )
>>> video = client.text_to_video(
...     "A cat running in a park",
...     model="genmo/mochi-1-preview",
... )
>>> with open("cat.mp4", "wb") as file:
...     file.write(video)

令牌分類

引數

text (str) — 要分類的字串。
model (str, 可選) — 用於執行詞元分類任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的詞元分類模型。預設為 None。
aggregation_strategy ("TokenClassificationAggregationStrategy", 可選) — 根據模型預測融合詞元的策略。
ignore_labels (List[str, 可選) — 要忽略的標籤列表。
stride (int, 可選) — 分割輸入文字時，塊之間重疊詞元的數量。

列表[TokenClassificationOutputElement]

包含實體組、置信度分數、單詞、開始和結束索引的 TokenClassificationOutputElement 列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

對給定文字執行令牌分類。通常用於句子解析，無論是語法解析，還是命名實體識別 (NER) 以理解文字中包含的關鍵詞。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.token_classification("My name is Sarah Jessica Parker but you can call me Jessica")
[
    TokenClassificationOutputElement(
        entity_group='PER',
        score=0.9971321225166321,
        word='Sarah Jessica Parker',
        start=11,
        end=31,
    ),
    TokenClassificationOutputElement(
        entity_group='PER',
        score=0.9773476123809814,
        word='Jessica',
        start=52,
        end=59,
    )
]

翻譯

引數

text (str) — 要翻譯的字串。
model (str, 可選) — 用於翻譯任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的翻譯模型。預設為 None。
src_lang (str, 可選) — 文字的源語言。對於可以從多種語言翻譯的模型，此為必需引數。
tgt_lang (str, 可選) — 目標翻譯語言。對於可以翻譯成多種語言的模型，此為必需引數。
clean_up_tokenization_spaces (bool, 可選) — 是否清理文字輸出中可能存在的額外空格。
truncation ("TranslationTruncationStrategy", 可選) — 要使用的截斷策略。
generate_parameters (Dict[str, Any], 可選) — 文字生成演算法的其他引數化。

TranslationOutput

生成的翻譯文字。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError 或 ValueError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。
ValueError — 如果僅提供了 src_lang 和 tgt_lang 引數中的一個。

將文字從一種語言轉換為另一種語言。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.translation("My name is Wolfgang and I live in Berlin")
'Mein Name ist Wolfgang und ich lebe in Berlin.'
>>> await client.translation("My name is Wolfgang and I live in Berlin", model="Helsinki-NLP/opus-mt-en-fr")
TranslationOutput(translation_text='Je m'appelle Wolfgang et je vis à Berlin.')

指定語言

>>> client.translation("My name is Sarah Jessica Parker but you can call me Jessica", model="facebook/mbart-large-50-many-to-many-mmt", src_lang="en_XX", tgt_lang="fr_XX")
"Mon nom est Sarah Jessica Parker mais vous pouvez m'appeler Jessica"

視覺問答

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 上下文的輸入影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
question (str) — 要回答的問題。
model (str, 可選) — 用於視覺問答任務的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。如果未提供，將使用預設推薦的視覺問答模型。預設為 None。
top_k (int, 可選) — 要返回的答案數量（將按可能性順序選擇）。請注意，如果上下文中沒有足夠的選項，我們將返回少於 topk 的答案。

List[VisualQuestionAnsweringOutputElement]

包含預測標籤和相關機率的 VisualQuestionAnsweringOutputElement 專案列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

基於影像回答開放式問題。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.visual_question_answering(
...     image="https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg",
...     question="What is the animal doing?"
... )
[
    VisualQuestionAnsweringOutputElement(score=0.778609573841095, answer='laying down'),
    VisualQuestionAnsweringOutputElement(score=0.6957435607910156, answer='sitting'),
]

零樣本分類

引數

text (str) — 要分類的輸入文字。
candidate_labels (List[str]) — 用於分類文字的可能類別標籤集。
labels (List[str], 可選) — (已棄用) 字串列表。每個字串是輸入文字可能標籤的口頭表達。
multi_label (bool, 可選) — 是否可以有多個候選標籤為真。如果為假，分數將歸一化，使得每個序列的標籤可能性之和為 1。如果為真，標籤被認為是獨立的，並且機率針對每個候選進行歸一化。
hypothesis_template (str, 可選) — 與 candidate_labels 結合使用的句子，透過將佔位符替換為候選標籤來嘗試文字分類。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數將覆蓋例項級別定義的模型。如果未提供，將使用預設推薦的零樣本分類模型。

List[ZeroShotClassificationOutputElement]

包含預測標籤及其置信度的 ZeroShotClassificationOutputElement 項列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

提供文字和一組候選標籤作為輸入，以對輸入文字進行分類。

multi_label=False 的示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> text = (
...     "A new model offers an explanation for how the Galilean satellites formed around the solar system's"
...     "largest world. Konstantin Batygin did not set out to solve one of the solar system's most puzzling"
...     " mysteries when he went for a run up a hill in Nice, France."
... )
>>> labels = ["space & cosmos", "scientific discovery", "microbiology", "robots", "archeology"]
>>> await client.zero_shot_classification(text, labels)
[
    ZeroShotClassificationOutputElement(label='scientific discovery', score=0.7961668968200684),
    ZeroShotClassificationOutputElement(label='space & cosmos', score=0.18570658564567566),
    ZeroShotClassificationOutputElement(label='microbiology', score=0.00730885099619627),
    ZeroShotClassificationOutputElement(label='archeology', score=0.006258360575884581),
    ZeroShotClassificationOutputElement(label='robots', score=0.004559356719255447),
]
>>> await client.zero_shot_classification(text, labels, multi_label=True)
[
    ZeroShotClassificationOutputElement(label='scientific discovery', score=0.9829297661781311),
    ZeroShotClassificationOutputElement(label='space & cosmos', score=0.755190908908844),
    ZeroShotClassificationOutputElement(label='microbiology', score=0.0005462635890580714),
    ZeroShotClassificationOutputElement(label='archeology', score=0.00047131875180639327),
    ZeroShotClassificationOutputElement(label='robots', score=0.00030448526376858354),
]

multi_label=True 和自定義 hypothesis_template 的示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.zero_shot_classification(
...    text="I really like our dinner and I'm very happy. I don't like the weather though.",
...    labels=["positive", "negative", "pessimistic", "optimistic"],
...    multi_label=True,
...    hypothesis_template="This text is {} towards the weather"
... )
[
    ZeroShotClassificationOutputElement(label='negative', score=0.9231801629066467),
    ZeroShotClassificationOutputElement(label='pessimistic', score=0.8760990500450134),
    ZeroShotClassificationOutputElement(label='optimistic', score=0.0008674879791215062),
    ZeroShotClassificationOutputElement(label='positive', score=0.0005250611575320363)
]

零樣本影像分類

引數

image (Union[str, Path, bytes, BinaryIO, PIL.Image.Image]) — 要配字幕的輸入影像。可以是原始位元組、影像檔案、線上影像的 URL 或 PIL 影像。
candidate_labels (List[str]) — 此影像的候選標籤。
labels (List[str], 可選) — (已棄用) 字串列表，表示可能的標籤。必須至少有 2 個標籤。
model (str, 可選) — 用於推理的模型。可以是 Hugging Face Hub 上託管的模型 ID，也可以是已部署的推理端點的 URL。此引數將覆蓋例項級別定義的模型。如果未提供，將使用預設推薦的零樣本影像分類模型。
hypothesis_template (str, 可選) — 與 candidate_labels 結合使用的句子，透過將佔位符替換為候選標籤來嘗試影像分類。

List[ZeroShotImageClassificationOutputElement]

包含預測標籤及其置信度的 ZeroShotImageClassificationOutputElement 項列表。

引發

InferenceTimeoutError 或 aiohttp.ClientResponseError

InferenceTimeoutError — 如果模型不可用或請求超時。
aiohttp.ClientResponseError — 如果請求失敗並返回 HTTP 錯誤狀態碼，但不是 HTTP 503。

提供輸入影像和文字標籤，以預測影像的文字標籤。

示例

# Must be run in an async context
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

>>> await client.zero_shot_image_classification(
...     "https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg",
...     labels=["dog", "cat", "horse"],
... )
[ZeroShotImageClassificationOutputElement(label='dog', score=0.956),...]

InferenceTimeoutError

class huggingface_hub.InferenceTimeoutError

( *args **kwargs )

模型不可用或請求超時時引發的錯誤。

ModelStatus

class huggingface_hub.inference._common.ModelStatus

( loaded: bool state: str compute_type: typing.Dict framework: str )

引數

loaded (bool) — 模型是否當前已載入到 HF 的推理 API 中。模型按需載入，導致使用者的首次請求需要更長時間。如果模型已載入，可以確保其處於正常狀態。
state (str) — 模型的當前狀態。可以是“Loaded”（已載入）、“Loadable”（可載入）、“TooBig”（過大）。如果模型狀態為“Loadable”，則表示它不過大且具有受支援的後端。當用戶首次請求端點上的推理時，可載入模型會自動載入。這意味著使用者載入模型是透明的，只是首次呼叫需要更長時間才能完成。
compute_type (Dict) — 有關模型正在使用或將要使用的計算資源的資訊，例如“gpu”型別和副本數量。
framework (str) — 模型構建所使用的框架名稱，例如“transformers”或“text-generation-inference”。

此資料類表示 HF 推理 API 中的模型狀態。

InferenceAPI

InferenceAPI 是呼叫推理 API 的傳統方式。該介面更為簡單，需要了解每個任務的輸入引數和輸出格式。它還缺乏連線到推理端點或 AWS SageMaker 等其他服務的能力。InferenceAPI 即將棄用，因此建議儘可能使用 InferenceClient。請參閱此指南，瞭解如何在指令碼中從 InferenceAPI 切換到 InferenceClient。

class huggingface_hub.InferenceApi

( repo_id: str task: typing.Optional[str] = None token: typing.Optional[str] = None gpu: bool = False )

用於配置請求和呼叫 HuggingFace 推理 API 的客戶端。

示例

>>> from huggingface_hub.inference_api import InferenceApi

>>> # Mask-fill example
>>> inference = InferenceApi("bert-base-uncased")
>>> inference(inputs="The goal of life is [MASK].")
[{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]

>>> # Question Answering example
>>> inference = InferenceApi("deepset/roberta-base-squad2")
>>> inputs = {
...     "question": "What's my name?",
...     "context": "My name is Clara and I live in Berkeley.",
... }
>>> inference(inputs)
{'score': 0.9326569437980652, 'start': 11, 'end': 16, 'answer': 'Clara'}

>>> # Zero-shot example
>>> inference = InferenceApi("typeform/distilbert-base-uncased-mnli")
>>> inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
>>> params = {"candidate_labels": ["refund", "legal", "faq"]}
>>> inference(inputs, params)
{'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}

>>> # Overriding configured task
>>> inference = InferenceApi("bert-base-uncased", task="feature-extraction")

>>> # Text-to-image
>>> inference = InferenceApi("stabilityai/stable-diffusion-2-1")
>>> inference("cat")
<PIL.PngImagePlugin.PngImageFile image (...)>

>>> # Return as raw response to parse the output yourself
>>> inference = InferenceApi("mio/amadeus")
>>> response = inference("hello world", raw_response=True)
>>> response.headers
{"Content-Type": "audio/flac", ...}
>>> response.content # raw bytes from server
b'(...)'

init

( repo_id: str task: typing.Optional[str] = None token: typing.Optional[str] = None gpu: bool = False )

引數

repo_id (str) — 倉庫 ID（例如 user/bert-base-uncased）。
task (str, 可選, 預設為 None) — 是否強制執行任務，而不是使用倉庫中指定的任務。
token (str, 可選) — 用作 HTTP 承載授權的 API 令牌。這不是身份驗證令牌。您可以在 https://huggingface.co/settings/token 中找到令牌。或者，您可以使用 HfApi().whoami(token) 找到您的組織和個人 API 令牌。
gpu (bool, 可選, 預設為 False) — 是否使用 GPU 而非 CPU 進行推理（至少需要 Startup 計劃）。

初始化標頭和 API 呼叫資訊。

call