Transformers 4.37 中文文档（十七）（2）-阿里云开发者社区

Transformers 4.37 中文文档（十七）（1）https://developer.aliyun.com/article/1564937

文本到音频管道

`class transformers.TextToAudioPipeline`

( *args vocoder = None sampling_rate = None **kwargs )

使用任何AutoModelForTextToWaveform或AutoModelForTextToSpectrogram的文本到音频生成管道。此管道从输入文本和可选的其他条件输入生成音频文件。

示例：

>>> from transformers import pipeline
>>> pipe = pipeline(model="suno/bark-small")
>>> output = pipe("Hey it's HuggingFace on the phone!")
>>> audio = output["audio"]
>>> sampling_rate = output["sampling_rate"]

了解有关在 pipeline tutorial 中使用管道的基础知识

您可以通过使用TextToAudioPipeline.__call__.forward_params或TextToAudioPipeline.__call__.generate_kwargs来指定传递给模型的参数。

示例：

>>> from transformers import pipeline
>>> music_generator = pipeline(task="text-to-audio", model="facebook/musicgen-small", framework="pt")
>>> # diversify the music generation by adding randomness with a high temperature and set a maximum music length
>>> generate_kwargs = {
...     "do_sample": True,
...     "temperature": 0.7,
...     "max_new_tokens": 35,
... }
>>> outputs = music_generator("Techno music with high melodic riffs", generate_kwargs=generate_kwargs)

此管道目前可以使用以下任务标识符从 pipeline()加载："text-to-speech"或"text-to-audio"。

在huggingface.co/models上查看可用模型列表。

`call`

<来源>

( text_inputs: Union **forward_params ) → export const metadata = 'undefined';A dict or a list of dict

参数

text_inputs (str或List[str]) — 要生成的文本。
forward_params (dict, 可选) — 传递给模型生成/前向方法的参数。forward_params始终传递给底层模型。
generate_kwargs (dict, 可选) — 用于生成调用的generate_config的自定义参数字典。有关 generate 的完整概述，请查看以下指南。generate_kwargs仅在底层模型是生成模型时才传递给底层模型。

一个dict或dict的列表

字典有两个键：

audio (np.ndarray，形状为(nb_channels, audio_length)) — 生成的音频波形。
sampling_rate (int) — 生成的音频波形的采样率。

从输入生成语音/音频。有关更多信息，请参阅 TextToAudioPipeline 文档。

ZeroShotAudioClassificationPipeline

`class transformers.ZeroShotAudioClassificationPipeline`

<来源>

( **kwargs )

参数

model (PreTrainedModel 或 TFPreTrainedModel) — 管道将用于进行预测的模型。这需要是继承自 PreTrainedModel 的模型，用于 PyTorch，以及继承自 TFPreTrainedModel 的模型，用于 TensorFlow。
tokenizer (PreTrainedTokenizer) — 管道将用于为模型编码数据的分词器。此对象继承自 PreTrainedTokenizer。
modelcard (str或ModelCard, 可选) — 为此管道的模型指定的模型卡。
framework (str, 可选) — 要使用的框架，可以是 "pt" 代表 PyTorch 或 "tf" 代表 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架并且两个框架都已安装，将默认使用model的框架，或者如果未提供模型，则默认使用 PyTorch。
task (str, 默认为 "") — 用于管道的任务标识符。
num_workers (int, 可选, 默认为 8) — 当管道将使用 DataLoader（传递数据集时，在 Pytorch 模型的 GPU 上），要使用的工作人员数量。
batch_size (int, 可选, 默认为 1) — 当管道将使用 DataLoader（传递数据集时，在 Pytorch 模型的 GPU 上），要使用的批次大小，对于推断来说，这并不总是有益的，请阅读使用管道进行批处理。
args_parser (ArgumentHandler, 可选) — 负责解析提供的管道参数的对象的引用。
device (int, 可选, 默认为-1) — CPU/GPU 支持的设备序数。将其设置为-1 将利用 CPU，正数将在关联的 CUDA 设备 ID 上运行模型。您也可以传递本机torch.device或str。
binary_output (bool, 可选, 默认为 False) — 指示管道输出是否应以二进制格式（即 pickle）或原始文本格式发生的标志。

使用ClapModel进行零射击音频分类管道。此管道在提供音频和一组candidate_labels时预测音频的类。

示例：

>>> from transformers import pipeline
>>> from datasets import load_dataset
>>> dataset = load_dataset("ashraq/esc50")
>>> audio = next(iter(dataset["train"]["audio"]))["array"]
>>> classifier = pipeline(task="zero-shot-audio-classification", model="laion/clap-htsat-unfused")
>>> classifier(audio, candidate_labels=["Sound of a dog", "Sound of vaccum cleaner"])
[{'score': 0.9996, 'label': 'Sound of a dog'}, {'score': 0.0004, 'label': 'Sound of vaccum cleaner'}]

在 pipeline 教程中了解如何使用管道的基础知识。此音频分类管道目前可以通过以下任务标识符从 pipeline()加载：“zero-shot-audio-classification”。在huggingface.co/models上查看可用模型的列表。

`call`

<来源>

( audios: Union **kwargs )

参数

audios (str, List[str], np.array或List[np.array]) — 管道处理三种类型的输入：

包含指向音频的 http 链接的字符串
包含音频本地路径的字符串
加载在 numpy 中的音频

candidate_labels (List[str]) — 此音频的候选标签
hypothesis_template（str，可选，默认为"This is a sound of {}"）- 与candidate_labels一起使用的句子，通过将占位符替换为 candidate_labels 尝试音频分类。然后通过使用 logits_per_audio 来估计可能性。

为传入的音频分配标签。

计算机视觉

计算机视觉任务可用的管道包括以下内容。

DepthEstimationPipeline

`class transformers.DepthEstimationPipeline`

< source >

( *args **kwargs )

参数

model（PreTrainedModel 或 TFPreTrainedModel）- 该模型将被管道用于进行预测。这需要是一个继承自 PreTrainedModel（对于 PyTorch）和 TFPreTrainedModel（对于 TensorFlow）的模型。
tokenizer（PreTrainedTokenizer）- 该 tokenizer 将被管道用于对数据进行编码以供模型使用。该对象继承自 PreTrainedTokenizer。
modelcard（str或ModelCard，可选）- 为此管道的模型分配的模型卡。
framework（str，可选）- 要使用的框架，可以是"pt"表示 PyTorch 或"tf"表示 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架并且两个框架都已安装，则将默认使用model的框架，或者如果未提供模型，则将默认使用 PyTorch。
task（str，默认为""）- 管道的任务标识符。
num_workers（int，可选，默认为 8）- 当管道将使用DataLoader（在传递数据集时，在 PyTorch 模型的 GPU 上）时，要使用的工作人员数量。
batch_size（int，可选，默认为 1）- 当管道将使用DataLoader（在传递数据集时，在 PyTorch 模型的 GPU 上）时，要使用的批次大小，对于推断，这并不总是有益的，请阅读使用管道进行批处理。
args_parser（ArgumentHandler，可选）- 负责解析提供的管道参数的对象的引用。
device（int，可选，默认为-1）- CPU/GPU 支持的设备序数。将其设置为-1 将利用 CPU，正数将在关联的 CUDA 设备 ID 上运行模型。您也可以传递本机torch.device或str。
binary_output（bool，可选，默认为False）- 指示管道输出是否应以二进制格式（即 pickle）或原始文本格式发生的标志。

使用任何AutoModelForDepthEstimation的深度估计管道。该管道预测图像的深度。

示例：

>>> from transformers import pipeline
>>> depth_estimator = pipeline(task="depth-estimation", model="Intel/dpt-large")
>>> output = depth_estimator("http://images.cocodataset.org/val2017/000000039769.jpg")
>>> # This is a tensor with the values being the depth expressed in meters for each pixel
>>> output["predicted_depth"].shape
torch.Size([1, 384, 384])

了解有关在 pipeline 教程中使用管道的基础知识。

此深度估计管道目前可以使用以下任务标识符从 pipeline()中加载：“depth-estimation”。

在huggingface.co/models上查看可用模型的列表。

`call`

< source >

( images: Union **kwargs )

参数

images (str, List[str], PIL.Image 或 List[PIL.Image]) — 管道处理三种类型的图像：

包含指向图像的 http 链接的字符串
包含指向图像的本地路径的字符串
直接加载的 PIL 图像

该管道接受单个图像或一批图像，然后必须将它们作为字符串传递。批处理中的图像必须全部采用相同的格式：全部作为 http 链接，全部作为本地路径，或全部作为 PIL 图像。
top_k (int, 可选, 默认为 5) — 管道将返回的前几个标签的数量。如果提供的数字高于模型配置中可用的标签数量，则将默认为标签数量。
timeout (float, 可选, 默认为 None) — 从网络获取图像的最长等待时间（以秒为单位）。如果为 None，则不设置超时，调用可能会永远阻塞。

为传递的图像分配标签。

ImageClassificationPipeline

`class transformers.ImageClassificationPipeline`

<来源>

( *args **kwargs )

参数

model (PreTrainedModel 或 TFPreTrainedModel) — 管道将用于进行预测的模型。这需要是继承自 PreTrainedModel 的模型，对于 PyTorch 和继承自 TFPreTrainedModel 的模型。
tokenizer (PreTrainedTokenizer) — 管道将用于为模型编码数据的分词器。该对象继承自 PreTrainedTokenizer。
modelcard (str 或 ModelCard, 可选) — 为此管道的模型指定的模型卡。
framework (str, 可选) — 要使用的框架，可以是"pt"表示 PyTorch，也可以是"tf"表示 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架并且两个框架都已安装，则将默认使用model的框架，或者如果未提供模型，则将默认使用 PyTorch 框架。
task (str, 默认为"") — 管道的任务标识符。
num_workers (int, 可选, 默认为 8) — 当管道将使用DataLoader（在传递数据集时，在 Pytorch 模型的 GPU 上），要使用的工作程序数量。
batch_size (int, 可选, 默认为 1) — 当管道将使用DataLoader（在传递数据集时，在 Pytorch 模型的 GPU 上），要使用的批次大小，对于推断，这并不总是有益，请阅读使用管道进行批处理。
args_parser (ArgumentHandler, 可选) — 负责解析提供的管道参数的对象的引用。
device (int, 可选, 默认为-1) — CPU/GPU 支持的设备序数。将其设置为-1 将利用 CPU，正数将在关联的 CUDA 设备 ID 上运行模型。您也可以传递本机torch.device或str。
binary_output (bool, 可选, 默认为False) — 指示管道输出是否应以二进制格式（即 pickle）或原始文本格式发生的标志。
function_to_apply (str, 可选, 默认为"default") — 用于从模型输出中提取分数的函数。接受四个不同的值：

"default": 如果模型只有一个标签，将在输出上应用 sigmoid 函数。如果模型有多个标签，将在输出上应用 softmax 函数。
sigmoid: 在输出上应用 sigmoid 函数。
"softmax": 在输出上应用 softmax 函数。
"none": 不在输出上应用任何函数。

使用任何 AutoModelForImageClassification 的图像分类管道。此管道预测图像的类别。

示例：

>>> from transformers import pipeline
>>> classifier = pipeline(model="microsoft/beit-base-patch16-224-pt22k-ft22k")
>>> classifier("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
[{'score': 0.442, 'label': 'macaw'}, {'score': 0.088, 'label': 'popinjay'}, {'score': 0.075, 'label': 'parrot'}, {'score': 0.073, 'label': 'parodist, lampooner'}, {'score': 0.046, 'label': 'poll, poll_parrot'}]

了解有关在 pipeline 教程中使用管道的基础知识。

这个图像分类管道目前可以通过 pipeline() 使用以下任务标识符进行加载："image-classification"。

查看 huggingface.co/models 上可用模型的列表。

`call`

< source >

( images: Union **kwargs )

参数

images (str, List[str], PIL.Image 或 List[PIL.Image]) — 管道处理三种类型的图像：

包含指向图像的 http 链接的字符串
包含指向图像的本地路径的字符串
直接加载的 PIL 图像

管道接受单个图像或一批图像，然后必须将它们作为字符串传递。批处理中的图像必须全部采用相同的格式：全部作为 http 链接，全部作为本地路径，或全部作为 PIL 图像。
function_to_apply (str, 可选, 默认为"default") — 应用于模型输出以检索分数的函数。接受四个不同的值：如果未指定此参数，则将根据标签数应用以下函数：

如果模型只有一个标签，将在输出上应用 sigmoid 函数。
如果模型有多个标签，将在输出上应用 softmax 函数。

可能的值有：

sigmoid: 在输出上应用 sigmoid 函数。
"softmax": 在输出上应用 softmax 函数。
"none": 不在输出上应用任何函数。

top_k (int, 可选, 默认为 5) — 管道将返回的前 k 个标签数。如果提供的数字高于模型配置中可用的标签数，则默认为标签数。
timeout (float, 可选, 默认为 None) — 从网络获取图像的最长等待时间（以秒为单位）。如果为 None，则不设置超时，调用可能永远阻塞。

为传入的图像分配标签。

ImageSegmentationPipeline

`class transformers.ImageSegmentationPipeline`

< source >

( *args **kwargs )

参数

model (PreTrainedModel 或 TFPreTrainedModel) — 管道将用于进行预测的模型。这需要是继承自 PreTrainedModel 的模型，对于 PyTorch 是 TFPreTrainedModel 是 TensorFlow。
tokenizer (PreTrainedTokenizer) — 管道将用于为模型编码数据的分词器。此对象继承自 PreTrainedTokenizer。
modelcard (str 或 ModelCard, 可选) — 为此管道的模型指定的模型卡。
framework (str, 可选) — 要使用的框架，可以是 "pt" 代表 PyTorch，或者 "tf" 代表 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架并且两个框架都已安装，则将默认使用model的框架，或者如果未提供模型，则将默认使用 PyTorch。
task (str, defaults to "") — 管道的任务标识符。
num_workers (int, optional, defaults to 8) — 当管道将使用DataLoader（在传递数据集时，对于 Pytorch 模型在 GPU 上），要使用的工作程序数量。
batch_size (int, optional, defaults to 1) — 当管道将使用DataLoader（在传递数据集时，对于 Pytorch 模型在 GPU 上），要使用的批处理大小，对于推断，这并不总是有益的，请阅读使用管道进行批处理。
args_parser (ArgumentHandler, optional) — 负责解析提供的管道参数的对象的引用。
device (int, optional, defaults to -1) — CPU/GPU 支持的设备序数。将其设置为-1 将利用 CPU，将其设置为正数将在关联的 CUDA 设备 ID 上运行模型。您也可以传递本机torch.device或str。
binary_output (bool, optional, defaults to False) — 指示管道输出应以二进制格式（即 pickle）还是原始文本格式发生的标志。

使用任何AutoModelForXXXSegmentation的图像分割管道。该管道预测对象及其类别的掩模。

示例：

>>> from transformers import pipeline
>>> segmenter = pipeline(model="facebook/detr-resnet-50-panoptic")
>>> segments = segmenter("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
>>> len(segments)
2
>>> segments[0]["label"]
'bird'
>>> segments[1]["label"]
'bird'
>>> type(segments[0]["mask"])  # This is a black and white mask showing where is the bird on the original image.
<class 'PIL.Image.Image'>
>>> segments[0]["mask"].size
(768, 512)

此图像分割管道目前可以使用以下任务标识符从 pipeline()加载：“image-segmentation”。

在huggingface.co/models上查看可用模型的列表。

`call`

< source >

( images **kwargs )

参数

images (str, List[str], PIL.Image或List[PIL.Image]) — 该管道处理三种类型的图像：

包含指向图像的 HTTP(S)链接的字符串
包含图像本地路径的字符串
直接在 PIL 中加载的图像

该管道接受单个图像或一批图像。批处理中的图像必须全部采用相同的格式：全部作为 HTTP(S)链接，全部作为本地路径，或全部作为 PIL 图像。
subtask (str, optional) — 要执行的分割任务，根据模型的能力选择[semantic、instance和panoptic]。如果未设置，管道将尝试按以下顺序解析：panoptic、instance、semantic。
threshold (float, optional, defaults to 0.9) — 用于过滤预测掩模的概率阈值。
mask_threshold (float, optional, defaults to 0.5) — 在将预测掩模转换为二进制值时使用的阈值。
overlap_mask_area_threshold (float, optional, defaults to 0.5) — 用于消除小的、不连续段的掩模重叠阈值。
timeout (float, optional, defaults to None) — 从网络获取图像的最长等待时间（以秒为单位）。如果为 None，则不设置超时，并且调用可能会永远阻塞。

在作为输入传递的图像中执行分割（检测掩模和类别）。

ImageToImagePipeline

`class transformers.ImageToImagePipeline`

< source >

( *args **kwargs )

参数

model (PreTrainedModel 或 TFPreTrainedModel) — 管道将用于进行预测的模型。这需要是继承自 PreTrainedModel 的模型，用于 PyTorch，以及继承自 TFPreTrainedModel 的模型，用于 TensorFlow。
tokenizer (PreTrainedTokenizer) — 该 tokenizer 将被管道用于为模型编码数据。此对象继承自 PreTrainedTokenizer。
modelcard (str或ModelCard, 可选) — 为此管道的模型指定的模型卡。
framework (str, 可选) — 要使用的框架，可以是"pt"表示 PyTorch，也可以是"tf"表示 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架，并且两个框架都已安装，则将默认使用model的框架，或者如果未提供模型，则默认使用 PyTorch。
task (str, 默认为"") — 用于管道的任务标识符。
num_workers (int, 可选, 默认为 8) — 当管道将使用DataLoader（在传递数据集时，在 PyTorch 模型的 GPU 上），要使用的工作程序数量。
batch_size (int, 可选, 默认为 1) — 当管道将使用DataLoader（在传递数据集时，在 PyTorch 模型的 GPU 上），要使用的批处理大小，对于推断，这并不总是有益的，请阅读使用管道进行批处理。
args_parser (ArgumentHandler, 可选) — 负责解析提供的管道参数的对象的引用。
device (int, 可选, 默认为-1) — CPU/GPU 支持的设备序数。将其设置为-1 将利用 CPU，正数将在关联的 CUDA 设备 ID 上运行模型。您也可以传递本机torch.device或str。
binary_output (bool, 可选, 默认为False) — 指示管道输出是否应以二进制格式（即 pickle）或原始文本格式发生的标志。

使用任何AutoModelForImageToImage的图像到图像管道。该管道基于先前的图像输入生成图像。

示例：

>>> from PIL import Image
>>> import requests
>>> from transformers import pipeline
>>> upscaler = pipeline("image-to-image", model="caidas/swin2SR-classical-sr-x2-64")
>>> img = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
>>> img = img.resize((64, 64))
>>> upscaled_img = upscaler(img)
>>> img.size
(64, 64)
>>> upscaled_img.size
(144, 144)

此图像到图像管道目前可以从 pipeline()中使用以下任务标识符加载："image-to-image"。

查看huggingface.co/models上可用模型的列表。

`call`

<来源>

( images: Union **kwargs )

参数

images (str, List[str], PIL.Image或List[PIL.Image]) — 该管道处理三种类型的图像：

包含指向图像的 http 链接的字符串
包含本地图像路径的字符串
直接加载的 PIL 图像

管道接受单个图像或一批图像，然后必须将它们作为字符串传递。批处理中的图像必须全部采用相同的格式：全部作为 http 链接，全部作为本地路径，或全部作为 PIL 图像。
timeout (float, 可选, 默认为 None) — 从网络获取图像的最长等待时间（以秒为单位）。如果为 None，则不使用超时，调用可能会永远阻塞。

转换传递的图像。

ObjectDetectionPipeline

`class transformers.ObjectDetectionPipeline`

< source >

( *args **kwargs )

参数

model (PreTrainedModel 或 TFPreTrainedModel) — 管道将用于进行预测的模型。这需要是继承自 PreTrainedModel 的模型，用于 PyTorch，以及继承自 TFPreTrainedModel 的模型，用于 TensorFlow。
tokenizer (PreTrainedTokenizer) — 管道将用于为模型编码数据的分词器。此对象继承自 PreTrainedTokenizer。
modelcard (str 或 ModelCard, optional) — 为此管道的模型指定的模型卡。
framework (str, optional) — 要使用的框架，可以是 "pt" 代表 PyTorch，也可以是 "tf" 代表 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架并且两个框架都已安装，则将默认使用 model 的框架，或者如果未提供模型，则将默认使用 PyTorch。
task (str, 默认为 "") — 管道的任务标识符。
num_workers (int, optional, 默认为 8) — 当管道将使用 DataLoader（在传递数据集时，在 Pytorch 模型的 GPU 上），要使用的工作人员数量。
batch_size (int, optional, defaults to 1) — 当管道将使用 DataLoader（在传递数据集时，在 Pytorch 模型的 GPU 上），要使用的批次大小，对于推理来说，这并不总是有益的，请阅读 Batching with pipelines。
args_parser (ArgumentHandler, optional) — 负责解析提供的管道参数的对象的引用。
device (int, optional, 默认为 -1) — 用于 CPU/GPU 支持的设备序数。将其设置为 -1 将利用 CPU，正数将在关联的 CUDA 设备 id 上运行模型。您也可以传递原生的 torch.device 或一个 str。
binary_output (bool, optional, 默认为 False) — 指示管道输出应以二进制格式（即 pickle）还是原始文本格式发生的标志。

使用任何 AutoModelForObjectDetection 的对象检测管道。此管道预测对象的边界框和它们的类别。

示例：

>>> from transformers import pipeline
>>> detector = pipeline(model="facebook/detr-resnet-50")
>>> detector("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
[{'score': 0.997, 'label': 'bird', 'box': {'xmin': 69, 'ymin': 171, 'xmax': 396, 'ymax': 507}}, {'score': 0.999, 'label': 'bird', 'box': {'xmin': 398, 'ymin': 105, 'xmax': 767, 'ymax': 507}}]
>>> # x, y  are expressed relative to the top left hand corner.

在 pipeline 教程中了解有关使用管道的基础知识

此对象检测管道目前可以通过以下任务标识符从 pipeline() 加载：“object-detection”。

查看 huggingface.co/models 上可用模型的列表。

`call`

< source >

( *args **kwargs )

参数

images (str, List[str], PIL.Image 或 List[PIL.Image]) — 管道处理三种类型的图像：

包含指向图像的 HTTP(S) 链接的字符串
包含本地图像路径的字符串
直接加载的 PIL 图像

管道接受单个图像或一批图像。批处理中的图像必须全部采用相同的格式：全部作为 HTTP(S) 链接，全部作为本地路径，或全部作为 PIL 图像。
threshold (float, optional, 默认为 0.9) — 进行预测所需的概率。
timeout（float，可选，默认为 None）— 从网络获取图像的最大等待时间（以秒为单位）。如果为 None，则不设置超时，调用可能永远阻塞。

检测作为输入传递的图像中的对象（边界框和类）。

视频分类管道

`class transformers.VideoClassificationPipeline`

< source >

( *args **kwargs )

参数

model（PreTrainedModel 或 TFPreTrainedModel）— 该模型将被管道用于进行预测。这需要是继承自 PreTrainedModel 的模型，对于 TensorFlow 是继承自 TFPreTrainedModel。
tokenizer（PreTrainedTokenizer）— 该管道将用于为模型编码数据的分词器。此对象继承自 PreTrainedTokenizer。
modelcard（str 或 ModelCard，可选）— 为此管道的模型指定的模型卡。
framework（str，可选）— 要使用的框架，可以是 "pt" 代表 PyTorch 或 "tf" 代表 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架并且两个框架都已安装，则将默认使用 model 的框架，或者如果未提供模型，则默认使用 PyTorch。
task（str，默认为 ""）— 管道的任务标识符。
num_workers（int，可选，默认为 8）— 当管道将使用 DataLoader（传递数据集时，在 Pytorch 模型的 GPU 上），要使用的工作程序数量。
batch_size（int，可选，默认为 1）— 当管道将使用 DataLoader（传递数据集时，在 Pytorch 模型的 GPU 上），要使用的批次大小，对于推断，这并不总是有益的，请阅读 Batching with pipelines 。
args_parser（ArgumentHandler，可选）— 负责解析提供的管道参数的对象的引用。
device（int，可选，默认为 -1）— 用于 CPU/GPU 支持的设备序数。将其设置为 -1 将利用 CPU，正数将在关联的 CUDA 设备 id 上运行模型。您也可以传递本机 torch.device 或 str。
binary_output（bool，可选，默认为 False）— 指示管道输出是否以二进制格式（即 pickle）或原始文本格式发生的标志。

使用任何 AutoModelForVideoClassification 的视频分类管道。此管道预测视频的类别。

当前可以使用以下任务标识符从 pipeline() 加载此视频分类管道："video-classification"。

查看 huggingface.co/models 上可用模型的列表。

`call`

< source >

( videos: Union **kwargs )

参数

videos（str，List[str]）— 管道处理三种类型的视频：

包含指向视频的 http 链接的字符串
包含视频本地路径的字符串

该管道接受单个视频或一批视频，然后必须将其作为字符串传递。批处理中的视频必须全部采用相同的格式：全部作为 http 链接或全部作为本地路径。
top_k (int, optional, defaults to 5) — 管道将返回的前 k 个标签的数量。如果提供的数字高于模型配置中可用的标签数量，则默认为标签数量。
num_frames (int, optional, defaults to self.model.config.num_frames) — 从视频中采样的帧数，用于进行分类。如果未提供，则将默认为模型配置中指定的帧数。
frame_sampling_rate (int, optional, defaults to 1) — 用于从视频中选择帧的采样率。如果未提供，则将默认为 1，即将使用每一帧。

为传入的视频分配标签。

ZeroShotImageClassificationPipeline

`class transformers.ZeroShotImageClassificationPipeline`

< source >

( **kwargs )

参数

model (PreTrainedModel or TFPreTrainedModel) — 该管道将用于进行预测的模型。这需要是继承自 PreTrainedModel 的 PyTorch 模型和 TFPreTrainedModel 的 TensorFlow 模型。
tokenizer (PreTrainedTokenizer) — 该 tokenizer 将被管道用于对数据进行编码以供模型使用。该对象继承自 PreTrainedTokenizer。
modelcard (str or ModelCard, optional) — 为该管道的模型指定的模型卡。
framework (str, optional) — 要使用的框架，可以是"pt"表示 PyTorch 或"tf"表示 TensorFlow。指定的框架必须已安装。
如果未指定框架，则将默认为当前安装的框架。如果未指定框架且两个框架都已安装，则将默认为model的框架，或者如果未提供模型，则将默认为 PyTorch。
task (str, defaults to "") — 用于管道的任务标识符。
num_workers (int, optional, defaults to 8) — 当管道将使用DataLoader（传递数据集时，在 Pytorch 模型的 GPU 上），要使用的工作人员数量。
batch_size (int, optional, defaults to 1) — 当管道将使用DataLoader（传递数据集时，在 Pytorch 模型的 GPU 上），要使用的批处理大小，对于推断，这并不总是有益的，请阅读使用管道进行批处理。
args_parser (ArgumentHandler, optional) — 负责解析提供的管道参数的对象的引用。
device (int, optional, defaults to -1) — CPU/GPU 支持的设备序数。将其设置为-1 将利用 CPU，正数将在关联的 CUDA 设备 ID 上运行模型。您也可以传递原生的torch.device或str。
binary_output (bool, optional, defaults to False) — 指示管道输出是否应以二进制格式（即 pickle）或原始文本形式发生的标志。

使用CLIPModel进行零样本图像分类管道。该管道在提供图像和一组candidate_labels时预测图像的类别。

示例：

>>> from transformers import pipeline
>>> classifier = pipeline(model="openai/clip-vit-large-patch14")
>>> classifier(
...     "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png",
...     candidate_labels=["animals", "humans", "landscape"],
... )
[{'score': 0.965, 'label': 'animals'}, {'score': 0.03, 'label': 'humans'}, {'score': 0.005, 'label': 'landscape'}]
>>> classifier(
...     "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png",
...     candidate_labels=["black and white", "photorealist", "painting"],
... )
[{'score': 0.996, 'label': 'black and white'}, {'score': 0.003, 'label': 'photorealist'}, {'score': 0.0, 'label': 'painting'}]

在 pipeline tutorial 中了解有关使用管道的基础知识

当前可以使用 pipeline()从中加载此图像分类管道的任务标识符为：“zero-shot-image-classification”。

在huggingface.co/models上查看可用模型列表。

`call`

< source >

( images: Union **kwargs )

参数

images (str, List[str], PIL.Image or List[PIL.Image]) — 管道处理三种类型的图像：

包含指向图像的 http 链接的字符串
包含指向图像的本地路径的字符串
直接在 PIL 中加载的图像

candidate_labels (List[str]) — 该图像的候选标签
hypothesis_template (str, 可选, 默认为"This is a photo of {}") — 与candidate_labels一起使用的句子，通过替换占位符与候选标签尝试图像分类。然后通过使用 logits_per_image 来估计可能性
timeout (float, 可选, 默认为 None) — 从网络获取图像的最长等待时间（以秒为单位）。如果为 None，则不设置超时，调用可能会永远阻塞。

为传入的图像分配标签。

Transformers 4.37 中文文档（十七）（3）https://developer.aliyun.com/article/1564940

Transformers 4.37 中文文档（十七）（2）

文本到音频管道

`class transformers.TextToAudioPipeline`

`call`

ZeroShotAudioClassificationPipeline

`class transformers.ZeroShotAudioClassificationPipeline`

`call`

计算机视觉

DepthEstimationPipeline

`class transformers.DepthEstimationPipeline`

`call`

ImageClassificationPipeline

`class transformers.ImageClassificationPipeline`

`call`

ImageSegmentationPipeline

`class transformers.ImageSegmentationPipeline`

`call`

ImageToImagePipeline

`class transformers.ImageToImagePipeline`

`call`

ObjectDetectionPipeline

`class transformers.ObjectDetectionPipeline`

`call`

视频分类管道

`class transformers.VideoClassificationPipeline`

`call`

ZeroShotImageClassificationPipeline

`class transformers.ZeroShotImageClassificationPipeline`

`call`

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Transformers 4.37 中文文档（十七）（2）

文本到音频管道

class transformers.TextToAudioPipeline

__call__

ZeroShotAudioClassificationPipeline

class transformers.ZeroShotAudioClassificationPipeline

__call__

计算机视觉

DepthEstimationPipeline

class transformers.DepthEstimationPipeline

__call__

ImageClassificationPipeline

class transformers.ImageClassificationPipeline

__call__

ImageSegmentationPipeline

class transformers.ImageSegmentationPipeline

__call__

ImageToImagePipeline

class transformers.ImageToImagePipeline

__call__

ObjectDetectionPipeline

class transformers.ObjectDetectionPipeline

__call__

视频分类管道

class transformers.VideoClassificationPipeline

__call__

ZeroShotImageClassificationPipeline

class transformers.ZeroShotImageClassificationPipeline

__call__

热门文章

最新文章

相关课程

相关电子书

`class transformers.TextToAudioPipeline`

`call`

`class transformers.ZeroShotAudioClassificationPipeline`

`call`

`class transformers.DepthEstimationPipeline`

`call`

`class transformers.ImageClassificationPipeline`

`call`

`class transformers.ImageSegmentationPipeline`

`call`

`class transformers.ImageToImagePipeline`

`call`

`class transformers.ObjectDetectionPipeline`

`call`

`class transformers.VideoClassificationPipeline`

`call`

`class transformers.ZeroShotImageClassificationPipeline`

`call`