Transformers 4.37 中文文档（十七）（5）-阿里云开发者社区

Transformers 4.37 中文文档（十七）（4）https://developer.aliyun.com/article/1564941

多模态

可用于多模态任务的管道包括以下内容。

文档问答管道

`class transformers.DocumentQuestionAnsweringPipeline`

( *args **kwargs )

参数

model (PreTrainedModel 或 TFPreTrainedModel) — 管道将用于进行预测的模型。这需要是继承自 PreTrainedModel 的模型，对于 PyTorch 是 TFPreTrainedModel。
tokenizer (PreTrainedTokenizer) — 管道将用于为模型编码数据的分词器。此对象继承自 PreTrainedTokenizer。
modelcard (str 或 ModelCard, optional) — 为此管道的模型指定的模型卡。
framework (str, optional) — 要使用的框架，可以是 "pt" 代表 PyTorch 或 "tf" 代表 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架并且两个框架都已安装，则将默认使用model的框架，或者如果未提供模型，则将默认使用 PyTorch。
task (str，默认为 "") — 管道的任务标识符。
num_workers (int, optional, 默认为 8) — 当管道将使用DataLoader（在传递数据集时，对于 Pytorch 模型在 GPU 上），要使用的工作人员数量。
batch_size (int, optional, 默认为 1) — 当管道将使用DataLoader（在传递数据集时，对于 Pytorch 模型在 GPU 上），要使用的批次大小，对于推断，这并不总是有益的，请阅读使用管道进行批处理。
args_parser (ArgumentHandler, optional) — 负责解析提供的管道参数的对象的引用。
device (int, optional, 默认为 -1) — CPU/GPU 支持的设备序数。将其设置为-1 将利用 CPU，正数将在关联的 CUDA 设备 ID 上运行模型。您也可以传递本机torch.device或str。
binary_output (bool, optional, 默认为 False) — 指示管道输出是否应以二进制格式（即 pickle）或原始文本格式发生的标志。

使用任何AutoModelForDocumentQuestionAnswering的文档问答管道。输入/输出与(抽取式)问答管道类似；但是，该管道将图像（和可选的 OCR 单词/框）作为输入，而不是文本上下文。

示例：

>>> from transformers import pipeline
>>> document_qa = pipeline(model="impira/layoutlm-document-qa")
>>> document_qa(
...     image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png",
...     question="What is the invoice number?",
... )
[{'score': 0.425, 'answer': 'us-001', 'start': 16, 'end': 16}]

了解有关在 pipeline 教程中使用管道的基础知识

此文档问答管道目前可以使用以下任务标识符从 pipeline()加载："document-question-answering"。

此管道可以使用已在文档问答任务上进行了微调的模型。请查看huggingface.co/models上可用模型的最新列表。

`call`

<来源>

( image: Union question: Optional = None word_boxes: Tuple = None **kwargs ) → export const metadata = 'undefined';A dict or a list of dict

参数

image (str或PIL.Image) — 管道处理三种类型的图像：

包含指向图像的 http 链接的字符串
包含图像本地路径的字符串
直接在 PIL 中加载的图像

管道接受单个图像或一批图像。如果给定单个图像，则可以广播到多个问题。
question (str) — 要问的问题。
word_boxes (List[str, Tuple[float, float, float, float]], 可选) — 一组单词和边界框（标准化为 0->1000）。如果提供此可选输入，则管道将使用这些单词和框，而不是在图像上运行 OCR 来为需要它们的模型（例如 LayoutLM）派生它们。这允许您在管道的许多调用之间重用 OCR 的结果，而无需每次重新运行它。
top_k (int, 可选, 默认为 1) — 要返回的答案数量（将按可能性顺序选择）。请注意，如果在上下文中没有足够的选项可用，我们将返回少于 top_k 个答案。
doc_stride (int, 可选, 默认为 128) — 如果文档中的单词太长，无法与模型的问题匹配，它将被分成几个具有一些重叠的块。此参数控制该重叠的大小。
max_answer_len (int, 可选, 默认为 15) — 预测答案的最大长度（例如，只考虑长度较短的答案）。
max_seq_len (int, 可选, 默认为 384) — 每个传递给模型的块中的总句子长度（上下文+问题）的最大长度（以标记为单位）。如果需要，上下文将被分成几个块（使用doc_stride作为重叠）。
max_question_len (int, 可选, 默认为 64) — 问题在标记化后的最大长度。如果需要，将被截断。
handle_impossible_answer (bool, 可选, 默认为False) — 是否接受不可能作为答案。
lang (str, 可选) — 运行 OCR 时要使用的语言。默认为英语。
tesseract_config (str, 可选) — 在运行 OCR 时传递给 tesseract 的附加标志。
timeout (float, 可选, 默认为 None) — 从网络获取图像的最长等待时间（以秒为单位）。如果为 None，则不设置超时，调用可能会永远阻塞。

一个dict或一个dict列表

每个结果都作为一个带有以下键的字典：

score (float) — 与答案相关联的概率。
start (int) — 答案的开始单词索引（在输入的 OCR 版本或提供的word_boxes中）。
end (int) — 答案的结束单词索引（在输入的 OCR 版本或提供的word_boxes中）。
answer (str) — 问题的答案。
words (list[int]) — 答案中每个单词/框对的索引

通过使用文档回答输入的问题。文档被定义为一幅图像和一个可选的（单词，框）元组列表，表示文档中的文本。如果未提供word_boxes，它将使用 Tesseract OCR 引擎（如果可用）自动提取单词和框，以供需要它们作为输入的 LayoutLM 类似模型使用。对于 Donut，不运行 OCR。

您可以以多种方式调用管道：

pipeline(image=image, question=question)
pipeline(image=image, question=question, word_boxes=word_boxes)
pipeline([{"image": image, "question": question}])
pipeline([{"image": image, "question": question, "word_boxes": word_boxes}])

FeatureExtractionPipeline

`class transformers.FeatureExtractionPipeline`

<来源>

( model: Union tokenizer: Optional = None feature_extractor: Optional = None image_processor: Optional = None modelcard: Optional = None framework: Optional = None task: str = '' args_parser: ArgumentHandler = None device: Union = None torch_dtype: Union = None binary_output: bool = False **kwargs )

参数

模型 (PreTrainedModel 或 TFPreTrainedModel) — 流水线将使用的模型来进行预测。这需要是一个继承自 PreTrainedModel 的模型，对于 PyTorch 是 TFPreTrainedModel。
分词器 (PreTrainedTokenizer) — 流水线将用于为模型编码数据的分词器。此对象继承自 PreTrainedTokenizer。
模型卡 (str 或 ModelCard，可选) — 为此流水线的模型指定的模型卡。
框架 (str，可选) — 要使用的框架，可以是 "pt" 代表 PyTorch 或 "tf" 代表 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架并且两个框架都已安装，则将默认使用模型的框架，或者如果未提供模型，则默认使用 PyTorch。
return_tensors (bool，可选) — 如果为 True，则根据指定的框架返回一个张量，否则返回一个列表。
任务 (str，默认为 "") — 用于流水线的任务标识符。
args_parser (ArgumentHandler，可选) — 负责解析提供的流水线参数的对象的引用。
设备 (int，可选，默认为 -1) — 用于 CPU/GPU 支持的设备序数。将其设置为 -1 将利用 CPU，正数将在关联的 CUDA 设备 id 上运行模型。
tokenize_kwargs (dict，可选) — 传递给分词器的额外关键字参数的字典。

使用没有模型头的特征提取流水线。此流水线从基础变换器中提取隐藏状态，可以用作下游任务中的特征。

示例：

>>> from transformers import pipeline
>>> extractor = pipeline(model="bert-base-uncased", task="feature-extraction")
>>> result = extractor("This is a simple test.", return_tensors=True)
>>> result.shape  # This is a tensor of shape [1, sequence_lenth, hidden_dimension] representing the input string.
torch.Size([1, 8, 768])

了解有关在流水线教程中使用流水线的基础知识

当前可以使用任务标识符 "feature-extraction" 从 pipeline() 加载此特征提取流水线。

所有模型都可以用于此流水线。查看包括社区贡献模型在内的所有模型列表，请访问 huggingface.co/models。

`call`

<来源>

( *args **kwargs ) → export const metadata = 'undefined';A nested list of float

参数

args (str 或 List[str]) — 一个或多个文本（或一个文本列表）以获取特征。

一个嵌套的 float 列表

模型计算的特征。

提取输入的特征。

ImageToTextPipeline

`class transformers.ImageToTextPipeline`

<来源>

( *args **kwargs )

参数

模型 (PreTrainedModel 或 TFPreTrainedModel) — 流水线将使用的模型来进行预测。这需要是一个继承自 PreTrainedModel 的模型，对于 PyTorch 是 TFPreTrainedModel。
tokenizer（PreTrainedTokenizer） — 管道将用于为模型编码数据的分词器。此对象继承自 PreTrainedTokenizer。
modelcard (str或ModelCard，optional) — 为此管道的模型指定的模型卡。
framework (str，optional) — 要使用的框架，可以是"pt"表示 PyTorch 或"tf"表示 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架并且两个框架都已安装，则将默认使用model的框架，或者如果未提供模型，则将默认使用 PyTorch。
task (str，默认为"") — 管道的任务标识符。
num_workers (int, optional, 默认为 8) — 当管道将使用DataLoader（在传递数据集时，在 Pytorch 模型的 GPU 上），要使用的工作程序数量。
batch_size (int, optional, 默认为 1) — 当管道将使用DataLoader（在传递数据集时，在 Pytorch 模型的 GPU 上），要使用的批次大小，对于推断，这并不总是有益的，请阅读使用管道进行批处理。
args_parser（ArgumentHandler，optional） — 负责解析提供的管道参数的对象的引用。
device (int, optional, 默认为-1) — 用于 CPU/GPU 支持的设备序数。将其设置为-1 将利用 CPU，正数将在关联的 CUDA 设备 ID 上运行模型。您也可以传递本机的torch.device或str。
binary_output (bool, optional, 默认为False) — 指示管道输出是否应以二进制格式（即 pickle）或原始文本格式发生的标志。

使用AutoModelForVision2Seq的图像到文本管道。此管道为给定图像预测标题。

示例：

>>> from transformers import pipeline
>>> captioner = pipeline(model="ydshieh/vit-gpt2-coco-en")
>>> captioner("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png")
[{'generated_text': 'two birds are standing next to each other '}]

了解有关在 pipeline 教程中使用管道的基础知识

目前可以使用以下任务标识符从 pipeline()加载此图像到文本管道：“image-to-text”。

在huggingface.co/models上查看可用模型列表。

`call`

< source >

( images: Union **kwargs ) → export const metadata = 'undefined';A list or a list of list of dict

参数

images (str，List[str]，PIL.Image或List[PIL.Image]) — 管道处理三种类型的图像：

包含指向图像的 HTTP(s)链接的字符串
包含指向图像的本地路径的字符串
直接加载的 PIL 图像

该管道接受单个图像或一批图像。
max_new_tokens (int, optional) — 要生成的最大标记数量。默认情况下，它将使用generate的默认值。
generate_kwargs (Dict, optional) — 将其传递给generate，以便直接将所有这些参数发送到generate，从而完全控制此函数。
timeout (float, optional, 默认为 None) — 从网络获取图像的最长等待时间（以秒为单位）。如果为 None，则不设置超时，并且调用可能会永远阻塞。

一个字典列表或字典列表

每个结果都以包含以下键的字典形式呈现：

generated_text (str) — 生成的文本。

为传入的图像分配标签。

MaskGenerationPipeline

`class transformers.MaskGenerationPipeline`

< source >

( **kwargs )

参数

model（PreTrainedModel 或 TFPreTrainedModel）- 该模型将被管道用于进行预测。这需要是一个继承自 PreTrainedModel（对于 PyTorch）和 TFPreTrainedModel（对于 TensorFlow）的模型。
tokenizer（PreTrainedTokenizer）- 该分词器将被管道用于为模型编码数据。该对象继承自 PreTrainedTokenizer。
feature_extractor（SequenceFeatureExtractor）- 该特征提取器将被管道用于对输入进行编码。
points_per_batch（可选，int，默认为 64）- 设置模型同时运行的点数。较高的数字可能更快，但会使用更多的 GPU 内存。
output_bboxes_mask（bool，可选，默认为False）- 是否输出边界框预测。
output_rle_masks（bool，可选，默认为False）- 是否输出RLE格式的掩码
model（PreTrainedModel 或 TFPreTrainedModel）- 该模型将被管道用于进行预测。这需要是一个继承自 PreTrainedModel（对于 PyTorch）和 TFPreTrainedModel（对于 TensorFlow）的模型。
tokenizer（PreTrainedTokenizer）- 该分词器将被管道用于为模型编码数据。该对象继承自 PreTrainedTokenizer。
modelcard（str或ModelCard，可选）- 为该管道的模型指定的模型卡。
framework（str，可选）- 要使用的框架，可以是 "pt" 代表 PyTorch 或 "tf" 代表 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架且两个框架都已安装，则将默认使用model的框架，或者如果未提供模型，则默认使用 PyTorch。
task（str，默认为""）- 用于该管道的任务标识符。
num_workers（int，可选，默认为 8）- 当管道将使用DataLoader（在传递数据集时，对于 PyTorch 模型在 GPU 上），要使用的工作程序数量。
batch_size（int，可选，默认为 1）- 当管道将使用DataLoader（在传递数据集时，对于 PyTorch 模型在 GPU 上），要使用的批次大小，对于推断，这并不总是有益的，请阅读使用管道进行批处理。
args_parser（ArgumentHandler，可选）- 负责解析提供的管道参数的对象的引用。
device（int，可选，默认为-1）- CPU/GPU 支持的设备序数。将其设置为-1 将利用 CPU，正数将在关联的 CUDA 设备 ID 上运行模型。您也可以传递原生的torch.device或一个str。
binary_output（bool，可选，默认为False）—指示流水线输出是否以二进制格式（即 pickle）或原始文本格式发生的标志。

使用SamForMaskGeneration为图像生成自动蒙版。该流水线预测图像的二进制蒙版，给定一个图像。这是一个ChunkPipeline，因为您可以将小批量中的点分开，以避免 OOM 问题。使用points_per_batch参数来控制同时处理的点数。默认值为64。

该流水线分为 3 个步骤：

preprocess：生成 1024 个均匀分隔的点网格，以及边界框和点标签。有关如何创建点和边界框的详细信息，请检查_generate_crop_boxes函数。还使用image_processor对图像进行预处理。此函数yield一个points_per_batch的小批量。
forward：将preprocess的输出馈送到模型。图像嵌入仅计算一次。调用self.model.get_image_embeddings并确保不计算梯度，张量和模型在同一设备上。
postprocess：自动蒙版生成的最重要部分发生在这里。引入了三个步骤：

image_processor.postprocess_masks（在每个小批量循环中运行）：接受原始输出蒙版，根据图像大小调整其大小，并将其转换为二进制蒙版。
image_processor.filter_masks（在每个小批量循环中）：同时使用pred_iou_thresh和stability_scores。还应用基于非最大抑制的各种过滤器，以消除不良蒙版。
image_processor.postprocess_masks_for_amg 将 NSM 应用于蒙版，仅保留相关的蒙版。

示例：

>>> from transformers import pipeline
>>> generator = pipeline(model="facebook/sam-vit-base", task="mask-generation")
>>> outputs = generator(
...     "http://images.cocodataset.org/val2017/000000039769.jpg",
... )
>>> outputs = generator(
...     "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png", points_per_batch=128
... )

了解有关在 pipeline 教程中使用流水线的基础知识

当前可以使用以下任务标识符从 pipeline()加载此分割流水线："mask-generation"。

在huggingface.co/models上查看可用模型的列表。

`call`

<来源>

( image *args num_workers = None batch_size = None **kwargs ) → export const metadata = 'undefined';Dict

参数

inputs（np.ndarray或bytes或str或dict）—图像或图像列表。
mask_threshold（float，可选，默认为 0.0）—将预测的蒙版转换为二进制值时使用的阈值。
pred_iou_thresh（float，可选，默认为 0.88）—在[0,1]上应用于模型预测的蒙版质量的过滤阈值。
stability_score_thresh（float，可选，默认为 0.95）—在[0,1]中的过滤阈值，使用蒙版在截止值变化下的稳定性来对模型的蒙版预测进行二值化。
stability_score_offset（int，可选，默认为 1）—在计算稳定性分数时，偏移截止值的量。
crops_nms_thresh（float，可选，默认为 0.7）—非极大值抑制使用的框 IoU 截止值，用于过滤重复蒙版。
crops_n_layers（int，可选，默认为 0）—如果crops_n_layers>0，将再次在图像的裁剪上运行蒙版预测。设置要运行的层数，其中每一层具有 2**i_layer 数量的图像裁剪。
crop_overlap_ratio（float，可选，默认为512 / 1500）—设置裁剪重叠的程度。在第一层裁剪中，裁剪将以图像长度的这一部分重叠。具有更多裁剪的后续层会缩小此重叠。
crop_n_points_downscale_factor（int，可选，默认为1）—在第 n 层中采样的每边点数按crop_n_points_downscale_factor**n缩小。
timeout（float，可选，默认为 None）—从网络获取图像的最长时间（以秒为单位）。如果为 None，则不设置超时，调用可能永远阻塞。

Dict

一个具有以下键的字典：

mask (PIL.Image) — 检测到的对象的二进制掩模，作为原始图像的形状为(width, height)的 PIL 图像。如果未找到对象，则返回填充有零的掩模。
score (可选 float) — 可选地，当模型能够估计由标签和掩模描述的“对象”的置信度时。

生成二进制分割掩模

VisualQuestionAnsweringPipeline

`class transformers.VisualQuestionAnsweringPipeline`

源代码

( *args **kwargs )

参数

model (PreTrainedModel 或 TFPreTrainedModel) — 管道将用于进行预测的模型。这需要是继承自 PreTrainedModel（对于 PyTorch）和 TFPreTrainedModel（对于 TensorFlow）的模型。
tokenizer (PreTrainedTokenizer) — 管道将用于为模型编码数据的分词器。此对象继承自 PreTrainedTokenizer。
modelcard (str 或 ModelCard, 可选) — 为此管道的模型指定的模型卡。
framework (str, 可选) — 要使用的框架，可以是"pt"表示 PyTorch 或"tf"表示 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架并且两个框架都已安装，则将默认使用model的框架，或者如果未提供模型，则将默认使用 PyTorch。
task (str, 默认为 "") — 管道的任务标识符。
num_workers (int, 可选, 默认为 8) — 当管道将使用DataLoader（在 GPU 上为 Pytorch 模型传递数据集时）时，要使用的工作程序数量。
batch_size (int, 可选, 默认为 1) — 当管道将使用DataLoader（在 GPU 上为 Pytorch 模型传递数据集时）时，要使用的批次大小，对于推断，这并不总是有益的，请阅读使用管道进行批处理。
args_parser (ArgumentHandler, 可选) — 负责解析提供的管道参数的对象的引用。
device (int, 可选, 默认为 -1) — CPU/GPU 支持的设备序数。将其设置为-1 将利用 CPU，正数将在关联的 CUDA 设备 ID 上运行模型。您也可以传递原生的torch.device或str。
binary_output (bool, 可选, 默认为 False) — 标志指示管道输出是否以二进制格式（即 pickle）或原始文本形式发生。

使用AutoModelForVisualQuestionAnswering的视觉问答管道。此管道目前仅在 PyTorch 中可用。

示例：

>>> from transformers import pipeline
>>> oracle = pipeline(model="dandelin/vilt-b32-finetuned-vqa")
>>> image_url = "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/lena.png"
>>> oracle(question="What is she wearing ?", image=image_url)
[{'score': 0.948, 'answer': 'hat'}, {'score': 0.009, 'answer': 'fedora'}, {'score': 0.003, 'answer': 'clothes'}, {'score': 0.003, 'answer': 'sun hat'}, {'score': 0.002, 'answer': 'nothing'}]
>>> oracle(question="What is she wearing ?", image=image_url, top_k=1)
[{'score': 0.948, 'answer': 'hat'}]
>>> oracle(question="Is this a person ?", image=image_url, top_k=1)
[{'score': 0.993, 'answer': 'yes'}]
>>> oracle(question="Is this a man ?", image=image_url, top_k=1)
[{'score': 0.996, 'answer': 'no'}]

在 pipeline 教程中了解如何使用管道的基础知识

此视觉问答管道目前可以从 pipeline()中加载，使用以下任务标识符：“visual-question-answering”, “vqa”。

此管道可以使用已在视觉问答任务上进行了微调的模型。请查看huggingface.co/models上可用模型的最新列表。

`call`

< source >

( image: Union question: str = None **kwargs ) → export const metadata = 'undefined';A dictionary or a list of dictionaries containing the result. The dictionaries contain the following keys

参数

image (str, List[str], PIL.Image 或 List[PIL.Image]) — 管道处理三种类型的图像：

包含指向图像的 http 链接的字符串
包含本地图像路径的字符串
直接加载的 PIL 图像

该管道接受单个图像或一批图像。如果给定单个图像，则可以广播到多个问题。
question (str, List[str]) — 提出的问题。如果给定单个问题，则可以广播到多个图像。
top_k (int, optional, 默认为 5) — 管道将返回的前 k 个标签的数量。如果提供的数字高于模型配置中可用的标签数量，则默认为标签数量。
timeout (float, optional, 默认为 None) — 从网络获取图像的最长等待时间（以秒为单位）。如果为 None，则不设置超时，调用可能会永远阻塞。

包含结果的字典或字典列表。字典包含以下键

label (str) — 模型识别的标签。
score (int) — 模型为该标签分配的分数。

回答关于图像的开放性问题。该管道接受下面详细说明的几种类型的输入：

pipeline(image=image, question=question)
pipeline({"image": image, "question": question})
pipeline([{"image": image, "question": question}])
pipeline([{"image": image, "question": question}, {"image": image, "question": question}])

父类：Pipeline

`class transformers.Pipeline`

< source >

( model: Union tokenizer: Optional = None feature_extractor: Optional = None image_processor: Optional = None modelcard: Optional = None framework: Optional = None task: str = '' args_parser: ArgumentHandler = None device: Union = None torch_dtype: Union = None binary_output: bool = False **kwargs )

参数

model (PreTrainedModel 或 TFPreTrainedModel) — 管道将用于进行预测的模型。这需要是一个继承自 PreTrainedModel 的模型，对于 PyTorch 是 TFPreTrainedModel。
tokenizer (PreTrainedTokenizer) — 管道将用于为模型编码数据的分词器。该对象继承自 PreTrainedTokenizer。
modelcard (str 或 ModelCard, optional) — 为该管道的模型分配的模型卡。
framework (str, optional) — 要使用的框架，可以是 "pt" 代表 PyTorch 或 "tf" 代表 TensorFlow。指定的框架必须已安装。
如果未指定框架，将默认使用当前安装的框架。如果未指定框架并且两个框架都已安装，将默认使用 model 的框架，或者如果未提供模型，则默认使用 PyTorch。
task (str, 默认为 "") — 用于管道的任务标识符。
num_workers (int, optional, 默认为 8) — 当管道将使用 DataLoader（在传递数据集时，在 Pytorch 模型上使用 GPU），要使用的工作进程数。
batch_size (int, optional, defaults to 1) — 当管道将使用 DataLoader（在传递数据集时，在 Pytorch 模型上使用 GPU），要使用的批次大小，对于推理来说，这并不总是有益的，请阅读 Batching with pipelines 。
args_parser（ArgumentHandler，可选）-负责解析提供的流水线参数的对象的引用。
device（int，可选，默认为-1）-CPU/GPU 支持的设备序数。将其设置为-1 将利用 CPU，正数将在关联的 CUDA 设备 ID 上运行模型。您也可以传递本机torch.device或str。
binary_output（bool，可选，默认为False）-指示流水线输出应以二进制格式（即 pickle）或原始文本发生的标志。

Pipeline 类是所有流水线继承的类。请参考此类以获取不同流水线共享的方法。

实现流水线操作的基类。流水线工作流定义为以下操作序列：

输入->标记化->模型推断->后处理（任务相关）->输出

Pipeline 支持通过设备参数在 CPU 或 GPU 上运行（见下文）。

某些流水线，例如 FeatureExtractionPipeline（'feature-extraction'）将大张量对象输出为嵌套列表。为了避免将这样大的结构转储为文本数据，我们提供了binary_output构造参数。如果设置为True，输出将以 pickle 格式存储。

`check_model_type`

<来源>

( supported_models: Union )

参数

supported_models（List[str]或dict）-流水线支持的模型列表，或具有模型类值的字典。

检查模型类是否受流水线支持。

`device_placement`

<来源>

( )

上下文管理器，以框架不可知的方式在用户指定的设备上分配张量。

示例：

# Explicitly ask for tensor allocation on CUDA device :0
pipe = pipeline(..., device=0)
with pipe.device_placement():
    # Every framework specific tensor allocation will be done on the request device
    output = pipe(...)

`ensure_tensor_on_device`

<来源>

( **inputs ) → export const metadata = 'undefined';Dict[str, torch.Tensor]

参数

inputs（应为torch.Tensor的关键字参数，其余部分将被忽略）-要放置在self.device上的张量。
仅对列表进行递归。

Dict[str, torch.Tensor]

与inputs相同，但在适当的设备上。

确保 PyTorch 张量位于指定设备上。

`postprocess`

<来源>

( model_outputs: ModelOutput **postprocess_parameters: Dict )

后处理将接收_forward方法的原始输出，通常是张量，并将其重新格式化为更友好的形式。通常它将输出一个包含字符串和数字的列表或结果字典。

`predict`

<来源>

( X )

Scikit / Keras 接口到 transformers 的流水线。此方法将转发到call()。

`preprocess`

<来源>

( input_: Any **preprocess_parameters: Dict )

预处理将获取特定流水线的input_并返回一个包含一切必要内容以使_forward正确运行的字典。它应至少包含一个张量，但可能有任意其他项目。

`save_pretrained`

<来源>

( save_directory: str safe_serialization: bool = True )

参数

save_directory（str）-要保存的目录的路径。如果不存在，将创建它。
safe_serialization（str）-是否使用safetensors保存模型，还是使用 PyTorch 或 Tensorflow 的传统方式。

保存流水线的模型和分词器。

`transform`

<来源>

( X )

Scikit / Keras 接口到 transformers 的管道。这种方法将转发到call()。

Transformers 4.37 中文文档（十七）（5）

多模态

文档问答管道

`class transformers.DocumentQuestionAnsweringPipeline`

`call`

FeatureExtractionPipeline

`class transformers.FeatureExtractionPipeline`

`call`

ImageToTextPipeline

`class transformers.ImageToTextPipeline`

`call`

MaskGenerationPipeline

`class transformers.MaskGenerationPipeline`

`call`

VisualQuestionAnsweringPipeline

`class transformers.VisualQuestionAnsweringPipeline`

`call`

父类：Pipeline

`class transformers.Pipeline`

`check_model_type`

`device_placement`

`ensure_tensor_on_device`

`postprocess`

`predict`

`preprocess`

`save_pretrained`

`transform`

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Transformers 4.37 中文文档（十七）（5）

多模态

文档问答管道

class transformers.DocumentQuestionAnsweringPipeline

__call__

FeatureExtractionPipeline

class transformers.FeatureExtractionPipeline

__call__

ImageToTextPipeline

class transformers.ImageToTextPipeline

__call__

MaskGenerationPipeline

class transformers.MaskGenerationPipeline

__call__

VisualQuestionAnsweringPipeline

class transformers.VisualQuestionAnsweringPipeline

__call__

父类：Pipeline

class transformers.Pipeline

check_model_type

device_placement

ensure_tensor_on_device

postprocess

predict

preprocess

save_pretrained

transform

热门文章

最新文章

相关课程

相关电子书

`class transformers.DocumentQuestionAnsweringPipeline`

`call`

`class transformers.FeatureExtractionPipeline`

`call`

`class transformers.ImageToTextPipeline`

`call`

`class transformers.MaskGenerationPipeline`

`call`

`class transformers.VisualQuestionAnsweringPipeline`

`call`

`class transformers.Pipeline`

`check_model_type`

`device_placement`

`ensure_tensor_on_device`

`postprocess`

`predict`

`preprocess`

`save_pretrained`

`transform`