Transformers 4.37 中文文档（六十七）（3）-阿里云开发者社区

Transformers 4.37 中文文档（六十七）（2）https://developer.aliyun.com/article/1564112

DPTForDepthEstimation

`class transformers.DPTForDepthEstimation`

( config )

参数

config (ViTConfig) — 包含模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained()方法以加载模型权重。

带有深度估计头部的 DPT 模型（包含 3 个卷积层），例如用于 KITTI、NYUv2。

这个模型是 PyTorch 的torch.nn.Module子类。将其用作常规的 PyTorch 模块，并参考 PyTorch 文档以获取与一般用法和行为相关的所有内容。

`forward`

< source >

( pixel_values: FloatTensor head_mask: Optional = None labels: Optional = None output_attentions: Optional = None output_hidden_states: Optional = None return_dict: Optional = None ) → export const metadata = 'undefined';transformers.modeling_outputs.DepthEstimatorOutput or tuple(torch.FloatTensor

参数

pixel_values (torch.FloatTensor of shape (batch_size, num_channels, height, width)) — 像素值。像素值可以使用 AutoImageProcessor 获得。有关详细信息，请参阅 DPTImageProcessor.call()。
head_mask（形状为(num_heads,)或(num_layers, num_heads)的torch.FloatTensor，可选）— 用于使自注意力模块中的选定头部失效的掩码。掩码值选择在[0, 1]之间：

1 表示头部未被屏蔽，
0 表示头部被屏蔽。

output_attentions（bool，可选）— 是否返回所有注意力层的注意力张量。有关更多详细信息，请查看返回张量下的attentions。
output_hidden_states（bool，可选）— 是否返回所有层的隐藏状态。有关更多详细信息，请查看返回张量下的hidden_states。
return_dict（bool，可选）— 是否返回 ModelOutput 而不是普通元组。
labels（形状为(batch_size, height, width)的torch.LongTensor，可选）— 用于计算损失的地面真实深度估计图。

transformers.modeling_outputs.DepthEstimatorOutput 或tuple(torch.FloatTensor)

transformers.modeling_outputs.DepthEstimatorOutput 或一个torch.FloatTensor元组（如果传递return_dict=False或config.return_dict=False）包含根据配置（DPTConfig）和输入的不同元素。

loss（形状为(1,)的torch.FloatTensor，可选，当提供labels时返回）— 分类（或如果config.num_labels==1则为回归）损失。
predicted_depth（形状为(batch_size, height, width)的torch.FloatTensor）— 每个像素的预测深度。
hidden_states（tuple(torch.FloatTensor)，可选，当传递output_hidden_states=True或config.output_hidden_states=True时返回）— 形状为(batch_size, num_channels, height, width)的torch.FloatTensor元组（如果模型有嵌入层的输出一个，+ 每一层的输出一个）。
模型在每一层输出的隐藏状态加上可选的初始嵌入输出。
attentions（tuple(torch.FloatTensor)，可选，当传递output_attentions=True或config.output_attentions=True时返回）— 形状为(batch_size, num_heads, patch_size, sequence_length)的torch.FloatTensor元组（每层一个）。
在自注意力头中用于计算加权平均值的注意力权重在注意力 softmax 之后。

DPTForDepthEstimation 的前向方法，覆盖__call__特殊方法。

虽然前向传递的步骤需要在此函数内定义，但应该在此之后调用Module实例，而不是在此处调用，因为前者会负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoImageProcessor, DPTForDepthEstimation
>>> import torch
>>> import numpy as np
>>> from PIL import Image
>>> import requests
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> image_processor = AutoImageProcessor.from_pretrained("Intel/dpt-large")
>>> model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large")
>>> # prepare image for the model
>>> inputs = image_processor(images=image, return_tensors="pt")
>>> with torch.no_grad():
...     outputs = model(**inputs)
...     predicted_depth = outputs.predicted_depth
>>> # interpolate to original size
>>> prediction = torch.nn.functional.interpolate(
...     predicted_depth.unsqueeze(1),
...     size=image.size[::-1],
...     mode="bicubic",
...     align_corners=False,
... )
>>> # visualize the prediction
>>> output = prediction.squeeze().cpu().numpy()
>>> formatted = (output * 255 / np.max(output)).astype("uint8")
>>> depth = Image.fromarray(formatted)

DPTForSemanticSegmentation

`class transformers.DPTForSemanticSegmentation`

<来源>

( config )

参数

config（ViTConfig）— 具有模型所有参数的模型配置类。使用配置文件初始化不会加载与模型关联的权重，只加载配置。查看 from_pretrained()方法以加载模型权重。

带有语义分割头的 DPT 模型，例如 ADE20k，CityScapes。

这个模型是一个 PyTorch torch.nn.Module子类。将其用作常规的 PyTorch 模块，并参考 PyTorch 文档以获取有关一般用法和行为的所有相关信息。

forward

<来源>

( pixel_values: Optional = None head_mask: Optional = None labels: Optional = None output_attentions: Optional = None output_hidden_states: Optional = None return_dict: Optional = None ) → export const metadata = 'undefined';transformers.modeling_outputs.SemanticSegmenterOutput or tuple(torch.FloatTensor)

参数

pixel_values (torch.FloatTensor，形状为(batch_size, num_channels, height, width)) — 像素值。像素值可以使用 AutoImageProcessor 获取。有关详细信息，请参阅 DPTImageProcessor.call()。
head_mask (torch.FloatTensor，形状为(num_heads,)或(num_layers, num_heads)，可选) — 用于使自注意力模块的选定头部失效的掩码。掩码值选定在[0, 1]中：

1 表示头部未被遮罩，
0 表示头部被遮罩。

output_attentions (bool，可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参阅返回张量下的attentions。
output_hidden_states (bool，可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参阅返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回一个 ModelOutput 而不是一个普通的元组。
labels (torch.LongTensor，形状为(batch_size, height, width)，可选) — 用于计算损失的地面真实语义分割地图。索引应在[0, ..., config.num_labels - 1]中。如果config.num_labels > 1，则计算分类损失（交叉熵）。

transformers.modeling_outputs.SemanticSegmenterOutput 或tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.SemanticSegmenterOutput 或一个torch.FloatTensor元组（如果传递了return_dict=False或config.return_dict=False）包含各种元素，具体取决于配置（DPTConfig）和输入。

loss (torch.FloatTensor，形状为(1,)，可选，当提供labels时返回） — 分类（或回归，如果config.num_labels==1）损失。
logits (torch.FloatTensor，形状为(batch_size, config.num_labels, logits_height, logits_width)) — 每个像素的分类分数。
返回的 logits 不一定与作为输入传递的pixel_values具有相同的大小。这是为了避免进行两次插值并在用户需要将 logits 调整为原始图像大小时丢失一些质量。您应该始终检查您的 logits 形状并根据需要调整大小。
hidden_states (tuple(torch.FloatTensor)，可选，当传递output_hidden_states=True或config.output_hidden_states=True时返回） — 形状为(batch_size, patch_size, hidden_size)的torch.FloatTensor元组（如果模型具有嵌入层，则为嵌入的输出的一个+每层输出的一个）。
模型在每一层输出时的隐藏状态加上可选的初始嵌入输出。
attentions (tuple(torch.FloatTensor)，可选，当传递output_attentions=True或config.output_attentions=True时返回） — 形状为(batch_size, num_heads, patch_size, sequence_length)的torch.FloatTensor元组（每层一个）。
注意力 softmax 后的注意力权重，用于计算自注意力头中的加权平均值。

DPTForSemanticSegmentation 的前向方法，覆盖了 __call__ 特殊方法。

虽然前向传递的步骤需要在这个函数中定义，但应该在此之后调用 Module 实例，而不是这个函数，因为前者会处理运行前后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoImageProcessor, DPTForSemanticSegmentation
>>> from PIL import Image
>>> import requests
>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> image_processor = AutoImageProcessor.from_pretrained("Intel/dpt-large-ade")
>>> model = DPTForSemanticSegmentation.from_pretrained("Intel/dpt-large-ade")
>>> inputs = image_processor(images=image, return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits

EfficientFormer

原始文本：huggingface.co/docs/transformers/v4.37.2/en/model_doc/efficientformer

概述

EfficientFormer 模型是由 Yanyu Li, Geng Yuan, Yang Wen, Eric Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren 在EfficientFormer: Vision Transformers at MobileNet Speed中提出的。EfficientFormer 提出了一个维度一致的纯 Transformer，可以在移动设备上运行，用于像图像分类、目标检测和语义分割这样的密集预测任务。

论文摘要如下：

Vision Transformers（ViT）在计算机视觉任务中取得了快速进展，在各种基准测试中取得了令人满意的结果。然而，由于参数数量庞大和模型设计（如注意力机制）等原因，基于 ViT 的模型通常比轻量级卷积网络慢。因此，将 ViT 部署到实时应用中尤为具有挑战性，特别是在资源受限的硬件上，如移动设备。最近的努力通过网络架构搜索或与 MobileNet 块混合设计来减少 ViT 的计算复杂性，但推理速度仍然不尽人意。这带来了一个重要问题：可以让 transformers 像 MobileNet 一样快速运行并获得高性能吗？为了回答这个问题，我们首先重新审视了 ViT-based 模型中使用的网络架构和运算符，并确定了低效的设计。然后，我们引入了一个维度一致的纯 Transformer（不包含 MobileNet 块）作为设计范式。最后，我们进行了基于延迟的精简，得到了一系列被称为 EfficientFormer 的最终模型。大量实验证明了 EfficientFormer 在移动设备上性能和速度方面的优越性。我们最快的模型 EfficientFormer-L1，在 iPhone 12 上（使用 CoreML 编译），仅需 1.6 毫秒的推理延迟就能实现 ImageNet-1K 的 79.2% top-1 准确率，这与 MobileNetV2×1.4（1.6 毫秒，74.7% top-1）一样快，而我们最大的模型 EfficientFormer-L7，在仅 7.0 毫秒的延迟下获得了 83.3%的准确率。我们的工作证明了经过合理设计的 transformers 可以在移动设备上达到极低的延迟，同时保持高性能。

这个模型是由novice03和Bearnardd贡献的。原始代码可以在这里找到。这个模型的 TensorFlow 版本是由D-Roberts添加的。

文档资源

图像分类任务指南

EfficientFormerConfig

`class transformers.EfficientFormerConfig`

< source >

( depths: List = [3, 2, 6, 4] hidden_sizes: List = [48, 96, 224, 448] downsamples: List = [True, True, True, True] dim: int = 448 key_dim: int = 32 attention_ratio: int = 4 resolution: int = 7 num_hidden_layers: int = 5 num_attention_heads: int = 8 mlp_expansion_ratio: int = 4 hidden_dropout_prob: float = 0.0 patch_size: int = 16 num_channels: int = 3 pool_size: int = 3 downsample_patch_size: int = 3 downsample_stride: int = 2 downsample_pad: int = 1 drop_path_rate: float = 0.0 num_meta3d_blocks: int = 1 distillation: bool = True use_layer_scale: bool = True layer_scale_init_value: float = 1e-05 hidden_act: str = 'gelu' initializer_range: float = 0.02 layer_norm_eps: float = 1e-12 image_size: int = 224 batch_norm_eps: float = 1e-05 **kwargs )

参数

depths (List(int), 可选, 默认为[3, 2, 6, 4]) — 每个阶段的深度。
hidden_sizes (List(int), 可选, 默认为[48, 96, 224, 448]) — 每个阶段的维度。
downsamples (List(bool), 可选, 默认为[True, True, True, True]) — 是否在两个阶段之间对输入进行下采样。
dim (int, 可选, 默认为 448) — Meta3D 层中的通道数量
key_dim (int, 可选, 默认为 32) — meta3D 块中键的大小。
attention_ratio (int, 可选, 默认为 4) — MSHA 块中查询和值的维度与键的维度之比
resolution (int, 可选, 默认为 7) — 每个 patch 的大小
num_hidden_layers (int, 可选, 默认为 5) — Transformer 编码器中的隐藏层数量。
num_attention_heads (int, 可选, 默认为 8) — 3D MetaBlock 中每个注意力层的注意力头数量。
mlp_expansion_ratio (int，可选，默认为 4) — MLP 隐藏维度大小与其输入维度大小的比率。
hidden_dropout_prob (float，可选，默认为 0.1) — 嵌入和编码器中所有全连接层的丢弃概率。
patch_size (int，可选，默认为 16) — 每个补丁的大小（分辨率）。
num_channels (int，可选，默认为 3) — 输入通道的数量。
pool_size (int，可选，默认为 3) — 池化层的核大小。
downsample_patch_size (int，可选，默认为 3) — 下采样层中补丁的大小。
downsample_stride (int，可选，默认为 2) — 下采样层中卷积核的步幅。
downsample_pad (int，可选，默认为 1) — 下采样层中的填充。
drop_path_rate (int，可选，默认为 0) — 在 DropPath 中增加丢失概率的速率。
num_meta3d_blocks (int，可选，默认为 1) — 最后阶段中的 3D MetaBlocks 的数量。
distillation (bool，可选，默认为 True) — 是否添加蒸馏头。
use_layer_scale (bool，可选，默认为 True) — 是否对标记混合器的输出进行缩放。
layer_scale_init_value (float，可选，默认为 1e-5) — 从标记混合器输出进行缩放的因子。
hidden_act (str 或 function，可选，默认为 "gelu") — 编码器和池化器中的非线性激活函数（函数或字符串）。如果是字符串，支持 "gelu"、"relu"、"selu" 和 "gelu_new"。
initializer_range (float，可选，默认为 0.02) — 用于初始化所有权重矩阵的截断正态初始化器的标准差。
layer_norm_eps (float，可选，默认为 1e-12) — 层归一化层使用的 epsilon。
image_size (int，可选，默认为 224) — 每个图像的大小（分辨率）。

这是一个配置类，用于存储 EfficientFormerModel 的配置。根据指定的参数实例化 EfficientFormer 模型，定义模型架构。使用默认值实例化配置将产生类似于 EfficientFormer snap-research/efficientformer-l1 架构的配置。

配置对象继承自 PretrainedConfig，可用于控制模型输出。阅读 PretrainedConfig 的文档以获取更多信息。

示例：

>>> from transformers import EfficientFormerConfig, EfficientFormerModel
>>> # Initializing a EfficientFormer efficientformer-l1 style configuration
>>> configuration = EfficientFormerConfig()
>>> # Initializing a EfficientFormerModel (with random weights) from the efficientformer-l3 style configuration
>>> model = EfficientFormerModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config

EfficientFormerImageProcessor

`class transformers.EfficientFormerImageProcessor`

< source >

( do_resize: bool = True size: Optional = None resample: Resampling = <Resampling.BICUBIC: 3> do_center_crop: bool = True do_rescale: bool = True rescale_factor: Union = 0.00392156862745098 crop_size: Dict = None do_normalize: bool = True image_mean: Union = None image_std: Union = None **kwargs )

参数

do_resize (bool，可选，默认为 True) — 是否将图像的（高度、宽度）维度调整为指定的 (size["height"], size["width"])。可以被 preprocess 方法中的 do_resize 参数覆盖。
size (dict，可选，默认为 {"height" -- 224, "width": 224})：调整大小后的输出图像大小。可以被 preprocess 方法中的 size 参数覆盖。
resample (PILImageResampling，可选，默认为 PILImageResampling.BILINEAR) — 调整图像大小时要使用的重采样滤波器。可以被 preprocess 方法中的 resample 参数覆盖。
do_center_crop (bool，可选，默认为 True) — 是否将图像居中裁剪到指定的 crop_size。可以被 preprocess 方法中的 do_center_crop 覆盖。
crop_size (Dict[str, int] 可选，默认为 224) — 应用center_crop后输出图像的大小。可以被preprocess方法中的crop_size覆盖。
do_rescale (bool，可选，默认为True) — 是否按指定比例rescale_factor重新缩放图像。可以被preprocess方法中的do_rescale参数覆盖。
rescale_factor (int或float，可选，默认为1/255) — 如果重新缩放图像，则使用的比例因子。可以被preprocess方法中的rescale_factor参数覆盖。do_normalize — 是否对图像进行归一化。可以被preprocess方法中的do_normalize参数覆盖。
image_mean (float或List[float]，可选，默认为IMAGENET_STANDARD_MEAN) — 如果对图像进行归一化，则使用的均值。这是一个浮点数或与图像中通道数相同长度的浮点数列表。可以被preprocess方法中的image_mean参数覆盖。
image_std (float或List[float]，可选，默认为IMAGENET_STANDARD_STD) — 如果do_normalize设置为True，则使用的标准差。这是一个浮点数或与图像中通道数相同长度的浮点数列表。可以被preprocess方法中的image_std参数覆盖。

构建一个 EfficientFormer 图像处理器。

`preprocess`

<来源>

( images: Union do_resize: Optional = None size: Dict = None resample: Resampling = None do_center_crop: bool = None crop_size: int = None do_rescale: Optional = None rescale_factor: Optional = None do_normalize: Optional = None image_mean: Union = None image_std: Union = None return_tensors: Union = None data_format: Union = <ChannelDimension.FIRST: 'channels_first'> input_data_format: Union = None **kwargs )

参数

images (ImageInput) — 要预处理的图像。期望单个或批量图像，像素值范围为 0 到 255。如果传入像素值在 0 到 1 之间的图像，请设置do_rescale=False。
do_resize (bool，可选，默认为self.do_resize) — 是否调整图像大小。
size (Dict[str, int]，可选，默认为self.size) — 以{"height": h, "width": w}格式指定调整大小后输出图像的大小的字典。
resample (PILImageResampling过滤器，可选，默认为self.resample) — 调整图像大小时要使用的PILImageResampling过滤器，例如PILImageResampling.BILINEAR。仅在do_resize设置为True时有效。
do_center_crop (bool，可选，默认为self.do_center_crop) — 是否对图像进行中心裁剪。
do_rescale (bool，可选，默认为self.do_rescale) — 是否将图像值重新缩放在[0 - 1]之间。
rescale_factor (float，可选，默认为self.rescale_factor) — 如果do_rescale设置为True，则重新缩放图像的缩放因子。
crop_size (Dict[str, int]，可选，默认为self.crop_size) — 中心裁剪的大小。仅在do_center_crop设置为True时有效。
do_normalize (bool，可选，默认为self.do_normalize) — 是否对图像进行归一化。
image_mean (float或List[float]，可选，默认为self.image_mean) — 如果do_normalize设置为True，则使用的图像均值。
image_std (float或List[float]，可选，默认为self.image_std) — 如果do_normalize设置为True，则使用的图像标准差。
return_tensors (str或TensorType，可选) — 要返回的张量类型。可以是以下之一：

未设置：返回一个np.ndarray列表。
TensorType.TENSORFLOW或'tf'：返回类型为tf.Tensor的批处理。
TensorType.PYTORCH或'pt'：返回类型为torch.Tensor的批处理。
TensorType.NUMPY或'np'：返回类型为np.ndarray的批处理。
TensorType.JAX或'jax'：返回类型为jax.numpy.ndarray的批处理。

data_format (ChannelDimension或str，可选，默认为ChannelDimension.FIRST) — 输出图像的通道维度格式。可以是以下之一：

"channels_first"或ChannelDimension.FIRST：图像以（通道数，高度，宽度）格式。
"channels_last"或ChannelDimension.LAST：图像以（高度，宽度，通道数）格式。
未设置：使用输入图像的通道维度格式。

input_data_format（ChannelDimension或str，可选） — 输入图像的通道维度格式。如果未设置，则从输入图像中推断通道维度格式。可以是以下之一：

"channels_first"或ChannelDimension.FIRST：图像以（通道数，高度，宽度）格式。
"channels_last"或ChannelDimension.LAST：图像以（高度，宽度，通道数）格式。
"none"或ChannelDimension.NONE：图像以（高度，宽度）格式。

预处理一张图像或一批图像。

PytorchHide Pytorch 内容

EfficientFormerModel

`class transformers.EfficientFormerModel`

<来源>

( config: EfficientFormerConfig )

参数

config（EfficientFormerConfig） — 具有模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只加载配置。查看 from_pretrained()方法以加载模型权重。

EfficientFormer 模型是一个裸的 transformer 模型，输出原始的隐藏状态，没有特定的头部。这个模型是 PyTorch nn.Module的子类。将其用作常规的 PyTorch 模块，并参考 PyTorch 文档以获取有关一般用法和行为的所有相关信息。

`forward`

<来源>

( pixel_values: Optional = None output_attentions: Optional = None output_hidden_states: Optional = None return_dict: Optional = None ) → export const metadata = 'undefined';transformers.modeling_outputs.BaseModelOutputWithPooling or tuple(torch.FloatTensor)

参数

pixel_values（形状为(batch_size, num_channels, height, width)的torch.FloatTensor） — 像素值。可以使用 ViTImageProcessor 获取像素值。有关详细信息，请参阅 ViTImageProcessor.preprocess()。
output_attentions (bool, 可选) — 是否返回所有注意力层的注意力张量。有关更多详细信息，请参阅返回张量中的attentions。
output_hidden_states (bool, 可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请参阅返回张量中的hidden_states。
return_dict（bool，可选） — 是否返回一个 ModelOutput 而不是一个普通元组。

transformers.modeling_outputs.BaseModelOutputWithPooling 或tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.BaseModelOutputWithPooling 或一个torch.FloatTensor元组（如果传递了return_dict=False或config.return_dict=False时）包含各种元素，取决于配置（EfficientFormerConfig）和输入。

last_hidden_state（形状为(batch_size, sequence_length, hidden_size)的torch.FloatTensor） — 模型最后一层的输出的隐藏状态序列。
pooler_output（形状为(batch_size, hidden_size)的torch.FloatTensor） — 序列的最后一层隐藏状态的第一个标记（分类标记）经过用于辅助预训练任务的层进一步处理后的隐藏状态。例如，对于 BERT 系列模型，这返回经过线性层和 tanh 激活函数处理后的分类标记。线性层的权重是从预训练期间的下一个句子预测（分类）目标中训练的。
hidden_states (tuple(torch.FloatTensor), 可选的, 当传递output_hidden_states=True或者当config.output_hidden_states=True时返回) — 形状为(batch_size, sequence_length, hidden_size)的torch.FloatTensor元组。
模型在每一层输出的隐藏状态以及可选的初始嵌入输出。
attentions (tuple(torch.FloatTensor), 可选的, 当传递output_attentions=True或者当config.output_attentions=True时返回) — 形状为(batch_size, num_heads, sequence_length, sequence_length)的torch.FloatTensor元组。
在自注意力头中用于计算加权平均值的注意力 softmax 之后的注意力权重。

EfficientFormerModel 的前向方法，覆盖了__call__特殊方法。

虽然前向传递的步骤需要在此函数内定义，但应该在此之后调用Module实例而不是这个，因为前者会处理运行前后处理步骤，而后者会默默地忽略它们。

示例:

>>> from transformers import AutoImageProcessor, EfficientFormerModel
>>> import torch
>>> from datasets import load_dataset
>>> dataset = load_dataset("huggingface/cats-image")
>>> image = dataset["test"]["image"][0]
>>> image_processor = AutoImageProcessor.from_pretrained("snap-research/efficientformer-l1-300")
>>> model = EfficientFormerModel.from_pretrained("snap-research/efficientformer-l1-300")
>>> inputs = image_processor(image, return_tensors="pt")
>>> with torch.no_grad():
...     outputs = model(**inputs)
>>> last_hidden_states = outputs.last_hidden_state
>>> list(last_hidden_states.shape)
[1, 49, 448]

Transformers 4.37 中文文档（六十七）（4）https://developer.aliyun.com/article/1564114

Transformers 4.37 中文文档（六十七）（3）

DPTForDepthEstimation

`class transformers.DPTForDepthEstimation`

`forward`

DPTForSemanticSegmentation

`class transformers.DPTForSemanticSegmentation`

EfficientFormer

概述

文档资源

EfficientFormerConfig

`class transformers.EfficientFormerConfig`

EfficientFormerImageProcessor

`class transformers.EfficientFormerImageProcessor`

`preprocess`

EfficientFormerModel

`class transformers.EfficientFormerModel`

`forward`

热门文章

最新文章

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Transformers 4.37 中文文档（六十七）（3）

DPTForDepthEstimation

class transformers.DPTForDepthEstimation

forward

DPTForSemanticSegmentation

class transformers.DPTForSemanticSegmentation

EfficientFormer

概述

文档资源

EfficientFormerConfig

class transformers.EfficientFormerConfig

EfficientFormerImageProcessor

class transformers.EfficientFormerImageProcessor

preprocess

EfficientFormerModel

class transformers.EfficientFormerModel

forward

热门文章

最新文章

相关电子书

`class transformers.DPTForDepthEstimation`

`forward`

`class transformers.DPTForSemanticSegmentation`

`class transformers.EfficientFormerConfig`

`class transformers.EfficientFormerImageProcessor`

`preprocess`

`class transformers.EfficientFormerModel`

`forward`