Transformers 4.37 中文文档（六十四）（5）-阿里云开发者社区

Transformers 4.37 中文文档（六十四）（4）https://developer.aliyun.com/article/1564126

Big Transfer (BiT)

原文链接：huggingface.co/docs/transformers/v4.37.2/en/model_doc/bit

概述

BiT 模型是由 Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby 在 Big Transfer (BiT): General Visual Representation Learning 中提出的。BiT 是一个简单的方法，用于扩大类似于 ResNet（具体来说是 ResNetv2）的架构的预训练。该方法在迁移学习方面取得了显著的改进。

该论文的摘要如下：

预训练表示的迁移提高了样本效率，并简化了训练视觉深度神经网络时的超参数调整。我们重新审视了在大型监督数据集上进行预训练并在目标任务上微调模型的范例。我们扩大了预训练规模，并提出了一个我们称之为 Big Transfer (BiT) 的简单方法。通过组合几个精心选择的组件，并使用简单的启发式方法进行迁移，我们在超过 20 个数据集上取得了强大的性能。BiT 在各种数据范例中表现良好 — 从每类 1 个示例到总共 1M 个示例。BiT 在 ILSVRC-2012 上达到了 87.5% 的 top-1 准确率，在 CIFAR-10 上达到了 99.4%，在 19 个任务的 Visual Task Adaptation Benchmark (VTAB) 上达到了 76.3%。在小数据集上，BiT 在每类 10 个示例的情况下，ILSVRC-2012 达到了 76.8%，在 CIFAR-10 上达到了 97.0%。我们对导致高迁移性能的主要组件进行了详细分析。

此模型由 nielsr 贡献。原始代码可在此处找到。

使用提示

BiT 模型在架构上等同于 ResNetv2，除了：1）所有批量归一化层都被 group normalization 取代，2）卷积层使用 weight standardization。作者表明，这两者的结合对于使用大批量大小进行训练是有用的，并且对迁移学习有显著影响。

资源

一系列官方 Hugging Face 和社区（由 🌎 表示）资源，可帮助您开始使用 BiT。

图像分类

BitForImageClassification 受此示例脚本和笔记本支持。
另请参阅：图像分类任务指南

如果您有兴趣提交资源以包含在此处，请随时打开 Pull Request，我们将进行审查！资源应该理想地展示一些新内容，而不是重复现有资源。

BitConfig

`class transformers.BitConfig`

< source >

( num_channels = 3 embedding_size = 64 hidden_sizes = [256, 512, 1024, 2048] depths = [3, 4, 6, 3] layer_type = 'preactivation' hidden_act = 'relu' global_padding = None num_groups = 32 drop_path_rate = 0.0 embedding_dynamic_padding = False output_stride = 32 width_factor = 1 out_features = None out_indices = None **kwargs )

参数

num_channels (int, optional, defaults to 3) — 输入通道数。
embedding_size (int, optional, defaults to 64) — 嵌入层的维度（隐藏大小）。
hidden_sizes (List[int], optional, defaults to [256, 512, 1024, 2048]) — 每个阶段的维度（隐藏大小）。
depths (List[int], optional, defaults to [3, 4, 6, 3]) — 每个阶段的深度（层数）。
layer_type (str, optional, defaults to "preactivation") — 要使用的层，可以是 "preactivation" 或 "bottleneck"。
hidden_act (str, optional, defaults to "relu") — 每个块中的非线性激活函数。如果是字符串，支持 "gelu", "relu", "selu" 和 "gelu_new"。
global_padding (str, 可选) — 用于卷积层的填充策略。可以是 "valid"、"same" 或 None。
num_groups (int, 可选, 默认为 32) — 用于 BitGroupNormActivation 层的组数。
drop_path_rate (float, 可选, 默认为 0.0) — 随机深度的丢弃路径率。
embedding_dynamic_padding (bool, 可选, 默认为 False) — 是否使用动态填充来进行嵌入层。
output_stride (int, 可选, 默认为 32) — 模型的输出步幅。
width_factor (int, 可选, 默认为 1) — 模型的宽度因子。
out_features (List[str], 可选) — 如果用作主干，要输出的特征列表。可以是 "stem"、"stage1"、"stage2" 等（取决于模型有多少阶段）。如果未设置且设置了 out_indices，将默认为相应的阶段。如果未设置且未设置 out_indices，将默认为最后一个阶段。必须按照 stage_names 属性中定义的顺序。
out_indices (List[int], 可选) — 如果用作主干，要输出的特征的索引列表。可以是 0、1、2 等（取决于模型有多少阶段）。如果未设置且设置了 out_features，将默认为相应的阶段。如果未设置且未设置 out_features，将默认为最后一个阶段。必须按照 stage_names 属性中定义的顺序。

这是一个配置类，用于存储 BitModel 的配置。根据指定的参数实例化一个 BiT 模型，定义模型架构。使用默认值实例化配置将产生类似于 BiT google/bit-50 架构的配置。

配置对象继承自 PretrainedConfig，可用于控制模型输出。阅读 PretrainedConfig 的文档以获取更多信息。

示例：

>>> from transformers import BitConfig, BitModel
>>> # Initializing a BiT bit-50 style configuration
>>> configuration = BitConfig()
>>> # Initializing a model (with random weights) from the bit-50 style configuration
>>> model = BitModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config

BitImageProcessor

`class transformers.BitImageProcessor`

< source >

( do_resize: bool = True size: Dict = None resample: Resampling = <Resampling.BICUBIC: 3> do_center_crop: bool = True crop_size: Dict = None do_rescale: bool = True rescale_factor: Union = 0.00392156862745098 do_normalize: bool = True image_mean: Union = None image_std: Union = None do_convert_rgb: bool = True **kwargs )

参数

do_resize (bool, 可选, 默认为 True) — 是否将图像的（高度，宽度）维度调整为指定的 size。可以被 preprocess 方法中的 do_resize 覆盖。
size (Dict[str, int] 可选, 默认为 {"shortest_edge" -- 224}): 调整大小后的图像大小。图像的最短边被调整为 size[“shortest_edge”]，最长边被调整以保持输入的纵横比。可以被 preprocess 方法中的 size 覆盖。
resample (PILImageResampling, 可选, 默认为 PILImageResampling.BICUBIC) — 如果调整图像大小，则使用的重采样滤波器。可以被 preprocess 方法中的 resample 覆盖。
do_center_crop (bool, 可选, 默认为 True) — 是否将图像居中裁剪到指定的 crop_size。可以被 preprocess 方法中的 do_center_crop 覆盖。
crop_size (Dict[str, int] 可选, 默认为 224) — 应用 center_crop 后的输出图像大小。可以被 preprocess 方法中的 crop_size 覆盖。
do_rescale (bool, 可选, 默认为 True) — 是否按照指定的比例 rescale_factor 重新缩放图像。可以被 preprocess 方法中的 do_rescale 覆盖。
rescale_factor (int 或 float, 可选, 默认为 1/255) — 如果重新调整图像，则使用的比例因子。可以被 preprocess 方法中的 rescale_factor 覆盖。do_normalize — 是否对图像进行归一化。可以被 preprocess 方法中的 do_normalize 覆盖。
image_mean (float 或 List[float], optional, defaults to OPENAI_CLIP_MEAN) — 如果归一化图像，则使用的均值。这是一个浮点数或与图像中通道数相同长度的浮点数列表。可以被 preprocess 方法中的 image_mean 参数覆盖。
image_std (float 或 List[float], optional, defaults to OPENAI_CLIP_MEAN) — 如果归一化图像，则使用的标准差。这是一个浮点数或与图像中通道数相同长度的浮点数列表。可以被 preprocess 方法中的 image_std 参数覆盖。可以被 preprocess 方法中的 image_std 参数覆盖。
do_convert_rgb (bool, optional, defaults to True) — 是否将图像转换为 RGB。

构造一个 BiT 图像处理器。

`preprocess`

< source >

( images: Union do_resize: bool = None size: Dict = None resample: Resampling = None do_center_crop: bool = None crop_size: int = None do_rescale: bool = None rescale_factor: float = None do_normalize: bool = None image_mean: Union = None image_std: Union = None do_convert_rgb: bool = None return_tensors: Union = None data_format: Optional = <ChannelDimension.FIRST: 'channels_first'> input_data_format: Union = None **kwargs )

参数

images (ImageInput) — 要预处理的图像。期望单个或批量图像，像素值范围为 0 到 255。如果传入像素值在 0 到 1 之间的图像，请设置 do_rescale=False。
do_resize (bool, optional, defaults to self.do_resize) — 是否调整图像大小。
size (Dict[str, int], optional, defaults to self.size) — 调整大小后的图像尺寸。图像的最短边被调整为 size[“shortest_edge”]，最长边被调整以保持输入的长宽比。
resample (int, optional, defaults to self.resample) — 如果调整图像大小，则使用的重采样滤波器。这可以是枚举 PILImageResampling 之一。仅在 do_resize 设置为 True 时有效。
do_center_crop (bool, optional, defaults to self.do_center_crop) — 是否对图像进行中心裁剪。
crop_size (Dict[str, int], optional, defaults to self.crop_size) — 中心裁剪的尺寸。仅在 do_center_crop 设置为 True 时有效。
do_rescale (bool, optional, defaults to self.do_rescale) — 是否重新缩放图像。
rescale_factor (float, optional, defaults to self.rescale_factor) — 如果 do_rescale 设置为 True，则重新缩放图像的重新缩放因子。
do_normalize (bool, optional, defaults to self.do_normalize) — 是否对图像进行归一化。
image_mean (float 或 List[float], optional, defaults to self.image_mean) — 用于归一化的图像均值。仅在 do_normalize 设置为 True 时有效。
image_std (float 或 List[float], optional, defaults to self.image_std) — 用于归一化的图像标准差。仅在 do_normalize 设置为 True 时有效。
do_convert_rgb (bool, optional, defaults to self.do_convert_rgb) — 是否将图像转换为 RGB。
return_tensors (str 或 TensorType, optional) — 要返回的张量类型。可以是以下之一：

未设置：返回一个np.ndarray列表。
TensorType.TENSORFLOW 或 'tf'：返回类型为 tf.Tensor 的批处理。
TensorType.PYTORCH 或 'pt'：返回类型为 torch.Tensor 的批处理。
TensorType.NUMPY 或 'np'：返回类型为 np.ndarray 的批处理。
TensorType.JAX 或 'jax'：返回类型为 jax.numpy.ndarray 的批处理。

data_format (ChannelDimension 或 str, optional, defaults to ChannelDimension.FIRST) — 输出图像的通道维度格式。可以是以下之一：

"channels_first" 或 ChannelDimension.FIRST：图像以 (num_channels, height, width) 格式。
"channels_last" 或 ChannelDimension.LAST：图像以 (height, width, num_channels) 格式。
未设置：使用输入图像的通道维度格式。

input_data_format (ChannelDimension 或 str, optional) — 输入图像的通道维度格式。如果未设置，则从输入图像推断通道维度格式。可以是以下之一：

"channels_first" 或 ChannelDimension.FIRST：图像以 (num_channels, height, width) 格式。
"channels_last" 或 ChannelDimension.LAST：图像以 (height, width, num_channels) 格式。
"none"或ChannelDimension.NONE：图像以（高度，宽度）格式。

预处理图像或图像批次。

BitModel

`class transformers.BitModel`

<来源>

( config )

参数

config (BitConfig) — 模型的所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只加载配置。查看 from_pretrained()方法以加载模型权重。

裸 BiT 模型输出原始特征，没有特定的头部。这个模型是 PyTorch torch.nn.Module子类。将其用作常规 PyTorch 模块，并参考 PyTorch 文档以获取有关一般用法和行为的所有相关信息。

`forward`

<来源>

( pixel_values: Tensor output_hidden_states: Optional = None return_dict: Optional = None ) → export const metadata = 'undefined';transformers.modeling_outputs.BaseModelOutputWithPoolingAndNoAttention or tuple(torch.FloatTensor)

参数

pixel_values (torch.FloatTensor，形状为(batch_size, num_channels, height, width)) — 像素值。可以使用 AutoImageProcessor 获取像素值。有关详细信息，请参阅 BitImageProcessor.call()。
output_hidden_states (bool，可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请查看返回张量下的hidden_states。
return_dict (bool, 可选) — 是否返回 ModelOutput 而不是普通元组。

transformers.modeling_outputs.BaseModelOutputWithPoolingAndNoAttention或tuple(torch.FloatTensor)

一个transformers.modeling_outputs.BaseModelOutputWithPoolingAndNoAttention或一个torch.FloatTensor元组（如果传递return_dict=False或config.return_dict=False时）包含各种元素，具体取决于配置(BitConfig)和输入。

last_hidden_state (torch.FloatTensor，形状为(batch_size, num_channels, height, width)) — 模型最后一层的隐藏状态序列。
pooler_output (torch.FloatTensor，形状为(batch_size, hidden_size)) — 在空间维度上进行池化操作后的最后一层隐藏状态。
hidden_states (tuple(torch.FloatTensor)，可选，当传递output_hidden_states=True或config.output_hidden_states=True时返回) — 形状为(batch_size, num_channels, height, width)的torch.FloatTensor元组。
模型在每一层输出的隐藏状态以及可选的初始嵌入输出。

BitModel 的前向方法，覆盖__call__特殊方法。

虽然前向传递的步骤需要在此函数内定义，但应该在此之后调用Module实例，而不是在此处调用，因为前者会处理运行前后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoImageProcessor, BitModel
>>> import torch
>>> from datasets import load_dataset
>>> dataset = load_dataset("huggingface/cats-image")
>>> image = dataset["test"]["image"][0]
>>> image_processor = AutoImageProcessor.from_pretrained("google/bit-50")
>>> model = BitModel.from_pretrained("google/bit-50")
>>> inputs = image_processor(image, return_tensors="pt")
>>> with torch.no_grad():
...     outputs = model(**inputs)
>>> last_hidden_states = outputs.last_hidden_state
>>> list(last_hidden_states.shape)
[1, 2048, 7, 7]

BitForImageClassification

`class transformers.BitForImageClassification`

<来源>

( config )

参数

config（BitConfig） — 具有模型所有参数的模型配置类。使用配置文件初始化不会加载与模型相关的权重，只会加载配置。查看 from_pretrained()方法以加载模型权重。

BiT 模型，顶部带有图像分类头部（在池化特征之上的线性层），例如用于 ImageNet。

此模型是 PyTorch torch.nn.Module子类。将其用作常规的 PyTorch 模块，并参考 PyTorch 文档以获取有关一般用法和行为的所有相关信息。

`forward`

< source >

( pixel_values: Optional = None labels: Optional = None output_hidden_states: Optional = None return_dict: Optional = None ) → export const metadata = 'undefined';transformers.modeling_outputs.ImageClassifierOutputWithNoAttention or tuple(torch.FloatTensor)

参数

pixel_values (torch.FloatTensor，形状为(batch_size, num_channels, height, width)） — 像素值。像素值可以使用 AutoImageProcessor 获取。有关详细信息，请参阅 BitImageProcessor.call()。
output_hidden_states (bool，可选) — 是否返回所有层的隐藏状态。有关更多详细信息，请查看返回张量下的hidden_states。
return_dict (bool，可选) — 是否返回一个 ModelOutput 而不是一个普通元组。
labels (torch.LongTensor，形状为(batch_size,)，可选） — 用于计算图像分类/回归损失的标签。索引应在[0, ..., config.num_labels - 1]范围内。如果config.num_labels > 1，则计算分类损失（交叉熵）。

transformers.modeling_outputs.ImageClassifierOutputWithNoAttention 或tuple(torch.FloatTensor)

一个 transformers.modeling_outputs.ImageClassifierOutputWithNoAttention 或一个torch.FloatTensor元组（如果传递return_dict=False或者config.return_dict=False时）包含各种元素，取决于配置（BitConfig）和输入。

损失 (torch.FloatTensor，形状为(1,)，可选，当提供labels时返回) — 分类（或者如果config.num_labels==1则为回归）损失。
logits (torch.FloatTensor，形状为(batch_size, config.num_labels)） — 分类（或者如果config.num_labels==1则为回归）得分（SoftMax 之前）。
hidden_states (tuple(torch.FloatTensor)，可选，当传递output_hidden_states=True或者config.output_hidden_states=True时返回） — 形状为(batch_size, num_channels, height, width)的torch.FloatTensor元组。模型在每个阶段输出的隐藏状态（也称为特征图）。

BitForImageClassification 的前向方法，覆盖了__call__特殊方法。

虽然前向传递的步骤需要在此函数内定义，但应该在此之后调用Module实例，而不是在此处调用，因为前者会负责运行预处理和后处理步骤，而后者会默默地忽略它们。

示例：

>>> from transformers import AutoImageProcessor, BitForImageClassification
>>> import torch
>>> from datasets import load_dataset
>>> dataset = load_dataset("huggingface/cats-image")
>>> image = dataset["test"]["image"][0]
>>> image_processor = AutoImageProcessor.from_pretrained("google/bit-50")
>>> model = BitForImageClassification.from_pretrained("google/bit-50")
>>> inputs = image_processor(image, return_tensors="pt")
>>> with torch.no_grad():
...     logits = model(**inputs).logits
>>> # model predicts one of the 1000 ImageNet classes
>>> predicted_label = logits.argmax(-1).item()
>>> print(model.config.id2label[predicted_label])
tiger cat

ifierOutputWithNoAttention or tuple(torch.FloatTensor)

参数
+   `pixel_values` (`torch.FloatTensor`，形状为`(batch_size, num_channels, height, width)`） — 像素值。像素值可以使用 AutoImageProcessor 获取。有关详细信息，请参阅 BitImageProcessor.`call`()。
+   `output_hidden_states` (`bool`，*可选*) — 是否返回所有层的隐藏状态。有关更多详细信息，请查看返回张量下的`hidden_states`。
+   `return_dict` (`bool`，*可选*) — 是否返回一个 ModelOutput 而不是一个普通元组。
+   `labels` (`torch.LongTensor`，形状为`(batch_size,)`，*可选*） — 用于计算图像分类/回归损失的标签。索引应在`[0, ..., config.num_labels - 1]`范围内。如果`config.num_labels > 1`，则计算分类损失（交叉熵）。
返回
transformers.modeling_outputs.ImageClassifierOutputWithNoAttention 或`tuple(torch.FloatTensor)`
一个 transformers.modeling_outputs.ImageClassifierOutputWithNoAttention 或一个`torch.FloatTensor`元组（如果传递`return_dict=False`或者`config.return_dict=False`时）包含各种元素，取决于配置（BitConfig）和输入。
+   `损失` (`torch.FloatTensor`，形状为`(1,)`，*可选*，当提供`labels`时返回) — 分类（或者如果`config.num_labels==1`则为回归）损失。
+   `logits` (`torch.FloatTensor`，形状为`(batch_size, config.num_labels)`） — 分类（或者如果`config.num_labels==1`则为回归）得分（SoftMax 之前）。
+   `hidden_states` (`tuple(torch.FloatTensor)`，*可选*，当传递`output_hidden_states=True`或者`config.output_hidden_states=True`时返回） — 形状为`(batch_size, num_channels, height, width)`的`torch.FloatTensor`元组。模型在每个阶段输出的隐藏状态（也称为特征图）。
BitForImageClassification 的前向方法，覆盖了`__call__`特殊方法。
虽然前向传递的步骤需要在此函数内定义，但应该在此之后调用`Module`实例，而不是在此处调用，因为前者会负责运行预处理和后处理步骤，而后者会默默地忽略它们。
示例：
```py
>>> from transformers import AutoImageProcessor, BitForImageClassification
>>> import torch
>>> from datasets import load_dataset
>>> dataset = load_dataset("huggingface/cats-image")
>>> image = dataset["test"]["image"][0]
>>> image_processor = AutoImageProcessor.from_pretrained("google/bit-50")
>>> model = BitForImageClassification.from_pretrained("google/bit-50")
>>> inputs = image_processor(image, return_tensors="pt")
>>> with torch.no_grad():
...     logits = model(**inputs).logits
>>> # model predicts one of the 1000 ImageNet classes
>>> predicted_label = logits.argmax(-1).item()
>>> print(model.config.id2label[predicted_label])
tiger cat

Transformers 4.37 中文文档（六十四）（5）

Big Transfer (BiT)

概述

使用提示

资源

BitConfig

`class transformers.BitConfig`

BitImageProcessor

`class transformers.BitImageProcessor`

`preprocess`

BitModel

`class transformers.BitModel`

`forward`

BitForImageClassification

`class transformers.BitForImageClassification`

`forward`

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Transformers 4.37 中文文档（六十四）（5）

Big Transfer (BiT)

概述

使用提示

资源

BitConfig

class transformers.BitConfig

BitImageProcessor

class transformers.BitImageProcessor

preprocess

BitModel

class transformers.BitModel

forward

BitForImageClassification

class transformers.BitForImageClassification

forward

热门文章

最新文章

相关课程

相关电子书

`class transformers.BitConfig`

`class transformers.BitImageProcessor`

`preprocess`

`class transformers.BitModel`

`forward`

`class transformers.BitForImageClassification`

`forward`