【计算机视觉】DINOv2的四种模型代码示范,以 28 * 28 的图像示例(含源代码)

简介: DINOv2利用最大模型ViT-g的知识蒸馏,而不是从头开始训练,从而提高了性能。这个过程包括将知识从更大、更复杂的模型(教师)转移到更小的模型(学生)。学生模型被训练来模仿教师的输出,从而继承其优越的能力。这个过程提高了小型模型的性能,使它们更有效率。

一、ViT-S/14

import torch
import torchvision.transforms as T
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.image as mpimg 
from PIL import Image
from sklearn.decomposition import PCA
import matplotlib

patch_h = 28
patch_w = 28
feat_dim = 384 # vits14

transform = T.Compose([
    T.GaussianBlur(9, sigma=(0.1, 2.0)),
    T.Resize((patch_h * 14, patch_w * 14)),
    T.CenterCrop((patch_h * 14, patch_w * 14)),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

dinov2_vits14 = torch.hub.load('', 'dinov2_vits14',source='local').cuda()

features = torch.zeros(4, patch_h * patch_w, feat_dim)
imgs_tensor = torch.zeros(4, 3, patch_h * 14, patch_w * 14).cuda()

img_path = f'/kaggle/input/demo-image/1 (4).png'
img = Image.open(img_path).convert('RGB')
imgs_tensor[0] = transform(img)[:3]
with torch.no_grad():
    features_dict = dinov2_vits14.forward_features(imgs_tensor)
    features = features_dict['x_norm_patchtokens']

features = features.reshape(4 * patch_h * patch_w, feat_dim).cpu()
pca = PCA(n_components = 3)
pca.fit(features)
pca_features = pca.transform(features)
pca_features[:, 0] = (pca_features[:, 0] - pca_features[:, 0].min()) / (pca_features[:, 0].max() - pca_features[:, 0].min())

pca_features_fg = pca_features[:, 0] > 0.3
pca_features_bg = ~pca_features_fg

b = np.where(pca_features_bg)

## 前景
pca.fit(features[pca_features_fg])
pca_features_rem = pca.transform(features[pca_features_fg])
for i in range(3):
    pca_features_rem[:, i] = (pca_features_rem[:, i] - pca_features_rem[:, i].min()) / (pca_features_rem[:, i].max() - pca_features_rem[:, i].min())
    # 使用平均值和标准差进行变换,个人发现这种变换可以提供更好的可视化效果
    # pca_features_rem[:, i] = (pca_features_rem[:, i] - pca_features_rem[:, i].mean()) / (pca_features_rem[:, i].std() ** 2) + 0.5

pca_features_rgb = pca_features.copy()
pca_features_rgb[pca_features_fg] = pca_features_rem
pca_features_rgb[b] = 0
pca_features_rgb = pca_features_rgb.reshape(4, patch_h, patch_w, 3)
plt.imshow(pca_features_rgb[0][...,::-1])
plt.savefig('features_s14.png')
plt.show()
plt.close()

image.png

print('---s14---')
print(features)
print('---维度---')
print(features.shape)

image.png

print('---pca_features---')
print(pca_features)
print('---维度---')
print(pca_features.shape)

image.png

二、ViT-B/14

patch_h = 28
patch_w = 28
feat_dim = 768

transform = T.Compose([
    T.GaussianBlur(9, sigma=(0.1, 2.0)),
    T.Resize((patch_h * 14, patch_w * 14)),
    T.CenterCrop((patch_h * 14, patch_w * 14)),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

dinov2_vitb14 = torch.hub.load('', 'dinov2_vitb14',source='local').cuda()

features_b14 = torch.zeros(4, patch_h * patch_w, feat_dim)
imgs_tensor_b14 = torch.zeros(4, 3, patch_h * 14, patch_w * 14).cuda()

img_path = f'/kaggle/input/demo-image/1 (4).png'
img = Image.open(img_path).convert('RGB')
imgs_tensor_b14[0] = transform(img)[:3]
with torch.no_grad():
    features_dict_b14 = dinov2_vitb14.forward_features(imgs_tensor_b14)
    features_b14 = features_dict_b14['x_norm_patchtokens']

features_b14 = features_b14.reshape(4 * patch_h * patch_w, feat_dim).cpu()
pca = PCA(n_components = 3)
pca.fit(features_b14)
pca_features_b14 = pca.transform(features_b14)
pca_features_b14[:, 0] = (pca_features_b14[:, 0] - pca_features_b14[:, 0].min()) / (pca_features_b14[:, 0].max() - pca_features_b14[:, 0].min())

pca_features_fg_b14 = pca_features_b14[:, 0] > 0.3
pca_features_bg_b14 = ~pca_features_fg_b14

b = np.where(pca_features_bg_b14)
pca.fit(features_b14[pca_features_fg_b14])
pca_features_rem_b14 = pca.transform(features_b14[pca_features_fg_b14])
for i in range(3):
    pca_features_rem_b14[:, i] = (pca_features_rem_b14[:, i] - pca_features_rem_b14[:, i].min()) \
    / (pca_features_rem_b14[:, i].max() - pca_features_rem_b14[:, i].min())

pca_features_rgb_b14 = pca_features_b14.copy()
pca_features_rgb_b14[pca_features_fg_b14] = pca_features_rem_b14
pca_features_rgb_b14[b] = 0
pca_features_rgb_b14 = pca_features_rgb_b14.reshape(4, patch_h, patch_w, 3)
plt.imshow(pca_features_rgb_b14[0][...,::-1])
plt.savefig('features_b14.png')
plt.show()
plt.close()

image.png

print('---b14---')
print(features_b14)
print('---维度---')
print(features_b14.shape)

image.png

print('---pca_features_b14---')
print(pca_features_b14)
print('---维度---')
print(pca_features_b14.shape)

image.png

三、ViT-L/14

patch_h = 28
patch_w = 28
feat_dim = 1024

transform = T.Compose([
    T.GaussianBlur(9, sigma=(0.1, 2.0)),
    T.Resize((patch_h * 14, patch_w * 14)),
    T.CenterCrop((patch_h * 14, patch_w * 14)),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

dinov2_vitl14 = torch.hub.load('', 'dinov2_vitl14',source='local').cuda()

features_l14 = torch.zeros(4, patch_h * patch_w, feat_dim)
imgs_tensor_l14 = torch.zeros(4, 3, patch_h * 14, patch_w * 14).cuda()

img_path = f'/kaggle/input/demo-image/1 (4).png'
img = Image.open(img_path).convert('RGB')
imgs_tensor_l14[0] = transform(img)[:3]
with torch.no_grad():
    features_dict_l14 = dinov2_vitl14.forward_features(imgs_tensor_l14)
    features_l14 = features_dict_l14['x_norm_patchtokens']

features_l14 = features_l14.reshape(4 * patch_h * patch_w, feat_dim).cpu()
pca = PCA(n_components = 3)
pca.fit(features_l14)
pca_features_l14 = pca.transform(features_l14)
pca_features_l14[:, 0] = (pca_features_l14[:, 0] - pca_features_l14[:, 0].min()) \
/ (pca_features_l14[:, 0].max() - pca_features_l14[:, 0].min())

pca_features_fg_l14 = pca_features_l14[:, 0] > 0.3
pca_features_bg_l14 = ~pca_features_fg_l14

b = np.where(pca_features_bg_l14)
pca.fit(features_l14[pca_features_fg_l14])
pca_features_rem_l14 = pca.transform(features_l14[pca_features_fg_l14])
for i in range(3):
    pca_features_rem_l14[:, i] = (pca_features_rem_l14[:, i] - pca_features_rem_l14[:, i].min()) \
    / (pca_features_rem_l14[:, i].max() - pca_features_rem_l14[:, i].min())

pca_features_rgb_l14 = pca_features_l14.copy()
pca_features_rgb_l14[pca_features_fg_l14] = pca_features_rem_l14
pca_features_rgb_l14[b] = 0
pca_features_rgb_l14 = pca_features_rgb_l14.reshape(4, patch_h, patch_w, 3)
plt.imshow(pca_features_rgb_l14[0][...,::-1])
plt.savefig('features_l14.png')
plt.show()
plt.close()

image.png

print('---l14---')
print(features_l14)
print('---维度---')
print(features_l14.shape)

image.png

print('---pca_features_l14---')
print(pca_features_l14)
print('---维度---')
print(pca_features_l14.shape)

image.png

四、ViT-g/14

patch_h = 28
patch_w = 28
feat_dim = 1536

transform = T.Compose([
    T.GaussianBlur(9, sigma=(0.1, 2.0)),
    T.Resize((patch_h * 14, patch_w * 14)),
    T.CenterCrop((patch_h * 14, patch_w * 14)),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

dinov2_vitg14 = torch.hub.load('', 'dinov2_vitg14',source='local').cuda()

features_g14 = torch.zeros(4, patch_h * patch_w, feat_dim)
imgs_tensor_g14 = torch.zeros(4, 3, patch_h * 14, patch_w * 14).cuda()

img_path = f'/kaggle/input/demo-image/1 (4).png'
img = Image.open(img_path).convert('RGB')
imgs_tensor_g14[0] = transform(img)[:3]
with torch.no_grad():
    features_dict_g14 = dinov2_vitg14.forward_features(imgs_tensor_g14)
    features_g14 = features_dict_g14['x_norm_patchtokens']

features_g14 = features_g14.reshape(4 * patch_h * patch_w, feat_dim).cpu()
pca = PCA(n_components = 3)
pca.fit(features_g14)
pca_features_g14 = pca.transform(features_g14)
pca_features_g14[:, 0] = (pca_features_g14[:, 0] - pca_features_g14[:, 0].min()) \
/ (pca_features_g14[:, 0].max() - pca_features_g14[:, 0].min())

pca_features_fg_g14 = pca_features_g14[:, 0] > 0.3
pca_features_bg_g14 = ~pca_features_fg_g14

b = np.where(pca_features_bg_g14)

pca.fit(features_g14[pca_features_fg_g14])
pca_features_rem_g14 = pca.transform(features_g14[pca_features_fg_g14])
for i in range(3):
    pca_features_rem_g14[:, i] = (pca_features_rem_g14[:, i] - pca_features_rem_g14[:, i].min()) \
    / (pca_features_rem_g14[:, i].max() - pca_features_rem_g14[:, i].min())

pca_features_rgb_g14 = pca_features_g14.copy()
pca_features_rgb_g14[pca_features_fg_g14] = pca_features_rem_g14
pca_features_rgb_g14[b] = 0
pca_features_rgb_g14 = pca_features_rgb_g14.reshape(4, patch_h, patch_w, 3)
plt.imshow(pca_features_rgb_g14[0][...,::-1])
plt.savefig('features_g14.png')
plt.show()
plt.close()

image.png

print('---g14---')
print(features_g14)
print('---维度---')
print(features_g14.shape)

image.png

print('---pca_features_g14---')
print(pca_features_g14)
print('---维度---')
print(pca_features_g14.shape)

image.png

相关文章
|
3月前
|
人工智能 测试技术 API
AI计算机视觉笔记二十 九:yolov10竹签模型,自动数竹签
本文介绍了如何在AutoDL平台上搭建YOLOv10环境并进行竹签检测与计数。首先从官网下载YOLOv10源码并创建虚拟环境,安装依赖库。接着通过官方模型测试环境是否正常工作。然后下载自定义数据集并配置`mycoco128.yaml`文件,使用`yolo detect train`命令或Python代码进行训练。最后,通过命令行或API调用测试训练结果,并展示竹签计数功能。如需转载,请注明原文出处。
|
2月前
|
机器学习/深度学习 PyTorch 算法框架/工具
聊一聊计算机视觉中常用的注意力机制以及Pytorch代码实现
本文介绍了几种常用的计算机视觉注意力机制及其PyTorch实现,包括SENet、CBAM、BAM、ECA-Net、SA-Net、Polarized Self-Attention、Spatial Group-wise Enhance和Coordinate Attention等,每种方法都附有详细的网络结构说明和实验结果分析。通过这些注意力机制的应用,可以有效提升模型在目标检测任务上的性能。此外,作者还提供了实验数据集的基本情况及baseline模型的选择与实验结果,方便读者理解和复现。
36 0
聊一聊计算机视觉中常用的注意力机制以及Pytorch代码实现
|
3月前
|
机器学习/深度学习 编解码 自动驾驶
计算机视觉之图像到图像的翻译
图像到图像的翻译(Image-to-Image Translation)是指将一种图像从一种表示转换为另一种表示的过程。该任务的目标是在保证图像语义信息的前提下,将图像风格、颜色或其他视觉特征进行转换。该技术在计算机视觉领域具有广泛应用,例如图像风格迁移、图像修复、图像增强、超分辨率、语义分割等。
59 4
|
2月前
|
计算机视觉 Python
计算机视觉---数字图像代码示例
计算机视觉---数字图像代码示例
49 0
|
3月前
|
人工智能 测试技术 PyTorch
AI计算机视觉笔记二十四:YOLOP 训练+测试+模型评估
本文介绍了通过正点原子的ATK-3568了解并实现YOLOP(You Only Look Once for Panoptic Driving Perception)的过程,包括训练、测试、转换为ONNX格式及在ONNX Runtime上的部署。YOLOP由华中科技大学团队于2021年发布,可在Jetson TX2上达到23FPS,实现了目标检测、可行驶区域分割和车道线检测的多任务学习。文章详细记录了环境搭建、训练数据准备、模型转换和测试等步骤,并解决了ONNX转换过程中的问题。
|
3月前
|
人工智能 计算机视觉
AI计算机视觉笔记十五:编写检测的yolov5测试代码
该文为原创文章,如需转载,请注明出处。本文作者在成功运行 `detect.py` 后,因代码难以理解而编写了一个简易测试程序,用于加载YOLOv5模型并检测图像中的对象,特别是“人”类目标。代码实现了从摄像头或图片读取帧、进行颜色转换,并利用YOLOv5进行推理,最后将检测框和置信度绘制在输出图像上,并保存为 `result.jpg`。如果缺少某些模块,可使用 `pip install` 安装。如涉及版权问题或需获取完整代码,请联系作者。
|
4月前
|
机器学习/深度学习 算法 大数据
【2023年MathorCup高校数学建模挑战赛-大数据竞赛】赛道A:基于计算机视觉的坑洼道路检测和识别 python 代码解析
本文提供了2023年MathorCup高校数学建模挑战赛大数据竞赛赛道A的解决方案,涉及基于计算机视觉的坑洼道路检测和识别任务,包括数据预处理、特征提取、模型建立、训练与评估等步骤的Python代码解析。
80 0
【2023年MathorCup高校数学建模挑战赛-大数据竞赛】赛道A:基于计算机视觉的坑洼道路检测和识别 python 代码解析
|
5月前
|
自然语言处理 监控 自动驾驶
大模型在自然语言处理(NLP)、计算机视觉(CV)和多模态模型等领域应用最广
【7月更文挑战第26天】大模型在自然语言处理(NLP)、计算机视觉(CV)和多模态模型等领域应用最广
219 11
|
6月前
|
编解码 机器人 测试技术
2024年6月计算机视觉论文推荐:扩散模型、视觉语言模型、视频生成等
6月还有一周就要结束了,我们今天来总结2024年6月上半月发表的最重要的论文,重点介绍了计算机视觉领域的最新研究和进展。
146 8
|
5月前
|
机器学习/深度学习 人工智能 自然语言处理
计算机视觉借助深度学习实现了革命性进步,从图像分类到复杂场景理解,深度学习模型如CNN、RNN重塑了领域边界。
【7月更文挑战第2天】计算机视觉借助深度学习实现了革命性进步,从图像分类到复杂场景理解,深度学习模型如CNN、RNN重塑了领域边界。AlexNet开启新时代,后续模型不断优化,推动对象检测、语义分割、图像生成等领域发展。尽管面临数据隐私、模型解释性等挑战,深度学习已广泛应用于安防、医疗、零售和农业,预示着更智能、高效的未来,同时也强调了技术创新、伦理考量的重要性。
66 1

热门文章

最新文章