图像嵌入(Image Embedding

简介: 机器学习中的图像嵌入(Image Embedding)是一种将图像数据转化为连续的、低维度的向量表示的方法,这些向量表示通常用于后续的机器学习任务,如分类、聚类、检索等。图像嵌入的目的是将高维度的图像数据转化为更易于处理的低维度数据,同时保留尽可能多的原始图像信息。常用的图像嵌入方法包括:

机器学习中的图像嵌入(Image Embedding)是一种将图像数据转化为连续的、低维度的向量表示的方法,这些向量表示通常用于后续的机器学习任务,如分类、聚类、检索等。图像嵌入的目的是将高维度的图像数据转化为更易于处理的低维度数据,同时保留尽可能多的原始图像信息。常用的图像嵌入方法包括:

  1. 基于深度学习的方法:这类方法通过构建深度神经网络(如卷积神经网络、循环神经网络等)来学习图像的低维度表示。这类方法通常需要大量的训练数据和计算资源,但可以获得较好的性能。
  2. 基于手工设计的特征提取方法:这类方法通过设计一些特定的特征提取算法(如 SIFT、HOG、LBP 等)来提取图像的局部特征,然后将这些特征组合成低维度的向量表示。这类方法相对简单,但对某些任务可能无法获得足够好的性能。
    要使用图像嵌入方法,一般需要进行以下步骤:
  3. 数据准备:收集并预处理图像数据,为模型提供训练样本。
  4. 模型构建:选择合适的图像嵌入方法,搭建相应的模型。
  5. 训练模型:利用收集到的数据对模型进行训练,通过优化损失函数(如均方误差、交叉熵等)来学习模型参数。
  6. 评估模型:使用验证集对模型进行评估,根据评估结果调整模型参数以提高性能。
  7. 应用模型:将训练好的模型应用于实际任务,例如图像分类、聚类、检索等。
    总之,图像嵌入是一种将图像数据转化为低维度向量表示的方法,可以应用于各种机器学习任务。通过数据准备、模型构建、训练和评估等步骤,可以利用图像嵌入解决实际问题。

Ch 12: Concept 02
Image embedding
The VGG-16 TensorFlow port is by Davi Frossard (http://www.cs.toronto.edu/~frossard/post/vgg16/).

Along with TensorFlow, it requires the following libraries:

$ pip install scipy
$ pip install Pillow
You will need to download the model parameters

$ wget https://www.cs.toronto.edu/~frossard/vgg16/vgg16_weights.npz
########################################################################################
# Davi Frossard, 2016                                                                  #
# VGG16 implementation in TensorFlow                                                   #
# Details:                                                                             #
# http://www.cs.toronto.edu/~frossard/post/vgg16/                                      #
#                                                                                      #
# Model from https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-readme-md     #
# Weights from Caffe converted using https://github.com/ethereon/caffe-tensorflow      #
########################################################################################

%matplotlib inline
from matplotlib import pyplot as plt

import tensorflow as tf
import numpy as np
from scipy.misc import imread, imresize
from imagenet_classes import class_names


class vgg16:
    def __init__(self, imgs, weights=None, sess=None):
        self.imgs = imgs
        tf.summary.image("imgs", self.imgs)
        self.convlayers()
        self.fc_layers()
        tf.summary.histogram("fc2", self.fc2)
        self.probs = tf.nn.softmax(self.fc3l)
        if weights is not None and sess is not None:
            self.load_weights(weights, sess)


    def convlayers(self):
        self.parameters = []

        # zero-mean input
        with tf.name_scope('preprocess') as scope:
            mean = tf.constant([123.68, 116.779, 103.939], dtype=tf.float32, shape=[1, 1, 1, 3], name='img_mean')
            images = self.imgs-mean

        # conv1_1
        with tf.name_scope('conv1_1') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 3, 64], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv1_1 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # conv1_2
        with tf.name_scope('conv1_2') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 64, 64], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.conv1_1, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv1_2 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # pool1
        self.pool1 = tf.nn.max_pool(self.conv1_2,
                               ksize=[1, 2, 2, 1],
                               strides=[1, 2, 2, 1],
                               padding='SAME',
                               name='pool1')

        # conv2_1
        with tf.name_scope('conv2_1') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 64, 128], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.pool1, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[128], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv2_1 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # conv2_2
        with tf.name_scope('conv2_2') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 128, 128], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.conv2_1, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[128], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv2_2 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # pool2
        self.pool2 = tf.nn.max_pool(self.conv2_2,
                               ksize=[1, 2, 2, 1],
                               strides=[1, 2, 2, 1],
                               padding='SAME',
                               name='pool2')

        # conv3_1
        with tf.name_scope('conv3_1') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 128, 256], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.pool2, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv3_1 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # conv3_2
        with tf.name_scope('conv3_2') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.conv3_1, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv3_2 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # conv3_3
        with tf.name_scope('conv3_3') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.conv3_2, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv3_3 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # pool3
        self.pool3 = tf.nn.max_pool(self.conv3_3,
                               ksize=[1, 2, 2, 1],
                               strides=[1, 2, 2, 1],
                               padding='SAME',
                               name='pool3')

        # conv4_1
        with tf.name_scope('conv4_1') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 512], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.pool3, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv4_1 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # conv4_2
        with tf.name_scope('conv4_2') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 512, 512], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.conv4_1, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv4_2 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # conv4_3
        with tf.name_scope('conv4_3') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 512, 512], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.conv4_2, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv4_3 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # pool4
        self.pool4 = tf.nn.max_pool(self.conv4_3,
                               ksize=[1, 2, 2, 1],
                               strides=[1, 2, 2, 1],
                               padding='SAME',
                               name='pool4')

        # conv5_1
        with tf.name_scope('conv5_1') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 512, 512], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.pool4, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv5_1 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # conv5_2
        with tf.name_scope('conv5_2') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 512, 512], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.conv5_1, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv5_2 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # conv5_3
        with tf.name_scope('conv5_3') as scope:
            kernel = tf.Variable(tf.truncated_normal([3, 3, 512, 512], dtype=tf.float32,
                                                     stddev=1e-1), name='weights')
            conv = tf.nn.conv2d(self.conv5_2, kernel, [1, 1, 1, 1], padding='SAME')
            biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32),
                                 trainable=True, name='biases')
            out = tf.nn.bias_add(conv, biases)
            self.conv5_3 = tf.nn.relu(out, name=scope)
            self.parameters += [kernel, biases]

        # pool5
        self.pool5 = tf.nn.max_pool(self.conv5_3,
                               ksize=[1, 2, 2, 1],
                               strides=[1, 2, 2, 1],
                               padding='SAME',
                               name='pool4')

    def fc_layers(self):
        # fc1
        with tf.name_scope('fc1') as scope:
            shape = int(np.prod(self.pool5.get_shape()[1:]))
            fc1w = tf.Variable(tf.truncated_normal([shape, 4096],
                                                         dtype=tf.float32,
                                                         stddev=1e-1), name='weights')
            fc1b = tf.Variable(tf.constant(1.0, shape=[4096], dtype=tf.float32),
                                 trainable=True, name='biases')
            pool5_flat = tf.reshape(self.pool5, [-1, shape])
            fc1l = tf.nn.bias_add(tf.matmul(pool5_flat, fc1w), fc1b)
            self.fc1 = tf.nn.relu(fc1l)
            self.parameters += [fc1w, fc1b]

        # fc2
        with tf.name_scope('fc2') as scope:
            fc2w = tf.Variable(tf.truncated_normal([4096, 4096],
                                                         dtype=tf.float32,
                                                         stddev=1e-1), name='weights')
            fc2b = tf.Variable(tf.constant(1.0, shape=[4096], dtype=tf.float32),
                                 trainable=True, name='biases')
            fc2l = tf.nn.bias_add(tf.matmul(self.fc1, fc2w), fc2b)
            self.fc2 = tf.nn.relu(fc2l)
            self.parameters += [fc2w, fc2b]

        # fc3
        with tf.name_scope('fc3') as scope:
            fc3w = tf.Variable(tf.truncated_normal([4096, 1000],
                                                         dtype=tf.float32,
                                                         stddev=1e-1), name='weights')
            fc3b = tf.Variable(tf.constant(1.0, shape=[1000], dtype=tf.float32),
                                 trainable=True, name='biases')
            self.fc3l = tf.nn.bias_add(tf.matmul(self.fc2, fc3w), fc3b)
            self.parameters += [fc3w, fc3b]

    def load_weights(self, weight_file, sess):
        weights = np.load(weight_file)
        keys = sorted(weights.keys())
        for i, k in enumerate(keys):
            print(i, k, np.shape(weights[k]))
            sess.run(self.parameters[i].assign(weights[k]))

if __name__ == '__main__':
    sess = tf.Session()

    imgs = tf.placeholder(tf.float32, [None, 224, 224, 3])

    print('Loading model...')
    vgg = vgg16(imgs, 'vgg16_weights.npz', sess)
    print('Done loading!')

    my_summaries = tf.summary.merge_all()
    my_writer = tf.summary.FileWriter('tb_files', sess.graph)

    img1 = imread('laska.png', mode='RGB')
    img1 = imresize(img1, (224, 224))

    plt.imshow(img1)
    plt.title('Input 224x244 image')
    plt.show()

    prob, fc2_val, my_summaries_protobuf = sess.run([vgg.probs, vgg.fc2, my_summaries], feed_dict={vgg.imgs: [img1]})
    prob = prob[0]
    my_writer.add_summary(my_summaries_protobuf)

    num_dimensions = np.shape(fc2_val)[1]
    plt.bar(range(num_dimensions), fc2_val[0], align='center')
    plt.title('{}-dimensional representation of image'.format(num_dimensions))
    plt.show()

    print('Top 5 predictions of VGG-16 model:')
    preds = (np.argsort(prob)[::-1])[0:5]
    for idx, p in enumerate(preds):
        print('{}. {} ({})'.format(idx + 1, class_names[p], prob[p]))
    sess.close()

Loading model...
0 conv1_1_W (3, 3, 3, 64)
1 conv1_1_b (64,)
2 conv1_2_W (3, 3, 64, 64)
3 conv1_2_b (64,)
4 conv2_1_W (3, 3, 64, 128)
5 conv2_1_b (128,)
6 conv2_2_W (3, 3, 128, 128)
7 conv2_2_b (128,)
8 conv3_1_W (3, 3, 128, 256)
9 conv3_1_b (256,)
10 conv3_2_W (3, 3, 256, 256)
11 conv3_2_b (256,)
12 conv3_3_W (3, 3, 256, 256)
13 conv3_3_b (256,)
14 conv4_1_W (3, 3, 256, 512)
15 conv4_1_b (512,)
16 conv4_2_W (3, 3, 512, 512)
17 conv4_2_b (512,)
18 conv4_3_W (3, 3, 512, 512)
19 conv4_3_b (512,)
20 conv5_1_W (3, 3, 512, 512)
21 conv5_1_b (512,)
22 conv5_2_W (3, 3, 512, 512)
23 conv5_2_b (512,)
24 conv5_3_W (3, 3, 512, 512)
25 conv5_3_b (512,)
26 fc6_W (25088, 4096)
27 fc6_b (4096,)
28 fc7_W (4096, 4096)
29 fc7_b (4096,)
30 fc8_W (4096, 1000)
31 fc8_b (1000,)
Done loading!


Top 5 predictions of VGG-16 model:
1. weasel (0.6933859586715698)
2. polecat, fitch, foulmart, foumart, Mustela putorius (0.1753876656293869)
3. mink (0.12208586186170578)
4. black-footed ferret, ferret, Mustela nigripes (0.008870664052665234)
5. otter (0.00012108328519389033)
目录
相关文章
|
2月前
|
人工智能 自然语言处理 开发工具
通过ModelScope开源Embedding模型将文本转换为向量
本文介绍如何通过ModelScope魔搭社区中的文本向量开源模型将文本转换为向量,并入库至向量检索服务DashVector中进行向量检索。
|
4月前
|
存储 人工智能 测试技术
图像相似度比较之 CLIP or DINOv2
图像相似度比较之 CLIP or DINOv2
|
1月前
|
机器学习/深度学习 编解码 数据可视化
图像恢复SwinIR: Image Restoration Using Swin Transformer
图像恢复SwinIR: Image Restoration Using Swin Transformer
72 2
|
9月前
|
机器学习/深度学习 自然语言处理 文字识别
【计算机视觉】CLIP:连接文本和图像(关于CLIP的一些补充说明)
我们推出了一个名为CLIP的神经网络,它可以有效地从自然语言监督中学习视觉概念。CLIP可以应用于任何视觉分类基准,只需提供要识别的视觉类别名称,类似于GPT-2和GPT-3的“零样本”功能。
|
4月前
|
存储 编解码 数据库
基于文本嵌入和 CLIP 图像嵌入的多模态检索
基于文本嵌入和 CLIP 图像嵌入的多模态检索
153 0
|
8月前
|
API 计算机视觉 索引
【COCO数据集】COCO API 解析图像数据和目标标签,vision-transformer DETR的相关transforms操作实现
【COCO数据集】COCO API 解析图像数据和目标标签,vision-transformer DETR的相关transforms操作实现
276 0
|
11月前
|
机器学习/深度学习 编解码 算法
图像目标分割_6 Mask RCNN
目标检测和语义分割的效果在短时间内得到了很大的改善。在很大程度上,这些进步是由强大的基线系统驱动的,例如,分别用于目标检测和语义分割的Fast/Faster R-CNN和全卷积网络(FCN)框架。这些方法在概念上是直观的,提供灵活性和鲁棒性,以及快速的训练和推理。论文作者在这项工作中的目标是为目标分割开发一个相对有力的框架。
142 0
|
机器学习/深度学习 编解码 计算机视觉
Text to image论文精读 StackGAN:Text to Photo-realistic Image Synthesis with Stacked GAN具有堆叠生成对抗网络文本到图像合成
本篇文章提出了叠加生成对抗网络(StackGAN)与条件增强,用于从文本合成现实图像,被2017年ICCV(International Conference on Computer Vision)会议录取。 论文地址: https://arxiv.org/pdf/1612.03242.pdf 代码地址: https://github.com/hanzhanggit/StackGAN 本篇是精读这篇论文的报告,包含一些个人理解、知识拓展和总结。
Text to image论文精读 StackGAN:Text to Photo-realistic Image Synthesis with Stacked GAN具有堆叠生成对抗网络文本到图像合成
|
机器学习/深度学习 传感器 编解码
Text to image论文精读 GAN-CLS和GAN-INT:Generative Adversarial Text to Image Synthesis生成性对抗性文本图像合成(文本生成图像)
这是一篇用GAN做文本生成图像(Text to Image、T2I)的论文,文章在2016年由Reed等人发布,被ICML会议录取。可以说是用GAN做文本生成图像的开山之作。 论文链接:https://arxiv.org/pdf/1605.05396.pdf 代码链接: https://github.com/zsdonghao/text-to-image 本篇文章是精读这篇论文的报告,包含一些个人理解、知识拓展和总结。
Text to image论文精读 GAN-CLS和GAN-INT:Generative Adversarial Text to Image Synthesis生成性对抗性文本图像合成(文本生成图像)
|
机器学习/深度学习 编解码 异构计算
Text to image论文精读 StackGAN++: Realistic Image Synthesis with Stacked GAN(具有堆叠式生成对抗网络的逼真的图像合成)
这篇文章主要工作是:将原先的Stack GAN的两阶段的堆叠结构改为了树状结构。包含有多个生成器和判别器,它们的分布像一棵树的结构一样,并且每个生成器产生的样本分辨率不一样。另外对网络结构也进行了改进。 文章被2017年ICCV(International Conference on Computer Vision)会议录取。 论文地址: https://arxiv.org/pdf/1710.10916v3.pdf 代码地址: https://github.com/hanzhanggit/StackGAN-v2
Text to image论文精读 StackGAN++: Realistic Image Synthesis with Stacked GAN(具有堆叠式生成对抗网络的逼真的图像合成)