VGG16图像分类基于tensorflow实现,主要包含以下四个程序:
- vgg16.py:读入模型参数构建模型
- utils.py:读入图片,概率显示
- nclasses.py:含labels字典
- app.py:应用程序,实现图像识别
1. vgg16.py 构建模型
程序结构如下:
(1) __init __
加载网络参数到data_dict
def __init__(self, vgg16_path=None): if vgg16_path is None: vgg16_path = os.path.join(os.getcwd(), "vgg16.npy") self.data_dict = np.load(vgg16_path, encoding='latin1').item()
字典key的列表如下所示,分别对应13个卷积层以及3个全连接层的参数W 和偏置b。
['conv1_1_W', 'conv1_1_b', 'conv1_2_W', 'conv1_2_b', 'conv2_1_W', 'conv2_1_b', 'conv2_2_W', 'conv2_2_b', 'conv3_1_W', 'conv3_1_b', 'conv3_2_W', 'conv3_2_b', 'conv3_3_W', 'conv3_3_b', 'conv4_1_W', 'conv4_1_b', 'conv4_2_W', 'conv4_2_b', 'conv4_3_W', 'conv4_3_b', 'conv5_1_W', 'conv5_1_b', 'conv5_2_W', 'conv5_2_b', 'conv5_3_W', 'conv5_3_b', 'fc6_W', 'fc6_b', 'fc7_W', 'fc7_b', 'fc8_W', 'fc8_b']
(2) forward
复现网络结构
def forward(self, images): rgb_scaled = images * 255.0 #RGB 转化为 BGR格式 red, green, blue = tf.split(rgb_scaled,3,3) bgr = tf.concat([ blue - VGG_MEAN[0], green - VGG_MEAN[1], red - VGG_MEAN[2]],3) self.conv1_1 = self.conv_layer(bgr, "conv1_1") self.conv1_2 = self.conv_layer(self.conv1_1, "conv1_2") self.pool1 = self.max_pool_2x2(self.conv1_2, "pool1") self.conv2_1 = self.conv_layer(self.pool1, "conv2_1") self.conv2_2 = self.conv_layer(self.conv2_1, "conv2_2") self.pool2 = self.max_pool_2x2(self.conv2_2, "pool2") self.conv3_1 = self.conv_layer(self.pool2, "conv3_1") self.conv3_2 = self.conv_layer(self.conv3_1, "conv3_2") self.conv3_3 = self.conv_layer(self.conv3_2, "conv3_3") self.pool3 = self.max_pool_2x2(self.conv3_3, "pool3") self.conv4_1 = self.conv_layer(self.pool3, "conv4_1") self.conv4_2 = self.conv_layer(self.conv4_1, "conv4_2") self.conv4_3 = self.conv_layer(self.conv4_2, "conv4_3") self.pool4 = self.max_pool_2x2(self.conv4_3, "pool4") self.conv5_1 = self.conv_layer(self.pool4, "conv5_1") self.conv5_2 = self.conv_layer(self.conv5_1, "conv5_2") self.conv5_3 = self.conv_layer(self.conv5_2, "conv5_3") self.pool5 = self.max_pool_2x2(self.conv5_3, "pool5") self.fc6 = self.fc_layer(self.pool5, "fc6") self.relu6 = tf.nn.relu(self.fc6) self.fc7 = self.fc_layer(self.relu6, "fc7") self.relu7 = tf.nn.relu(self.fc7) self.fc8 = self.fc_layer(self.relu7, "fc8") self.prob = tf.nn.softmax(self.fc8, name="prob") self.data_dict = None
注:需要将图片由RGB 转化为BGR格式,这主要因为opencv
默认通道是bgr的,这是为兼容某些硬件的遗留问题。
- RGB代表红绿蓝。R在高位,G在中间,B在低位。
- BGR是相同的,除了区域顺序颠倒。
卷积层
def conv_layer(self, x, name): with tf.variable_scope(name): w = self.get_conv_filter(name) conv = tf.nn.conv2d(x, w, [1, 1, 1, 1], padding='SAME') conv_biases = self.get_bias(name) result = tf.nn.relu(tf.nn.bias_add(conv, conv_biases)) return result
池化层
def max_pool_2x2(self, x, name): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME
全连接层
def fc_layer(self, x, name): with tf.variable_scope(name): shape = x.get_shape().as_list() dim = 1 for i in shape[1:]: dim *= i x = tf.reshape(x, [-1, dim]) w = self.get_fc_weight(name) b = self.get_bias(name) result = tf.nn.bias_add(tf.matmul(x, w), b) return result
2. utils.py 处理图片
将图片处理称为1 × 224 × 224 × 3 格式
3. nclasses.py 字典
格式如下:
0: 'tench\n Tinca tinca', 1: 'goldfish\n Carassius auratus', 2: 'great white shark\n white shark\n man-eater\n man-eating shark\n Carcharodon carcharias', 3: 'tiger shark\n Galeocerdo cuvieri', 4: 'hammerhead\n hammerhead shark', 5: 'electric ray\n crampfish\n numbfish\n torpedo',
4. app.py 主应用程序
识别程序如下:
with tf.Session() as sess: images = tf.placeholder(tf.float32, [1, 224, 224, 3]) #通过vgg16的初始化函数 实例化vgg,读出了保存在npy文件中的模型参数 vgg = vgg16.Vgg16() vgg.forward(images) #复现神经网络结构 # 得出1000个分类的概率分布 probability = sess.run(vgg.prob, feed_dict={images:img_ready}) #概率最高的5个 概率索引值存入top5 top5 = np.argsort(probability[0])[-1:-6:-1] print("top5:",top5) values = [] bar_label = [] #标签字典对应的值 5个物种的名称 for n, i in enumerate(top5): print("n:",n) print("i:",i) values.append(probability[0][i]) bar_label.append(labels[i]) print(i, ":", labels[i], "----", utils.percent(probability[0][i]) )