tfrecords 文件存储|学习笔记-阿里云开发者社区

tfrecords 文件存储|学习笔记

2022-01-12 224

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 快速学习 tfrecords 文件存储

开发者学堂课程【深度学习框架TensorFlow入门：tfrecords 文件存储】学习笔记，与课程紧密联系，让用户快速学习知识。

课程地址：https://developer.aliyun.com/learning/course/773/detail/13556

tfrecords 文件存储

内容介绍：

一、什么是 TFRecords 文件

二、Example 结构解析

三、案例：CIFAR10 数据存入 TFRecords 文件

一、什么是 TFRecords 文件

TFRecords 其实是一种二进制文件，虽然它不如其他格式好理解，但是它能更好的利用内存，更方便复制和移动，并且不需要单独的标签文件。

比如说我们之前接触的数据集都有训练集和测试集，其中都有分开的特征值和目标值，而 TFRecords 文件的特征值和目标值是绑定在一起的。

使用步骤：

1）获取数据

2）将数据填入到 Example 协议内存块（protocol buffer）

3）将协议内存块序列化为字符串，并且通过

tf.python_to.TFRecordWriter 写入到 TFRecords 文件。

文件格式 *.tfrecords

二、Example结构解析

cifar10

特征值 - image - 3072个字节

目标值 - label - 1个字节

Example：

features{

feature{

key:＂image＂

value{

bytes_list{

value:”\377\374\375\372\356\351\365\31\350\356\352\350”

}

feature{

key:＂lable＂

value{

int64_list{

value:9

}

tf.train.Example 协议内存块(protocol buffer)(协议内存块包含了字段 Features )

Features 包含了一个 Feature 字段

Features 中包含要写入的数据、并指明数据类型。

这是一个样本的结构，批数据需要循环存入这样的结构

example=tf.train.Example(features=tf.train.Features(features={

“image”:tf.train.Feature(bytes_list=tf.train.BytesList(value=[image])，

“label”:tf.train.Feature(int64_list=tf.train.Int64List(value=[label]))

}))

exmaple.SerializeToString()

tf.train.Example(features=None)

写入 tfrecords 文件

features ： tf.train.Features 类型的特征实例

return ： example 格式协议块

tf.train.Features(feature=None)

构建每个样本的信息键值对

feature ：字典数据， key 为要保存的名字

value 为 tf.train.Feature 实例

return ： Features 类型

tf.train.Feature(options)

options ：例如

bytes_list=tf.train.BytesList(value=[Bytes])

int64_list=tf.train.int64List(value=[Value])

支持存入的类型如下

tf.train.int64List(value=[Value])

tf.train.BytesList(value=[Bytes])

tf.train.FloatList(value=[value])

三、案例：CIFAR10数据存入 TFRecords 文件

1、流程分析

构造存储实例， tf.python_io.TFRecordWrite(path)

写入 tfrecords 文件

path ： TFRecords 文件的路径

return ：写文件

method 方法

write(record):向文件中写入一个 example

close():关闭文件写入器

循环将数据填入到 Example 协议内存块 (protocol buffer)

2、代码

import tensorflow as tf

class Cifar(object):

def __init__(self):

# 初始化操作

self.height = 32

self.width = 32

self.channels = 3

# 设置图像字节数

self.image_bytes = self.height * self.width * self.channels

self.label_bytes = 1

self.all_bytes = self.label_bytes + self.image_bytes

def read_and_decode(self):

＂＂＂

读取二进制文件

:return:

＂＂＂

# 1、构造文件名队列

file_queue = tf.train.string_input_producer(file_list)

# 2、读取与解码

# 读取阶段

reader = tf.FixedLengthRecordReader(self.all_bytes)

# key 文件名 value 一个样本

key,value = reader,read(file_queue)

print(＂key:\n＂,key)

print(＂value:\n＂,value)

# 解码阶段

decoded = tf.decode_raw(value,tf.uint8)

print(＂decoded:\n＂,decoded)

# 将目标值和特征值切片切开

label=tf.slice(decoded,[0],[self.label_bytes])

tf.slice(decoded,[self.label_bytes],[self.image_bytes])

print(＂label:\n＂,label)

print(＂image:\n＂,image)

# 调整图片形状

image_reshaped=tf.reshaped(image,shape=[self,channels,self.heighy,self.width])

print(＂image_reshaped:\n＂,image_reshaped)

# 转置，将图片的顺序转为 height，width，channels

image_transposed=tf.transpose(image_reshaped,[1,2,0])

print(＂image_transposed:\n＂,image_transposed)

# 调整图像类型

image_cast=tf.cast(image_transposed,tf.float32)

# 3、批处理

label_batch,image_batch=tf.train.batch([label,image_cast],batch_size=100,num_threads=1,capacity=100)

# 开启会话

with tf.Session() as sess:

# 开启线程

coord = tf.train.Coordinator()

threads=tf.train.start_queue_runners(sess=sess,coord=coord)

key_new,value_new,decoded_new,label_new,image_new,image_reshaped_new,image_transposed_new=sess.run([key,value,decoded,label,image_reshaped,image_transposed])

label_value,image_value=sess.run(label_batch,image_batch)

print(＂key_new:\n＂,key_new)

print(＂value_new:\n＂,value_new)

print(＂decoded_new:\n＂,decoded_new)

print(＂label_new:\n＂,label_new)

print(＂image_new:\n＂,image_new)

print(＂image_reshaped_new:\n＂,image_reshaped_new)

print(＂image_transposed_new:\n＂,image_transposed_new)

print(＂label_value:\n＂,label_value)

print(＂image_value:\n＂,image_value)

# 回收线程

coord.request_stop()

coord.join(threads)

return image_value,label_value

#写入

def write_to_tfrecords(self,image_batch,label_batch):

＂＂＂

将样本的特征值和目标值一起写入 tfrecords 文件

:param image:

:param label:

:return:

＂＂＂

with tf.python_io.TFRecordWriter(＂cifar10.tfrecords＂) as writer:

# 循环构造 example 对象，并序列化写入文件

for i in range(100):

image = image_batch[i].tostring()

label = label_batch[i][0]

# print(＂tfrecords_image:\n＂,image)

# print(＂tfrecords_label:\n＂,label)

example=tf.train.Example(features=tf.train.Features(features={

“image”:tf.train.Feature(bytes_list=tf.train.BytesList(value=[image])，

“label”:tf.train.Feature(int64_list=tf.train.Int64List(value=[label]))

}))

# example.SerializeToString

# 将序列化后的 example 写入文件

writer.write(example.SerializeToString())

return None

if __name__ == ＂__main__＂:

cifar = Cifar()

image_value,label_value = cifar.read_binary()

cifar.write_to_tfrecords(image_value,label_value)