ubyte 数据集的制作【2月更文挑战第07天】

2025-02-07 339

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 【2月更文挑战第07天】

t10k-images-idx3-ubyte
将文件夹中的图片处理成二进制数据集（如 MNIST 数据集的格式）是一个常见的需求，特别是在需要将数据集用于某些特定的框架或工具时。以下是一个详细的步骤指南，展示如何将文件夹中的图片和对应的标签转换为二进制格式。
步骤 1：准备数据
假设你的数据集结构如下：
复制
dataset/
├── train/
│ ├── 0/
│ ├── 1/
│ ├── 2/
│ └── ...
└── val/
├── 0/
├── 1/
├── 2/
└── ...
每个文件夹中的图片对应一个类别。
步骤 2：读取图片和标签
使用 Python 的 PIL 或 OpenCV 库读取图片，并将图片和标签存储到内存中。
Python复制
import os
from PIL import Image
import numpy as np

def load_images_and_labels(data_dir, target_size=(28, 28)):
images = []
labels = []
for label, folder in enumerate(sorted(os.listdir(data_dir))):
folder_path = os.path.join(data_dir, folder)
if not os.path.isdir(folder_path):
continue
for filename in os.listdir(folder_path):
if filename.endswith('.png') or filename.endswith('.jpg'):
image_path = os.path.join(folder_path, filename)
image = Image.open(image_path).convert('L') # 转换为灰度图
image = image.resize(target_size) # 调整大小
images.append(np.array(image, dtype=np.uint8))
labels.append(label)
return np.array(images), np.array(labels)

示例

train_images, train_labels = load_images_and_labels('dataset/train')
val_images, val_labels = load_images_and_labels('dataset/val')
步骤 3：保存为二进制文件
将图片和标签保存为二进制文件，格式类似于 MNIST 数据集。
Python复制
import struct

def save_mnist(images, labels, image_file_path, label_file_path):
with open(image_file_path, 'wb') as image_file, open(label_file_path, 'wb') as label_file:

    # 写入图片文件头
    image_file.write(struct.pack('>IIII', 2051, len(images), images.shape[1], images.shape[2]))
    # 写入标签文件头
    label_file.write(struct.pack('>II', 2049, len(labels)))
    # 写入图片数据
    for image in images:
        image_file.write(image.tobytes())
    # 写入标签数据
    label_file.write(labels.tobytes())

示例

save_mnist(train_images, train_labels, 'train-images-idx3-ubyte', 'train-labels-idx1-ubyte')
save_mnist(val_images, val_labels, 't10k-images-idx3-ubyte', 't10k-labels-idx1-ubyte')
步骤 4：验证保存的文件
你可以使用之前提到的读取 MNIST 数据集的代码来验证保存的文件是否正确。
Python复制
def read_mnist_images(file_path):
with open(file_path, 'rb') as f:
magic, num, rows, cols = struct.unpack(">IIII", f.read(16))
images = np.frombuffer(f.read(), dtype=np.uint8).reshape(num, rows, cols)
return images

def read_mnist_labels(file_path):
with open(file_path, 'rb') as f:
magic, num = struct.unpack(">II", f.read(8))
labels = np.frombuffer(f.read(), dtype=np.uint8)
return labels

示例

train_images = read_mnist_images('train-images-idx3-ubyte')
train_labels = read_mnist_labels('train-labels-idx1-ubyte')
val_images = read_mnist_images('t10k-images-idx3-ubyte')
val_labels = read_mnist_labels('t10k-labels-idx1-ubyte')

print(train_images.shape, train_labels.shape)
print(val_images.shape, val_labels.shape)

6个月前
t10k-labels-idx1-ubyte

6个月前
train-images-idx3-ubyte

6个月前
train-labels-idx1-ubyte

6个月前
t10k-images.idx3-ubyte

7.84MB
6个月前
下载
t10k-labels.idx1-ubyte

10.01KB
6个月前
下载
train-images.idx3-ubyte

ubyte 数据集的制作【2月更文挑战第07天】

示例

示例

示例

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

ubyte 数据集的制作【2月更文挑战第07天】

示例

示例

示例

热门文章

最新文章

相关电子书