h5py的简介
使用h5py库读写超过内存的大数据 。在简单数据的读操作中,我们通常一次性把数据全部读入到内存中。读写超过内存的大数据时,有别于简单数据的读写操作,受限于内存大小,通常需要指定位置、指定区域读写操作,避免无关数据的读写。 h5py库刚好可以实现这一功能。
h5py的优势:速度快、压缩效率高,总之,numpy.savez和cPickle存储work或不work的都可以试一试h5py!h5py文件是存放两类对象的容器,数据集(dataset)和组(group),dataset类似数组类的数据集合,和numpy的数组差不多。group是像文件夹一样的容器,它好比python中的字典,有键(key)和值(value)。group中可以存放dataset或者其他的group。”键”就是组成员的名称,”值”就是组成员对象本身(组或者数据集),下面来看下如何创建组和数据集。
相关文章:HDF5 for Python
h5py is a thin, pythonic wrapper around the HDF5, which runs on Python 3 (3.6+).
Websites
Main website: https://www.h5py.org
Source code: https://github.com/h5py/h5py
Mailing list: https://groups.google.com/d/forum/h5py
Installation
Pre-build h5py can either be installed via your Python Distribution (e.g. Continuum Anaconda, Enthought Canopy) or from PyPI via pip. h5py is also distributed in many Linux Distributions (e.g. Ubuntu, Fedora), and in the MacOS package managers Homebrew, Macports, or Fink.
More detailed installation instructions, including how to install h5py with MPI support, can be found at: https://docs.h5py.org/en/latest/build.html.
Reporting bugs
Open a bug at https://github.com/h5py/h5py/issues. For general questions, ask on the list (https://groups.google.com/d/forum/h5py).
h5py的安装
pip install h5py
安装成功!哈哈,继续学习去啦!
h5py的使用方法
1、写入数据
import h5py
"""
create_dataset : 新建 dataset
create_group : 新建 group
"""
x = np.arange(100)
with h5py.File('test.h5','w') as f:
f.create_dataset('test_numpy',data=x)
subgroup = f.create_group('subgroup')
subgroup.create_dataset('test_numpy',data=x)
subsub = subgroup.create_group('subsub')
subsub.create_dataset('test_numpy',data=x)
2、读取数据
"""
keys() : 获取本文件夹下所有的文件及文件夹的名字
f['key_name'] : 获取对应的对象
"""
def read_data(filename):
with h5py.File(filename,'r') as f:
def print_name(name):
print(name)
f.visit(print_name)
print('---------------------------------------')
subgroup = f['subgroup']
print(subgroup.keys())
print('---------------------------------------')
dset = f['test_numpy']
print(dset)
print(dset.name)
print(dset.shape)
print(dset.dtype)
print(dset[:])
print('---------------------------------------')
read_data('test.h5')