历史文章
[hadoop3.x系列]HDFS REST HTTP API的使用(一)WebHDFS
[hadoop3.x系列]HDFS REST HTTP API的使用(二)HttpFS
[hadoop3.x系列]Hadoop常用文件存储格式及BigData File Viewer工具的使用(三)
✨[hadoop3.x]新一代的存储格式Apache Arrow(四)
[hadoop3.x]HDFS存储策略和冷热温三阶段数据存储(六)概述
🍑 1.1 💃存储策略命令💃
1 💃列出存储策略💃
列出所有存储策略。
命令:
[root@node1 Examples]# 💃hdfs storagepolicies -listPolicies💃 Block Storage Policies: BlockStoragePolicy{PROVIDED:1, storageTypes=[PROVIDED, DISK], creationFallbacks=[PROVIDED, DISK], replicationFallbacks=[PROVIDED, DISK]} BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]} BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]} BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} BlockStoragePolicy{ONE_SSD:10, storageTypes=[SSD, DISK], creationFallbacks=[SSD, DISK], replicationFallbacks=[SSD, DISK]} BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]} BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]} hdfs storagepolicies -listPolicies
2 💃设置存储策略💃
给一个文件或目录设置存储策略
hdfs storagepolicies -setStoragePolicy -path -policy
参数:
-path | 引用目录或文件的路径 |
-policy | 存储策略的名称 |
3 💃取消存储策略💃
取消文件或目录的存储策略。在执行unset命令之后,将应用当前目录最近的祖先存储策略,如果没有任何祖先的策略,则将应用默认的存储策略。
hdfs storagepolicies -unsetStoragePolicy -path
参数:
-path | 引用目录或文件的路径 |
4 💃获取存储策略💃
获取文件或目录的存储策略。
hdfs storagepolicies -getStoragePolicy -path
-path | 引用目录或文件的路径。 |
🍑2.1 💃冷热温三阶段数据存储💃
为了更加充分的利用存储资源,我们可以将数据分为冷、热、温三个阶段来存储。
/data/hdfs-test/data_phase/hot | 热阶段数据 |
/data/hdfs-test/data_phase/warm | 温阶段数据 |
/data/hdfs-test/data_phase/cold | 冷阶段数据 |
1 💃配置DataNode存储目录💃
为了能够支撑不同类型的数据,我们需要在hdfs-site.xml中配置不同存储类型数据的位置。
进入到Hadoop配置目录,编辑hdfs-site.xml
cd /export/server/hadoop-3.1.4/etc/hadoop vim hdfs-site.xml <property> <name>dfs.datanode.data.dir</name> <value>[DISK]file:///export/server/hadoop-3.1.4/data/datanode,[ARCHIVE]file:///export/server/hadoop-3.1.4/data/archive</value> <description>DataNode存储名称空间和事务日志的本地文件系统上的路径</description> </property>
分发到不同另外两个节点中
scp hdfs-site.xml node2.itcast.cn:$PWD scp hdfs-site.xml node3.itcast.cn:$PWD
重启HDFS集群
stop-dfs.sh start-dfs.sh
配置好后,我们在WebUI的Datanodes页面中点击任意一个DataNode节点
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1HGUDOAb-1633607380830)(https://gitee.com/the_efforts_paid_offf/picture-blog/raw/master/img/20211006210405.jpg)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VHsmIDyH-1633607380831)(https://gitee.com/the_efforts_paid_offf/picture-blog/raw/master/img/20211006210409.jpg)]
可以看到,现在配置的是两个目录,一个StorageType为ARCHIVE、一个Storage为DISK。
2 💃配置策略💃
创建测试目录结构
hdfs dfs -mkdir -p /data/hdfs-test/data_phase/hot hdfs dfs -mkdir -p /data/hdfs-test/data_phase/warm hdfs dfs -mkdir -p /data/hdfs-test/data_phase/cold
- 查看当前HDFS支持的存储策略
[root@node1 Examples]# hdfs storagepolicies -listPolicies Block Storage Policies: BlockStoragePolicy{PROVIDED:1, storageTypes=[PROVIDED, DISK], creationFallbacks=[PROVIDED, DISK], replicationFallbacks=[PROVIDED, DISK]} BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]} BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]} BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} BlockStoragePolicy{ONE_SSD:10, storageTypes=[SSD, DISK], creationFallbacks=[SSD, DISK], replicationFallbacks=[SSD, DISK]} BlockStoragePolicy{ALL_SSD:12, storageTypes=[SSD], creationFallbacks=[DISK], replicationFallbacks=[DISK]} BlockStoragePolicy{LAZY_PERSIST:15, storageTypes=[RAM_DISK, DISK], creationFallbacks=[DISK], replicationFallbacks=[DISK]}
- 分别设置三个目录的存储策略
hdfs storagepolicies -setStoragePolicy -path /data/hdfs-test/data_phase/hot -policy HOT hdfs storagepolicies -setStoragePolicy -path /data/hdfs-test/data_phase/warm -policy WARM hdfs storagepolicies -setStoragePolicy -path /data/hdfs-test/data_phase/cold -policy COLD
- 查看三个目录的存储策略
hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/hot hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/warm hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/cold
[root@node1 Examples]# hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/hot The storage policy of /data/hdfs-test/data_phase/hot: BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} [root@node1 Examples]# hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/warm The storage policy of /data/hdfs-test/data_phase/warm: BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]} [root@node1 Examples]# hdfs storagepolicies -getStoragePolicy -path /data/hdfs-test/data_phase/cold The storage policy of /data/hdfs-test/data_phase/cold: BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}
3 💃上传测试💃
分别上传文件到三个目录中测试
hdfs dfs -put /etc/profile /data/hdfs-test/data_phase/hot hdfs dfs -put /etc/profile /data/hdfs-test/data_phase/warm hdfs dfs -put /etc/profile /data/hdfs-test/data_phase/cold
查看不同存储策略文件的block位置
[root@node1 hadoop]# hdfs fsck /data/hdfs-test/data_phase/hot/profile -files -blocks -locations Connecting to namenode via http://node1.itcast.cn:9870/fsck?ugi=root&files=1&blocks=1&locations=1&path=%2Fdata%2Fhdfs-test%2Fdata_phase%2Fhot%2Fprofile FSCK started by root (auth:SIMPLE) from /192.168.88.100 for path /data/hdfs-test/data_phase/hot/profile at Sun Oct 11 22:03:05 CST 2020 /data/hdfs-test/data_phase/hot/profile 3158 bytes, replicated: replication=3, 1 block(s): OK 0. BP-538037512-192.168.88.100-1600884040401:blk_1073742535_1750 len=3158 Live_repl=3 [DatanodeInfoWithStorage[192.168.88.101:9866,DS-96feb29a-5dfd-4692-81ea-9e7f100166fe,DISK], DatanodeInfoWithStorage[192.168.88.100:9866,DS-79739be9-5f9b-4f96-a005-aa5b507899f5,DISK], DatanodeInfoWithStorage[192.168.88.102:9866,DS-e28af2f2-21ae-4aa6-932e-e376dd04ddde,DISK]] hdfs fsck /data/hdfs-test/data_phase/warm/profile -files -blocks -locations /data/hdfs-test/data_phase/warm/profile 3158 bytes, replicated: replication=3, 1 block(s): OK 0. BP-538037512-192.168.88.100-1600884040401:blk_1073742536_1751 len=3158 Live_repl=3 [DatanodeInfoWithStorage[192.168.88.102:9866,DS-636f34a0-682c-4d1b-b4ee-b4c34e857957,ARCHIVE], DatanodeInfoWithStorage[192.168.88.101:9866,DS-ff6970f8-43e0-431f-9041-fc440a44fdb0,ARCHIVE], DatanodeInfoWithStorage[192.168.88.100:9866,DS-79739be9-5f9b-4f96-a005-aa5b507899f5,DISK]] hdfs fsck /data/hdfs-test/data_phase/cold/profile -files -blocks -locations /data/hdfs-test/data_phase/cold/profile 3158 bytes, replicated: replication=3, 1 block(s): OK 0. BP-538037512-192.168.88.100-1600884040401:blk_1073742537_1752 len=3158 Live_repl=3 [DatanodeInfoWithStorage[192.168.88.102:9866,DS-636f34a0-682c-4d1b-b4ee-b4c34e857957,ARCHIVE], DatanodeInfoWithStorage[192.168.88.101:9866,DS-ff6970f8-43e0-431f-9041-fc440a44fdb0,ARCHIVE], DatanodeInfoWithStorage[192.168.88.100:9866,DS-ca9759a0-f6f0-4b8b-af38-d96f603bca93,ARCHIVE]]
我们可以看到:
hot目录中的block,3个block都在DISK磁盘
warm目录中的block,1个block在DISK磁盘,另外两个在archive磁盘
cold目录中的block,3个block都在archive磁盘