Hadoop2.7实战v1.0之添加DataNode节点后,更改文件复制策略dfs.replication

简介: 1.查看当前系统的复制策略dfs.replication为3,表示文件会备份成3份a.通过查看hdfs-site.xml 文件 点击(此处)折叠或打开 [root@sht-sgmhadoopnn-01 ~]# cd /hadoop/hadoop-2.
1.查看当前系统的复制策略dfs.replication为3,表示文件会备份成3份
a.通过查看hdfs-site.xml 文件

点击(此处)折叠或打开

  1. [root@sht-sgmhadoopnn-01 ~]# cd /hadoop/hadoop-2.7.2/etc/hadoop
  2. [root@sht-sgmhadoopnn-01 hadoop]# more hdfs-site.xml
  3.  <property>
  4.                 <name>dfs.replication</name>
  5.                 <value>3</value>
  6. </property>
b.通过查看当前hdfs文件的复制值是多少

点击(此处)折叠或打开

  1. [root@sht-sgmhadoopnn-01 hadoop]# hdfs dfs -ls /testdir
  2. Found 7 items
  3. -rw-r--r-- 3 root supergroup 37322672 2016-03-05 17:59 /testdir/012_HDFS.avi
  4. -rw-r--r-- 3 root supergroup 224001146 2016-03-05 18:01 /testdir/016_Hadoop.avi
  5. -rw-r--r-- 3 root supergroup 176633760 2016-03-05 19:11 /testdir/022.avi
  6. -rw-r--r-- 3 root supergroup 30 2016-02-28 22:42 /testdir/1.log
  7. -rw-r--r-- 3 root supergroup 196 2016-02-28 22:23 /testdir/full_backup.log
  8. -rw-r--r-- 3 root supergroup 142039186 2016-03-05 17:55 /testdir/oracle-j2sdk1.7-1.7.0+update67-1.x86_64.rpm
  9. -rw-r--r-- 3 root supergroup 44 2016-02-28 19:40 /testdir/test.log
  10. [root@sht-sgmhadoopnn-01 hadoop]#
  11. ### 紧跟-rw-r--r--权限后面的3,表示该文件在hdfs有多少份备份
c.通过 hadoop fsck /,也可以方便的看到Average block replication的值仍然为3,该值我们可以手动的进行动态修改。
而Default replication factor则需要重启整个Hadoop集群才能修改(就是hdfs-site.xml 文件中改为4,然后集群重启才生效,不过这种情况不适用生产集群),
但实际影响系统的还是Average block replication的值,因此并非一定要修改默认值Default replication factor。

点击(此处)折叠或打开

  1. [root@sht-sgmhadoopnn-01 hadoop]# hdfs fsck /
  2. 16/03/06 17:15:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. Connecting to namenode via http://sht-sgmhadoopnn-01:50070/fsck?ugi=root&path=%2F
  4. FSCK started by root (auth:SIMPLE) from /172.16.101.55 for path / at Sun Mar 06 17:15:29 CST 2016
  5. ............Status: HEALTHY
  6.  Total size: 580151839 B
  7.  Total dirs: 15
  8.  Total files: 12
  9.  Total symlinks: 0
  10.  Total blocks (validated): 11 (avg. block size 52741076 B)
  11.  Minimally replicated blocks: 11 (100.0 %)
  12.  Over-replicated blocks: 0 (0.0 %)
  13.  Under-replicated blocks: 0 (0.0 %)
  14.  Mis-replicated blocks: 0 (0.0 %)
  15.  Default replication factor: 3
  16.  Average block replication: 3.0
  17.  Corrupt blocks: 0
  18.  Missing replicas: 0 (0.0 %)
  19.  Number of data-nodes: 4
  20.  Number of racks: 1
  21. FSCK ended at Sun Mar 06 17:15:29 CST 2016 in 9 milliseconds
  22. The filesystem under path '/' is HEALTHY
  23. You have mail in /var/spool/mail/root
2.修改hdfs文件备份系数

点击(此处)折叠或打开

  1. [root@sht-sgmhadoopnn-01 hadoop]# hdfs dfs -help
  2. -setrep [-R] [-w] ... :
  3.   Set the replication level of a file. If is a directory then the command
  4.   recursively changes the replication factor of all files under the directory tree
  5.   rooted at .
  6.                                                                                  
  7.   -w It requests that the command waits for the replication to complete. This
  8.       can potentially take a very long time.
  9.   -R It is accepted for backwards compatibility. It has no effect.


  10. [root@sht-sgmhadoopnn-01 hadoop]# hdfs dfs -setrep -w 4 -R /
  11. setrep: `-R': No such file or directory
  12. Replication 4 set: /out1/_SUCCESS
  13. Replication 4 set: /out1/part-r-00000
  14. Replication 4 set: /testdir/012_HDFS.avi
  15. Replication 4 set: /testdir/016_Hadoop.avi
  16. Replication 4 set: /testdir/022.avi
  17. Replication 4 set: /testdir/1.log
  18. Replication 4 set: /testdir/full_backup.log
  19. Replication 4 set: /testdir/oracle-j2sdk1.7-1.7.0+update67-1.x86_64.rpm
  20. Replication 4 set: /testdir/test.log
  21. Replication 4 set: /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1456590271264_0002-1456659654297-root-word+count-1456659679606-1-1-SUCCEEDED-root.root-1456659662730.jhist
  22. Replication 4 set: /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1456590271264_0002.summary
  23. Replication 4 set: /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1456590271264_0002_conf.xml
  24. Waiting for /out1/_SUCCESS ... done
  25. Waiting for /out1/part-r-00000 .... done
  26. Waiting for /testdir/012_HDFS.avi ... done
  27. Waiting for /testdir/016_Hadoop.avi ... done
  28. Waiting for /testdir/022.avi ... done
  29. Waiting for /testdir/1.log ... done
  30. Waiting for /testdir/full_backup.log ... done
  31. Waiting for /testdir/oracle-j2sdk1.7-1.7.0+update67-1.x86_64.rpm ... done
  32. Waiting for /testdir/test.log ... done
  33. Waiting for /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1456590271264_0002-1456659654297-root-word+count-1456659679606-1-1-SUCCEEDED-root.root-1456659662730.jhist ... done
  34. Waiting for /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1456590271264_0002.summary ... done
  35. Waiting for /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1456590271264_0002_conf.xml ... done
  36. [root@sht-sgmhadoopnn-01 hadoop]#

  37. ##再次检查备份系统的情况, Average block replication为4
  38. [root@sht-sgmhadoopnn-01 hadoop]# hdfs fsck /
  39. 16/03/06 17:25:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  40. Connecting to namenode via http://sht-sgmhadoopnn-01:50070/fsck?ugi=root&path=%2F
  41. FSCK started by root (auth:SIMPLE) from /172.16.101.55 for path / at Sun Mar 06 17:25:51 CST 2016
  42. ............Status: HEALTHY
  43.  Total size: 580151839 B
  44.  Total dirs: 15
  45.  Total files: 12
  46.  Total symlinks: 0
  47.  Total blocks (validated): 11 (avg. block size 52741076 B)
  48.  Minimally replicated blocks: 11 (100.0 %)
  49.  Over-replicated blocks: 0 (0.0 %)
  50.  Under-replicated blocks: 0 (0.0 %)
  51.  Mis-replicated blocks: 0 (0.0 %)
  52.  Default replication factor: 3
  53.  Average block replication: 4.0
  54.  Corrupt blocks: 0
  55.  Missing replicas: 0 (0.0 %)
  56.  Number of data-nodes: 4
  57.  Number of racks: 1
  58. FSCK ended at Sun Mar 06 17:25:51 CST 2016 in 6 milliseconds
  59. The filesystem under path '/
3.测试

点击(此处)折叠或打开

  1. [root@sht-sgmhadoopnn-01 hadoop]# vi /tmp/wjp.log
  2. hello,i am
  3. hadoop
  4. hdfs
  5. mapreduce
  6. yarn
  7. hive
  8. zookeeper

  9. [root@sht-sgmhadoopnn-01 hadoop]# hdfs dfs -put /tmp/wjp.log /testdir

  10. [root@sht-sgmhadoopnn-01 hadoop]# hdfs dfs -ls /testdir
  11. Found 8 items
  12. -rw-r--r-- 4 root supergroup 37322672 2016-03-05 17:59 /testdir/012_HDFS.avi
  13. -rw-r--r-- 4 root supergroup 224001146 2016-03-05 18:01 /testdir/016_Hadoop.avi
  14. -rw-r--r-- 4 root supergroup 176633760 2016-03-05 19:11 /testdir/022.avi
  15. -rw-r--r-- 4 root supergroup 30 2016-02-28 22:42 /testdir/1.log
  16. -rw-r--r-- 4 root supergroup 196 2016-02-28 22:23 /testdir/full_backup.log
  17. -rw-r--r-- 4 root supergroup 142039186 2016-03-05 17:55 /testdir/oracle-j2sdk1.7-1.7.0+update67-1.x86_64.rpm
  18. -rw-r--r-- 4 root supergroup 44 2016-02-28 19:40 /testdir/test.log
  19. -rw-r--r-- 3 root supergroup 62 2016-03-06 17:30 /testdir/wjp.log
  20. [root@sht-sgmhadoopnn-01 hadoop]# hdfs dfs -rm /testdir/wjp.log
  21. 16/03/06 17:31:47 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
  22. Moved: 'hdfs://mycluster/testdir/wjp.log' to trash at: hdfs://mycluster/user/root/.Trash/Current
  23. [root@sht-sgmhadoopnn-01 hadoop]#
  24. ### put的测试文件wjp.log的备份数还是3,于是我先把测试文件删除掉,去修改namenode节点的hdfs-site.xml的参数
4.修改namenode节点的hdfs-site.xml的参数

点击(此处)折叠或打开

  1. [root@sht-sgmhadoopnn-01 hadoop]# vi hdfs-site.xml
  2.          <property>
  3.                 <name>dfs.replication</name>
  4.                 <value>4</value>
  5.          </property>
  6. [root@sht-sgmhadoopnn-01 hadoop]# scp hdfs-site.xml root@sht-sgmhadoopnn-02:/hadoop/hadoop-2.7.2/etc/hadoop
  7. ###假如集群中,配置了namenode HA,那么应该需要对另外一个standbyNamenode节点的文件要同步一直,无需也同步到datanode节点
5.再次测试
##先不重启试试看

点击(此处)折叠或打开

  1. [root@sht-sgmhadoopnn-01 hadoop]# hdfs dfs -put /tmp/wjp.log /testdir
  2. 16/03/06 17:36:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  3. You have mail in /var/spool/mail/root
  4. [root@sht-sgmhadoopnn-01 hadoop]# hdfs dfs -ls /testdir
  5. 16/03/06 17:36:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  6. Found 8 items
  7. -rw-r--r-- 4 root supergroup 37322672 2016-03-05 17:59 /testdir/012_HDFS.avi
  8. -rw-r--r-- 4 root supergroup 224001146 2016-03-05 18:01 /testdir/016_Hadoop.avi
  9. -rw-r--r-- 4 root supergroup 176633760 2016-03-05 19:11 /testdir/022.avi
  10. -rw-r--r-- 4 root supergroup 30 2016-02-28 22:42 /testdir/1.log
  11. -rw-r--r-- 4 root supergroup 196 2016-02-28 22:23 /testdir/full_backup.log
  12. -rw-r--r-- 4 root supergroup 142039186 2016-03-05 17:55 /testdir/oracle-j2sdk1.7-1.7.0+update67-1.x86_64.rpm
  13. -rw-r--r-- 4 root supergroup 44 2016-02-28 19:40 /testdir/test.log
  14. -rw-r--r-- 4 root supergroup 62 2016-03-06 17:36 /testdir/wjp.log

  15. [root@sht-sgmhadoopnn-01 hadoop]# hdfs fsck /
  16. 16/03/06 21:49:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  17. Connecting to namenode via http://sht-sgmhadoopnn-01:50070/fsck?ugi=root&path=%2F
  18. FSCK started by root (auth:SIMPLE) from /172.16.101.55 for path / at Sun Mar 06 21:49:12 CST 2016
  19. ...............Status: HEALTHY
  20.  Total size: 580152025 B
  21.  Total dirs: 17
  22.  Total files: 15
  23.  Total symlinks: 0
  24.  Total blocks (validated): 14 (avg. block size 41439430 B)
  25.  Minimally replicated blocks: 14 (100.0 %)
  26.  Over-replicated blocks: 0 (0.0 %)
  27.  Under-replicated blocks: 0 (0.0 %)
  28.  Mis-replicated blocks: 0 (0.0 %)
  29.  Default replication factor: 3
  30.  Average block replication: 4.0
  31.  Corrupt blocks: 0
  32.  Missing replicas: 0 (0.0 %)
  33.  Number of data-nodes: 4
  34.  Number of racks: 1
  35. FSCK ended at Sun Mar 06 21:49:12 CST 2016 in 8 milliseconds
##【事实证明】:无需重启集群或者namenode节点,是从刚才动态设置命令(hdfs dfs -setrep -w 4 -R /)的内存信息中读取的,
而不是从配置文件hdfs-site.xml文件中读取配置的,从而验证了上面这句话:
实际影响系统的还是Average block replication的值,因此并非一定要修改默认值Default replication factor。


总结命令:
hdfs fsck /
hdfs dfs -setrep -w 4 -R /

目录
相关文章
|
5月前
|
SQL 分布式计算 Hadoop
大数据行业部署实战1:Hadoop伪分布式部署
大数据行业部署实战1:Hadoop伪分布式部署
146 0
|
4月前
|
分布式计算 Java 大数据
【大数据技术Hadoop+Spark】HDFS Shell常用命令及HDFS Java API详解及实战(超详细 附源码)
【大数据技术Hadoop+Spark】HDFS Shell常用命令及HDFS Java API详解及实战(超详细 附源码)
161 0
|
6月前
|
分布式计算 大数据 Hadoop
【大数据开发技术】实验03-Hadoop读取文件
【大数据开发技术】实验03-Hadoop读取文件
101 0
|
6月前
|
分布式计算 Hadoop 大数据
大数据Hadoop之——Apache Hudi 数据湖实战操作(Spark,Flink与Hudi整合)
大数据Hadoop之——Apache Hudi 数据湖实战操作(Spark,Flink与Hudi整合)
|
4月前
|
分布式计算 大数据 Scala
【大数据技术Hadoop+Spark】Spark RDD创建、操作及词频统计、倒排索引实战(超详细 附源码)
【大数据技术Hadoop+Spark】Spark RDD创建、操作及词频统计、倒排索引实战(超详细 附源码)
89 1
|
1天前
|
分布式计算 负载均衡 Hadoop
Hadoop集群节点添加
Hadoop集群节点添加
|
4月前
|
分布式计算 资源调度 搜索推荐
《PySpark大数据分析实战》-02.了解Hadoop
大家好!今天为大家分享的是《PySpark大数据分析实战》第1章第2节的内容:了解Hadoop。
44 0
《PySpark大数据分析实战》-02.了解Hadoop
|
4月前
|
分布式计算 Hadoop 大数据
大数据成长之路-- hadoop集群的部署(4)退役旧数据节点
大数据成长之路-- hadoop集群的部署(4)退役旧数据节点
52 0
|
4月前
|
分布式计算 Hadoop 大数据
大数据成长之路-- hadoop集群的部署(3)HDFS新增节点
大数据成长之路-- hadoop集群的部署(3)HDFS新增节点
66 0
|
4月前
|
存储 分布式计算 搜索推荐
【大数据技术Hadoop+Spark】MapReduce之单词计数和倒排索引实战(附源码和数据集 超详细)
【大数据技术Hadoop+Spark】MapReduce之单词计数和倒排索引实战(附源码和数据集 超详细)
46 0

相关实验场景

更多