前置步骤:
第三阶段: Hadoop高可用(HA)
0. 步骤概述
1). 为完全分布式保存hadoop配置
2). 为hadoop2配置hadoop1的ssh免密
3). 在hadoop2上配置zookeeper
4). 在hadoop1上修改hadoop配置文件为HA高可用模式
5). 第一次启动HA
6). 常规启动HA
7). 在完全分布式集群上测试wordcount程序
1. 为完全分布式保存hadoop配置
1.1 进入$HADOOP_HOME/etc/目录
[root@hadoop1 ~]# cd /opt/test/hadoop-2.6.5/etc
1.2 备份hadoop完全分布式配置,命名为hadoop-full,供以后使用
[root@hadoop1 etc]# cp -r hadoop/ hadoop-full
1.3 查看$HADOOP_HOME/etc/目录,备份成功
[root@hadoop1 etc]# ls
hadoop hadoop-full
# hadoop-full保留了已有配置,接下来高可用的配置继续在hadoop文件夹内修改
2. 为hadoop2配置hadoop1的ssh免密
2.1 在hadoop2上生成密匙
[root@hadoop2 ~]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
2.2 在hadoop2上配置对自身免密
[root@hadoop2 ~]# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2.3 在hadoop2上查看authorized_keys密匙
[root@hadoop2 ~]# cat ~/.ssh/authorized_keys
ssh-dss ***** root@hadoop1
ssh-dss ***** root@hadoop2
# hadoop2上的authorized_keys现在有两个,一个来自hadoop1,一个是自身的
2.4 在hadoop2上将公匙拷贝给hadoop1
2.4.1 方式一:直接用ssh-copy-id –i命令从hadoop2上拷贝到hadoop1
[root@hadoop2 ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub hadoop1
2.4.2.1 方式二(1):首先在hadoop2上操作,用scp命令将公匙复制到hadoop1
[root@hadoop2 ~]# scp ~/.ssh/id_dsa.pub hadoop1:~/.ssh/hadoop2.pub
2.4.2.2 方式二(2):接着在hadoop1上使用cat命令使hadoop2公匙生效
[root@hadoop2 ~]# cat hadoop2.pub >> authorized_keys
2.5 在hadoop2上测试ssh到hadoop1是否成功免密
[root@hadoop2 ~]# ssh hadoop1
[root@hadoop1 ~]#
#成功进入hadoop1,没有提示输入密码,表示免密成功
3. 在hadoop2上配置zookeeper
3.1 进入/opt/test/目录
[root@hadoop2 ~]# cd /opt/test
[root@hadoop2 test]
3.2 通过xftp将zookeeper-3.4.6.tar.gz上传到hadoop2的/opt/test/目录
3.3 解压缩文件
[root@hadoop2 test]# tar -zxvf zookeeper-3.4.6.tar.gz
3.4 为hadoop2,hadoop3,hadoop4设置zookeeper环境变量
3.4.1 在hadoop2上编辑/etc/profile,增加zookeeper环境变量配置
[root@hadoop2 ~]# vim /etc/profile
export ZOOKEEPER_PREFIX=/opt/test/zookeeper-3.4.6
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin:$ZOOKEEPER_PREFIX/bin
3.4.2 分发hadoop2上的/etc/profile到hadoop3,hadoop4
[root@hadoop2 ~]# scp /etc/profile hadoop2:/etc/
[root@hadoop2 ~]# scp /etc/profile hadoop2:/etc/
3.4.3 在hadoop2,hadoop3,hadoop4上使/etc/profile生效
[root@hadoop2 ~]# source /etc/profile
[root@hadoop3 ~]# source /etc/profile
[root@hadoop4 ~]# source /etc/profile
3.5 编辑zoo.cfg文件
3.5.0 进入/opt/etc/zookeeper-3.4.6/conf目录
[root@hadoop2 ~]# cd /opt/test/zookeeper-3.4.6/conf
[root@hadoop2 conf]
3.5.1将zoo_sample.cfg复制为zoo.cfg文件
[root@hadoop2 conf]# cp zoo_sample.cfg zoo.cfg
3.5.2 在zoo.cfg中添加如下内容
[root@hadoop2 conf]# vim zoo.cfg
# 配置zookeeper数据存放目录
dataDir=/var/test/zk/
# 设置zookeeper位置信息
server.1=hadoop2:2888:3888
server.2=hadoop3:2888:3888
server.3=hadoop4:2888:3888
3.6 设置zookeeper节点对应的ID
3.6.1 在hadoop2上进入/var/test/
[root@hadoop2 ~]# cd /var/test/
3.6.2 在test目录下创建zk目录
[root@hadoop2 test]# mkdir zk
3.6.3 进入/var/test/zk目录
[root@hadoop2 test]# cd /var/test/zk
3.6.4 在/var/test/zk目录下生成myid文件
[root@hadoop2 zk]# echo 1 > myid
3.6.5 分别在hadoop3和hadoop4重复3.6.1~3.6.4的操作,其中3.6.4步骤中hadoop3的myid内容为2,hadoop4的myid内容为3
3.6.6 查看hadoop2,hadoop3,hadoop4的myid文件
[root@hadoop2 zk]# cat myid
1
[root@hadoop3 zk]# cat myid
2
[root@hadoop4 zk]# cat myid
3
3.7 将zookeeper-3.4.6目录分发到其他节点上
3.7.1 在hadoop2上进入/opt/test/目录
[root@hadoop2 zk]# cd /opt/test/
3.7.2 分发zookeeper-3.4.6目录到hadoop3,hadoop4
[root@hadoop2 test]# scp -r zookeeper-3.4.6 hadoop3:`pwd`
[root@hadoop2 test]# scp -r zookeeper-3.4.6 hadoop4:`pwd`
3.8 验证zookeeper是否安装成功
3.8.1 在hadoop2,hadoop3,hadoop4上分别启动zookeeper
[root@hadoop2 test]# zkServer.sh start
[root@hadoop3 test]# zkServer.sh start
[root@hadoop4 test]# zkServer.sh start
3.8.2 查看zookeeper状态
# 如果成功,可以看到2个follower,1个leader,leader由选举产生
[root@hadoop2 test]# zkServer.sh status
JMX enabled by default
Using config: /opt/test/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
[root@hadoop3 test]# zkServer.sh status
JMX enabled by default
Using config: /opt/test/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower
[root@hadoop4 test]# zkServer.sh status
JMX enabled by default
Using config: /opt/test/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
3.9 使用zookeeper客户端
3.9.1 进入zookeeper客户端
[root@hadoop2 test]# zkCli.sh
[zk: localhost:2181(CONNECTED) 0]
显示上述信息表示进入成功
3.9.2 退出zookeeper客户端
[zk: localhost:2181(CONNECTED) 0] quit
Quitting...
2017-11-30 20:43:17,953 [myid:] - INFO [main:ZooKeeper@684] - Session: ** closed
2017-11-30 20:43:17,953 [myid:] – INFO [main-EventThread:ClientCnxn$EventThread@512] - EventThread shut down
3.9.3 停止zookeeper服务
[root@hadoop4 test]# zkServer.sh stop
JMX enabled by default
Using config: /opt/test/zookeeper-3.4.6/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
4. 在hadoop1上修改hadoop配置文件为HA高可用模式
4.1 进入$HADOOP_HOME/etc/hadoop目录
[root@hadoop1 ~]# cd /opt/test/hadoop-2.6.5/etc/hadoop/
4.2 修改hdfs-site.xml文件
[root@hadoop1 hadoop]# vim hdfs-site.xml
4.2.1删除secondary的配置信息
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop2:50090</value>
</property>
4.2.2 将原有hdfs-site.xml配置替换为如下内容
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--定义nameservices逻辑名称-->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!--映射nameservices逻辑名称到namenode逻辑名称-->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!--映射namenode逻辑名称到真实主机名称(RPC)-->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop1:8020</value>
</property>
<!--映射namenode逻辑名称到真实主机名称(RPC)-->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop2:8020</value>
</property>
<!--映射namenode逻辑名称到真实主机名称(HTTP)-->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop1:50070</value>
</property>
<!--映射namenode逻辑名称到真实主机名称(HTTP)-->
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop2:50070</value>
</property>
<!--配置journalnode集群位置信息及目录-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/var/test/hadoop/ha/jn</value>
</property>
<!--配置故障切换实现类-->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--指定切换方式为SSH免密钥方式-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_dsa</value>
</property>
<!--设置自动切换-->
<property>
<name>dfs.ha.automatic-failover.enabled.mycluster</name>
<value>true</value>
</property>
</configuration>
4.3 修改core-site.xml文件,将原有配置替换如下
[root@hadoop1 hadoop]# vim core-site.xml
<configuration>
<!--设置fs.defaultFS为nameservices的逻辑主机名-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!--设置zookeeper数据存放目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/var/test/hadoop/ha</value>
</property>
<!--设置zookeeper位置信息-->
<property>
<name>ha.zookeeper.quorum.mycluster</name>
<value>hadoop2:2181,hadoop3:2181,hadoop4:2181</value>
</property>
</configuration>
4.4 将修改后的hdfs-site.xml和core-site.xml分发到其他节点
[root@hadoop1 hadoop]# scp hdfs-site.xml core-site.xml hadoop2:`pwd`
[root@hadoop1 hadoop]# scp hdfs-site.xml core-site.xml hadoop3:`pwd`
[root@hadoop1 hadoop]# scp hdfs-site.xml core-site.xml hadoop4:`pwd`
5. 第一次启动HA
5.1 启动zookeeper
5.1.1 在hadoop2,hadoop3,hadoop4上分别启动zookeeper
[root@hadoop2 ~]# zkServer.sh start
[root@hadoop3 ~]# zkServer.sh start
[root@hadoop4 ~]# zkServer.sh start
5.1.2 hadoop2,hadoop3,hadoop4进程显示如下
[root@hadoop2 ~]# jps
**** Jps
**** QuorumPeerMain
[root@hadoop3 ~]# jps
**** Jps
**** QuorumPeerMain
[root@hadoop4 ~]# jps
**** Jps
**** QuorumPeerMain
5.2 启动journalnode
5.2.1 在hadoop1,hadoop2,hadoop3上启动journalnode
[root@hadoop1 ~]# hadoop-daemon.sh start journalnode
[root@hadoop2 ~]# hadoop-daemon.sh start journalnode
[root@hadoop3 ~]# hadoop-daemon.sh start journalnode
5.2.2 hadoop1,hadoop2,hadoop3进程显示如下
[root@hadoop1 ~]# jps
**** Jps
**** JournalNode
[root@hadoop2 ~]# jps
**** Jps
**** QuorumPeerMain
**** JournalNode
[root@hadoop3 ~]# jps
**** Jps
**** QuorumPeerMain
**** JournalNode
5.3 在hadoop1上格式化namenode
[root@hadoop1 ~]# hdfs namenode -format
17/11/30 21:16:35 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
……
SHUTDOWN_MSG: Shutting down NameNode at hadoop1/192.168.111.211
************************************************************/
5.4 在hadoop1上启动namenode
5.4.1 格式化完成后在hadoop1上启动namenode
[root@hadoop1 ~]# hadoop-daemon.sh start namenode
starting namenode, logging to /opt/test/hadoop-2.6.5/logs/hadoop-root-namenode-hadoop1.out
5.4.2 hadoop1进程显示如下
[root@hadoop1 ~]# jps
**** Jps
**** JournalNode
**** NameNode
5.5 在hadoop2,即另一台namenode上同步hadoop1的CID等信息
[root@hadoop2 ~]# hdfs namenode -bootstrapStandby
17/11/30 21:20:27 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop2/192.168.111.212
************************************************************/
5.6 在hadoop1上启动其他服务
[root@hadoop1 ~]# start-dfs.sh
17/11/30 21:21:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop1 hadoop2]
hadoop1: namenode running as process 1555. Stop it first.
hadoop2: starting namenode, logging to /opt/test/hadoop-2.6.5/logs/hadoop-root-namenode-hadoop2.out
hadoop2: starting datanode, logging to /opt/test/hadoop-2.6.5/logs/hadoop-root-datanode-hadoop2.out
hadoop3: starting datanode, logging to /opt/test/hadoop-2.6.5/logs/hadoop-root-datanode-hadoop3.out
hadoop4: starting datanode, logging to /opt/test/hadoop-2.6.5/logs/hadoop-root-datanode-hadoop4.out
Starting journal nodes [hadoop1 hadoop2 hadoop3]
hadoop1: journalnode running as process 1397. Stop it first.
hadoop3: journalnode running as process 1437. Stop it first.
hadoop2: journalnode running as process 1435. Stop it first.
17/11/30 21:21:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting ZK Failover Controllers on NN hosts [hadoop1 hadoop2]
hadoop1: starting zkfc, logging to /opt/test/hadoop-2.6.5/logs/hadoop-root-zkfc-hadoop1.out
hadoop2: starting zkfc, logging to /opt/test/hadoop-2.6.5/logs/hadoop-root-zkfc-hadoop2.out
5.7 在hadoop1上格式化zookeeper
[root@hadoop1 ~]# hdfs zkfc -formatZK
……
17/11/30 21:23:10 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
17/11/30 21:23:10 INFO ha.ActiveStandbyElector: Session connected.
17/11/30 21:23:10 INFO zookeeper.ClientCnxn: EventThread shut down
17/11/30 21:23:10 INFO zookeeper.ZooKeeper: Session: 0x2600d0c1b960000 closed
5.8 在hadoop2,hadoop3,hadoop4上使用zkCli.sh查看格式化结果
5.8.1 进入zookeeper客户端
[root@hadoop2 ~]# zkCli.sh
[zk: localhost:2181(CONNECTED) 0]
5.8.2 使用ls /命令查看根目录
[zk: localhost:2181(CONNECTED) 0] ls /
[hadoop-ha, zookeeper]
#每个根目录都生成了hadoop-ha目录
*格式化namenode后出现datanode无法启动的情况,查看BUGFIX1
6. 常规启动HA
6.1 启动zookeeper
6.1.1 在hadoop2,hadoop3,hadoop4上分别启动zookeeper
[root@hadoop2 ~]# zkServer.sh start
[root@hadoop3 ~]# zkServer.sh start
[root@hadoop4 ~]# zkServer.sh start
6.1.2 hadoop2,hadoop3,hadoop4进程显示如下
[root@hadoop2 ~]# jps
**** Jps
**** QuorumPeerMain
[root@hadoop3 ~]# jps
**** Jps
**** QuorumPeerMain
[root@hadoop4 ~]# jps
**** Jps
**** QuorumPeerMain
6.2 启动hdfs集群
6.2.1 在hadoop1上启动整个集群start-dfs.sh
[root@hadoop1 ~]# start-dfs.sh
6.2.2 hadoop会启动如下进程:
hadoop1, hadoop2: namenode
hadoop2, hadoop3, hadoop4: datanode
hadoop1, hadoop2, hadoop3: journalnode
hadoop1, hadoop2: ZKFC
6.2.3 启动完成后各节点进程显示如下:
[root@hadoop1 ~]# jps
2559 JournalNode
2724 DFSZKFailoverController
2790 Jps
2366 NameNode
[root@hadoop2 ~]# jps
2099 JournalNode
2217 DFSZKFailoverController
2265 Jps
1754 QuorumPeerMain
2014 DataNode
1945 NameNode
[root@hadoop3 ~]# jps
1583 QuorumPeerMain
1714 DataNode
1799 JournalNode
1859 Jps
[root@hadoop4 ~]# jps
1685 Jps
1510 QuorumPeerMain
1613 DataNode
6.3 启动yarn
6.3.1 在hadoop1上启动yarn
[root@hadoop1 ~]# start-yarn.sh
6.3.2 启动完成后各集群进程如下
[root@hadoop1 ~]# jps
2559 JournalNode
2935 ResourceManager
2724 DFSZKFailoverController
3350 Jps
2366 NameNode
[root@hadoop2 ~]# jps
2099 JournalNode
2217 DFSZKFailoverController
1754 QuorumPeerMain
2381 NodeManager
2014 DataNode
2587 Jps
1945 NameNode
[root@hadoop3 ~]# jps
1583 QuorumPeerMain
1714 DataNode
2628 Jps
1901 NodeManager
1799 JournalNode
[root@hadoop4 ~]# jps
1728 NodeManager
1510 QuorumPeerMain
1613 DataNode
1891 Jps
7. 在完全分布式集群上测试wordcount程序
7.1 从hadoop1进入$HADOOP_HOME/share/hadoop/mapreduce/目录
[root@hadoop1 ~]# cd /opt/test/hadoop-2.6.5/share/hadoop/mapreduce/
7.2上传test.txt文件到根目录
7.2.1 默认上传
[root@hadoop1 mapreduce]# hadoop fs -put test.txt /
7.2.2 也可以指定blocksize
[root@hadoop1 mapreduce]# hdfs dfs -D dfs.blocksize=1048576 -put test.txt /
7.3 运行wordcount测试程序,输出到/output
[root@hadoop1 mapreduce]#
hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount /test.txt /output
#运行时会首先看到如下信息
INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
7.4 查看mapreduce运行结果
[root@hadoop1 mapreduce]# hadoop dfs -text /output/part-*
hello 100003
world 200002
“hello 100000