HostName | IP | Hadoop | HBase | Zookeeper | Hive | |
HMaster0 | 192.168.18.215 | NameNode | HMaster | / | Hive | |
HMaster1 | 192.168.18.216 | NameNode | HMaster | / | Hive-client |
|
HSlave0 | 192.168.18.217 | DataNode | HRegionServer | QuorumPeerMain | / | |
HSlave1 | 192.168.18.218 | DataNode | HRegionServer | QuorumPeerMain | / | |
HSlave2 | 192.168.18.219 | DataNode | HRegionServer | QuorumPeerMain | / |
软件名 |
版本号 | 功能 |
Hadoop | hadoop-2.6.0.tar.gz | 为海量数据提供分布式存储(HDFS)和分布式计算(YARN)。 |
HBase | hbase-1.0.1.1-src.tar.gz | 基于Hadoop的分布式、面向列的NoSQL数据库,适用于非结构化数据存储的数据库。 |
Zookeeper | zookeeper-3.4.6.tar.gz | 一个分布式应用程序协调服务,为应用提供一致性服务,是Hadoop和Hbase的重要组件。 |
Hive | apache-hive-1.2.0-bin.tar.gz | 基于Hadoop的一个数据仓库工具,将结构化的数据文件映射成一张表,并提供简单的SQL查询功能,将SQL语句转换为MapReduce任务运行处理。 |
Phoenix |
phoenix-4.4.0-HBase-1.0-bin.tar.gz | Hbase的SQL驱动,Phoenix让Hbase支持以JDBC方式访问,并将SQL查询转换成Hbase的扫描和相应的操作。 |
JDK | jdk-7u79-linux-x64.gz | JAVA运行环境 |
Hadoop生态系统下载地址:http://www.apache.org/dist/ JDK下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html |
1
2
3
4
5
6
7
8
|
# tar zxvf jdk-7u79-linux-x64.gz
# mv jdk1.7.0_79 /usr/local/jdk1.7
# vi /etc/profile
JAVA_HOME=
/usr/local/jdk1
.7
PATH=$PATH:$JAVA_HOME
/bin
CLASSPATH=$JAVA_HOME
/lib
:$JAVA_HOME
/jre/lib
export
JAVA_HOME PATH CLASSPATH
# source /etc/profile #使配置生效
|
1
2
3
|
# hostname HMaster0
# vi /etc/hostname
HMaster0
|
1
2
3
4
5
6
7
8
|
# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4localhost4.localdomain4
::1 localhost localhost.localdomainlocalhost6 localhost6.localdomain6
192.168.18.215 HMaster0
192.168.18.216 HMaster1
192.168.18.217 HSlave0
192.168.18.218 HSlave1
192.168.18.219 HSlave2
|
1
2
3
4
5
6
7
8
9
10
11
12
|
# ssh-kegen #一直回车创建秘钥对
[root@HMaster0]
# cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys
[root@HMaster0]
# scp /root/.ssh/authorized_keys root@HMaster0:/root/.ssh
[root@HMaster0]
# scp /root/.ssh/authorized_keys root@HMaster1:/root/.ssh
[root@HMaster0]
# scp /root/.ssh/authorized_keys root@HSlave0:/root/.ssh
[root@HMaster0]
# scp /root/.ssh/authorized_keys root@HSlave1:/root/.ssh
[root@HMaster0]
# scp /root/.ssh/authorized_keys root@HSlave2:/root/.ssh
[root@HMaster0]
# ssh root@HMaster0 'chmod 600 /root/.ssh/authorized_keys && chmod 700 /root/.ssh'
[root@HMaster0]
# ssh root@HMaster1 'chmod 600 /root/.ssh/authorized_keys && chmod 700 /root/.ssh'
[root@HMaster0]
# ssh root@HSlave0 'chmod 600 /root/.ssh/authorized_keys && chmod 700 /root/.ssh'
[root@HMaster0]
# ssh root@HSlave1 'chmod 600 /root/.ssh/authorized_keys && chmod 700 /root/.ssh'
[root@HMaster0]
# ssh root@HSlave2 'chmod 600 /root/.ssh/authorized_keys && chmod 700 /root/.ssh'
|
1
2
3
4
|
# tar zxvf zookeeper-3.4.6.tar.gz
# mv zookeeper-3.4.6 /opt
# cd /opt/zookeeper-3.4.6/conf
# cp zoo_sample.cfg zoo.cfg
|
1
2
3
4
5
6
7
8
9
10
|
# vi zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=
/home/zookeeper/data
dataLogDir=
/home/zookeeper/logs
clientPort=2181
server.0=HSlave0:2888:3888
server.1=HSlave1:2888:3888
server.2=HSlave2:2888:3888
|
1
2
3
4
5
6
|
JMX enabled by default
Using config:
/opt/zookeeper-3
.4.6
/bin/
..
/conf/zoo
.cfg
Mode: follower
JMX enabled by default
Using config:
/opt/zookeeper-3
.4.6
/bin/
..
/conf/zoo
.cfg
Mode: leader
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
# vi core-site.xml
<configuration>
<!--HDFS路径逻辑名称-->
<property>
<name>fs.defaultFS<
/name
>
<value>hdfs:
//hcluster
<
/value
>
<
/property
>
<!--Hadoop存放临时文件位置-->
<property>
<name>hadoop.tmp.
dir
<
/name
>
<value>
/home/hadoop/tmp
<
/value
>
<
/property
>
<!--使用的zookeeper集群地址-->
<property>
<name>ha.zookeeper.quorum<
/name
>
<value>HSlave0:2181,HSlave1:2181,HSlave2:2181<
/value
>
<
/property
>
<
/configuration
>
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
|
# vi hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices<
/name
>
<value>hcluster<
/value
>
<
/property
>
<!--NameNode地址集群标识(hcluster),最多两个-->
<property>
<name>dfs.ha.namenodes.hcluster<
/name
>
<value>HMaster0,HMaster1<
/value
>
<
/property
>
<!--HDFS文件系统数据存储位置,可以分别保存到不同硬盘,突破单硬盘性能瓶颈,多个位置以逗号隔开-->
<property>
<name>dfs.data.
dir
<
/name
>
<value>
/home/hadoop/hdfs/data
<
/value
>
<
/property
>
<!--数据副本数量,根据HDFS台数设置,默认3份-->
<property>
<name>dfs.replication<
/name
>
<value>3<
/value
>
<
/property
>
<property>
<name>dfs.namenode.rpc-address.hcluster.HMaster0<
/name
>
<value>HMaster0:9000<
/value
>
<
/property
>
<!--RPC端口-->
<property>
<name>dfs.namenode.rpc-address.hcluster.HMaster1<
/name
>
<value>HMaster1:9000<
/value
>
<
/property
>
<!--NameNode HTTP访问地址-->
<property>
<name>dfs.namenode.http-address.hcluster.HMaster0<
/name
>
<value>HMaster0:50070<
/value
>
<
/property
>
<property>
<name>dfs.namenode.http-address.hcluster.HMaster1<
/name
>
<value>HMaster1:50070<
/value
>
<
/property
>
<!--NN存放元数据和日志位置-->
<property>
<name>dfs.namenode.name.
dir
<
/name
>
<value>
file
:
/home/hadoop/name
<
/value
>
<
/property
>
<!--同时把NameNode元数据和日志存放在JournalNode上(
/home/hadoop/journal/hcluster
)-->
<property>
<name>dfs.namenode.shared.edits.
dir
<
/name
>
<value>qjournal:
//HSlave0
:8485;HSlave1:8485;HSlave2:8485
/hcluster
<
/value
>
<
/property
>
<!--JournalNode上元数据和日志存放位置-->
<property>
<name>dfs.journalnode.edits.
dir
<
/name
>
<value>
/home/hadoop/journal
<
/value
>
<
/property
>
<!--开启NameNode失败自动切换-->
<property>
<name>dfs.ha.automatic-failover.enabled<
/name
>
<value>
true
<
/value
>
<
/property
>
<!--NameNode失败自动切换实现方式-->
<property>
<name>dfs.client.failover.proxy.provider.hcluster<
/name
>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider<
/value
>
<
/property
>
<!--隔离机制方法,确保任何时间只有一个NameNode处于活动状态-->
<property>
<name>dfs.ha.fencing.methods<
/name
>
<value>sshfence(hdfs)
shell(
/bin/true
)<
/value
>
<
/property
>
<!--使用sshfence隔离机制要SSH免密码认证-->
<property>
<name>dfs.ha.fencing.
ssh
.private-key-files<
/name
>
<value>
/root/
.
ssh
/id_rsa
<
/value
>
<
/property
>
<
/configuration
>
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
|
# vi yarn-site.xml
<configuration>
<!--启用RM高可用-->
<property>
<name>yarn.resourcemanager.ha.enabled<
/name
>
<value>
true
<
/value
>
<
/property
>
<!--RM集群标识符-->
<property>
<name>yarn.resourcemanager.cluster-
id
<
/name
>
<value>
rm
-cluster<
/value
>
<
/property
>
<property>
<!--指定两台RM主机名标识符-->
<name>yarn.resourcemanager.ha.
rm
-ids<
/name
>
<value>rm1,rm2<
/value
>
<
/property
>
<!--RM故障自动切换-->
<property>
<name>yarn.resourcemanager.ha.automatic-failover.recover.enabled<
/name
>
<value>
true
<
/value
>
<
/property
>
<!--RM故障自动恢复
<property>
<name>yarn.resourcemanager.recovery.enabled<
/name
>
<value>
true
<
/value
>
<
/property
> -->
<!--RM主机1-->
<property>
<name>yarn.resourcemanager.
hostname
.rm1<
/name
>
<value>HMaster0<
/value
>
<
/property
>
<!--RM主机2-->
<property>
<name>yarn.resourcemanager.
hostname
.rm2<
/name
>
<value>HMaster1<
/value
>
<
/property
>
<!--RM状态信息存储方式,一种基于内存(MemStore),另一种基于ZK(ZKStore)-->
<property>
<name>yarn.resourcemanager.store.class<
/name
>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore<
/value
>
<
/property
>
<!--使用ZK集群保存状态信息-->
<property>
<name>yarn.resourcemanager.zk-address<
/name
>
<value>HSlave0:2181,HSlave1:2181,HSlave2:2181<
/value
>
<
/property
>
<!--向RM调度资源地址-->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1<
/name
>
<value>HMaster0:8030<
/value
>
<
/property
>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2<
/name
>
<value>HMaster1:8030<
/value
>
<
/property
>
<!--NodeManager通过该地址交换信息-->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1<
/name
>
<value>HMaster0:8031<
/value
>
<
/property
>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2<
/name
>
<value>HMaster1:8031<
/value
>
<
/property
>
<!--客户端通过该地址向RM提交对应用程序操作-->
<property>
<name>yarn.resourcemanager.address.rm1<
/name
>
<value>HMaster0:8032<
/value
>
<
/property
>
<property>
<name>yarn.resourcemanager.address.rm2<
/name
>
<value>HMaster1:8032<
/value
>
<
/property
>
<!--管理员通过该地址向RM发送管理命令-->
<property>
<name>yarn.resourcemanager.admin.address.rm1<
/name
>
<value>HMaster0:8033<
/value
>
<
/property
>
<property>
<name>yarn.resourcemanager.admin.address.rm2<
/name
>
<value>HMaster1:8033<
/value
>
<
/property
>
<!--RM HTTP访问地址,查看集群信息-->
<property>
<name>yarn.resourcemanager.webapp.address.rm1<
/name
>
<value>HMaster0:8088<
/value
>
<
/property
>
<property>
<name>yarn.resourcemanager.webapp.address.rm2<
/name
>
<value>HMaster1:8088<
/value
>
<
/property
>
<
/configuration
>
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
# vi mapred-site.xml
<configuration>
<!--指定MR框架为YARN-->
<property>
<name>mapreduce.framework.name<
/name
>
<value>yarn<
/value
>
<
/property
>
<!-- 配置 MapReduce JobHistory Server地址 ,默认端口10020 -->
<property>
<name>mapreduce.jobhistory.address<
/name
>
<value>0.0.0.0:10020<
/value
>
<
/property
>
<!-- 配置 MapReduce JobHistory Server HTTP地址, 默认端口19888 -->
<property>
<name>mapreduce.jobhistory.webapp.address<
/name
>
<value>0.0.0.0:19888<
/value
>
<
/property
>
<
/configuration
>
|
1
2
|
# vi hadoop-env.sh
将
export
JAVA_HOME=${JAVA_HOME}修改为我们安装的JDK路径
export
JAVA_HOME=
/usr/local/jdk1
.7
|
1
2
3
4
|
# vi slaves
HSlave0
HSlave1
HSlave2
|
1
2
3
4
5
|
# vi /etc/profile
HADOOP_HOME=
/opt/hadoop-2
.6.0
PATH=$PATH:$HADOOP_HOME
/bin
:$HADOOP_HOME
/sbin
export
HADOOP_HOME PATH
# source /etc/profile
|