硬件环境:
hddcluster1 10.0.0.197 redhat7
hddcluster2 10.0.0.228 centos7 这台作为master
hddcluster3 10.0.0.202 redhat7
hddcluster4 10.0.0.181 centos7
软件环境:
关闭所有防火墙firewall
openssh-clients
openssh-server
java-1.8.0-openjdk
java-1.8.0-openjdk-devel
hadoop-2.7.3.tar.gz
流程:
-
选定一台机器作为 Master
-
在 Master 节点上配置 hadoop 用户、安装 SSH server、安装 Java 环境
-
在 Master 节点上安装 Hadoop,并完成配置
-
在其他 Slave 节点上配置 hadoop 用户、安装 SSH server、安装 Java 环境
-
将 Master 节点上的 /usr/local/hadoop 目录复制到其他 Slave 节点上
-
在 Master 节点上开启 Hadoop
1
2
3
4
5
6
7
8
|
#节点的名称与对应的 IP 关系
[hadoop@hddcluster2 ~]$
cat
/etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.228 hddcluster2
10.0.0.197 hddcluster1
10.0.0.202 hddcluster3
10.0.0.181 hddcluster4
|
1
2
3
4
5
|
创建hadoop用户
su
# 上述提到的以 root 用户登录
useradd
-m hadoop -s
/bin/bash
# 创建新用户hadoop
passwd
hadoop
#设置hadoop密码
visudo
#root ALL=(ALL) ALL 这行下面添加hadoop ALL=(ALL) ALL
|
1
2
3
4
5
6
7
8
9
10
11
|
#登录hadoop用户,安装SSH、配置SSH无密码登陆
[hadoop@hddcluster2 ~]$ rpm -qa |
grep
ssh
[hadoop@hddcluster2 ~]$
sudo
yum
install
openssh-clients
[hadoop@hddcluster2 ~]$
sudo
yum
install
openssh-server
[hadoop@hddcluster2 ~]$
cd
~/.
ssh
/
# 若没有该目录,请先执行一次ssh localhost
[hadoop@hddcluster2 ~]$
ssh
-keygen -t rsa
# 会有提示,都按回车就可以
[hadoop@hddcluster2 ~]$
ssh
-copy-
id
-i ~/.
ssh
/id_rsa
.pub localhost
# 加入授权
[hadoop@hddcluster2 ~]$
chmod
600 .
/authorized_keys
# 修改文件权限
[hadoop@hddcluster2 ~]$
ssh
-copy-
id
-i ~/.
ssh
/id_rsa
.pub hadoop@hddcluster1
[hadoop@hddcluster2 ~]$
ssh
-copy-
id
-i ~/.
ssh
/id_rsa
.pub hadoop@hddcluster3
[hadoop@hddcluster2 ~]$
ssh
-copy-
id
-i ~/.
ssh
/id_rsa
.pub hadoop@hddcluster4
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
#解压hadoop文件到/usr/local/hadoop
[hadoop@hddcluster2 ~]$
sudo
tar
-zxf hadoop-2.7.3.
tar
.gz -C
/usr/local/
[hadoop@hddcluster2 ~]$
sudo
mv
/usr/local/hadoop-2
.7.3
/usr/local/hadoop
[hadoop@hddcluster2 ~]$
sudo
chown
-R hadoop:hadoop
/usr/local/hadoop
cd
/usr/local/hadoop
.
/bin/hadoop
version
#安装java环境
[hadoop@hddcluster2 ~]$
sudo
yum
install
java-1.8.0-openjdk java-1.8.0-openjdk-devel
[hadoop@hddcluster2 ~]$ rpm -ql java-1.8.0-openjdk-devel |
grep
'/bin/javac'
/usr/lib/jvm/java-1
.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64
/bin/javac
[hadoop@hddcluster2 ~]$ vim ~/.bashrc
export
JAVA_HOME=
/usr/lib/jvm/java-1
.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64
export
HADOOP_HOME=
/usr/local/hadoop
export
HADOOP_INSTALL=$HADOOP_HOME
export
HADOOP_MAPRED_HOME=$HADOOP_HOME
export
HADOOP_COMMON_HOME=$HADOOP_HOME
export
HADOOP_HDFS_HOME=$HADOOP_HOME
export
YARN_HOME=$HADOOP_HOME
export
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME
/lib/native
export
PATH=$PATH:$HADOOP_HOME
/sbin
:$HADOOP_HOME
/bin
export
HADOOP_PREFIX=$HADOOP_HOME
export
HADOOP_OPTS=
"-Djava.library.path=$HADOOP_PREFIX/lib:$HADOOP_PREFIX/lib/native"
#测试java环境
source
~/.bashrc
java -version
$JAVA_HOME
/bin/java
-version
# 与直接执行 java -version 一样
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
|
#修改hadoop文件配置
[hadoop@hddcluster2 hadoop]$
pwd
/usr/local/hadoop/etc/hadoop
[hadoop@hddcluster2 hadoop]$
cat
core-site.xml
<?xml version=
"1.0"
encoding=
"UTF-8"
?>
<?xml-stylesheet
type
=
"text/xsl"
href=
"configuration.xsl"
?>
<!--
Licensed under the Apache License, Version 2.0 (the
"License"
);
you may not use this
file
except
in
compliance with the License.
You may obtain a copy of the License at
http:
//www
.apache.org
/licenses/LICENSE-2
.0
Unless required by applicable law or agreed to
in
writing, software
distributed under the License is distributed on an
"AS IS"
BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License
for
the specific language governing permissions and
limitations under the License. See accompanying LICENSE
file
.
-->
<!-- Put site-specific property overrides
in
this
file
. -->
<configuration>
<property>
<name>fs.defaultFS<
/name
>
<value>hdfs:
//hddcluster2
:9000<
/value
>
<
/property
>
<property>
<name>hadoop.tmp.
dir
<
/name
>
<value>
file
:
/usr/local/hadoop/tmp
<
/value
>
<description>Abase
for
other temporary directories.<
/description
>
<
/property
>
<
/configuration
>
[hadoop@hddcluster2 hadoop]$
cat
hdfs-site.xml
<?xml version=
"1.0"
encoding=
"UTF-8"
?>
<?xml-stylesheet
type
=
"text/xsl"
href=
"configuration.xsl"
?>
<!--
Licensed under the Apache License, Version 2.0 (the
"License"
);
you may not use this
file
except
in
compliance with the License.
You may obtain a copy of the License at
http:
//www
.apache.org
/licenses/LICENSE-2
.0
Unless required by applicable law or agreed to
in
writing, software
distributed under the License is distributed on an
"AS IS"
BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License
for
the specific language governing permissions and
limitations under the License. See accompanying LICENSE
file
.
-->
<!-- Put site-specific property overrides
in
this
file
. -->
<configuration>
<property>
<name>dfs.namenode.secondary.http-address<
/name
>
<value>hddcluster2:50090<
/value
>
<
/property
>
<property>
<name>dfs.replication<
/name
>
<value>3<
/value
>
<
/property
>
<property>
<name>dfs.namenode.name.
dir
<
/name
>
<value>
file
:
/usr/local/hadoop/tmp/dfs/name
<
/value
>
<
/property
>
<property>
<name>dfs.datanode.data.
dir
<
/name
>
<value>
file
:
/usr/local/hadoop/tmp/dfs/data
<
/value
>
<
/property
>
<
/configuration
>
[hadoop@hddcluster2 hadoop]$
[hadoop@hddcluster2 hadoop]$
cat
mapred-site.xml
<?xml version=
"1.0"
?>
<?xml-stylesheet
type
=
"text/xsl"
href=
"configuration.xsl"
?>
<!--
Licensed under the Apache License, Version 2.0 (the
"License"
);
you may not use this
file
except
in
compliance with the License.
You may obtain a copy of the License at
http:
//www
.apache.org
/licenses/LICENSE-2
.0
Unless required by applicable law or agreed to
in
writing, software
distributed under the License is distributed on an
"AS IS"
BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License
for
the specific language governing permissions and
limitations under the License. See accompanying LICENSE
file
.
-->
<!-- Put site-specific property overrides
in
this
file
. -->
<configuration>
<property>
<name>mapreduce.framework.name<
/name
>
<value>yarn<
/value
>
<
/property
>
<property>
<name>mapreduce.jobhistory.address<
/name
>
<value>hddcluster2:10020<
/value
>
<
/property
>
<property>
<name>mapreduce.jobhistory.webapp.address<
/name
>
<value>hddcluster2:19888<
/value
>
<
/property
>
<
/configuration
>
[hadoop@hddcluster2 hadoop]$
[hadoop@hddcluster2 hadoop]$
cat
yarn-site.xml
<?xml version=
"1.0"
?>
<!--
Licensed under the Apache License, Version 2.0 (the
"License"
);
you may not use this
file
except
in
compliance with the License.
You may obtain a copy of the License at
http:
//www
.apache.org
/licenses/LICENSE-2
.0
Unless required by applicable law or agreed to
in
writing, software
distributed under the License is distributed on an
"AS IS"
BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License
for
the specific language governing permissions and
limitations under the License. See accompanying LICENSE
file
.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.
hostname
<
/name
>
<value>hddcluster2<
/value
>
<
/property
>
<property>
<name>yarn.nodemanager.aux-services<
/name
>
<value>mapreduce_shuffle<
/value
>
<
/property
>
<
/configuration
>
[hadoop@hddcluster2 hadoop]$
[hadoop@hddcluster2 hadoop]$
cat
slaves
hddcluster1
hddcluster2
hddcluster3
hddcluster4
|
1
2
3
4
5
6
7
8
|
$
cd
/usr/local
$
sudo
rm
-r .
/hadoop/tmp
# 删除 Hadoop 临时文件
$
sudo
rm
-r .
/hadoop/logs/
*
# 删除日志文件
$
tar
-zcf ~
/hadoop
.master.
tar
.gz .
/hadoop
# 先压缩再复制
$
cd
~
$
scp
.
/hadoop
.master.
tar
.gz hddcluster1:
/home/hadoop
$
scp
.
/hadoop
.master.
tar
.gz hddcluster3:
/home/hadoop
$
scp
.
/hadoop
.master.
tar
.gz hddcluster4:
/home/hadoop
|
1
2
3
4
|
在salve节点上操作,安装软件环境并配置好.bashrc
sudo
tar
-zxf ~
/hadoop
.master.
tar
.gz -C
/usr/local
sudo
chown
-R hadoop
/usr/local/hadoop
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
[hadoop@hddcluster2 ~]$hdfs namenode -
format
# 首次运行需要执行初始化,之后不需要
接着可以启动 hadoop 了,启动需要在 Master 节点上进行启动命令:
$start-dfs.sh
$start-yarn.sh
$mr-jobhistory-daemon.sh start historyserver
通过命令 jps 可以查看各个节点所启动的进程。正确的话,
在 Master 节点上可以看到 NameNode、ResourceManager、SecondrryNameNode、JobHistoryServer 进程,
另外还需要在 Master 节点上通过命令 hdfs dfsadmin -report 查看 DataNode 是否正常启动,如果 Live datanodes 不为 0 ,则说明集群启动成功。
[hadoop@hddcluster2 ~]$ hdfs dfsadmin -report
Configured Capacity: 2125104381952 (1.93 TB)
Present Capacity: 1975826509824 (1.80 TB)
DFS Remaining: 1975824982016 (1.80 TB)
DFS Used: 1527808 (1.46 MB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (4):
也可以通过 Web 页面看到查看 DataNode 和 NameNode 的状态:http:
//hddcluster2
:50070/。如果不成功,可以通过启动日志排查原因。
|
1
|
在 Slave 节点操作可以看到 DataNode 和 NodeManager 进程
|
1
2
3
4
5
6
7
8
9
10
|
测试hadoop分布式实例
首先创建 HDFS 上的用户目录:
hdfs dfs -
mkdir
-p
/user/hadoop
将
/usr/local/hadoop/etc/hadoop
中的配置文件作为输入文件复制到分布式文件系统中:
hdfs dfs -
mkdir
input
hdfs dfs -put
/usr/local/hadoop/etc/hadoop/
*.xml input
通过查看 的DataNode 的状态(占用大小有改变),输入文件确实复制到了 DataNode 中。
接着就可以运行 MapReduce 作业了:
hadoop jar
/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-
*.jar
grep
input output
'dfs[a-z.]+'
等待执行完毕后的输出结果:
|
1
2
3
4
5
6
7
8
|
hadoop启动命令:
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
hadoop关闭命令:
stop-dfs.sh
stop-yarn.sh
mr-jobhistory-daemon.sh stop historyserver
|
PS:如果集群有一两台无法启动的话,先尝试一下删除hadoop临时文件
cd /usr/local
sudo rm -r ./hadoop/tmp
sudo rm -r ./hadoop/logs/*
然后执行
hdfs namenode -format
再启动
本文参考了一下网站并实验成功:
http://www.powerxing.com/install-hadoop-cluster/
本文转自 yanconggod 51CTO博客,原文链接:
http://blog.51cto.com/yanconggod/1884998