安装hadoop
准备机器:一台master,若干台slave,配置每台机器的/etc/hosts保证各台机器之间通过机器名可以互访,例如:
172.16.200.4 node1(master) 172.16.200.5 node2 (slave1) 172.16.200.6 node3 (slave2) 主机信息:
机器名 |
IP地址 |
作用 |
master |
172.16.200.4 |
NameNode、JobTracker |
Node1 |
172.16.200.4 |
NameNode、JobTracker |
Node2 |
172.16.200.5 |
DataNode、TaskTracker |
Node3 |
172.16.200.6 |
DataNode、TaskTracker |
一、修改主机名(三台都配置)
以node1为例,其它两台做同样配置。
vim /etc/hosts
172.16.200.4 node1
172.16.200.5 node2
172.16.200.6 node3
vim /etc/sysconfig/network
hostname= node1
重新登陆使之生效 hostname node1
ping 主机名 ---------验证
二、添加hadoop用户,并赋予root权限 (三台都配置)
useradd hadoop
passwd hadoop (密码和用户名一样)
修改 /etc/sudoers 文件,找到下面一行,在root下面添加一行,如下所示:
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
三、配置免密码登陆(三台都配置)
在Hadoop启动以后,Namenode是通过SSH(Secure Shell)来启动和停止各个datanode上的各种守护进程的,这就须要在节点之间执行指令的时候是不须要输入密码的形式,故我们须要配置SSH运用无密码公钥认证的形式。
以本文中的三台机器为例,现在node1是主节点,他须要连接node2和node3。须要确定每台机器上都安装了ssh,并且datanode机器上sshd服务已经启动。
( 说明:hadoop@hadoop~]$ssh-keygen -t rsa
这个命令将为hadoop上的用户hadoop生成其密钥对,询问其保存路径时直接回车采用默认路径,当提示要为生成的密钥输入passphrase的时候,直接回车,也就是将其设定为空密码。生成的密钥对id_rsa,id_rsa.pub,默认存储在/home/hadoop/.ssh目录下然后将id_rsa.pub的内容复制到每个机器(也包括本机)的/home/dbrg/.ssh/authorized_keys文件中,如果机器上已经有authorized_keys这个文件了,就在文件末尾加上id_rsa.pub中的内容,如果没有authorized_keys这个文件,直接复制过去就行.)
四、 安装jdk (三台都安装)
export JAVA_HOME=/usr/java/jdk1.7.0_67
exportPATH=$PATH:$HADOOP_HOME/bin
export JRE_HOME=/usr/java/jdk1.7.0_67/jre
source /etc/profile
五、安装Hadoop
这是下载后的hadoop-2.6.4.tar.gz压缩包,
1、解压 tar -xzvf hadoop-2.6.4.tar.gz
[hadoop@node1 hadoop-2.6.4]$ ls
bin data etc include lib libexec LICENSE.txt logs name NOTICE.txt README.txt sbin share var
[hadoop@node1 hadoop-2.6.4]$ pwd
/home/hadoop/hadoop-2.6.4
2、配置之前,先在本地文件系统创建以下文件夹:
/home/hadoop/hadoop-2.6.4/var
/home/hadoop/hadoop-2.6.4/data
/home/hadoop/hadoop-2.6.4/var/name
3、编辑配置文件主要涉及的配置文件有7个:都在/home/hadoop/hadoop-2.6.4/etc/hadoop文件夹下
- ~/hadoop/etc/hadoop/hadoop-env.sh
- ~/hadoop/etc/hadoop/yarn-env.sh
- ~/hadoop/etc/hadoop/slaves
- ~/hadoop/etc/hadoop/core-site.xml
- ~/hadoop/etc/hadoop/hdfs-site.xml
- ~/hadoop/etc/hadoop/mapred-site.xml
- ~/hadoop/etc/hadoop/yarn-site.xml
4、进去hadoop配置文件目录
a、配置 hadoop-env.sh文件-->修改JAVA_HOME
- # The java implementation to use.
- export JAVA_HOME=/usr/java/jdk1.7.0_67
4.2、配置 yarn-env.sh 文件-->>修改JAVA_HOME
- # some Java parameters
- export JAVA_HOME=/usr/java/jdk1.7.0_67
4.3、配置slaves文件-->>增加slave节点
01 node1
02 node2
03 node3
4.4、配置 core-site.xml文件-->>增加hadoop核心配置(hdfs文件端口是9000、file: /home/hadoop/hadoop-2.6.4/var )
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop-2.6.4/var</value>
<description>Abasefor other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.spark.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.spark.groups</name>
<value>*</value>
</property>
</configuration>
4.5、配置 hdfs-site.xml 文件-->>增加hdfs配置信息(namenode、datanode端口和目录位置)<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node1:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop-2.6.4/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop-2.6.4/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
4.6、配置 mapred-site.xml 文件-->>增加mapreduce配置(使用yarn框架、jobhistory使用地址以及web地址)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
</property>
</configuration>
4.7、配置 yarn-site.xml 文件-->>增加yarn功能
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>node1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>node1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>node1:8088</value>
</property>
</configuration>
5、将配置好的hadoop文件copy到另两台slave机器上
scp -r hadoop-2.6.4 hadoop@node2:/home/hadoop/
scp -r hadoop-2.6.4 hadoop@node3:/home/hadoop/
四、验证1、格式化namenode:
- [spark@S1PA11 opt]$ cd hadoop-2.6.0/
- [spark@S1PA11 hadoop-2.6.0]$ ls
- bin dfs etc include input lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share tmp
-
[spark@S1PA11 hadoop-2.6.0]$ ./bin/hdfs namenode -format
- [spark@S1PA222 .ssh]$ cd ~/opt/hadoop-2.6.0
-
[spark@S1PA222 hadoop-2.6.0]$ ./bin/hdfs namenode -format
复制代码
2、启动hdfs:
-
[spark@S1PA11 hadoop-2.6.0]$ ./sbin/start-dfs.sh
- 15/01/05 16:41:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- Starting namenodes on [S1PA11]
- S1PA11: starting namenode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-namenode-S1PA11.out
- S1PA222: starting datanode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-datanode-S1PA222.out
- Starting secondary namenodes [S1PA11]
- S1PA11: starting secondarynamenode, logging to /home/spark/opt/hadoop-2.6.0/logs/hadoop-spark-secondarynamenode-S1PA11.out
- 15/01/05 16:41:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
复制代码
- [spark@S1PA11 hadoop-2.6.0]$ jps
- 22230 Master
- 30889 Jps
- 22478 Worker
- 30498 NameNode
- 30733 SecondaryNameNode
- 19781 ResourceManager
复制代码
3、停止hdfs:
- [spark@S1PA11 hadoop-2.6.0]$./sbin/stop-dfs.sh
- 15/01/05 16:40:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
- Stopping namenodes on [S1PA11]
- S1PA11: stopping namenode
- S1PA222: stopping datanode
- Stopping secondary namenodes [S1PA11]
- S1PA11: stopping secondarynamenode
- 15/01/05 16:40:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
复制代码
- [spark@S1PA11 hadoop-2.6.0]$ jps
- 30336 Jps
- 22230 Master
- 22478 Worker
- 19781 ResourceManager
复制代码
4、启动yarn:
- [spark@S1PA11 hadoop-2.6.0]$./sbin/start-yarn.sh
- starting yarn daemons
- starting resourcemanager, logging to /home/spark/opt/hadoop-2.6.0/logs/yarn-spark-resourcemanager-S1PA11.out
- S1PA222: starting nodemanager, logging to /home/spark/opt/hadoop-2.6.0/logs/yarn-spark-nodemanager-S1PA222.out
复制代码
- [spark@S1PA11 hadoop-2.6.0]$ jps
- 31233 ResourceManager
- 22230 Master
- 22478 Worker
- 30498 NameNode
- 30733 SecondaryNameNode
- 31503 Jps
复制代码
5、停止yarn:
- [spark@S1PA11 hadoop-2.6.0]$ ./sbin/stop-yarn.sh
- stopping yarn daemons
- stopping resourcemanager
- S1PA222: stopping nodemanager
- no proxyserver to stop
复制代码
- [spark@S1PA11 hadoop-2.6.0]$ jps
- 31167 Jps
- 22230 Master
- 22478 Worker
- 30498 NameNode
- 30733 SecondaryNameNode
复制代码
6、查看集群状态:
[hadoop@node1 hadoop-2.6.4]$ ./bin/hdfs dfsadmin -report
16/05/26 10:51:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 56338194432 (52.47 GB)
Present Capacity: 42922237952 (39.97 GB)
DFS Remaining: 42922164224 (39.97 GB)
DFS Used: 73728 (72 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 172.16.200.4:50010 (node1)
Hostname: node1
Decommission Status : Normal
Configured Capacity: 18779398144 (17.49 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 4559396864 (4.25 GB)
DFS Remaining: 14219976704 (13.24 GB)
DFS Used%: 0.00%
DFS Remaining%: 75.72%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu May 26 10:51:35 CST 2016
Name: 172.16.200.5:50010 (node2)
Hostname: node2
Decommission Status : Normal
Configured Capacity: 18779398144 (17.49 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 4369121280 (4.07 GB)
DFS Remaining: 14410252288 (13.42 GB)
DFS Used%: 0.00%
DFS Remaining%: 76.73%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu May 26 10:51:35 CST 2016
Name: 172.16.200.6:50010 (node3)
Hostname: node3
Decommission Status : Normal
Configured Capacity: 18779398144 (17.49 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 4487438336 (4.18 GB)
DFS Remaining: 14291935232 (13.31 GB)
DFS Used%: 0.00%
DFS Remaining%: 76.10%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu May 26 10:51:35 CST 2016
7、查看hdfs:http://172.16.200.4:50070/8、查看RM:http://172.16.200.4:8088/