Hadoop 2.x和1.x已经大不相同了,应该说对于存储计算都更加通用了。Hadoop 2.x实现了用来管理集群资源的YARN框架,可以面向任何需要使用基于HDFS存储来计算的需要,当然MapReduce现在已经作为外围的插件式的计算框架,你可以根据需要开发或者选择合适的计算框架。目前,貌似对MapReduce支持还是比较好的,毕竟MapReduce框架已经还算成熟。其他一些基于YARN框架的标准也在开发中。
YARN框架的核心是资源的管理和分配调度,它比Hadoop 1.x中的资源分配的粒度更细了,也更加灵活了,它的前景应该不错。由于极大地灵活性,所以在使用过程中由于这些配置的灵活性,可能使用的难度也加大了一些。另外,我个人觉得,YARN毕竟还在发展之中,也有很多不成熟的地方,各种问题频频出现,资料也相对较少,官方文档有时更新也不是很及时,如果我选择做海量数据处理,可能YARN还不能满足生产环境的需要。如果完全使用MapReduce来做计算,还是选择相对更加成熟的Hadoop 1.x版本用于生产环境。
下面使用4台机器,操作系统为CentOS 6.4 64位,一台做主节点,另外三台做从节点,实践集群的安装配置。
主机配置规划
修改/etc/hosts文件,增加如下地址映射:
每台机器配置对应的hostname,修改/etc/sysconfig/network文件,例如s1节点内容配置为:
m1为集群主节点,s1、s2、s3为集群从节点。
关于主机资源的配置,我们这里面使用VMWare工具,创建了4个虚拟机,具体置情况如下所示:
- 一个主节点有1个核(core)
- 一个主节点内存1G
- 每个从节点有1个核(core)
- 每个从节点内存2G
目录规划
Hadoop程序存放目录为/home/shirdrn/cloud/programs/hadoop-2.2.0,相关的数据目录,包括日志、存储等指定为/home/shirdrn/cloud/storage/hadoop-2.2.0。将程序和数据目录分开,可以更加方便的进行配置的同步。
具体目录的准备与配置如下所示:
- 在每个节点上创建程序存储目录/home/shirdrn/cloud/programs/hadoop-2.2.0,用来存放Hadoop程序文件
- 在每个节点上创建数据存储目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs,用来存放集群数据
- 在主节点m1上创建目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/name,用来存放文件系统元数据
- 在每个从节点上创建目录/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/data,用来存放真正的数据
- 所有节点上的日志目录为/home/shirdrn/cloud/storage/hadoop-2.2.0/logs
- 所有节点上的临时目录为/home/shirdrn/cloud/storage/hadoop-2.2.0/tmp
下面配置涉及到的目录,都参照这里的目录规划。
环境变量配置
首先,使用Sun的JDK,修改~/.bashrc文件,配置如下:
1 |
export JAVA_HOME=/usr/java/jdk1.6.0_45/ |
2 |
export PATH=$PATH:$JAVA_HOME/bin |
3 |
export CLASSPATH=$JAVA_HOME/lib/*.jar:$JAVA_HOME/jre/lib/*.jar |
然后配置Hadoop安装目录,相关环境变量:
1 |
export HADOOP_HOME=/home/shirdrn/cloud/programs/hadoop-2.2.0 |
2 |
export PATH=$PATH:$HADOOP_HOME/bin |
3 |
export PATH=$PATH:$HADOOP_HOME/sbin |
4 |
export HADOOP_LOG_DIR=/home/shirdrn/cloud/storage/hadoop-2.2.0/logs |
5 |
export YARN_LOG_DIR=$HADOOP_LOG_DIR |
免密码登录配置
在每各节点上,执行如下命令:
然后点击回车一直下去即可。
在主节点m1上,执行命令:
保证不需要密码即可登录本机m1节点。
将m1的公钥,添加到s1、s2、s3的~/.ssh/authorized_keys文件中,并且需要查看~/.ssh/authorized_keys的权限,不能对同组用户具有写权限,如果有,则执行下面命令:
1 |
chmod g-w ~/. ssh /authorized_keys |
这时,在m1节点上,应该保证执行如下命令不需要输入密码:
Hadoop配置文件
配置文件所在目录为/home/shirdrn/programs/hadoop-2.2.0/etc/hadoop,可以修改对应的配置文件。
01 |
<? xml version = "1.0" encoding = "UTF-8" ?> |
02 |
<? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
06 |
< name >fs.defaultFS</ name > |
08 |
< description >The name of the default file system. A URI whose scheme |
09 |
and authority determine the FileSystem implementation. The uri's |
10 |
scheme determines the config property (fs.SCHEME.impl) naming the |
11 |
FileSystem implementation class. The uri's authority is used to |
12 |
determine the host, port, etc. for a filesystem.</ description > |
15 |
< name >dfs.replication</ name > |
19 |
< name >hadoop.tmp.dir</ name > |
20 |
< value >/home/shirdrn/cloud/storage/hadoop-2.2.0/tmp/hadoop-${user.name}</ value > |
21 |
< description >A base for other temporary directories.</ description > |
01 |
<? xml version = "1.0" encoding = "UTF-8" ?> |
02 |
<? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
06 |
< name >dfs.namenode.name.dir</ name > |
07 |
< value >/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/name</ value > |
08 |
< description >Path on the local filesystem where the NameNode stores |
09 |
the namespace and transactions logs persistently.</ description > |
12 |
< name >dfs.datanode.data.dir</ name > |
13 |
< value >/home/shirdrn/cloud/storage/hadoop-2.2.0/hdfs/data</ value > |
14 |
< description >Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</ description > |
17 |
< name >dfs.permissions</ name > |
05 |
< name >yarn.resourcemanager.resource-tracker.address</ name > |
06 |
< value >m1:8031</ value > |
07 |
< description >host is the hostname of the resource manager and |
08 |
port is the port on which the NodeManagers contact the Resource Manager. |
12 |
< name >yarn.resourcemanager.scheduler.address</ name > |
13 |
< value >m1:8030</ value > |
14 |
< description >host is the hostname of the resourcemanager and port is |
16 |
on which the Applications in the cluster talk to the Resource Manager. |
20 |
< name >yarn.resourcemanager.scheduler.class</ name > |
21 |
< value >org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</ value > |
22 |
< description >In case you do not want to use the default scheduler</ description > |
25 |
< name >yarn.resourcemanager.address</ name > |
26 |
< value >m1:8032</ value > |
27 |
< description >the host is the hostname of the ResourceManager and the |
29 |
which the clients can talk to the Resource Manager. |
33 |
< name >yarn.nodemanager.local-dirs</ name > |
34 |
< value >${hadoop.tmp.dir}/nodemanager/local</ value > |
35 |
< description >the local directories used by the nodemanager</ description > |
38 |
< name >yarn.nodemanager.address</ name > |
39 |
< value >0.0.0.0:8034</ value > |
40 |
< description >the nodemanagers bind to this port</ description > |
43 |
< name >yarn.nodemanager.resource.cpu-vcores</ name > |
45 |
< description ></ description > |
48 |
< name >yarn.nodemanager.resource.memory-mb</ name > |
50 |
< description >Defines total available resources on the NodeManager to be made available to running containers</ description > |
53 |
< name >yarn.nodemanager.remote-app-log-dir</ name > |
54 |
< value >${hadoop.tmp.dir}/nodemanager/remote</ value > |
55 |
< description >directory on hdfs where the application logs are moved to </ description > |
58 |
< name >yarn.nodemanager.log-dirs</ name > |
59 |
< value >${hadoop.tmp.dir}/nodemanager/logs</ value > |
60 |
< description >the directories used by Nodemanagers as log directories</ description > |
63 |
< name >yarn.application.classpath</ name > |
64 |
< value >$HADOOP_HOME,$HADOOP_HOME/share/hadoop/common/*, |
65 |
$HADOOP_HOME/share/hadoop/common/lib/*, |
66 |
$HADOOP_HOME/share/hadoop/hdfs/*,$HADOOP_HOME/share/hadoop/hdfs/lib/*, |
67 |
$HADOOP_HOME/share/hadoop/yarn/*,$HADOOP_HOME/share/hadoop/yarn/lib/*, |
68 |
$HADOOP_HOME/share/hadoop/mapreduce/*,$HADOOP_HOME/share/hadoop/mapreduce/lib/*</ value > |
69 |
< description >Classpath for typical applications.</ description > |
73 |
< name >yarn.nodemanager.aux-services</ name > |
74 |
< value >mapreduce_shuffle</ value > |
75 |
< description >shuffle service that needs to be set for Map Reduce to run </ description > |
78 |
< name >yarn.nodemanager.aux-services.mapreduce.shuffle.class</ name > |
79 |
< value >org.apache.hadoop.mapred.ShuffleHandler</ value > |
82 |
< name >yarn.scheduler.minimum-allocation-mb</ name > |
86 |
< name >yarn.scheduler.maximum-allocation-mb</ name > |
90 |
< name >yarn.scheduler.minimum-allocation-vcores</ name > |
94 |
< name >yarn.scheduler.maximum-allocation-vcores</ name > |
02 |
<? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
06 |
< name >mapreduce.framework.name</ name > |
08 |
< description >Execution framework set to Hadoop YARN.</ description > |
11 |
< name >mapreduce.map.memory.mb</ name > |
13 |
< description >Larger resource limit for maps. default 1024M</ description > |
16 |
< name >mapreduce.map.cpu.vcores</ name > |
18 |
< description ></ description > |
21 |
< name >mapreduce.reduce.memory.mb</ name > |
23 |
< description >Larger resource limit for reduces.</ description > |
26 |
< name >mapreduce.reduce.shuffle.parallelcopies</ name > |
28 |
< description >Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.</ description > |
31 |
< name >mapreduce.jobhistory.address</ name > |
32 |
< value >m1:10020</ value > |
33 |
< description >MapReduce JobHistory Server host:port, default port is 10020.</ description > |
36 |
< name >mapreduce.jobhistory.webapp.address</ name > |
37 |
< value >m1:19888</ value > |
38 |
< description >MapReduce JobHistory Server Web UI host:port, default port is 19888.</ description > |
- 配置hadoop-env.sh、yarn-env.sh、mapred-env.sh脚本文件
修改每个脚本文件的JAVA_HOME变量即可,如下所示:
1 |
export JAVA_HOME=/usr/java/jdk1.6.0_45/ |
同步分发程序文件
在主节点m1上将上面配置好的程序文件,复制分发到各个从节点上:
1 |
scp -r /home/shirdrn/cloud/programs/hadoop-2.2.0 shirdrn@s1:/home/shirdrn/cloud/programs/ |
2 |
scp -r /home/shirdrn/cloud/programs/hadoop-2.2.0 shirdrn@s2:/home/shirdrn/cloud/programs/ |
3 |
scp -r /home/shirdrn/cloud/programs/hadoop-2.2.0 shirdrn@s3:/home/shirdrn/cloud/programs/ |
启动HDFS集群
经过上面配置以后,可以启动HDFS集群。
为了保证集群启动过程中不会出现问题,需要手动关闭每个节点上的防火墙,执行如下命令:
1 |
sudo service iptables stop |
或者永久关闭防火墙:
1 |
sudo chkconfig iptables off |
2 |
sudo chkconfig ip6tables off |
在主节点m1上,首先进行文件系统格式化操作,执行如下命令:
1 |
hadoop namenode - format |
然后,可以启动HDFS集群,执行如下命令:
可以查看启动日志,确认HDFS集群启动是否成功:
1 |
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-namenode-m1.log |
2 |
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-secondarynamenode-m1.log |
3 |
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-datanode-s1.log |
4 |
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-datanode-s2.log |
5 |
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/hadoop-shirdrn-datanode-s3.log |
或者,查看对应的进程情况:
可以通过登录Web控制台,查看HDFS集群状态,访问如下地址:
启动YARN集群
在主节点m1上,执行如下命令:
可以查看启动日志,确认YARN集群启动是否成功:
1 |
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-resourcemanager-m1.log |
2 |
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-nodemanager-s1.log |
3 |
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-nodemanager-s2.log |
4 |
tail -100f /home/shirdrn/cloud/storage/hadoop-2.2.0/logs/yarn-shirdrn-nodemanager-s3.log |
或者,查看对应的进程情况:
另外,ResourceManager运行在主节点m1上,可以Web控制台查看状态:
NodeManager运行在从节点上,可以通过Web控制台查看对应节点的资源状态,例如节点s1:
管理JobHistory Server
启动可以JobHistory Server,能够通过Web控制台查看集群计算的任务的信息,执行如下命令:
1 |
mr-jobhistory-daemon.sh start historyserver |
默认使用19888端口。
通过访问http://m1:19888/查看任务执行历史信息。
终止JobHistory Server,执行如下命令:
1 |
mr-jobhistory-daemon.sh stop historyserver |
集群验证
我们使用Hadoop自带的WordCount例子进行验证。
先在HDFS创建几个数据目录:
1 |
hadoop fs - mkdir -p /data/wordcount |
2 |
hadoop fs - mkdir -p /output/ |
目录/data/wordcount用来存放Hadoop自带的WordCount例子的数据文件,运行这个MapReduce任务的结果输出到/output/wordcount目录中。
将本地文件上传到HDFS中:
1 |
hadoop fs -put /home/shirdrn/cloud/programs/hadoop-2.2.0/etc/hadoop/*.xml /data/wordcount/ |
可以查看上传后的文件情况,执行如下命令:
1 |
hadoop fs - ls /data/wordcount |
可以看到上传到HDFS中的文件。
下面,运行WordCount例子,执行如下命令:
1 |
hadoop jar /home/shirdrn/cloud/programs/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount |
可以看到控制台输出程序运行的信息:
01 |
[shirdrn@m1 hadoop-2.2.0]$ hadoop jar /home/shirdrn/cloud/programs/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount |
02 |
13/12/25 22:38:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
03 |
13/12/25 22:38:03 INFO client.RMProxy: Connecting to ResourceManager at m1/10.95.3.48:8032 |
04 |
13/12/25 22:38:04 INFO input.FileInputFormat: Total input paths to process : 7 |
05 |
13/12/25 22:38:04 INFO mapreduce.JobSubmitter: number of splits:7 |
06 |
13/12/25 22:38:04 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name |
07 |
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar |
08 |
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class |
09 |
13/12/25 22:38:04 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class |
10 |
13/12/25 22:38:04 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class |
11 |
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name |
12 |
13/12/25 22:38:04 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class |
13 |
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir |
14 |
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir |
15 |
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps |
16 |
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class |
17 |
13/12/25 22:38:04 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir |
18 |
13/12/25 22:38:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1388039619930_0002 |
19 |
13/12/25 22:38:05 INFO impl.YarnClientImpl: Submitted application application_1388039619930_0002 to ResourceManager at m1/10.95.3.48:8032 |
21 |
13/12/25 22:38:05 INFO mapreduce.Job: Running job: job_1388039619930_0002 |
22 |
13/12/25 22:38:14 INFO mapreduce.Job: Job job_1388039619930_0002 running in uber mode : false |
23 |
13/12/25 22:38:14 INFO mapreduce.Job: map 0% reduce 0% |
24 |
13/12/25 22:38:22 INFO mapreduce.Job: map 14% reduce 0% |
25 |
13/12/25 22:38:42 INFO mapreduce.Job: map 29% reduce 5% |
26 |
13/12/25 22:38:43 INFO mapreduce.Job: map 43% reduce 5% |
27 |
13/12/25 22:38:45 INFO mapreduce.Job: map 43% reduce 14% |
28 |
13/12/25 22:38:54 INFO mapreduce.Job: map 57% reduce 14% |
29 |
13/12/25 22:38:55 INFO mapreduce.Job: map 71% reduce 19% |
30 |
13/12/25 22:38:56 INFO mapreduce.Job: map 100% reduce 19% |
31 |
13/12/25 22:38:57 INFO mapreduce.Job: map 100% reduce 100% |
32 |
13/12/25 22:38:58 INFO mapreduce.Job: Job job_1388039619930_0002 completed successfully |
33 |
13/12/25 22:38:58 INFO mapreduce.Job: Counters: 44 |
35 |
FILE: Number of bytes read=15339 |
36 |
FILE: Number of bytes written=667303 |
37 |
FILE: Number of read operations=0 |
38 |
FILE: Number of large read operations=0 |
39 |
FILE: Number of write operations=0 |
40 |
HDFS: Number of bytes read=21904 |
41 |
HDFS: Number of bytes written=9717 |
42 |
HDFS: Number of read operations=24 |
43 |
HDFS: Number of large read operations=0 |
44 |
HDFS: Number of write operations=2 |
48 |
Launched reduce tasks=1 |
49 |
Data-local map tasks=9 |
50 |
Total time spent by all maps in occupied slots (ms)=457338 |
51 |
Total time spent by all reduces in occupied slots (ms)=65832 |
54 |
Map output records=1923 |
55 |
Map output bytes=26222 |
56 |
Map output materialized bytes=15375 |
58 |
Combine input records=1923 |
59 |
Combine output records=770 |
60 |
Reduce input groups=511 |
61 |
Reduce shuffle bytes=15375 |
62 |
Reduce input records=770 |
63 |
Reduce output records=511 |
68 |
GC time elapsed (ms)=3951 |
69 |
CPU time spent (ms)=22610 |
70 |
Physical memory (bytes) snapshot=1598832640 |
71 |
Virtual memory (bytes) snapshot=6564274176 |
72 |
Total committed heap usage (bytes)=971993088 |
80 |
File Input Format Counters |
82 |
File Output Format Counters |
查看结果,执行如下命令:
1 |
hadoop fs - cat /output/wordcount/part-r-00000 | head |
结果数据示例如下:
01 |
[shirdrn@m1 hadoop-2.2.0]$ hadoop fs - cat /output/wordcount/part-r-00000 | head |
02 |
13/12/25 22:58:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin -java classes where applicable |
07 |
$HADOOP_HOME/share/hadoop/common/lib/*, 1 |
08 |
$HADOOP_HOME/share/hadoop/hdfs/*,$HADOOP_HOME/share/hadoop/hdfs/lib/*, 1 |
09 |
$HADOOP_HOME/share/hadoop/mapreduce/*,$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value> 1 |
10 |
$HADOOP_HOME/share/hadoop/yarn/*,$HADOOP_HOME/share/hadoop/yarn/lib/*, 1 |
13 |
cat : Unable to write to output stream. |
登录到Web控制台,访问链接http://m1:8088/可以看到任务记录情况。
可见,我们的HDFS能够存储数据,而YARN集群也能够运行MapReduce任务。
问题及总结
在Hadoop 2.2.0中,YARN框架有很多默认的参数值,如果你是在机器资源比较不足的情况下,需要修改这些默认值,来满足一些任务需要。
NodeManager和ResourceManager都是在yarn-site.xml文件中配置的,而运行MapReduce任务时,是在mapred-site.xml中进行配置的。
下面看一下相关的参数及其默认值情况:
参数名称 |
默认值 |
进程名称 |
配置文件 |
含义说明 |
yarn.nodemanager.resource.memory-mb |
8192 |
NodeManager |
yarn-site.xml |
从节点所在物理主机的可用物理内存总量 |
yarn.nodemanager.resource.cpu-vcores |
8 |
NodeManager |
yarn-site.xml |
节点所在物理主机的可用虚拟CPU资源总数(core) |
yarn.nodemanager.vmem-pmem-ratio |
2.1 |
NodeManager |
yarn-site.xml |
使用1M物理内存,最多可以使用的虚拟内存数量 |
yarn.scheduler.minimum-allocation-mb |
1024 |
ResourceManager |
yarn-site.xml |
一次申请分配内存资源的最小数量 |
yarn.scheduler.maximum-allocation-mb |
8192 |
ResourceManager |
yarn-site.xml |
一次申请分配内存资源的最大数量 |
yarn.scheduler.minimum-allocation-vcores |
1 |
ResourceManager |
yarn-site.xml |
一次申请分配虚拟CPU资源最小数量 |
yarn.scheduler.maximum-allocation-vcores |
8 |
ResourceManager |
yarn-site.xml |
一次申请分配虚拟CPU资源最大数量 |
mapreduce.framework.name |
local |
MapReduce |
mapred-site.xml |
取值local、classic或yarn其中之一,如果不是yarn,则不会使用YARN集群来实现资源的分配 |
mapreduce.map.memory.mb |
1024 |
MapReduce |
mapred-site.xml |
每个MapReduce作业的map任务可以申请的内存资源数量 |
mapreduce.map.cpu.vcores |
1 |
MapReduce |
mapred-site.xml |
每个MapReduce作业的map任务可以申请的虚拟CPU资源的数量 |
mapreduce.reduce.memory.mb |
1024 |
MapReduce |
mapred-site.xml |
每个MapReduce作业的reduce任务可以申请的内存资源数量 |
yarn.nodemanager.resource.cpu-vcores |
8 |
MapReduce |
mapred-site.xml |
每个MapReduce作业的reduce任务可以申请的虚拟CPU资源的数量 |
- 异常java.io.IOException: Bad connect ack with firstBadLink as 10.95.3.66:50010
详细异常信息,如下所示:
01 |
[shirdrn@m1 hadoop-2.2.0]$ hadoop fs -put /home/shirdrn/cloud/programs/hadoop-2.2.0/etc/hadoop/*.xml /data/wordcount/ |
02 |
13/12/25 21:29:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
03 |
13/12/25 21:29:46 INFO hdfs.DFSClient: Exception in createBlockOutputStream |
04 |
java.io.IOException: Bad connect ack with firstBadLink as 10.95.3.66:50010 |
05 |
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1166) |
06 |
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) |
07 |
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) |
08 |
13/12/25 21:29:46 INFO hdfs.DFSClient: Abandoning BP-1906424073-10.95.3.48-1388035628061:blk_1073741825_1001 |
09 |
13/12/25 21:29:46 INFO hdfs.DFSClient: Excluding datanode 10.95.3.66:50010 |
10 |
13/12/25 21:29:46 INFO hdfs.DFSClient: Exception in createBlockOutputStream |
11 |
java.io.IOException: Bad connect ack with firstBadLink as 10.95.3.59:50010 |
12 |
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1166) |
13 |
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) |
14 |
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) |
15 |
13/12/25 21:29:46 INFO hdfs.DFSClient: Abandoning BP-1906424073-10.95.3.48-1388035628061:blk_1073741826_1002 |
16 |
13/12/25 21:29:46 INFO hdfs.DFSClient: Excluding datanode 10.95.3.59:50010 |
17 |
13/12/25 21:29:46 INFO hdfs.DFSClient: Exception in createBlockOutputStream |
18 |
java.net.NoRouteToHostException: No route to host |
19 |
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) |
20 |
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599) |
21 |
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) |
22 |
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) |
23 |
at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305) |
24 |
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128) |
25 |
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) |
26 |
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) |
27 |
13/12/25 21:29:46 INFO hdfs.DFSClient: Abandoning BP-1906424073-10.95.3.48-1388035628061:blk_1073741828_1004 |
28 |
13/12/25 21:29:46 INFO hdfs.DFSClient: Excluding datanode 10.95.3.59:50010 |
29 |
13/12/25 21:29:46 INFO hdfs.DFSClient: Exception in createBlockOutputStream |
主要是由于Hadoop集群内某些节点的防火墙没有关闭,导致无法访问集群内节点。