hadoop初探--性能分析1

简介:

    研究Hadoop也有2个星期了,由于之前有MooseFS的基础,所以学习Hadoop的HDFS部分还算比较顺利。我的环境是非常典型的Hadoop环境,一台NameNode+Hmaster,4台DataNode+RegionServer。如下图所示

    image

   说明: DataNode1和DataNode2是VM,配置4VCPU和4G内存,虚拟化平台是ESXI4.1。DataNode3和DataNode4是物理机,OS为Ubuntu11.04,2cpu和2G内存

 

   实验:利用hadoop自身提供的程序,利用MapReduce计算“TestTable”这个表的“行数” 
[root@master bin]# ./hadoop jar /root/hbase/hbase-0.90.4.jar rowcounter TestTable

 

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:host.name=master

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_26

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.

。。。。。。。。。。。。。。。。。。。。。。

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/root/hadoop/bin/../lib/native/Linux-i386-32

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.18-164.el5

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:user.name=root

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:user.home=/root

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root/hadoop-0.20.2/bin

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=datanode2:2181,datanode1:2181,master:2181 sessionTimeout=180000 watcher=hconnection

11/10/25 10:34:36 INFO zookeeper.ClientCnxn: Opening socket connection to server datanode2/172.20.10.193:2181

11/10/25 10:34:36 INFO zookeeper.ClientCnxn: Socket connection established to datanode2/172.20.10.193:2181, initiating session

11/10/25 10:34:36 INFO zookeeper.ClientCnxn: Session establishment complete on server datanode2/172.20.10.193:2181, sessionid = 0x23338df2f890004, negotiated timeout = 180000

11/10/25 10:34:37 INFO mapred.JobClient: Running job: job_201110241323_0001

11/10/25 10:34:38 INFO mapred.JobClient: map 0% reduce 0%

11/10/25 10:35:34 INFO mapred.JobClient: map 2% reduce 0%

11/10/25 10:39:30 INFO zookeeper.ClientCnxn: Socket connection established to datanode1/172.20.10.201:2181, initiating session

11/10/25 10:40:05 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x23338df2f890004, likely server has closed socket, closing socket connection and attempting reconnect

11/10/25 10:40:06 INFO zookeeper.ClientCnxn: Opening socket connection to server datanode2/172.20.10.193:2181

11/10/25 10:40:06 INFO zookeeper.ClientCnxn: Socket connection established to datanode2/172.20.10.193:2181, initiating session

11/10/25 10:40:06 INFO zookeeper.ClientCnxn: Session establishment complete on server datanode2/172.20.10.193:2181, sessionid = 0x23338df2f890004, negotiated timeout = 180000

11/10/25 10:40:13 INFO mapred.JobClient: map 7% reduce 0%

11/10/25 10:42:31 INFO mapred.JobClient: map 10% reduce 0%

11/10/25 10:43:04 INFO mapred.JobClient: map 13% reduce 0%

11/10/25 10:53:08 INFO mapred.JobClient: map 15% reduce 0%

。。。。。。。。。。。。。。。

11/10/25 11:11:28 INFO mapred.JobClient: map 36% reduce 0%

11/10/25 11:18:00 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000008_0, Status : FAILED

Task attempt_201110241323_0001_m_000008_0 failed to report status for 608 seconds. Killing!

Task attempt_201110241323_0001_m_000008_0 failed to report status for 611 seconds. Killing!

11/10/25 11:18:36 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000009_0, Status : FAILED

Task attempt_201110241323_0001_m_000009_0 failed to report status for 646 seconds. Killing!

Task attempt_201110241323_0001_m_000009_0 failed to report status for 617 seconds. Killing!

11/10/25 11:18:37 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000013_0, Status : FAILED

Task attempt_201110241323_0001_m_000013_0 failed to report status for 624 seconds. Killing!

Task attempt_201110241323_0001_m_000013_0 failed to report status for 604 seconds. Killing!

11/10/25 11:26:17 INFO mapred.JobClient: map 39% reduce 0%

11/10/25 11:26:48 INFO mapred.JobClient: map 42% reduce 0%

11/10/25 11:35:48 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000014_0, Status : FAILED

Task attempt_201110241323_0001_m_000014_0 failed to report status for 600 seconds. Killing!

11/10/25 11:42:07 INFO mapred.JobClient: map 44% reduce 0%

11/10/25 11:57:43 INFO mapred.JobClient: map 47% reduce 0%

11/10/25 12:12:16 INFO mapred.JobClient: map 49% reduce 0%

。。。。。。。。。。。。。。。。。。。。。

11/10/25 15:45:26 INFO mapred.JobClient: map 97% reduce 0%

11/10/25 16:16:57 INFO mapred.JobClient: map 100% reduce 0%

11/10/25 16:16:59 INFO mapred.JobClient: Job complete: job_201110241323_0001

11/10/25 16:16:59 INFO mapred.JobClient: Counters: 7

11/10/25 16:16:59 INFO mapred.JobClient: Job Counters

11/10/25 16:16:59 INFO mapred.JobClient: Rack-local map tasks=21

11/10/25 16:16:59 INFO mapred.JobClient: Launched map tasks=58

11/10/25 16:16:59 INFO mapred.JobClient: Data-local map tasks=37

11/10/25 16:16:59 INFO mapred.JobClient: org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters

 

11/10/25 16:16:59 INFO mapred.JobClient: ROWS=6003531

 

11/10/25 16:16:59 INFO mapred.JobClient: Map-Reduce Framework

11/10/25 16:16:59 INFO mapred.JobClient: Map input records=6003531

11/10/25 16:16:59 INFO mapred.JobClient: Spilled Records=0

11/10/25 16:16:59 INFO mapred.JobClient: Map output records=0

web监控如下显示

image

由于篇幅限制,日志输出中有大量略去,经过5h42min的运算,可以看到TestTable这个表总共有6003531行。日志中出现10个错误,基本就2类,

  第一类是“状态报告超时”

Task attempt_201110251718_0001_m_000014_0 failed to report status for 602 seconds. Killing!

Task attempt_201110251718_0001_m_000014_0 failed to report status for 600 seconds. Killing!

  第二类是“租约到期”

org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '7594235277433071278' does not exist at

org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1847) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at

 

由于篇幅限制,关于性能的分析见下一篇文章《hadoop初探--性能分析2





本文转自 taojin1240 51CTO博客,原文链接:http://blog.51cto.com/taotao1240/698496,如需转载请自行联系原作者

目录
相关文章
|
分布式计算 监控 Hadoop
|
10月前
|
存储 分布式计算 Hadoop
从“笨重大象”到“敏捷火花”:Hadoop与Spark的大数据技术进化之路
从“笨重大象”到“敏捷火花”:Hadoop与Spark的大数据技术进化之路
549 79
|
分布式计算 Kubernetes Hadoop
大数据-82 Spark 集群模式启动、集群架构、集群管理器 Spark的HelloWorld + Hadoop + HDFS
大数据-82 Spark 集群模式启动、集群架构、集群管理器 Spark的HelloWorld + Hadoop + HDFS
552 6
|
分布式计算 资源调度 Hadoop
大数据-80 Spark 简要概述 系统架构 部署模式 与Hadoop MapReduce对比
大数据-80 Spark 简要概述 系统架构 部署模式 与Hadoop MapReduce对比
325 2
|
存储 分布式计算 大数据
Flume+Hadoop:打造你的大数据处理流水线
本文介绍了如何使用Apache Flume采集日志数据并上传至Hadoop分布式文件系统(HDFS)。Flume是一个高可用、可靠的分布式系统,适用于大规模日志数据的采集和传输。文章详细描述了Flume的安装、配置及启动过程,并通过具体示例展示了如何将本地日志数据实时传输到HDFS中。同时,还提供了验证步骤,确保数据成功上传。最后,补充说明了使用文件模式作为channel以避免数据丢失的方法。
743 4
|
存储 分布式计算 Hadoop
数据湖技术:Hadoop与Spark在大数据处理中的协同作用
【10月更文挑战第27天】在大数据时代,数据湖技术凭借其灵活性和成本效益成为企业存储和分析大规模异构数据的首选。Hadoop和Spark作为数据湖技术的核心组件,通过HDFS存储数据和Spark进行高效计算,实现了数据处理的优化。本文探讨了Hadoop与Spark的最佳实践,包括数据存储、处理、安全和可视化等方面,展示了它们在实际应用中的协同效应。
604 2
|
存储 分布式计算 Hadoop
数据湖技术:Hadoop与Spark在大数据处理中的协同作用
【10月更文挑战第26天】本文详细探讨了Hadoop与Spark在大数据处理中的协同作用,通过具体案例展示了两者的最佳实践。Hadoop的HDFS和MapReduce负责数据存储和预处理,确保高可靠性和容错性;Spark则凭借其高性能和丰富的API,进行深度分析和机器学习,实现高效的批处理和实时处理。
511 1
|
存储 分布式计算 资源调度
两万字长文向你解密大数据组件 Hadoop
两万字长文向你解密大数据组件 Hadoop
651 11
|
存储 分布式计算 资源调度
大数据-04-Hadoop集群 集群群起 NameNode/DataNode启动 3台公网云 ResourceManager Yarn HDFS 集群启动 UI可视化查看 YarnUI(一)
大数据-04-Hadoop集群 集群群起 NameNode/DataNode启动 3台公网云 ResourceManager Yarn HDFS 集群启动 UI可视化查看 YarnUI(一)
297 5
|
资源调度 数据可视化 大数据
大数据-04-Hadoop集群 集群群起 NameNode/DataNode启动 3台公网云 ResourceManager Yarn HDFS 集群启动 UI可视化查看 YarnUI(二)
大数据-04-Hadoop集群 集群群起 NameNode/DataNode启动 3台公网云 ResourceManager Yarn HDFS 集群启动 UI可视化查看 YarnUI(二)
187 4

相关实验场景

更多