Hadoop history

简介: *The genesis of Hadoop came from the Google File System paper[11] that was published in October 2003. This paper spawned another research paper from Google – MapReduce: Simplified Data Processin


*The genesis of Hadoop came from the Google File System paper[11] that was published in October 2003. This paper spawned another research paper from Google – MapReduce: Simplified Data Processing on Large Clusters.[12] Development started on the Apache Nutch project, but was moved to the new Hadoop subproject in January 2006.[13] Doug Cutting, who was working at Yahoo! at the time,[14] named it after his son's toy elephant.[15] The initial code that was factored out of Nutch consisted of 5k lines of code for HDFS and 6k lines of code for MapReduce.


The first committer added to the Hadoop project was Owen O’Malley in March 2006.[16] Hadoop 0.1.0 was released in April 2006[17] and continues to evolve by the many contributors[18] to the Apache Hadoop project.


Timeline[edit]
Year Month Event Ref.
2003 October Google File System paper released [19]
2004 December MapReduce: Simplified Data Processing on Large Clusters [20]
2006 January Hadoop subproject created with mailing lists, jira, and wiki [21]
2006 January Hadoop is born from Nutch 197 [22]
2006 February NDFS+ MapReduce moved out of Apache Nutch to create Hadoop [23]
2006 February Owen O'Malley's first patch goes into Hadoop [24]
2006 February Hadoop is named after Cutting's son's yellow plush toy [25]
2006 April Hadoop 0.1.0 released [26]
2006 April Hadoop sorts 1.8 TB on 188 nodes in 47.9 hours [23]
2006 May Yahoo deploys 300 machine Hadoop cluster [23]
2006 October Yahoo Hadoop cluster reaches 600 machines [23]
2007 April Yahoo runs two clusters of 1,000 machines [23]
2007 June Only three companies on "Powered by Hadoop Page" [27]
2007 October First release of Hadoop that includes HBase [28]
2007 October Yahoo Labs creates Pig, and donates it to the ASF [29]
2008 January YARN JIRA opened Yarn Jira (Mapreduce 279)
2008 January 20 companies on "Powered by Hadoop Page" [27]
2008 February Yahoo moves its web index onto Hadoop [30]
2008 February Yahoo! production search index generated by a 10,000-core Hadoop cluster [23]
2008 March First Hadoop Summit [31]
2008 April Hadoop world record fastest system to sort a terabyte of data. Running on a 910-node cluster, Hadoop sorted one terabyte in 209 seconds [23]
2008 May Hadoop wins TeraByte Sort (World Record sortbenchmark.org) [32]
2008 July Hadoop wins Terabyte Sort Benchmark [33]
2008 October Loading 10 TB/day in Yahoo clusters [23]
2008 October Cloudera, Hadoop distributor is founded [34]
2008 November Google MapReduce implementation sorted one terabyte in 68 seconds [23]
2009 March Yahoo runs 17 clusters with 24,000 machines [23]
2009 April Hadoop sorts a petabyte [35]
2009 May Yahoo! used Hadoop to sort one terabyte in 62 seconds [23]
2009 June Second Hadoop Summit [36]
2009 July Hadoop Core is renamed Hadoop Common [37]
2009 July MapR, Hadoop distributor founded [38]
2009 July HDFS now a separate subproject [37]
2009 July MapReduce now a separate subproject [37]
2010 January Kerberos support added to Hadoop [39]
2010 May Apache HBase Graduates [40]
2010 June Third Hadoop Summit [41]
2010 June Yahoo 4,000 nodes/70 petabytes [42]
2010 June Facebook 2,300 clusters/40 petabytes [42]
2010 September Apache Hive Graduates [43]
2010 September Apache Pig Graduates [44]
2011 January Apache Zookeeper Graduates [45]
2011 January Facebook, LinkedIn, eBay and IBM collectively contribute 200,000 lines of code [46]
2011 March Apache Hadoop takes top prize at Media Guardian Innovation Awards [47]
2011 June Rob Beardon and Eric Badleschieler spin out Hortonworks out of Yahoo. [48]
2011 June Yahoo has 42K Hadoop nodes and hundreds of petabytes of storage [48]
2011 June Third Annual Hadoop Summit (1,700 attendees) [49]
2011 October Debate over which company had contributed more to Hadoop. [46]
2012 January Hadoop community moves to separate from MapReduce and replace with YARN [25]
2012 June San Jose Hadoop Summit (2,100 attendees) [50]
2012 November Apache Hadoop 1.0 Available [37]
2013 March Hadoop Summit – Amsterdam (500 attendees) [51]
2013 March YARN deployed in production at Yahoo [52]
2013 June San Jose Hadoop Summit (2,700 attendees) [53]
2013 October Apache Hadoop 2.2 Available [37]
2014 February Apache Hadoop 2.3 Available [37]
2014 February Apache Spark top Level Apache Project [54]
2014 April Hadoop summit Amsterdam (750 attendees) [55]
2014 June Apache Hadoop 2.4 Available [37]
2014 June San Jose Hadoop Summit (3,200 attendees) [56]
2014 August Apache Hadoop 2.5 Available [37]
2014 November Apache Hadoop 2.6 Available [37]
2015 April Hadoop Summit Europe [57]
2015 June Apache Hadoop 2.7 Available [37]
目录
相关文章
|
存储 分布式计算 大数据
hadoop和spark的区别
学习hadoop已经有很长一段时间了,好像是二三月份的时候朋友给了一个国产Hadoop发行版下载地址,因为还是在学习阶段就下载了一个三节点的学习版玩一下。在研究、学习hadoop的朋友可以去找一下看看
4090 0
|
5月前
|
分布式计算 Hadoop Java
Hadoop编辑hadoop-env.sh文件
【7月更文挑战第19天】
364 5
|
6月前
|
分布式计算 Hadoop 大数据
Spark与Hadoop的区别?
【6月更文挑战第15天】Spark与Hadoop的区别?
73 8
|
7月前
|
分布式计算 资源调度 Hadoop
bigdata-06-Hadoop了解与配置
bigdata-06-Hadoop了解与配置
85 0
|
7月前
|
SQL 分布式计算 Hadoop
Hadoop学习笔记(HDP)-Part.16 安装HBase
01 关于HDP 02 核心组件原理 03 资源规划 04 基础环境配置 05 Yum源配置 06 安装OracleJDK 07 安装MySQL 08 部署Ambari集群 09 安装OpenLDAP 10 创建集群 11 安装Kerberos 12 安装HDFS 13 安装Ranger 14 安装YARN+MR 15 安装HIVE 16 安装HBase 17 安装Spark2 18 安装Flink 19 安装Kafka 20 安装Flume
148 1
Hadoop学习笔记(HDP)-Part.16 安装HBase
|
7月前
|
SQL 分布式计算 Hadoop
Hadoop学习笔记(HDP)-Part.15 安装HIVE
01 关于HDP 02 核心组件原理 03 资源规划 04 基础环境配置 05 Yum源配置 06 安装OracleJDK 07 安装MySQL 08 部署Ambari集群 09 安装OpenLDAP 10 创建集群 11 安装Kerberos 12 安装HDFS 13 安装Ranger 14 安装YARN+MR 15 安装HIVE 16 安装HBase 17 安装Spark2 18 安装Flink 19 安装Kafka 20 安装Flume
220 1
Hadoop学习笔记(HDP)-Part.15 安装HIVE
|
7月前
|
资源调度 分布式计算 Hadoop
Hadoop学习笔记(HDP)-Part.14 安装YARN+MR
01 关于HDP 02 核心组件原理 03 资源规划 04 基础环境配置 05 Yum源配置 06 安装OracleJDK 07 安装MySQL 08 部署Ambari集群 09 安装OpenLDAP 10 创建集群 11 安装Kerberos 12 安装HDFS 13 安装Ranger 14 安装YARN+MR 15 安装HIVE 16 安装HBase 17 安装Spark2 18 安装Flink 19 安装Kafka 20 安装Flume
285 0
Hadoop学习笔记(HDP)-Part.14 安装YARN+MR
|
7月前
|
存储 机器学习/深度学习 分布式计算
Hadoop学习笔记(HDP)-Part.12 安装HDFS
01 关于HDP 02 核心组件原理 03 资源规划 04 基础环境配置 05 Yum源配置 06 安装OracleJDK 07 安装MySQL 08 部署Ambari集群 09 安装OpenLDAP 10 创建集群 11 安装Kerberos 12 安装HDFS 13 安装Ranger 14 安装YARN+MR 15 安装HIVE 16 安装HBase 17 安装Spark2 18 安装Flink 19 安装Kafka 20 安装Flume
197 0
Hadoop学习笔记(HDP)-Part.12 安装HDFS
|
存储 分布式计算 负载均衡
Hadoop常见命令总结
常见基础命令: • 启动Hadoop • 进入HADOOP_HOME目录。 • 执行sh bin/start-all.sh • 关闭Hadoop • 进入HADOOP_HOME目录。 • 执行sh bin/stop-all.sh
569 0
|
存储 机器学习/深度学习 资源调度