Hadoop history

简介: *The genesis of Hadoop came from the Google File System paper[11] that was published in October 2003. This paper spawned another research paper from Google – MapReduce: Simplified Data Processin


*The genesis of Hadoop came from the Google File System paper[11] that was published in October 2003. This paper spawned another research paper from Google – MapReduce: Simplified Data Processing on Large Clusters.[12] Development started on the Apache Nutch project, but was moved to the new Hadoop subproject in January 2006.[13] Doug Cutting, who was working at Yahoo! at the time,[14] named it after his son's toy elephant.[15] The initial code that was factored out of Nutch consisted of 5k lines of code for HDFS and 6k lines of code for MapReduce.


The first committer added to the Hadoop project was Owen O’Malley in March 2006.[16] Hadoop 0.1.0 was released in April 2006[17] and continues to evolve by the many contributors[18] to the Apache Hadoop project.


Timeline[edit]
Year Month Event Ref.
2003 October Google File System paper released [19]
2004 December MapReduce: Simplified Data Processing on Large Clusters [20]
2006 January Hadoop subproject created with mailing lists, jira, and wiki [21]
2006 January Hadoop is born from Nutch 197 [22]
2006 February NDFS+ MapReduce moved out of Apache Nutch to create Hadoop [23]
2006 February Owen O'Malley's first patch goes into Hadoop [24]
2006 February Hadoop is named after Cutting's son's yellow plush toy [25]
2006 April Hadoop 0.1.0 released [26]
2006 April Hadoop sorts 1.8 TB on 188 nodes in 47.9 hours [23]
2006 May Yahoo deploys 300 machine Hadoop cluster [23]
2006 October Yahoo Hadoop cluster reaches 600 machines [23]
2007 April Yahoo runs two clusters of 1,000 machines [23]
2007 June Only three companies on "Powered by Hadoop Page" [27]
2007 October First release of Hadoop that includes HBase [28]
2007 October Yahoo Labs creates Pig, and donates it to the ASF [29]
2008 January YARN JIRA opened Yarn Jira (Mapreduce 279)
2008 January 20 companies on "Powered by Hadoop Page" [27]
2008 February Yahoo moves its web index onto Hadoop [30]
2008 February Yahoo! production search index generated by a 10,000-core Hadoop cluster [23]
2008 March First Hadoop Summit [31]
2008 April Hadoop world record fastest system to sort a terabyte of data. Running on a 910-node cluster, Hadoop sorted one terabyte in 209 seconds [23]
2008 May Hadoop wins TeraByte Sort (World Record sortbenchmark.org) [32]
2008 July Hadoop wins Terabyte Sort Benchmark [33]
2008 October Loading 10 TB/day in Yahoo clusters [23]
2008 October Cloudera, Hadoop distributor is founded [34]
2008 November Google MapReduce implementation sorted one terabyte in 68 seconds [23]
2009 March Yahoo runs 17 clusters with 24,000 machines [23]
2009 April Hadoop sorts a petabyte [35]
2009 May Yahoo! used Hadoop to sort one terabyte in 62 seconds [23]
2009 June Second Hadoop Summit [36]
2009 July Hadoop Core is renamed Hadoop Common [37]
2009 July MapR, Hadoop distributor founded [38]
2009 July HDFS now a separate subproject [37]
2009 July MapReduce now a separate subproject [37]
2010 January Kerberos support added to Hadoop [39]
2010 May Apache HBase Graduates [40]
2010 June Third Hadoop Summit [41]
2010 June Yahoo 4,000 nodes/70 petabytes [42]
2010 June Facebook 2,300 clusters/40 petabytes [42]
2010 September Apache Hive Graduates [43]
2010 September Apache Pig Graduates [44]
2011 January Apache Zookeeper Graduates [45]
2011 January Facebook, LinkedIn, eBay and IBM collectively contribute 200,000 lines of code [46]
2011 March Apache Hadoop takes top prize at Media Guardian Innovation Awards [47]
2011 June Rob Beardon and Eric Badleschieler spin out Hortonworks out of Yahoo. [48]
2011 June Yahoo has 42K Hadoop nodes and hundreds of petabytes of storage [48]
2011 June Third Annual Hadoop Summit (1,700 attendees) [49]
2011 October Debate over which company had contributed more to Hadoop. [46]
2012 January Hadoop community moves to separate from MapReduce and replace with YARN [25]
2012 June San Jose Hadoop Summit (2,100 attendees) [50]
2012 November Apache Hadoop 1.0 Available [37]
2013 March Hadoop Summit – Amsterdam (500 attendees) [51]
2013 March YARN deployed in production at Yahoo [52]
2013 June San Jose Hadoop Summit (2,700 attendees) [53]
2013 October Apache Hadoop 2.2 Available [37]
2014 February Apache Hadoop 2.3 Available [37]
2014 February Apache Spark top Level Apache Project [54]
2014 April Hadoop summit Amsterdam (750 attendees) [55]
2014 June Apache Hadoop 2.4 Available [37]
2014 June San Jose Hadoop Summit (3,200 attendees) [56]
2014 August Apache Hadoop 2.5 Available [37]
2014 November Apache Hadoop 2.6 Available [37]
2015 April Hadoop Summit Europe [57]
2015 June Apache Hadoop 2.7 Available [37]
目录
相关文章
|
前端开发 开发工具 数据安全/隐私保护
WebStorm安装详情以及破解教程
WebStorm作为前端开发最强大的编辑器之一,很多小伙伴选择了它作为自己的常用前端开发工具,但是毕竟这是一款付费软件,对于很多学生党来说,还是承担不起这个费用的,所以我就给大家找来了免费的正版破解教程,希望能帮助到大家。 注意 :破解的版本只能用于个人学习使用,如果是商用,那我还是建议购买正版的 说明:因为 WebStorm 版本会一直进行更新,所以每次更新都需要寻找新的破解文,所以我就给大家准备了稳定 WebStorm 2020.1 版本安装包和对应的破解文件。
5920 0
WebStorm安装详情以及破解教程
|
机器学习/深度学习 数据可视化 算法框架/工具
Keras中神经网络可视化模块keras.utils.vis_util 的安装
Keras中神经网络可视化模块keras.utils.vis_util 的安装
916 0
|
移动开发 JavaScript 前端开发
HarmonyOS鸿蒙应用开发——探索原生与H5通信框架DSBridge
HarmonyOS版DSBridge是一个桥梁库,允许鸿蒙原生环境与JavaScript交互。它兼容Android和iOS的第三方DSBridge核心功能,支持同步和异步调用、命名空间API管理、进度回调及页面关闭监听等功能。主要特性包括适配鸿蒙NEXT版本、支持串行异步并发任务、兼容DSBridge 2.0与3.0版本JS脚本,并提供类形式集中管理API及自定义页面组件注册。源码仓库:HarmonyOS版 - DSBridge-HarmonyOS。安装命令为`ohpm install @hzw/ohos-dsbridge`。通过该库,开发者可以方便地在鸿蒙系统中实现原生与JS的高效交互。
802 1
|
5月前
|
人工智能 自然语言处理 语音技术
自动生成+语音转写,办公必备!2025年智能会议纪要10+工具!
在当今快节奏的商业环境中,会议已成为组织沟通和决策的核心环节。然而,低效的会议管理往往导致时间浪费和信息丢失。会议纪要工具通过语音识别、智能摘要等技术,实现会议内容自动记录与结构化整理,提升会议效率与信息留存,助力企业优化知识管理与团队协作。不仅简化了传统手工记录的过程,更通过人工智能、自然语言处理等先进技术,实现了会议内容的自动捕捉、智能分析和结构化整理。这类工具已成为现代企业提升生产力、优化知识管理的重要基础设施,帮助团队从繁琐的会议记录工作中解放出来,将更多精力投入。
1263 0
|
监控 安全 中间件
Next.js 实战 (十):中间件的魅力,打造更快更安全的应用
这篇文章介绍了什么是Next.js中的中间件以及其应用场景。中间件可以用于处理每个传入请求,比如实现日志记录、身份验证、重定向、CORS配置等功能。文章还提供了一个身份验证中间件的示例代码,以及如何使用限流中间件来限制同一IP地址的请求次数。中间件相当于一个构建模块,能够简化HTTP请求的预处理和后处理,提高代码的可维护性,有助于创建快速、安全和用户友好的Web体验。
295 0
Next.js 实战 (十):中间件的魅力,打造更快更安全的应用
|
关系型数据库 MySQL 测试技术
MySQL性能测试(完整版)
MySQL性能测试(完整版)
1678 1
|
关系型数据库 MySQL
mysql配置文件的使用
mysql配置文件的使用
400 1
mysql配置文件的使用
|
自然语言处理 测试技术 网络安全
ElasticSearch7最新实战文档-附带logstash同步方案
ElasticSearch7最新实战文档-附带logstash同步方案
323 0
|
开发者 Java UED
大文件传输不再头疼:揭秘Struts 2如何轻松应对文件上传与下载难题!
【8月更文挑战第31天】在Web应用开发中,文件上传与下载至关重要。Struts 2作为主流Java EE框架,凭借Commons FileUpload及文件上传拦截器简化了相关操作。本文探讨Struts 2在文件传输上的优势,通过具体配置与代码示例,展示如何设置最大文件大小、使用`fileUpload`拦截器以及实现文件上传与下载功能。对于大文件传输,Struts 2不仅能够轻松应对,还支持上传进度显示,有效提升了用户体验。总体而言,Struts 2为文件传输提供了高效便捷的解决方案,助力开发者构建稳定可靠的Web应用。然而,在处理大文件时需兼顾网络带宽与服务器性能,确保传输顺畅。
237 0
gitea配置全局代理用于镜像github源
gitea配置全局代理用于镜像github源
3010 0

热门文章

最新文章