Hadoop正在走下坡路

本文涉及的产品
云原生大数据计算服务 MaxCompute,5000CU*H 100GB 3个月
云原生大数据计算服务MaxCompute,500CU*H 100GB 3个月
简介:

长期以来,Hadoop 这个词铺天盖地,几乎成了大数据的代名词。三年之前,提起超越 Hadoop 这件事,似乎还显得难以想象。但三年后的今天,这一情况发生了一些改变。

Hadoop

早在 2012 年,知名媒体 SiliconANGLE 就针对 Twitter 平台上的大数据专业人士做了一项调查。调查结果显示:这些专业人士日常谈论 NoSQL 等技术(如 MongoDB)的次数要远多于 Hadoop。这表明,至少在数据科学家的群体中,用 Hadoop 代指大数据似乎并不准确

然而大多数人认为 Hadoop 已经是大数据最重要的技术之一,是大数据构建的基础。它还被利用在一些新的领域,如仓储系统。话虽如此,出人意料的是,它的适用性或多或少有点滞后。对此,IBM Software 的传道士 James Kobielus 说道:“ 2016 年,Hadoop 在大数据领域的下滑速度比我预期的要快得多。”

其中原因很难说清,但可将其理解为数据领域的惯有现象。Gartner 于 2015 年的调查显示,54% 的公司都没有计划投资 Hadoop,另外 44% 的公司表示已使用 Hadoop 或将在未来两年使用。这些数据不同人看来有不同的观点,你可以认为 Hadoop 将进一步扩大,也可以认为大多数人根本不重视 Hadoop。同时,调查还揭露了一些其他无法平息的影响因素。在没有投资的人当中,49% 的人仍在努力挖掘 Hadoop 的使用价值,而另外 57% 的人指出,其中的技能差距是决定是否使用的主要阻碍,而这并不能立马得到解决。这一现象恰好与 ”Hadoop Testing“ 关于就业趋势的调查结果相一致:在 2014 年中旬,这一关键词在大约 0.061% 的广告中出现,在 2016 年末又增长至 0.087%,在 18 个月内,增长了约 43%。

这可能表明,采用Hadoop的公司数量不一定会降低到坊间证据表明的那样,但公司只是发现很难从他们现有的团队中提取Hadoop的价值,他们需要更多的专业知识。

另一个可能引起人们关注的因素是,一个人的大数据却是另一个人的小数据。 Hadoop是为大量数据而设计的,Kashif Saiyed在KD Nuggets上写道:‘如果你的企业没有真正面临海量数据的问题,你就不需要Hadoop,因此数百家企业对他们无用的、处理2到10TB数据规模大小的 Hadoop集群感到非常失望 – Hadoop技术只是不擅长处理这种规模。‘

大多数公司目前没有足够的数据来保证Hadoop的部署,但还是这么做的原因是他们觉得他们需要互相攀比。 经过几年的实验,并与真正的数据科学家一起工作,他们很快就意识到他们的数据在其他技术上工作得更好。

这种趋势已经超出了采用开源平台的速度,但对于一些公司来说,这已经产生了实际的财务影响。 Cloudera和Hortonworks是从Hadoop框架构建自己产品的两家最大的公司。 由于Hadoop的下滑,对于两家公司都造成了不同程度的重大损失,据报告Cloudera失去了40%,而Hortonworks的股价自2015年中期以来已经下跌了68%。

这篇文章对Hadoop的批评似乎有些苛刻,但并不是平台本身造成了当前的问题。 相反,这可能是由于过分炒作和大数据协会导致了事实上的伤害。一些公司采用了该平台却没有理解它,同时又没有合适的人或数据来使其正常工作,这导致了项目实施的幻灭和明显的停滞。Hadoop依然还有强大的生命力,只是人们需要更好地理解它。

原文:

Three years ago, looking beyond Hadoop was insanity, and there was little else that could come close according to many in the media. However, the reality has been a little different.

For a long period, Hadoop and big data were almost interchangeable when they were being discussed by those in the media, although this was not necessarily found to be the case amongst data scientists. A study by Silicon Angle in 2012 analyzing Twitter conversations between data professionals talking about big data found that they actually talked about NoSQL technologies like MongoDB as much, or more, than Hadoop, which would indicate that it has not actually been the must have that many assumed it was.

Most would argue that Hadoop has been one of the single most important elements in the spread of big data, that it is very much the foundation on which data today is built. We are also still finding new ways to use it, in warehousing for instance. That being said, to the surprise of many, its adoption appears to have more or less stagnated, leading even James Kobielus, Big Data Evangelist at IBM Software, to claim that ‘Hadoop declined more rapidly in 2016 from the big-data landscape than I expected.’

The reasons for this are hard to ascertain, but could be down to a problem common in data circles. A 2015 study from Gartner found that 54% of companies had no plans to invest in Hadoop, while 44% of those asked had adopted Hadoop already or planned to at some point in the next two years. This could, depending on your point of view, be taken to mean either that it would see even further expansion or that the majority were ignoring it. However, the survey also revealed a number of other telling factors with implications unlikely to have subsided since. Of those who were not investing, 49% were still trying to figure out how to use it for value, while 57% said that the skills gap was the major reason, a number that is not going to be corrected overnight. This coincides with findings from Indeed who tracked job trends with ‘Hadoop Testing’ in the title, with the term featured in a peak of 0.061% of ads in mid 2014, which then jumped to 0.087% in late 2016, an increase of around 43% in 18 months.

What this may signal is that adoption hasn’t necessarily dropped to the extent that anecdotal evidence would suggest, but companies are simply finding it difficult to extract value from Hadoop from their current teams and they require greater expertise.

Another element that may be cause for concern is simply that one man’s big data is another man’s small data. Hadoop is designed for huge amounts of data, and as Kashif Saiyed wrote on KD Nuggets ‘You don’t need Hadoop if you don’t really have a problem of huge data volumes in your enterprise, so hundreds of enterprises were hugely disappointed by their useless 2 to 10TB Hadoop clusters – Hadoop technology just doesn’t shine at this scale.’

Most companies do not currently have enough data to warrant a Hadoop rollout, but did so anyway because they felt they needed to keep up with the Joneses. After a few years of experimentation and working alongside genuine data scientists, they soon realize that their data works better in other technologies.

This trend has had impacts beyond a slow down in the adoption of an open source platform though, for some companies this has had real world financial impacts. Cloudera and Hortonworks are two of the biggest companies that build their products out from a Hadoop framework. Both have lost significant value in-part due to its decline, with Cloudera reported to have lost 40% whilst Hortonworks’ shares have plummeted 68% since mid 2015.

Criticism within this article may seem harsh on Hadoop, but it is not the platform in itself that has caused the current issues. Instead it is perhaps the hype and association of big data that has done the real damage. Companies have adopted the platform without understanding it and then failed to get the right people or data to make it work properly, which has led to disillusionment and its apparent stagnation. There is still a huge amount of life in Hadoop, but people just need to understand it better.


本文作者:George Hill 

来源:51CTO

相关实践学习
基于MaxCompute的热门话题分析
本实验围绕社交用户发布的文章做了详尽的分析,通过分析能得到用户群体年龄分布,性别分布,地理位置分布,以及热门话题的热度。
SaaS 模式云数据仓库必修课
本课程由阿里云开发者社区和阿里云大数据团队共同出品,是SaaS模式云原生数据仓库领导者MaxCompute核心课程。本课程由阿里云资深产品和技术专家们从概念到方法,从场景到实践,体系化的将阿里巴巴飞天大数据平台10多年的经过验证的方法与实践深入浅出的讲给开发者们。帮助大数据开发者快速了解并掌握SaaS模式的云原生的数据仓库,助力开发者学习了解先进的技术栈,并能在实际业务中敏捷的进行大数据分析,赋能企业业务。 通过本课程可以了解SaaS模式云原生数据仓库领导者MaxCompute核心功能及典型适用场景,可应用MaxCompute实现数仓搭建,快速进行大数据分析。适合大数据工程师、大数据分析师 大量数据需要处理、存储和管理,需要搭建数据仓库?学它! 没有足够人员和经验来运维大数据平台,不想自建IDC买机器,需要免运维的大数据平台?会SQL就等于会大数据?学它! 想知道大数据用得对不对,想用更少的钱得到持续演进的数仓能力?获得极致弹性的计算资源和更好的性能,以及持续保护数据安全的生产环境?学它! 想要获得灵活的分析能力,快速洞察数据规律特征?想要兼得数据湖的灵活性与数据仓库的成长性?学它! 出品人:阿里云大数据产品及研发团队专家 产品 MaxCompute 官网 https://www.aliyun.com/product/odps 
相关文章
|
2月前
|
分布式计算 资源调度 Hadoop
Hadoop 2.0 与 Hadoop 1.x 有何不同?
【8月更文挑战第12天】
31 4
|
分布式计算 Hadoop Java
|
存储 分布式计算 Hadoop
Hadoop
Hadoop是一个由Apache基金会所开发的分布式系统基础架构。
|
SQL 数据采集 分布式计算
Hadoop01【介绍】
HADOOP是apache旗下的一套开源软件平台,HADOOP提供的功能:利用服务器集群,根据用户的自定义业务逻辑,对海量数据进行分布式处理
Hadoop01【介绍】
|
存储 分布式计算 大数据
|
分布式计算 Hadoop
Hadoop DistributedCache详解(转载)
转自:http://dongxicheng.org/mapreduce-nextgen/hadoop-distributedcache-details/
783 0
|
分布式计算 Java Hadoop
|
分布式计算 Java Hadoop
|
存储 SQL 分布式计算