How can R and Hadoop be used together?

简介:

Referer: http://www.quora.com/How-can-R-and-Hadoop-be-used-together/answer/Jay-Kreps?srid=OVd9&share=1

 

Another way to answer this question is that they don't really integrate very well.

The advantage of R is not its syntax but rather the incredible library of primitives for visualization and statistics. These libraries are fundamentally non-distributed, and almost always operate on data resident in memory. So, for example, if you are finding R's glm method slow (or completely infeasible) on a particular dataset, there is really no way to make it run faster with Hadoop.

The reason it is important to point this out is that
1. Transparently distributed R is every data geeks wet dream.
2. I have sat through numerous presentations from distributed database vendors claiming to provide this.

What hadoop and database vendors can provide is the ability to run R in parallel on lots of little data sets. Virtually none of the libraries will work on a data set larger than memory.

 

Referer: http://www.quora.com/How-can-R-and-Hadoop-be-used-together/answer/Jay-Kreps?srid=OVd9&share=1

声明:如有转载本博文章,请注明出处。您的支持是我的动力!文章部分内容来自互联网,本人不负任何法律责任。
本文转自bourneli博客园博客,原文链接:http://www.cnblogs.com/bourneli/p/3233398.html ,如需转载请自行联系原作者
相关文章
|
3月前
|
存储 分布式计算 Hadoop
深入理解Hadoop中的SequenceFileInputFormat
【8月更文挑战第31天】
76 0
|
对象存储 分布式计算 Hadoop
hadoop
core-site.xml
97 0
|
机器学习/深度学习 存储 SQL
Hadoop
Hadoop组成
206 0
|
存储 分布式计算 大数据
|
分布式计算 资源调度 监控
|
存储 SQL 分布式计算
|
存储 分布式计算 Java

相关实验场景

更多