Spark(Python) 从内存中建立 RDD 的例子-阿里云开发者社区

开发者社区> 嗯哼9925> 正文

Spark(Python) 从内存中建立 RDD 的例子

简介:
+关注继续查看

Spark(Python) 从内存中建立 RDD 的例子:

myData = ["Alice","Carlos","Frank","Barbara"]
myRdd = sc.parallelize(myData)
myRdd.take(2)

----
In [52]: myData = ["Alice","Carlos","Frank","Barbara"]

In [53]: myRdd = sc.parallelize(myData)

In [54]: myRdd.take(2)
17/09/24 02:40:10 INFO spark.SparkContext: Starting job: runJob at PythonRDD.scala:393
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Got job 5 (runJob at PythonRDD.scala:393) with 1 output partitions
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Final stage: ResultStage 5 (runJob at PythonRDD.scala:393)
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Missing parents: List()
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Submitting ResultStage 5 (PythonRDD[32] at RDD at PythonRDD.scala:43), which has no missing parents
17/09/24 02:40:10 INFO storage.MemoryStore: Block broadcast_16 stored as values in memory (estimated size 3.2 KB, free 1767.1 KB)
17/09/24 02:40:10 INFO storage.MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 2.2 KB, free 1769.3 KB)
17/09/24 02:40:10 INFO storage.BlockManagerInfo: Added broadcast_16_piece0 in memory on localhost:33950 (size: 2.2 KB, free: 208.7 MB)
17/09/24 02:40:10 INFO spark.SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1006
17/09/24 02:40:10 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 5 (PythonRDD[32] at RDD at PythonRDD.scala:43)
17/09/24 02:40:10 INFO scheduler.TaskSchedulerImpl: Adding task set 5.0 with 1 tasks
17/09/24 02:40:10 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 5.0 (TID 5, localhost, partition 0,PROCESS_LOCAL, 2028 bytes)
17/09/24 02:40:10 INFO executor.Executor: Running task 0.0 in stage 5.0 (TID 5)
17/09/24 02:40:11 INFO python.PythonRunner: Times: total = 41, boot = 20, init = 14, finish = 7
17/09/24 02:40:11 INFO executor.Executor: Finished task 0.0 in stage 5.0 (TID 5). 979 bytes result sent to driver
17/09/24 02:40:11 INFO scheduler.DAGScheduler: ResultStage 5 (runJob at PythonRDD.scala:393) finished in 0.423 s
17/09/24 02:40:11 INFO scheduler.DAGScheduler: Job 5 finished: runJob at PythonRDD.scala:393, took 0.648315 s
17/09/24 02:40:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 5.0 (TID 5) in 423 ms on localhost (1/1)
17/09/24 02:40:11 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool 
Out[54]: ['Alice', 'Carlos']

In [55]:





本文转自健哥的数据花园博客园博客,原文链接:http://www.cnblogs.com/gaojian/p/7587750.html,如需转载请自行联系原作者

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

相关文章
Apache APISIX 助力有赞云原生 PaaS 平台,实现全面微服务治理(3)
Apache APISIX 助力有赞云原生 PaaS 平台,实现全面微服务治理(3)
13 0
Apache APISIX 助力有赞云原生 PaaS 平台,实现全面微服务治理(1)
Apache APISIX 助力有赞云原生 PaaS 平台,实现全面微服务治理(1)
21 0
Python3 编程实例(26 - 30)
Python3 编程实例(26 - 30)
13 0
阿里云ECS云服务器和轻量应用服务有什么区别及选择方法
轻量应用服务器是轻量级的云服务器,不能搭建集群,适用于单机应用,比如单机网站应用。
3 0
Python3 编程实例(21 - 25)
Python3 编程实例(21 - 25)
12 0
Python3 编程实例(36 - 40)
Python3 编程实例(36 - 40)
6 0
Python3 编程实例(41 - 45)
Python3 编程实例(41 - 45)
10 0
Python3 编程实例(46 - 50)
Python3 编程实例(46 - 50)
9 0
Part9__机器学习实战学习笔记__PCA
本文对PCA算法原理进行简要的介绍,然后在iris数据集上面测试算法的效果。
45 0
Python3 编程实例(51 - 55)
Python3 编程实例(51 - 55)
12 0
+关注
4716
文章
0
问答
文章排行榜
最热
最新
相关电子书
更多
《2021云上架构与运维峰会演讲合集》
立即下载
《零基础CSS入门教程》
立即下载
《零基础HTML入门教程》
立即下载