193 DStream相关操作 - Output Operations on DStreams

简介: 193 DStream相关操作 - Output Operations on DStreams

Output Operations可以将DStream的数据输出到外部的数据库或文件系统,当某个Output Operations原语被调用时(与RDD的Action相同),streaming程序才会开始真正的计算过程。

Output Operation Meaning
print() Prints the first ten elements of every batch of data in a DStream on the driver node running the streaming application. This is useful for development and debugging.
saveAsTextFiles(prefix, [suffix]) Save this DStream’s contents as text files. The file name at each batch interval is generated based on prefix and suffix: “prefix-TIME_IN_MS[.suffix]”.
saveAsObjectFiles(prefix, [suffix]) Save this DStream’s contents as SequenceFiles of serialized Java objects. The file name at each batch interval is generated based on prefix and suffix: “prefix-TIME_IN_MS[.suffix]”.
saveAsHadoopFiles(prefix, [suffix]) Save this DStream’s contents as Hadoop files. The file name at each batch interval is generated based on prefix and suffix: “prefix-TIME_IN_MS[.suffix]”.
foreachRDD(func) The most generic output operator that applies a function, func, to each RDD generated from the stream. This function should push the data in each RDD to an external system, such as saving the RDD to files, or writing it over the network to a database. Note that the function func is executed in the driver process running the streaming application, and will usually have RDD actions in it that will force the computation of the streaming RDDs.
目录
相关文章
|
SQL 流计算
Flink CDC这俩statement mode和batch mode啥区别
Flink CDC这俩statement mode和batch mode啥区别
99 1
|
机器学习/深度学习 分布式计算 API
192 DStream相关操作 - Transformations on DStreams
192 DStream相关操作 - Transformations on DStreams
36 0
|
分布式计算 Java 5G
spark异常:missing an output location for shuffle 0
spark异常:missing an output location for shuffle 0
506 0
|
分布式计算 Java Spark
Spark Tungsten-sort Based Shuffle 分析
Tungsten-sort 算不得一个全新的shuffle 方案,它在特定场景下基于类似现有的Sort Based Shuffle处理流程,对内存/CPU/Cache使用做了非常大的优化。带来高效的同时,也就限定了自己的使用场景。如果Tungsten-sort 发现自己无法处理,则会自动使用 Sor
3758 0