StreamingPro

简介: StreamingPro is not a complete application, but rather a extensible and programmable framework for spark streaming (also include spark,storm)that can

Declarative workflows for building Spark Streaming

1de7721f4209f17f306f024d216317c55367bc2f
Spark Streaming
Spark Streaming is an extension of the core Spark API that enables stream processing from a variety of sources.Spark is a extensible and programmable framework for massive distributed processing of datasets,called Resilient Distributed Datasets (RDD). Spark Streaming receives input data streams and divides the data into batches, which are then processed by the Spark engine to generate the results.Spark Streaming data is organized into a sequence of DStreams,represented internally as a sequence of RDDs.

StreamingPro

StreamingPro is not a complete application, but rather  a extensible and programmable framework for spark streaming (also include spark,storm)that can easily be used to build your streaming application.
StreamingPro also make it possible that all you should do to build streaming program is assembling components(eg. SQL Component) in configuration file. 

Features

  • Pure Spark Streaming(Or normal Spark) program (Storm in future)
  • No need of coding, only declarative workflows
  • Rest API for interactive
  • SQL-Oriented workflows support  
  • Data continuously streamed in & processed in near real-time
  • dynamically CURD of workflows  at runtime via Rest API 
  • Flexible workflows (input, output, parsers, etc...) 
  • High performance
  • Scalable   

Documents

Architecture

cfc7ad03f8758fe950f25976c1e140fbc7af0690
Snip20160510_3.png

Declarative workflows

1de7721f4209f17f306f024d216317c55367bc2f
Snip20160510_4.png

Implementation

e7ea91ecaf0f3c5a6a3f0c6288608a460ec1b282
Snip20160510_1.png
目录
相关文章
|
存储 SQL 消息中间件
实战|使用Spark Structured Streaming写入Hudi
传统数仓的组织架构是针对离线数据的OLAP(联机事务分析)需求设计的,常用的导入数据方式为采用sqoop或spark定时作业逐批将业务库数据导入数仓。随着数据分析对实时性要求的不断提高,按小时、甚至分钟级的数据同步越来越普遍。由此展开了基于spark/flink流处理机制的(准)实时同步系统的开发。
787 0
实战|使用Spark Structured Streaming写入Hudi
|
SQL 存储 分布式计算
Storm与Spark、Hadoop三种框架对比
Storm与Spark、Hadoop这三种框架,各有各的优点,每个框架都有自己的最佳应用场景。所以,在不同的应用场景下,应该选择不同的框架。
394 0
Storm与Spark、Hadoop三种框架对比
|
Ubuntu Java 程序员
Flink1.9.2源码编译和使用
修改flink1.9.2源码,并编译构建,在新的任务中使用和验证
142 0
Flink1.9.2源码编译和使用
|
SQL 消息中间件 存储
Flink x Zeppelin ,Hive Streaming 实战解析
Flink 1.11 正式发布已经三周了,其中最吸引我的特性就是 Hive Streaming。正巧 Zeppelin-0.9-preview2 也在前不久发布了,所以就写了一篇 Zeppelin 上的 Flink Hive Streaming 的实战解析。
Flink x Zeppelin ,Hive Streaming 实战解析
|
SQL 分布式计算 HIVE
Shark
Shark自己也没用过,不太熟悉,只了解它的背景,现在已经被Spark淘汰,也不去熟悉它了! Spark 1.0版本开始,推出了Spark SQL。
1064 0
|
分布式计算 Hadoop Linux
|
jstorm 分布式计算 Spark
|
分布式计算 Java Hadoop