阿里巴巴编程之夏(Alibaba Summer of Code)是一个全球性计划,通过这个计划,学生可以直接参与开源软件开发,在导师(Mentor)的指导下,深度体验真实世界的软件开发,感受开源技术共建的魅力。
另外,学生能够在计划进行中,结实更多开源领域技术大牛和志趣相投的小伙伴;在完成计划后获得由阿里巴巴提供的奖学金、开源贡献者证书并且有机会进入阿里招聘绿色通道;学生编写的代码更有机会被国际开源基金顶级项目采用,让世界各地的人自由使用。
这些收获,不仅仅是未来求职简历上浓墨重彩的一笔,更是学生向高阶开源贡献者晋级的闪亮起点。阿里巴巴编程之夏将于 2019 年 5 月 6 日- 8 月 29 日间展开,学生可以利用暑期时间参与到开源项目中。
项目介绍:
Apache Flink 是由 Apache 软件基金会开发的开源流处理框架,其核心是用 Java 和 Scala 编写的分布式流数据流引擎。Flink 以数据并行和流水线方式执行任意流数据程序,Flink 的流水线运行时系统可以执行批处理和流处理程序。此外,Flink 的运行时本身也支持迭代算法的执行。
Idea list
1.Add a new implementation of the HighAvailabilityServices using etcd:https://issues.apache.org/jira/browse/FLINK-11105
- Mentor:沙晟阳 @ 成阳 ;GitHub ID:[MalcolmSanders;(https://github.com/MalcolmSanders) Apache YARN、Flink 贡献者; 阿里云计算平台高级开发工程师
2.在树莓派等有限硬件资源的环境下高效的运行 flink,将 flink 应用于 IoT,边缘计算场景
- Mentor:宋辛童 @ 五藏;Github ID: xintongsong 北京大学博士;阿里巴巴 高级开发工程师
3.通过 Intelij Idea 一站式编写、远程提交和分布式Debug Flink 任务。Intelij Idea 是很好的编程语言 IDE,Flink 是下一代分布式大数据处理引擎,两者结合,在 Intellij Idea 上构建Flink 任务编写、远程任务提交、分布式 Debug 和在线运维的一站式服务将对 Flink 用户带来更好的体验。通过该项目,有助于熟练使用 Flink,提升大数据处理和相关工具的开发使用能力,提交的代码反馈社区,尽早参与到 Flink 生态建设中。
- Mentor:何健超 @ 迟南; Github id: hejianchao; 阿里巴巴 技术专家
4.State storage is on the critical path of Flink, a stateful computing engine. Basically it's a kv store but with computing-relative requirement, thus an interdisciplinary area. Gemini is a KeyValue store we designed for such scenario. In Gemini, using elastic pages from a few bytes to tens of KB to store the data.
In this topic you need to implement a cache allocator for pages, which aims at supporting off-heap to reduce GC, having high throughput and always replacing cold data with hot ones to increase cache hit ratio and memory utilization.
- Mentor:李钰 @ 绝顶; Github id: [https://github.com/carp84]
Apache HBase PMC & committer, Flink/HDFS contributor; 阿里巴巴 高级技术专家
5.State storage is on the critical path of Flink, a state-ful computing engine. Basically it's a kv store but with computing-relative requirement, thus an interdisciplinary area. Gemini is a KeyValue store we designed for such scenario, it's a two-component LSM-tree structure, of which C0 tree is write buffer, and C1 tree could be an enhanced B+-tree or hash table, where hash table offers faster random lookup than sorted-base index.In this topic you need to implement a CSBw-tree, which is a combination of CSB+-tree[1] and Bw-tree[2], which aims at both good cpu cache utility (cache-conscious) and fast random access.
[1] Making B+-Trees Cache Conscious in Main Memory, SIGMOD 2000
[2] The Bw-Tree: A B-tree for New Hardware Platforms, ICDE 2013
- Mentor:李钰 @ 绝顶;Github id: [https://github.com/carp84]
Apache HBase PMC & committer, Flink/HDFS contributor; 阿里巴巴 高级技术专家
6.Batch benchmark has matured and been widely used to analyze performance of batch processing technologies. However, There is no suitable benchmark to test streaming framework, which has more performance latitudes and usage scenarios. So we need to develop streaming benchmark to comprehensive test Flink and other streaming processing framework, and optimize Flink according to the benchmark results.
- Mentor:
胥平勇 @姬平; Github id: XuPingyong; Apache Flink contributor; 阿里巴巴 技术专家