数据湖架构之Hudi编译篇

本文涉及的产品
实时计算 Flink 版,5000CU*H 3个月
简介: 数据湖架构之Hudi编译篇

前言

说起编译hudi,从第一遍过之后,再回过头来看,发现就是第一遍不熟悉,出现的一切问题可以总结为maven仓库没配置好。一开始我只是配置了阿里云仓库,但是后面不断报错,然后百度谷歌找原因,再调整配置,再编译,最后就成功了,所以整体来说编译不复杂,只要配置正确,那我把最后可以通过的配置贴出来,这也是我觉得可以帮助到大部分同学的地方。

版本与源码

hudi迭代还是比较快的,因为同时也依赖了hadoop和spark,为了组合使用,我使用的是0.9.0版本,对应地址:[https://hudi.apache.org/releases/release-0.9.0](https://hudi.apache.org/releases/release-

0.9.0)

源码部分可以点击download的部分,即可下载

环境准备

[root@zhu-91-134 target]# mvn -v
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-18T02:33:14+08:00)
Maven home: /apps/svr/maven
Java version: 1.8.0_144, vendor: Oracle Corporation, runtime: /apps/svr/jdk1.8.0_144/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "linux", version: "3.10.5-3.el6.x86_64", arch: "amd64", family: "unix"

编译

编译过程其实就是普通maven项目,不是很复杂

mvn clean install -DskipTests -DskipITs -Dscala-2.12 -Dspark3 

前面也提到了,陆陆续续才成功,最后的结果,还是需要亮相一下:

[INFO] Hudi 0.9.0 ......................................... SUCCESS [  2.758 s]
[INFO] hudi-common ........................................ SUCCESS [ 20.652 s]
[INFO] hudi-timeline-service .............................. SUCCESS [  3.375 s]
[INFO] hudi-client ........................................ SUCCESS [  0.208 s]
[INFO] hudi-client-common ................................. SUCCESS [ 12.704 s]
[INFO] hudi-hadoop-mr ..................................... SUCCESS [  5.637 s]
[INFO] hudi-spark-client .................................. SUCCESS [ 24.567 s]
[INFO] hudi-sync-common ................................... SUCCESS [  1.197 s]
[INFO] hudi-hive-sync ..................................... SUCCESS [  6.125 s]
[INFO] hudi-spark-datasource .............................. SUCCESS [  0.107 s]
[INFO] hudi-spark-common_2.12 ............................. SUCCESS [ 13.649 s]
[INFO] hudi-spark3_2.12 ................................... SUCCESS [ 11.451 s]
[INFO] hudi-spark_2.12 .................................... SUCCESS [ 45.515 s]
[INFO] hudi-utilities_2.12 ................................ SUCCESS [ 23.751 s]
[INFO] hudi-utilities-bundle_2.12 ......................... SUCCESS [ 51.554 s]
[INFO] hudi-cli ........................................... SUCCESS [ 32.192 s]
[INFO] hudi-java-client ................................... SUCCESS [  3.458 s]
[INFO] hudi-flink-client .................................. SUCCESS [ 12.356 s]
[INFO] hudi-spark2_2.12 ................................... SUCCESS [ 17.489 s]
[INFO] hudi-dla-sync ...................................... SUCCESS [  3.055 s]
[INFO] hudi-sync .......................................... SUCCESS [  0.131 s]
[INFO] hudi-hadoop-mr-bundle .............................. SUCCESS [  6.229 s]
[INFO] hudi-hive-sync-bundle .............................. SUCCESS [  2.009 s]
[INFO] hudi-spark3-bundle_2.12 ............................ SUCCESS [ 14.460 s]
[INFO] hudi-presto-bundle ................................. SUCCESS [  9.588 s]
[INFO] hudi-timeline-server-bundle ........................ SUCCESS [  7.371 s]
[INFO] hudi-hadoop-docker ................................. SUCCESS [  0.852 s]
[INFO] hudi-hadoop-base-docker ............................ SUCCESS [01:08 min]
[INFO] hudi-hadoop-namenode-docker ........................ SUCCESS [  0.178 s]
[INFO] hudi-hadoop-datanode-docker ........................ SUCCESS [  0.124 s]
[INFO] hudi-hadoop-history-docker ......................... SUCCESS [  0.112 s]
[INFO] hudi-hadoop-hive-docker ............................ SUCCESS [  0.541 s]
[INFO] hudi-hadoop-sparkbase-docker ....................... SUCCESS [  0.121 s]
[INFO] hudi-hadoop-sparkmaster-docker ..................... SUCCESS [  0.128 s]
[INFO] hudi-hadoop-sparkworker-docker ..................... SUCCESS [  0.181 s]
[INFO] hudi-hadoop-sparkadhoc-docker ...................... SUCCESS [  0.171 s]
[INFO] hudi-hadoop-presto-docker .......................... SUCCESS [  0.231 s]
[INFO] hudi-integ-test .................................... SUCCESS [01:05 min]
[INFO] hudi-integ-test-bundle ............................. SUCCESS [02:29 min]
[INFO] hudi-examples ...................................... SUCCESS [  9.459 s]
[INFO] hudi-flink_2.12 .................................... SUCCESS [  9.703 s]
[INFO] hudi-flink-bundle_2.12 0.9.0 ....................... SUCCESS [ 24.891 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11:02 min
[INFO] Finished at: 2022-05-04T00:53:46+08:00
[INFO] ------------------------------------------------------------------------

关键maven配置

这份配置是不断报错,找资料最后解决的,所以大家编译的时候直接把我这份配置放在settings.xml里面就可以。

[INFO] Hudi 0.9.0 ......................................... SUCCESS [  2.758 s]
[INFO] hudi-common ........................................ SUCCESS [ 20.652 s]
[INFO] hudi-timeline-service .............................. SUCCESS [  3.375 s]
[INFO] hudi-client ........................................ SUCCESS [  0.208 s]
[INFO] hudi-client-common ................................. SUCCESS [ 12.704 s]
[INFO] hudi-hadoop-mr ..................................... SUCCESS [  5.637 s]
[INFO] hudi-spark-client .................................. SUCCESS [ 24.567 s]
[INFO] hudi-sync-common ................................... SUCCESS [  1.197 s]
[INFO] hudi-hive-sync ..................................... SUCCESS [  6.125 s]
[INFO] hudi-spark-datasource .............................. SUCCESS [  0.107 s]
[INFO] hudi-spark-common_2.12 ............................. SUCCESS [ 13.649 s]
[INFO] hudi-spark3_2.12 ................................... SUCCESS [ 11.451 s]
[INFO] hudi-spark_2.12 .................................... SUCCESS [ 45.515 s]
[INFO] hudi-utilities_2.12 ................................ SUCCESS [ 23.751 s]
[INFO] hudi-utilities-bundle_2.12 ......................... SUCCESS [ 51.554 s]
[INFO] hudi-cli ........................................... SUCCESS [ 32.192 s]
[INFO] hudi-java-client ................................... SUCCESS [  3.458 s]
[INFO] hudi-flink-client .................................. SUCCESS [ 12.356 s]
[INFO] hudi-spark2_2.12 ................................... SUCCESS [ 17.489 s]
[INFO] hudi-dla-sync ...................................... SUCCESS [  3.055 s]
[INFO] hudi-sync .......................................... SUCCESS [  0.131 s]
[INFO] hudi-hadoop-mr-bundle .............................. SUCCESS [  6.229 s]
[INFO] hudi-hive-sync-bundle .............................. SUCCESS [  2.009 s]
[INFO] hudi-spark3-bundle_2.12 ............................ SUCCESS [ 14.460 s]
[INFO] hudi-presto-bundle ................................. SUCCESS [  9.588 s]
[INFO] hudi-timeline-server-bundle ........................ SUCCESS [  7.371 s]
[INFO] hudi-hadoop-docker ................................. SUCCESS [  0.852 s]
[INFO] hudi-hadoop-base-docker ............................ SUCCESS [01:08 min]
[INFO] hudi-hadoop-namenode-docker ........................ SUCCESS [  0.178 s]
[INFO] hudi-hadoop-datanode-docker ........................ SUCCESS [  0.124 s]
[INFO] hudi-hadoop-history-docker ......................... SUCCESS [  0.112 s]
[INFO] hudi-hadoop-hive-docker ............................ SUCCESS [  0.541 s]
[INFO] hudi-hadoop-sparkbase-docker ....................... SUCCESS [  0.121 s]
[INFO] hudi-hadoop-sparkmaster-docker ..................... SUCCESS [  0.128 s]
[INFO] hudi-hadoop-sparkworker-docker ..................... SUCCESS [  0.181 s]
[INFO] hudi-hadoop-sparkadhoc-docker ...................... SUCCESS [  0.171 s]
[INFO] hudi-hadoop-presto-docker .......................... SUCCESS [  0.231 s]
[INFO] hudi-integ-test .................................... SUCCESS [01:05 min]
[INFO] hudi-integ-test-bundle ............................. SUCCESS [02:29 min]
[INFO] hudi-examples ...................................... SUCCESS [  9.459 s]
[INFO] hudi-flink_2.12 .................................... SUCCESS [  9.703 s]
[INFO] hudi-flink-bundle_2.12 0.9.0 ....................... SUCCESS [ 24.891 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11:02 min
[INFO] Finished at: 2022-05-04T00:53:46+08:00
[INFO] ------------------------------------------------------------------------

后记

hudi编译会比其他的简单很多 ~~

相关实践学习
基于Hologres轻松玩转一站式实时仓库
本场景介绍如何利用阿里云MaxCompute、实时计算Flink和交互式分析服务Hologres开发离线、实时数据融合分析的数据大屏应用。
Linux入门到精通
本套课程是从入门开始的Linux学习课程,适合初学者阅读。由浅入深案例丰富,通俗易懂。主要涉及基础的系统操作以及工作中常用的各种服务软件的应用、部署和优化。即使是零基础的学员,只要能够坚持把所有章节都学完,也一定会受益匪浅。
目录
相关文章
|
5月前
|
存储 分布式计算 大数据
数据仓库与数据湖在大数据架构中的角色与应用
在大数据时代,数据仓库和数据湖分别以结构化数据管理和原始数据存储见长,共同助力企业数据分析。数据仓库通过ETL处理支持OLAP查询,适用于历史分析、BI报表和预测分析;而数据湖则存储多样化的原始数据,便于数据探索和实验。随着技术发展,湖仓一体成为趋势,融合两者的优点,如Delta Lake和Hudi,实现数据全生命周期管理。企业应根据自身需求选择合适的数据架构,以释放数据潜力。【6月更文挑战第12天】
203 5
|
3月前
|
存储 缓存 Cloud Native
阿里云EMR数据湖文件系统问题之JindoFS架构升级后的问题如何解决
阿里云EMR数据湖文件系统问题之JindoFS架构升级后的问题如何解决
|
6月前
|
SQL 关系型数据库 HIVE
KLOOK客路旅行基于Apache Hudi的数据湖实践
KLOOK客路旅行基于Apache Hudi的数据湖实践
123 2
KLOOK客路旅行基于Apache Hudi的数据湖实践
|
6月前
|
SQL 消息中间件 Kafka
使用 Apache Flink 和 Apache Hudi 创建低延迟数据湖管道
使用 Apache Flink 和 Apache Hudi 创建低延迟数据湖管道
83 3
使用 Apache Flink 和 Apache Hudi 创建低延迟数据湖管道
|
6月前
|
存储 分布式计算 分布式数据库
字节跳动基于Apache Hudi构建EB级数据湖实践
字节跳动基于Apache Hudi构建EB级数据湖实践
96 2
|
6月前
|
存储 消息中间件 SQL
基于 Apache Hudi 构建分析型数据湖
基于 Apache Hudi 构建分析型数据湖
61 4
|
6月前
|
存储 分布式计算 Hadoop
一文了解Apache Hudi架构、工具和最佳实践
一文了解Apache Hudi架构、工具和最佳实践
1125 0
|
6月前
|
SQL 分布式计算 HIVE
最强指南!数据湖Apache Hudi、Iceberg、Delta环境搭建
最强指南!数据湖Apache Hudi、Iceberg、Delta环境搭建
284 0
|
6月前
|
存储 分布式计算 Hadoop
Apache Hudi:云数据湖解决方案
Apache Hudi:云数据湖解决方案
115 0
|
6月前
|
存储 SQL 分布式计算
使用Apache Hudi构建大规模、事务性数据湖
使用Apache Hudi构建大规模、事务性数据湖
129 0