Hudi数据湖技术引领大数据新风口(三)解决spark模块依赖冲突-阿里云开发者社区

Hudi数据湖技术引领大数据新风口(三)解决spark模块依赖冲突

2023-12-26 287 发布于黑龙江

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生大数据计算服务 MaxCompute，5000CU*H 100GB 3个月

云原生大数据计算服务MaxCompute，500CU*H 100GB 3个月

简介： Hudi数据湖技术引领大数据新风口(三)解决spark模块依赖冲突

解决spark模块依赖冲突

修改了Hive版本为3.1.2，其携带的jetty是0.9.3，hudi本身用的0.9.4，存在依赖冲突。

1）修改hudi-spark-bundle的pom文件，排除低版本jetty，添加hudi指定版本的jetty:

vim /opt/software/hudi-0.12.0/packaging/hudi-spark-bundle/pom.xml

在382行的位置，修改如下（红色部分）：

<!-- Hive -->
  <dependency>
   <groupId>${hive.groupid}</groupId>
   <artifactId>hive-service</artifactId>
   <version>${hive.version}</version>
   <scope>${spark.bundle.hive.scope}</scope>
   <exclusions>
    <exclusion>
     <artifactId>guava</artifactId>
     <groupId>com.google.guava</groupId>
    </exclusion>
    <exclusion>
     <groupId>org.eclipse.jetty</groupId>
     <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
     <groupId>org.pentaho</groupId>
     <artifactId>*</artifactId>
    </exclusion>
   </exclusions>
  </dependency>
  <dependency>
   <groupId>${hive.groupid}</groupId>
   <artifactId>hive-service-rpc</artifactId>
   <version>${hive.version}</version>
   <scope>${spark.bundle.hive.scope}</scope>
  </dependency>
  <dependency>
   <groupId>${hive.groupid}</groupId>
   <artifactId>hive-jdbc</artifactId>
   <version>${hive.version}</version>
   <scope>${spark.bundle.hive.scope}</scope>
   <exclusions>
    <exclusion>
     <groupId>javax.servlet</groupId>
     <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
     <groupId>javax.servlet.jsp</groupId>
     <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
     <groupId>org.eclipse.jetty</groupId>
     <artifactId>*</artifactId>
    </exclusion>
   </exclusions>
  </dependency>
  <dependency>
   <groupId>${hive.groupid}</groupId>
   <artifactId>hive-metastore</artifactId>
   <version>${hive.version}</version>
   <scope>${spark.bundle.hive.scope}</scope>
   <exclusions>
    <exclusion>
     <groupId>javax.servlet</groupId>
     <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
     <groupId>org.datanucleus</groupId>
     <artifactId>datanucleus-core</artifactId>
    </exclusion>
    <exclusion>
     <groupId>javax.servlet.jsp</groupId>
     <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
     <artifactId>guava</artifactId>
     <groupId>com.google.guava</groupId>
    </exclusion>
   </exclusions>
  </dependency>
  <dependency>
   <groupId>${hive.groupid}</groupId>
   <artifactId>hive-common</artifactId>
   <version>${hive.version}</version>
   <scope>${spark.bundle.hive.scope}</scope>
   <exclusions>
    <exclusion>
     <groupId>org.eclipse.jetty.orbit</groupId>
     <artifactId>javax.servlet</artifactId>
    </exclusion>
    <exclusion>
     <groupId>org.eclipse.jetty</groupId>
     <artifactId>*</artifactId>
    </exclusion>
   </exclusions>
</dependency>
  <!-- 增加hudi配置版本的jetty -->
  <dependency>
   <groupId>org.eclipse.jetty</groupId>
   <artifactId>jetty-server</artifactId>
   <version>${jetty.version}</version>
  </dependency>
  <dependency>
   <groupId>org.eclipse.jetty</groupId>
   <artifactId>jetty-util</artifactId>
   <version>${jetty.version}</version>
  </dependency>
  <dependency>
   <groupId>org.eclipse.jetty</groupId>
   <artifactId>jetty-webapp</artifactId>
   <version>${jetty.version}</version>
  </dependency>
  <dependency>
   <groupId>org.eclipse.jetty</groupId>
   <artifactId>jetty-http</artifactId>
   <version>${jetty.version}</version>
  </dependency>

否则在使用spark向hudi表插入数据时，会报错如下：

java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V

2）修改hudi-utilities-bundle的pom文件，排除低版本jetty，添加hudi指定版本的jetty:

vim /opt/software/hudi-0.12.0/packaging/hudi-utilities-bundle/pom.xml

在405行的位置，修改如下（红色部分）：

<!-- Hoodie -->
  <dependency>
   <groupId>org.apache.hudi</groupId>
   <artifactId>hudi-common</artifactId>
   <version>${project.version}</version>
   <exclusions>
    <exclusion>
     <groupId>org.eclipse.jetty</groupId>
     <artifactId>*</artifactId>
    </exclusion>
   </exclusions>
  </dependency>
  <dependency>
   <groupId>org.apache.hudi</groupId>
   <artifactId>hudi-client-common</artifactId>
   <version>${project.version}</version>
   <exclusions>
    <exclusion>
     <groupId>org.eclipse.jetty</groupId>
     <artifactId>*</artifactId>
    </exclusion>
   </exclusions>
  </dependency>
<!-- Hive -->
  <dependency>
   <groupId>${hive.groupid}</groupId>
   <artifactId>hive-service</artifactId>
   <version>${hive.version}</version>
   <scope>${utilities.bundle.hive.scope}</scope>
   <exclusions>
    <exclusion>
     <artifactId>servlet-api</artifactId>
     <groupId>javax.servlet</groupId>
    </exclusion>
    <exclusion>
     <artifactId>guava</artifactId>
     <groupId>com.google.guava</groupId>
    </exclusion>
    <exclusion>
     <groupId>org.eclipse.jetty</groupId>
     <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
     <groupId>org.pentaho</groupId>
     <artifactId>*</artifactId>
    </exclusion>
   </exclusions>
  </dependency>
  <dependency>
   <groupId>${hive.groupid}</groupId>
   <artifactId>hive-service-rpc</artifactId>
   <version>${hive.version}</version>
   <scope>${utilities.bundle.hive.scope}</scope>
  </dependency>
  <dependency>
   <groupId>${hive.groupid}</groupId>
   <artifactId>hive-jdbc</artifactId>
   <version>${hive.version}</version>
   <scope>${utilities.bundle.hive.scope}</scope>
   <exclusions>
    <exclusion>
     <groupId>javax.servlet</groupId>
     <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
     <groupId>javax.servlet.jsp</groupId>
     <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
     <groupId>org.eclipse.jetty</groupId>
     <artifactId>*</artifactId>
    </exclusion>
   </exclusions>
  </dependency>
  <dependency>
   <groupId>${hive.groupid}</groupId>
   <artifactId>hive-metastore</artifactId>
   <version>${hive.version}</version>
   <scope>${utilities.bundle.hive.scope}</scope>
   <exclusions>
    <exclusion>
     <groupId>javax.servlet</groupId>
     <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
     <groupId>org.datanucleus</groupId>
     <artifactId>datanucleus-core</artifactId>
    </exclusion>
    <exclusion>
     <groupId>javax.servlet.jsp</groupId>
     <artifactId>*</artifactId>
    </exclusion>
    <exclusion>
     <artifactId>guava</artifactId>
     <groupId>com.google.guava</groupId>
    </exclusion>
   </exclusions>
  </dependency>
  <dependency>
   <groupId>${hive.groupid}</groupId>
   <artifactId>hive-common</artifactId>
   <version>${hive.version}</version>
   <scope>${utilities.bundle.hive.scope}</scope>
   <exclusions>
    <exclusion>
     <groupId>org.eclipse.jetty.orbit</groupId>
     <artifactId>javax.servlet</artifactId>
    </exclusion>
    <exclusion>
     <groupId>org.eclipse.jetty</groupId>
     <artifactId>*</artifactId>
    </exclusion>
   </exclusions>
</dependency>
  <!-- 增加hudi配置版本的jetty -->
  <dependency>
   <groupId>org.eclipse.jetty</groupId>
   <artifactId>jetty-server</artifactId>
   <version>${jetty.version}</version>
  </dependency>
  <dependency>
   <groupId>org.eclipse.jetty</groupId>
   <artifactId>jetty-util</artifactId>
   <version>${jetty.version}</version>
  </dependency>
  <dependency>
   <groupId>org.eclipse.jetty</groupId>
   <artifactId>jetty-webapp</artifactId>
   <version>${jetty.version}</version>
  </dependency>
  <dependency>
   <groupId>org.eclipse.jetty</groupId>
   <artifactId>jetty-http</artifactId>
   <version>${jetty.version}</version>
  </dependency>

否则在使用DeltaStreamer工具向hudi表插入数据时，也会报Jetty的错误。

2.2.6 执行编译命令

mvn clean package -DskipTests -Dspark3.2 -Dflink1.13 -Dscala-2.12 -Dhadoop.version=3.1.3 -Pflink-bundle-shade-hive3

2.2.7 编译成功

编译成功后，进入hudi-cli说明成功：

编译完成后，相关的包在packaging目录的各个模块中：

比如，flink与hudi的包：

Hudi数据湖技术引领大数据新风口(三)解决spark模块依赖冲突

解决spark模块依赖冲突

2.2.6 执行编译命令

2.2.7 编译成功

下一章核心概念

热门文章

最新文章

相关课程

相关电子书

相关实验场景

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

Hudi数据湖技术引领大数据新风口(三)解决spark模块依赖冲突

解决spark模块依赖冲突

2.2.6 执行编译命令

2.2.7 编译成功

下一章 核心概念

热门文章

最新文章

相关课程

相关电子书

相关实验场景

下一章核心概念