解决spark模块依赖冲突
修改了Hive版本为3.1.2,其携带的jetty是0.9.3,hudi本身用的0.9.4,存在依赖冲突。
1)修改hudi-spark-bundle的pom文件,排除低版本jetty,添加hudi指定版本的jetty:
vim /opt/software/hudi-0.12.0/packaging/hudi-spark-bundle/pom.xml
在382行的位置,修改如下(红色部分):
<!-- Hive --> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-service</artifactId> <version>${hive.version}</version> <scope>${spark.bundle.hive.scope}</scope> <exclusions> <exclusion> <artifactId>guava</artifactId> <groupId>com.google.guava</groupId> </exclusion> <exclusion> <groupId>org.eclipse.jetty</groupId> <artifactId>*</artifactId> </exclusion> <exclusion> <groupId>org.pentaho</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-service-rpc</artifactId> <version>${hive.version}</version> <scope>${spark.bundle.hive.scope}</scope> </dependency> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-jdbc</artifactId> <version>${hive.version}</version> <scope>${spark.bundle.hive.scope}</scope> <exclusions> <exclusion> <groupId>javax.servlet</groupId> <artifactId>*</artifactId> </exclusion> <exclusion> <groupId>javax.servlet.jsp</groupId> <artifactId>*</artifactId> </exclusion> <exclusion> <groupId>org.eclipse.jetty</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-metastore</artifactId> <version>${hive.version}</version> <scope>${spark.bundle.hive.scope}</scope> <exclusions> <exclusion> <groupId>javax.servlet</groupId> <artifactId>*</artifactId> </exclusion> <exclusion> <groupId>org.datanucleus</groupId> <artifactId>datanucleus-core</artifactId> </exclusion> <exclusion> <groupId>javax.servlet.jsp</groupId> <artifactId>*</artifactId> </exclusion> <exclusion> <artifactId>guava</artifactId> <groupId>com.google.guava</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-common</artifactId> <version>${hive.version}</version> <scope>${spark.bundle.hive.scope}</scope> <exclusions> <exclusion> <groupId>org.eclipse.jetty.orbit</groupId> <artifactId>javax.servlet</artifactId> </exclusion> <exclusion> <groupId>org.eclipse.jetty</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <!-- 增加hudi配置版本的jetty --> <dependency> <groupId>org.eclipse.jetty</groupId> <artifactId>jetty-server</artifactId> <version>${jetty.version}</version> </dependency> <dependency> <groupId>org.eclipse.jetty</groupId> <artifactId>jetty-util</artifactId> <version>${jetty.version}</version> </dependency> <dependency> <groupId>org.eclipse.jetty</groupId> <artifactId>jetty-webapp</artifactId> <version>${jetty.version}</version> </dependency> <dependency> <groupId>org.eclipse.jetty</groupId> <artifactId>jetty-http</artifactId> <version>${jetty.version}</version> </dependency>
否则在使用spark向hudi表插入数据时,会报错如下:
java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V
2)修改hudi-utilities-bundle的pom文件,排除低版本jetty,添加hudi指定版本的jetty:
vim /opt/software/hudi-0.12.0/packaging/hudi-utilities-bundle/pom.xml
在405行的位置,修改如下(红色部分):
<!-- Hoodie --> <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi-common</artifactId> <version>${project.version}</version> <exclusions> <exclusion> <groupId>org.eclipse.jetty</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hudi</groupId> <artifactId>hudi-client-common</artifactId> <version>${project.version}</version> <exclusions> <exclusion> <groupId>org.eclipse.jetty</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <!-- Hive --> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-service</artifactId> <version>${hive.version}</version> <scope>${utilities.bundle.hive.scope}</scope> <exclusions> <exclusion> <artifactId>servlet-api</artifactId> <groupId>javax.servlet</groupId> </exclusion> <exclusion> <artifactId>guava</artifactId> <groupId>com.google.guava</groupId> </exclusion> <exclusion> <groupId>org.eclipse.jetty</groupId> <artifactId>*</artifactId> </exclusion> <exclusion> <groupId>org.pentaho</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-service-rpc</artifactId> <version>${hive.version}</version> <scope>${utilities.bundle.hive.scope}</scope> </dependency> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-jdbc</artifactId> <version>${hive.version}</version> <scope>${utilities.bundle.hive.scope}</scope> <exclusions> <exclusion> <groupId>javax.servlet</groupId> <artifactId>*</artifactId> </exclusion> <exclusion> <groupId>javax.servlet.jsp</groupId> <artifactId>*</artifactId> </exclusion> <exclusion> <groupId>org.eclipse.jetty</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-metastore</artifactId> <version>${hive.version}</version> <scope>${utilities.bundle.hive.scope}</scope> <exclusions> <exclusion> <groupId>javax.servlet</groupId> <artifactId>*</artifactId> </exclusion> <exclusion> <groupId>org.datanucleus</groupId> <artifactId>datanucleus-core</artifactId> </exclusion> <exclusion> <groupId>javax.servlet.jsp</groupId> <artifactId>*</artifactId> </exclusion> <exclusion> <artifactId>guava</artifactId> <groupId>com.google.guava</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>${hive.groupid}</groupId> <artifactId>hive-common</artifactId> <version>${hive.version}</version> <scope>${utilities.bundle.hive.scope}</scope> <exclusions> <exclusion> <groupId>org.eclipse.jetty.orbit</groupId> <artifactId>javax.servlet</artifactId> </exclusion> <exclusion> <groupId>org.eclipse.jetty</groupId> <artifactId>*</artifactId> </exclusion> </exclusions> </dependency> <!-- 增加hudi配置版本的jetty --> <dependency> <groupId>org.eclipse.jetty</groupId> <artifactId>jetty-server</artifactId> <version>${jetty.version}</version> </dependency> <dependency> <groupId>org.eclipse.jetty</groupId> <artifactId>jetty-util</artifactId> <version>${jetty.version}</version> </dependency> <dependency> <groupId>org.eclipse.jetty</groupId> <artifactId>jetty-webapp</artifactId> <version>${jetty.version}</version> </dependency> <dependency> <groupId>org.eclipse.jetty</groupId> <artifactId>jetty-http</artifactId> <version>${jetty.version}</version> </dependency>
否则在使用DeltaStreamer工具向hudi表插入数据时,也会报Jetty的错误。
2.2.6 执行编译命令
mvn clean package -DskipTests -Dspark3.2 -Dflink1.13 -Dscala-2.12 -Dhadoop.version=3.1.3 -Pflink-bundle-shade-hive3
2.2.7 编译成功
编译成功后,进入hudi-cli说明成功:
编译完成后,相关的包在packaging目录的各个模块中:
比如,flink与hudi的包: