前言
现在是2022年6月18日9:50,凌晨倒腾快到2点了,今天其实是京东的店庆,另外我还记得今天其实是中考的日子,不知道是不是中学都是这天毕业,再或者是学校不一样,时间点不一样,可能时间也有调整了吧。事情是这样,我本身是为了搭建hudi的环境,所以要要搭建Spark环境,Hadoop环境,奈何家里网速有点慢,我找了官网,Spark3.3.0源码其实是28M,但是那个安装包要261M,其实Hadoop2.7.3源码是17.3M但是按照包200多M,最近也是网速抽风,我下载不过来,所以准备一边下载一边自己编译的搞法,Hadoop比较顺利,但是我发现Spark3.3版本和3.0版本按照我之前的搞法死活过不去。我原来写过一篇文章 Spark3.0源码编译打包,但是到了3.3.0需要再修正了,所以就决定再次记录下来。
先晒结果
国内网络不通确实是比较伤脑筋的,以下的结果是倒腾了蛮久才出现的,起因是在Spark3.3.0中增加了谷歌仓库插件的访问,纠结了很久才调试通过。
Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.pom Downloaded from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.pom (0 B at 0 B/s) Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/org/fusesource/leveldbjni/leveldbjni-project/1.8/leveldbjni-project-1.8.pom Downloaded from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/org/fusesource/leveldbjni/leveldbjni-project/1.8/leveldbjni-project-1.8.pom (0 B at 0 B/s) Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/org/fusesource/fusesource-pom/1.9/fusesource-pom-1.9.pom Downloaded from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/org/fusesource/fusesource-pom/1.9/fusesource-pom-1.9.pom (0 B at 0 B/s) Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/core/jackson-core/2.13.3/jackson-core-2.13.3.pom Downloaded from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/core/jackson-core/2.13.3/jackson-core-2.13.3.pom (5.5 kB at 7.1 kB/s) Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/jackson-base/2.13.3/jackson-base-2.13.3.pom Downloaded from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/jackson-base/2.13.3/jackson-base-2.13.3.pom (9.9 kB at 14 kB/s) Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/jackson-bom/2.13.3/jackson-bom-2.13.3.pom Downloaded from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/jackson-bom/2.13.3/jackson-bom-2.13.3.pom (17 kB at 24 kB/s) Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/jackson-parent/2.13/jackson-parent-2.13.pom Downloaded from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/jackson-parent/2.13/jackson-parent-2.13.pom (7.4 kB at 9.3 kB/s) Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/oss-parent/43/oss-parent-43.pom Downloaded from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/oss-parent/43/oss-parent-43.pom (24 kB at 30 kB/s) Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/core/jackson-databind/2.13.3/jackson-databind-2.13.3.pom Downloaded from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/core/jackson-databind/2.13.3/jackson-databind-2.13.3.pom (16 kB at 21 kB/s) Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/core/jackson-annotations/2.13.3/jackson-annotations-2.13.3.pom Downloaded from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/com/fasterxml/jackson/core/jackson-annotations/2.13.3/jackson-annotations-2.13.3.pom (6.1 kB at 7.6 kB/s) Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/org/apache/logging/log4j/log4j-api/2.17.2/log4j-api-2.17.2.pom Downloaded from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/org/apache/logging/log4j/log4j-api/2.17.2/log4j-api-2.17.2.pom (14 kB at 19 kB/s) Downloading from gcs-maven-central-mirror: https://maven-central-asia.storage-download.googleapis.com/maven2/org/apache/logging/log4j/log4j/2.17.2/log4j-2.17.2.pom
抓狂的证书问题
一直连接不上的仓库
编译命令还是没有变化
./dev/make-distribution.sh --name spark-3.3.0 --tgz -Phadoop-2 -Dhadoop.version=2.7.4 -Phive -Phive-thriftserver -Pyarn
问题是出现在pom.xml中,有这么部分
<pluginRepository> <id>gcs-maven-central-mirror</id> <!-- Google Mirror of Maven Central, placed first so that it's used instead of flaky Maven Central. See https://storage-download.googleapis.com/maven-central/index.html --> <name>GCS Maven Central mirror</name> <!--<url>https://maven-central.storage-download.googleapis.com/maven2/</url>--> <url>https://maven-central-asia.storage-download.googleapis.com/maven2/</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </pluginRepository>
网络上是叫我改成阿里的地址,我试了,其实没有用,依赖库在ali的仓库里面是没有的,然后根据提示,我换成了谷歌的亚洲仓库,也不起作用,日志一直卡在下载maven-metadata.xml的地方,完全超时
gcs-maven-central-mirror (https://maven-central.storage-download.googleapis.com/maven2/): transfer failed for https://maven-central.storage-download.googleapis.com/maven2/org/apache/maven/plugins/maven-metadata.xml
一直报错
https://maven-central-asia.storage-download.googleapis.com/maven2/org/apache/maven/plugins/maven-metadata.xml
比较长时间不断在切换仓库,以为就是googleapis连接不上导致的.
转折点
也不知道哪里来的想法,我突然想确定一下是不是确实连接访问不了,我直接用wget去访问了一下
wget https://maven-central-asia.storage-download.googleapis.com/maven2/org/apache/maven/plugins/maven-metadata.xml
结果报错了,不过内容并不是连接不上,而是证书问题
[root@zhu-91-134 spark-3.3.0]# wget https://maven-central-asia.storage-download.googleapis.com/maven2/org/apache/maven/plugins/maven-metadata.xml --2022-05-04 12:31:00-- https://maven-central-asia.storage-download.googleapis.com/maven2/org/apache/maven/plugins/maven-metadata.xml 正在解析主机 maven-central-asia.storage-download.googleapis.com... 142.251.43.16 正在连接 maven-central-asia.storage-download.googleapis.com|142.251.43.16|:443... 已连接。 错误: 无法验证 maven-central-asia.storage-download.googleapis.com 的由 “/C=US/O=Google Trust Services LLC/CN=GTS CA 1C3” 颁发的证书: 颁发的证书还未生效。 错误: 证书通用名 “*.storage.googleapis.com” 与所要求的主机名 “maven-central-asia.storage-download.googleapis.com” 不符。 要以不安全的方式连接至 maven-central-asia.storage-download.googleapis.com,使用‘--no-check-certificate’。 You have new mail in /var/spool/mail/root
这个问题好办,有经验,修改一些不校验的参数
wget https://maven-central-asia.storage-download.googleapis.com/maven2/org/apache/maven/plugins/maven-metadata.xml --no-check-certifica
maven命令行修改参数
找到了原因,就想办法在maven中去掉限制,这个也是之前知道,所以就直接加上
-Dmaven.wagon.http.ssl.allowall=true -Dmaven.wagon.http.ssl.ignore.validity.dates=true,这样可以不进行证书校验
./dev/make-distribution.sh --name spark-3.3.0 --tgz -Dmaven.wagon.http.ssl.insecure=true -Dmaven.wagon.http.ssl.allowall=true -Dmaven.wagon.http.ssl.ignore.validity.dates=true -Phadoop-2 -Dhadoop.version=2.7.4 -Phive -Phive-thriftserver -Pyarn
网络问题就可以顺利通过了
内存问题
最后还是需要调整一下内存
export MAVEN_OPTS="-Xss64m -Xmx2g -XX:ReservedCodeCacheSize=1g"