spark1.x升级spark2如何升级及需要考虑的问题

简介: spark1.x升级spark2如何升级及需要考虑的问题

spark2出来已经很长时间了,但是由于spark1.6比较稳定,很多依然在使用。如果想使用spark2,那么该如何升级。我们window升级一般为直接点击升级即可,剩下的事情,不用我们管。但是spark的升级确实有点出乎意料。相当于我们直接安装,但是可以借用以前的配置,比如配置文件基本是不变的,如果目录相同,环境变量变化也不大。

如果只是单纯的学习,升级是没有问题的。但是如果我们生产环境,升级就需要注意了,因为升级后会带来不少的负作用。

spark安装参考http://www.aboutyun.com/forum.php?mod=viewthread&tid=20620


下面介绍如何升级:



1.spark升级

首先停止所有服务


./stop-all.sh

这里额外补充一些内容:

spark有stop-all.sh,

eafcbc9f01ce589df972066e18c4c164.jpghadoop也有同样的命令,只不过hadoop在准备弃用下面两个命令。那么如果想使用这两个命令,我们最好到对应的目录里面sbin,然后执行


./stop-all.sh

ea466326a8896b2f4ee344008fcae2f2.jpg既然手工配置,升级我们需要考虑的问题:


问题


1.配置文件是否变化

参考官网spark1.x和2.x所幸应该是没有变化的,配置文件还是那些。

http://spark.apache.org/docs/latest/spark-standalone.html,这样升级就放心了,因为我们可以使用原先的配置文件,不能再麻烦了。


2.变化的有哪些


我们停止集群后,后面开始相关的配置。

我这里的spark版本为1.6,这里要升级为2.2

首先重命名spark文件夹


sudo mv spark spark1.6

解压spark2.2包


sudo tar zxvf spark-2.2.0-bin-hadoop2.7.tgz -C /data

查看权限为500

9ff657aa8134aac65c5c3c44d44fa952.jpg

为了防止出现问题,因此改变下权限:


sudo chown -R aboutyun:aboutyun spark-2.2.0-bin-hadoop2.7/


sudo chmod -R 777 spark-2.2.0-bin-hadoop2.7/

我们队这个文件夹重命名


sudo mv spark-2.2.0-bin-hadoop2.7/ spark


将spark1.6的文件spark-env.sh、slaves、spark-defaults.conf复制到spark

对于三个文件,如果都比较完善的话,是不需要修改的

slaves

机器不变化,是不需要修改的。

spark-env.sh

JAVA_HOME=/data/jdk1.8

SCALA_HOME=/data/scala2

SPARK_MASTER_HOST=192.168.1.10

HADOOP_CONF_DIR=/data/hadoop/etc/hadoop

SPARK_LOCAL_DIR=/data/spark_data

SPARK_WORKER_DIR=/data/spark_data/spark_works


说明:SPARK_MASTER_IP在spark1.x中,spark2中使用的是SPARK_MASTER_HOST


spark-defaults.conf

spark.master                        spark://master:7077

spark.eventLog.enabled                true

spark.eventLog.dir                file:///data/spark_data/history/event-log

spark.serializer                org.apache.spark.serializer.KryoSerializer

spark.history.fs.LogDirectory        file:///data/spark_data/history/spark-events        


上面都不需要修改,当然如果需要调整的自行修改即可。


修改环境变量

~/.bashrc


export HADOOP_HOME=/data/hadoop
export SPARK_HOME=/data/spark
export ZOOKEEPER_HOME=/data/zookeeper-3.4.6
export KAFKA_HOME=/data/kafka_2.11
export HIVE_HOME=/data/hive-1.2.1
export PATH=$HIVE_HOME/bin:$KAFKA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
export FLUME_HOME=/data/flume-1.6.0
export PATH=$FLUME_HOME/bin:$PATH
source ~/.bashrc

这一步很重要,否则可能还是原先的版本

f444fb5e3c8547b31dc1511c9df8ef8a.jpg

上面由于我们文件名为spark,因此不需要修改。


接着我们复制到其它客户端:


scp -r spark aboutyun@slave1:/data



scp -r spark aboutyun@slave2:/data


在远程复制的时候,需要记得将slave1和slave2的hadoop文件夹删除,否则会将hadoop2.7.4和hadoop2.6.5包混合

说明:

一般来讲我们是不能直接复制到非home目录的,所以我们需要将data文件夹授权为777,这样我们才能远程复制成功。

接着我们启动spark,进入spark的sbin目录


./start-all.sh


对于spark的升级,注意如果使用的是hadoop,需要对应hadoop版本,否则可能会出错。对于Scala版本同样需要注意,Scala支持版本为2.11

dbe17855d780fc585d58614cb79f20f7.jpg

#########################


cloudera升级

除了spark原生态升级,对于cloudera升级就比较简单了,cloudera中,spark1.6和spark2是可以并存的,直接安装spark2即可。

#########################

spark升级带来哪些副作用


如果我们已经线上使用,那么需要谨慎升级,否则可能会发生预料之外的事情。下面内容仅供大家参考

计算准确性

SELECT '0.1' = 0返回的是true!Spark 2.2中,0.1会被转换为int,如果你的数据类型全部是文本类型,做数值计算时,结果极有可能不正确。之前的版本中0.1会被转换为double类型绝大多数场景下这样的处理是正确的。目前为止,社区还没有很好的处理这个问题,针对这个问题,我给社区提交过一个PR,想要自己解决这个问题的同学,可以手动合并下:https://github.com/apache/spark/pull/18986

过于复杂的SQL语句执行可能会出现64KB字节码编译限制的问题,这算是个老问题了,Spark自从上了Tungsten基本上一直存在这个问题,也算是受到了JVM的限制,遇到此类问题,建议大家找找PR:https://github.com/apache/spark/search?utf8=%E2%9C%93&q=64KB&type=Issues

数据计算精度有问题,SELECT 1 > 0.0001会报错,这个问题已在2.1.2及2.2.0中修复:https://issues.apache.org/jira/browse/SPARK-20211

2.1.0版本中INNER JOIN涉及到常量计算结果不正确,后续版本已修复:https://issues.apache.org/jira/browse/SPARK-19766

2.1.0中,执行GROUPING SET(col),如果col列数据为null,会报空指针异常,后续版本已修复:https://issues.apache.org/jira/browse/SPARK-19509

2.1.0中,嵌套的CASE WHEN语句执行有可能出错,后续版本已修复:https://issues.apache.org/jira/browse/SPARK-19472



行为变化

那些不算太致命,改改代码或配置就可以兼容的问题。

Spark 2.2的UDAF实现有所变动,如果你的Hive UDAF没有严格按照标准实现,有可能会计算报错或数据不正确,建议将逻辑迁移到Spark AF,同时也能获得更好的性能

Spark 2.1开始全表读取分区表采用FilePartition的方式,单个Partition内可以读取多个文件,如果对文件做了压缩,这种方式有可能导致查询性能变差,可以适当降低spark.sql.files.maxPartitionBytes的值,默认是128MB(对于大部分的Parquet压缩表来说,这个默认设置其实会导致性能问题)

Spark 2.x限制了Hive表中spark.sql.*相关属性的操作,明明存在的属性,使用SHOW TBLPROPERTIES tb("spark.sql.sources.schema.numParts")无法获取到,同理也无法执行ALTER TABLE tb SET TBLPROPERTIES ('spark.sql.test' = 'test')进行修改

无法修改外部表的属性ALTER TABLE tb SET TBLPROPERTIES ('test' = 'test')这里假设tb是EXTERNAL类型的表

DROP VIEW IF EXISTS tb,如果这里的tb是个TABLE而非VIEW,执行会报错AnalysisException: Cannot drop a table with DROP VIEW,在2.x以下不会报错,由于我们指定了IF EXISTS关键字,这里的报错显然不合理,需要做异常处理。

如果你访问的表不存在,异常信息在Spark2.x里由之前的Table not found变成了Table or view not found,如果你的代码里依赖这个异常信息,就需要注意调整了。

EXPLAIN语句的返回格式变掉了,在1.6里是多行文本,2.x中是一行,而且内容格式也有稍微的变化,相比Spark1.6,少了Tungsten关键字;EXPLAIN中显示的HDFS路径过长的话,在Spark 2.x中会被省略为...


2.x中默认不支持笛卡尔积操作,需要通过参数spark.sql.crossJoin.enabled开启

OLAP分析中常用的GROUPING__ID函数在2.x变成了GROUPING_ID()

如果你有一个基于Hive的UDF名为abc,有3个参数,然后又基于Spark的UDF实现了一个2个参数的abc,在2.x中,2个参数的abc会覆盖掉Hive中3个参数的abc函数,1.6则不会有这个问题

执行类似SELECT 1 FROM tb GROUP BY 1的语句会报错,需要单独设置spark.sql.groupByOrdinal false类似的参数还有spark.sql.orderByOrdinal false

CREATE DATABASE默认路径发生了变化,不在从hive-site.xml读取hive.metastore.warehouse.dir,需要通过Spark的spark.sql.warehouse.dir配置指定数据库的默认存储路径。

CAST一个不存在的日期返回null,如:year('2015-03-40'),在1.6中返回2015

Spark 2.x不允许在VIEW中使用临时函数(temp function)https://issues.apache.org/jira/browse/SPARK-18209

Spark 2.1以后,窗口函数ROW_NUMBER()必须要在OVER内添加ORDER BY,以前的ROW_NUMBER() OVER()执行会报错

Spark 2.1以后,SIZE(null)返回-1,之前的版本返回null

Parquet文件的默认压缩算法由gzip变成了snappy,据官方说法是snappy有更好的查询性能,大家需要自己验证性能的变化

DESC FORMATTED tb返回的内容有所变化,1.6的格式和Hive比较贴近,2.x中分两列显示

异常信息的变化,未定义的函数,Spark 2.x: org.apache.spark.sql.AnalysisException: Undefined function: 'xxx’., Spark 1.6: AnalysisException: undefined function xxx,参数格式错误:Spark 2.x:Invalid number of arguments, Spark 1.6: No handler for Hive udf class org.apache.hadoop.hive.ql.udf.generic.GenericUDAFXXX because: Exactly one argument is expected..

Spark Standalone的WebUI中已经没有这个API了:/api/v1/applications:https://issues.apache.org/jira/browse/SPARK-12299https://issues.apache.org/jira/browse/SPARK-18683


内容摘自:

http://www.jianshu.com/p/482407c88d27


###########################################

spark升级遇到问题总结


spark的升级后,会遇到很奇怪的问题,

1.进程会有多个master

2.端口无缘无故被暂用

3.进程都正常,master连接不上


启用spark-shell报错如下


To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/11/17 11:30:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/17 11:30:14 WARN client.StandaloneAppClient$ClientEndpoint: Failed to connect to master master:7077
org.apache.spark.SparkException: Exception thrown in awaitResult: 
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
        at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.StreamCorruptedException: invalid stream header: 01000C31
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
        at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:64)
        at org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:64)
        at org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:123)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
        at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:258)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:310)
        at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:257)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:256)
        at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:588)
        at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570)
        at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
        at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
        at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
        at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:745)
        at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:207)
        at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:120)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        ... 1 more
17/11/17 11:30:33 WARN client.StandaloneAppClient$ClientEndpoint: Failed to connect to master master:7077
org.apache.spark.SparkException: Exception thrown in awaitResult: 
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
        at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.StreamCorruptedException: invalid stream header: 01000C31
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
        at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:64)
        at org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:64)
        at org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:123)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
        at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:258)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:310)
        at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:257)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:256)
        at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:588)
        at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570)
        at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
        at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
        at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
        at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:745)
        at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:207)
        at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:120)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        ... 1 more
17/11/17 11:30:53 WARN client.StandaloneAppClient$ClientEndpoint: Failed to connect to master master:7077
org.apache.spark.SparkException: Exception thrown in awaitResult: 
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
        at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.StreamCorruptedException: invalid stream header: 01000C31
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
        at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:64)
        at org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:64)
        at org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:123)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
        at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:258)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:310)
        at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:257)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:256)
        at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:588)
        at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570)
        at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
        at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
        at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
        at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:745)
        at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:207)
        at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:120)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        ... 1 more
17/11/17 11:31:13 ERROR cluster.StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
17/11/17 11:31:13 WARN cluster.StandaloneSchedulerBackend: Application ID is not initialized yet.

显然是端口的问题,这时候排查7077

netstat -anp | grep 7077

发现被暂用,于是kill掉进程。但是依然不行,最后重启,进入spark sbin目录

./stop-all.sh
./start-all.sh

问题得到解决


目录
相关文章
|
1月前
|
分布式计算 Hadoop 大数据
安装Spark
安装Spark
35 0
|
4月前
|
分布式计算 资源调度 监控
【Spark】 Spark的基础环境 Day03
【Spark】 Spark的基础环境 Day03
37 0
【Spark】 Spark的基础环境 Day03
|
23天前
|
SQL 分布式计算 大数据
Spark 的集成
Spark 的集成
32 2
|
8月前
|
分布式计算 Apache Spark
Apache Doris Spark Load快速体验之Spark部署(1)2
Apache Doris Spark Load快速体验之Spark部署(1)2
98 0
|
8月前
|
SQL 机器学习/深度学习 分布式计算
Apache Doris Spark Load快速体验之Spark部署(1)1
Apache Doris Spark Load快速体验之Spark部署(1)1
91 0
|
分布式计算 资源调度 监控
【Spark】(六)Spark 运行流程
【Spark】(六)Spark 运行流程
491 0
【Spark】(六)Spark 运行流程
|
分布式计算 Java Scala
【Spark】(二)Spark2.3.4 集群分布式安装
【Spark】(二)Spark2.3.4 集群分布式安装
195 0
【Spark】(二)Spark2.3.4 集群分布式安装
|
分布式计算 资源调度 Hadoop
Spark的部署模式
Spark的部署模式
130 0
Spark的部署模式
spark支持2.4.3版本
信息摘要: 该版本主要发布spark-connectors 1.0.4版本,升级spark内核到社区最新稳定版本2.4.3适用客户: 企业客户/个人开发者版本/规格功能: 该版本主要发布spark-connectors 1.
834 0
|
分布式计算 大数据 测试技术
一键部署 spark
前言 Spark简介 Spark是整个BDAS的核心组件,是一个大数据分布式编程框架,不仅实现了MapReduce的算子map 函数和reduce函数及计算模型,还提供更为丰富的算子,如filter、join、groupByKey等。
1117 0