记录CDH5.10一个clients.NetworkClient: Bootstrap broker ip:9092 disconnected问题

简介: 1.当前环境使用的稳定版本组合a.本套环境CDH经过四次升级,当然版本为CDH-5.10.0-1.cdh5.10.0.p0.41b.KAFKA版本为KAFKA-2.

1.当前环境使用的稳定版本组合
a.本套环境CDH经过四次升级,当然版本为CDH-5.10.0-1.cdh5.10.0.p0.41
b.KAFKA版本为KAFKA-2.1.0-1.2.1.0.p0.115
c.SPARK2版本为SPARK2-2.0.0.cloudera1-1.cdh5.7.0.p0.113931




2.Spark2安装排查分析
你在Hosts-->Parcels页会发现,Spark2可以升级到该版本的release的2.0.0.cloudera2版本,即为2.0.0.cloudera2-1.cdh5.7.0.p0.118100,
但是我们在安装时,发现该版本的spark history启动报错,通过分析shell脚本stdout,stderr日志则报错为
The CSD version (2.0.0.cloudera1) is not compatible with the current Spark 2 version (2.0.0.cloudera2)

后来再分析一下,当前的CSD_VERSION为2.0.0.cloudera1,假如升级为最新版本,则SPARK2_VERSION为2.0.0.cloudera2,所以服务根本不可能启动,
尝试着在元数据库的表中将2.0.0.cloudera2改为2.0.0.cloudera1,但是web界面的parcel的该spark2的则立即显示不可用,这时真心感觉cloudera的厉害!

最后我选择和CSD_VERSION相同版本的SPARK2-2.0.0.cloudera1-1.cdh5.7.0.p0.113931


3.spark2_submit提交jar包到yarn上,实时spak从kafka中读取数据,但是检查job的log发现以下错误



4.分析错误,将程序的pom文件引用的版本全部替换为当前CDH,Kafka,Spark2的版本,再编译jar包
(其实假如编译廋包,就是没有依赖包,pom文件为Apache maven也行);

然后思考怀疑集群上的spark2的kafka jar包和CDH的kafka 版本不一致,
故将之前版本bak,然后cpoy 当前kafka的jar包到spark2的jars文件夹中(重点改这


4.1pom文件
img_e25d4fb2f8de1caf41a735ec53088516.pngpom.rar

4.2集群的每台都要进行如下操作

点击(此处)折叠或打开

  1. [root@sh-hadoop-01 ~]# /opt/cloudera/parcels/SPARK2/lib/spark2/jars/
  2. [root@sh-hadoop-01 jars]# ll
  3. ...............
  4. -rw-rw-r-- 1 root root 5001608 Dec 7 02:54 kafka_2.11-0.9.0-kafka-2.0.0.jar
  5. -rw-rw-r-- 1 root root 649382 Dec 7 02:54 kafka-clients-0.9.0-kafka-2.0.0.jar
  6. ..............
  7. [root@sh-hadoop-01 jars]# mv kafka_2.11-0.9.0-kafka-2.0.0.jar kafka_2.11-0.9.0-kafka-2.0.0.jar.bak
  8. [root@sh-hadoop-01 jars]# mv kafka-clients-0.9.0-kafka-2.0.0.jar kafka-clients-0.9.0-kafka-2.0.0.jar.bak
  9. [root@sh-hadoop-01 jars]# cd /opt/cloudera/parcels/KAFKA/lib/kafka/libs
  10. [root@sh-hadoop-01 libs]# cp /opt/cloudera/parcels/KAFKA/lib/kafka/libs/kafka_2.11-0.10.0-kafka-2.1.0.jar /opt/cloudera/parcels/SPARK2/lib/spark2/jars/
  11. [root@sh-hadoop-01 libs]# cp /opt/cloudera/parcels/KAFKA/lib/kafka/libs/kafka-clients-0.10.0-kafka-2.1.0.jar /opt/cloudera/parcels/SPARK2/lib/spark2/jars/
  12. [root@sh-hadoop-01 libs]# ll /opt/cloudera/parcels/SPARK2/lib/spark2/jars/
  13. ...............
  14. -rwxr-xr-x 1 root root 5156768 Mar 9 23:48 kafka_2.11-0.10.0-kafka-2.1.0.jar
  15. -rw-rw-r-- 1 root root 5001608 Dec 7 02:54 kafka_2.11-0.9.0-kafka-2.0.0.jar.bak
  16. -rwxr-xr-x 1 root root 747732 Mar 9 23:48 kafka-clients-0.10.0-kafka-2.1.0.jar
  17. -rw-rw-r-- 1 root root 649382 Dec 7 02:54 kafka-clients-0.9.0-kafka-2.0.0.jar.bak
  18. ...............


5.凌晨解决问题,重新提交jar,直到现在稳定运行10h

目录
相关文章
|
6月前
|
分布式计算 资源调度 Hadoop
Hadoop【问题记录 03】【ipc.Client: Retrying connect to server:xxx/:8032+InvalidResourceRequestException】解决
【4月更文挑战第2天】Hadoop【问题记录 03】【ipc.Client: Retrying connect to server:xxx/:8032+InvalidResourceRequestException】解决
358 2
|
消息中间件 Go
go连接RabbitMQ "no access to this vhost"错误
连接的失败报错:RabbitMQ Exception (403) Reason: "no access to this vhost" 因为没有配置该用户的访问权限,可以通过 rabbitmqctl add_vhost admin 来添加,并赋予权限: rabbitmqctl set_permissions -p 用户名 admin ".
7450 0
|
消息中间件 Kafka
kafka报错: (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
kafka报错: (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
kafka报错: (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
|
6月前
|
消息中间件 Kafka
Kafka【问题 03】Connection to node -1 (/IP:9092) could not be established. Broker may not be available.
Kafka【问题 03】Connection to node -1 (/IP:9092) could not be established. Broker may not be available.
964 0
|
5月前
|
监控 网络安全
zookeeper的日志报will be dropped if server is in r-o mode如何解决
【6月更文挑战第26天】zookeeper的日志报will be dropped if server is in r-o mode如何解决
200 2
|
5月前
|
监控 网络安全
zookeeper的日志报will be dropped if server is in r-o mode
【6月更文挑战第8天】zookeeper的日志报will be dropped if server is in r-o mode
142 6
|
5月前
|
监控 网络安全
zookeeper的日志报will be dropped if server is in r-o mode 问题
【6月更文挑战第13天】zookeeper的日志报will be dropped if server is in r-o mode 问题
186 1
|
4月前
|
消息中间件 Shell
mq报错abbit@syld36: * connected to epmd (port 4369) on syld36 * epmd reports node ‘rabbit‘ uses po
mq报错abbit@syld36: * connected to epmd (port 4369) on syld36 * epmd reports node ‘rabbit‘ uses po
73 0
|
6月前
|
分布式计算 Hadoop
Zookeeper 启动失败【Cannot open channel to 3 at election address...】
解决Hadoop Zookeeper连接问题:检查Zookeeper目录权限,使用`sudo chown -R username:username /your_zookeeper_path`授权。确保`zoo.cfg`配置`quorumListenOnAllIPs=true`并监听所有IP。关键是机器ID(如`server.0`, `server.1`等)需与IP或主机名对应,修正`zoo.cfg`中的设置,例如`server.0=hadoop120:2888:3888`等。重启Zookeeper后,问题解决。
319 0
|
6月前
|
消息中间件 Linux
mq报错abbit@syld36: * connected to epmd (port 4369) on syld36 * epmd reports node ‘rabbit‘ uses po
mq报错abbit@syld36: * connected to epmd (port 4369) on syld36 * epmd reports node ‘rabbit‘ uses po
107 0