开发者社区> 问答> 正文

kafka集群突然挂掉,求救大神看看什么问题出在哪???报错

kafka集群,版本是0.8.2.0:hetserver1,hetserver,hetserver3三台主机

hetserver1在17号 21:39开始报错

2020-03-17 21:39:00,040 ERROR kafka.network.Processor: Closing socket for /172.19.4.12 because of error
java.io.IOException: Broken pipe
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
    at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
    at kafka.network.MultiSend.writeTo(Transmission.scala:101)
    at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
    at kafka.network.Processor.write(SocketServer.scala:473)
    at kafka.network.Processor.run(SocketServer.scala:343)
    at java.lang.Thread.run(Thread.java:745)
2020-03-17 21:39:00,071 ERROR kafka.network.Processor: Closing socket for /172.19.4.12 because of error
java.io.IOException: Broken pipe
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
    at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
    at kafka.network.MultiSend.writeTo(Transmission.scala:101)
    at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
    at kafka.network.Processor.write(SocketServer.scala:473)
    at kafka.network.Processor.run(SocketServer.scala:343)
    at java.lang.Thread.run(Thread.java:745)
2020-03-17 21:39:00,073 ERROR kafka.network.Processor: Closing socket for /172.19.4.12 because of error
java.io.IOException: Broken pipe
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
    at kafka.api.TopicDataSend.writeTo(FetchResponse.scala:123)
    at kafka.network.MultiSend.writeTo(Transmission.scala:101)
    at kafka.api.FetchResponseSend.writeTo(FetchResponse.scala:231)
    at kafka.network.Processor.write(SocketServer.scala:473)
    at kafka.network.Processor.run(SocketServer.scala:343)
    at java.lang.Thread.run(Thread.java:745)

hetserver2也报错:

2020-03-17 21:39:00,193 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver1/172.19.4.12:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,194 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver1/172.19.4.12:2181, initiating session
2020-03-17 21:39:00,194 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Expired)
2020-03-17 21:39:00,195 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0xa70e4797252000e has expired, closing socket connection
2020-03-17 21:39:00,195 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hetserver1:2181,hetserver2:2181,hetserver3:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@5e39570d
2020-03-17 21:39:00,196 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2020-03-17 21:39:00,196 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver3/172.19.4.14:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,197 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver3/172.19.4.14:2181, initiating session
2020-03-17 21:39:00,198 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hetserver3/172.19.4.14:2181, sessionid = 0x870e479725d0517, negotiated timeout = 6000
2020-03-17 21:39:00,198 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected)
2020-03-17 21:39:00,297 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 41]: Broker change listener fired for path /brokers/ids with children 42
2020-03-17 21:39:00,301 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 41]: Newly added brokers: , deleted brokers: 41,40, all live brokers: 42
2020-03-17 21:39:00,301 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-41-send-thread], Shutting down
2020-03-17 21:39:00,301 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-41-send-thread], Stopped
2020-03-17 21:39:00,301 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-41-send-thread], Shutdown completed
2020-03-17 21:39:00,302 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-40-send-thread], Shutting down
2020-03-17 21:39:00,302 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-40-send-thread], Stopped
2020-03-17 21:39:00,302 INFO kafka.controller.RequestSendThread: [Controller-41-to-broker-40-send-thread], Shutdown completed
2020-03-17 21:39:00,304 INFO kafka.controller.KafkaController: [Controller 41]: Broker failure callback for 41,40
2020-03-17 21:39:00,306 INFO kafka.controller.KafkaController: [Controller 41]: Removed ArrayBuffer() from list of shutting down brokers.
2020-03-17 21:39:00,308 INFO kafka.controller.PartitionStateMachine: [Partition state machine on Controller 41]: Invoking state change to OfflinePartition for partitions [__consumer_offsets,19],[__consumer_offsets,47],[__consumer_offsets,41],[__consumer_offsets,29],[session-location,0],[__consumer_offsets,17],[__consumer_offsets,10],[hetASUPfldTopic,0],[__consumer_offsets,14],[__consumer_offsets,40],[hetACDMTopic,0],[__consumer_offsets,26],[__consumer_offsets,20],[__consumer_offsets,22],[__consumer_offsets,5],[push-result-error,0],[__consumer_offsets,8],[__consumer_offsets,23],[__consumer_offsets,11],[hetAsupMsgTopic,0],[__consumer_offsets,13],[__consumer_offsets,49],[__consumer_offsets,28],[__consumer_offsets,4],[__consumer_offsets,37],[__consumer_offsets,44],[__consumer_offsets,31],[__consumer_offsets,34],[__consumer_offsets,46],[btTaskTopic,0],[__consumer_offsets,25],[__consumer_offsets,43],[__consumer_offsets,32],[__consumer_offsets,35],[__consumer_offsets,7],[__consumer_offsets,38],[__consumer_offsets,1],[HetPetaTopic,0],[__consumer_offsets,2],[__consumer_offsets,16]
2020-03-17 21:39:00,314 ERROR state.change.logger: Controller 41 epoch 57126 aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127.
2020-03-17 21:39:00,314 ERROR state.change.logger: Controller 41 epoch 57126 encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127..
2020-03-17 21:39:00,314 ERROR state.change.logger: Controller 41 epoch 57126 initiated state change for partition [__consumer_offsets,19] from OfflinePartition to OnlinePartition failed
kafka.common.StateChangeFailedException: encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127..
    at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:380)
    at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
    at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:453)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
    at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
    at kafka.utils.Utils$.inLock(Utils.scala:561)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
    at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
    at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
Caused by: kafka.common.StateChangeFailedException: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 41 went through a soft failure and another controller was elected with epoch 57127.
    at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:354)
    ... 23 more

hetserver3报错信息如下:

2020-03-17 21:39:00,660 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver3/172.19.4.14:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,661 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver3/172.19.4.14:2181, initiating session
2020-03-17 21:39:00,661 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Expired)
2020-03-17 21:39:00,661 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x870e47715360000 has expired, closing socket connection
2020-03-17 21:39:00,661 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hetserver1:2181,hetserver2:2181,hetserver3:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@68246cf
2020-03-17 21:39:00,663 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hetserver2/172.19.4.13:2181. Will not attempt to authenticate using SASL (unknown error)
2020-03-17 21:39:00,664 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hetserver2/172.19.4.13:2181, initiating session
2020-03-17 21:39:00,666 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hetserver2/172.19.4.13:2181, sessionid = 0xa70e47972520504, negotiated timeout = 6000
2020-03-17 21:39:00,666 INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected)
2020-03-17 21:39:00,666 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2020-03-17 21:39:00,763 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 40]: Broker change listener fired for path /brokers/ids with children 42
2020-03-17 21:39:00,764 INFO kafka.controller.ReplicaStateMachine$BrokerChangeListener: [BrokerChangeListener on Controller 40]: Newly added brokers: , deleted brokers: 41,40, all live brokers: 42
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-41-send-thread], Shutting down
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-41-send-thread], Stopped
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-41-send-thread], Shutdown completed
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-40-send-thread], Shutting down
2020-03-17 21:39:00,764 INFO kafka.network.Processor: Closing socket connection to /172.19.4.14.
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-40-send-thread], Stopped
2020-03-17 21:39:00,764 INFO kafka.controller.RequestSendThread: [Controller-40-to-broker-40-send-thread], Shutdown completed
2020-03-17 21:39:00,764 INFO kafka.controller.KafkaController: [Controller 40]: Broker failure callback for 41,40
2020-03-17 21:39:00,765 INFO kafka.controller.KafkaController: [Controller 40]: Removed ArrayBuffer() from list of shutting down brokers.
2020-03-17 21:39:00,765 INFO kafka.controller.PartitionStateMachine: [Partition state machine on Controller 40]: Invoking state change to OfflinePartition for partitions [__consumer_offsets,19],[__consumer_offsets,30],[__consumer_offsets,47],[__consumer_offsets,29],[__consumer_offsets,41],[session-location,0],[HetPetaAddTopic,0],[__consumer_offsets,39],[hetASUPfldTopic,0],[__consumer_offsets,10],[__consumer_offsets,17],[hetFltMsgTopic,0],[__consumer_offsets,14],[__consumer_offsets,40],[hetACDMTopic,0],[__consumer_offsets,18],[__consumer_offsets,0],[__consumer_offsets,26],[__consumer_offsets,24],[__consumer_offsets,33],[__consumer_offsets,20],[__consumer_offsets,21],[__consumer_offsets,3],[__consumer_offsets,5],[__consumer_offsets,22],[hetVideoTopic,0],[__consumer_offsets,12],[push-result-error,0],[__consumer_offsets,8],[__consumer_offsets,23],[__consumer_offsets,15],[__consumer_offsets,11],[hetAsupMsgTopic,0],[__consumer_offsets,48],[__consumer_offsets,13],[__consumer_offsets,49],[__consumer_offsets,6],[__consumer_offsets,28],[__consumer_offsets,4],[__consumer_offsets,37],[__consumer_offsets,31],[push-result,0],[__consumer_offsets,44],[hetTaskTopic,0],[__consumer_offsets,42],[__consumer_offsets,34],[__consumer_offsets,46],[btTaskTopic,0],[__consumer_offsets,25],[__consumer_offsets,27],[__consumer_offsets,45],[__consumer_offsets,32],[__consumer_offsets,43],[__consumer_offsets,36],[__consumer_offsets,35],[__consumer_offsets,7],[__consumer_offsets,38],[__consumer_offsets,9],[__consumer_offsets,1],[HetPetaTopic,0],[__consumer_offsets,2],[__consumer_offsets,16]
2020-03-17 21:39:00,767 ERROR state.change.logger: Controller 40 epoch 57125 aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127.
2020-03-17 21:39:00,767 ERROR state.change.logger: Controller 40 epoch 57125 encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127..
2020-03-17 21:39:00,767 ERROR state.change.logger: Controller 40 epoch 57125 initiated state change for partition [__consumer_offsets,19] from OfflinePartition to OnlinePartition failed
kafka.common.StateChangeFailedException: encountered error while electing leader for partition [__consumer_offsets,19] due to: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127..
    at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:380)
    at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:206)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
    at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
    at kafka.controller.KafkaController.onBrokerFailure(KafkaController.scala:453)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:373)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
    at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
    at kafka.utils.Utils$.inLock(Utils.scala:561)
    at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
    at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
    at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
Caused by: kafka.common.StateChangeFailedException: aborted leader election for partition [__consumer_offsets,19] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 40 went through a soft failure and another controller was elected with epoch 57127.
    at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:354)
    ... 23 more

展开
收起
爱吃鱼的程序员 2020-06-05 14:13:59 1298 0
1 条回答
写回答
取消 提交回答
  • https://developer.aliyun.com/profile/5yerqm5bn5yqg?spm=a2c6h.12873639.0.0.6eae304abcjaIB
                        磁盘满了?
    
    2020-06-05 14:14:18
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
Java Spring Boot开发实战系列课程【第16讲】:Spring Boot 2.0 实战Apache Kafka百万级高并发消息中间件与原理解析 立即下载
MaxCompute技术公开课第四季 之 如何将Kafka数据同步至MaxCompute 立即下载
消息队列kafka介绍 立即下载