开发者社区 > 云原生 > 正文

Broker和namesrv断开连接后,Broker无法提供服务

集群中的其中一个broker(brokerA)与其中的一台namesrv(共两台)节点仅有一次断开连接,导致与brokerA互为主从的从节点一直无法连接brokerA ;并且客户端程序也无法正常发送消息至brokerA ; org.apache.rocketmq.remoting.exception.RemotingTimeoutException: wait response on the channel <rocketmq-namesrv-prod01.cloud.bz/xx.xx.xx.xx:9876> timeout, 6000(ms)

当发现这个问题后,及时拒绝往brokerA写入消息数据(./mqadmin updateBrokerConfig -b xx.xx.xx.xx:10911 -n xx.x.xx.xx:9876 -k brokerPermission -v 4 )

我期望当brokerA与namesrv节点状态复活后,不影响从节点同步主节点的数据,不影响客户端继续往该broker发送消息

实际上:当brokerA与namesrv连接恢复(ms级别)后,brokerA的从节点一直无法和主节点建立有效连接,提示(ERROR BrokerControllerScheduledThread1 - SyncDelayOffset Exception),客户端也无法继续正常发送消息

1、broker集群版本(4.0.0-incubating) 2、jdk:1.8.0_131-b11 3、rocketMQ客户端版本:4.4.0

brokerA主节点的日志信息: 2020-09-10 16:42:03 WARN BrokerControllerScheduledThread1 - registerBroker Exception, rocketmq-namesrv-prod01.cloud.bz:9876 org.apache.rocketmq.remoting.exception.RemotingTimeoutException: wait response on the channel <ocketmq-namesrv-prod01.cloud.bz/xx.x.xx.xx:9876> timeout, 6000(ms) at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:292) ~[rocketmq-remoting-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:338) ~[rocketmq-remoting-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.broker.out.BrokerOuterAPI.registerBroker(BrokerOuterAPI.java:167) ~[rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.broker.out.BrokerOuterAPI.registerBrokerAll(BrokerOuterAPI.java:116) ~[rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.broker.BrokerController.registerBrokerAll(BrokerController.java:674) [rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.broker.BrokerController$9.run(BrokerController.java:643) [rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_131] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_131] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_131] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] 2020-09-10 16:42:05 INFO BrokerControllerScheduledThread1 - register broker to name server rocketmq-namesrv-prod02.cloud.bz:9876 OK brokerA 从节点的日志信息 2020-09-10 16:42:05 INFO ClientManageThread_131 - subscription changed, group: MQ_nike_receive_consumer_group OLD: SubscriptionData [classFilterMode=false, topic=MQ_LSS_PUSH_NIKE_PRO, subString=, tagsSet=[], codeSet=[], subVersion=1599115705119] NEW: SubscriptionData [classFilterMode=false, topic=MQ_LSS_PUSH_NIKE_PRO, subString=, tagsSet=[], codeSet=[], subVersion=1599115711804] 2020-09-10 16:42:08 ERROR BrokerControllerScheduledThread1 - SyncConsumerOffset Exception, 1xx.xx.xx.xx:20911 org.apache.rocketmq.remoting.exception.RemotingTimeoutException: wait response on the channel xx.xx.xx.xx:20911 timeout, 3000(ms) at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:292) ~[rocketmq-remoting-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:338) ~[rocketmq-remoting-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.broker.out.BrokerOuterAPI.getAllConsumerOffset(BrokerOuterAPI.java:254) ~[rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.broker.slave.SlaveSynchronize.syncConsumerOffset(SlaveSynchronize.java:84) [rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating] at org.apache.rocketmq.broker.slave.SlaveSynchronize.syncAll(SlaveSynchronize.java:50) [rocketmq-broker-4.0.0-incubating.jar:4.0.0-incubating]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] 客户端日志信息: org.apache.rocketmq.remoting.exception.RemotingTooMuchRequestException: sendDefaultImpl call timeout at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.sendDefaultImpl(DefaultMQProducerImpl.java:634) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.send(DefaultMQProducerImpl.java:1279) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.send(DefaultMQProducerImpl.java:1225) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at org.apache.rocketmq.client.producer.DefaultMQProducer.send(DefaultMQProducer.java:283) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at com.baozun.scm.baseservice.message.rocketmq.service.server.RocketMQProducerServer.sendDataMsgConcurrently(RocketMQProducerServer.java:167) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.ecs.oms4.ofa.manager.receive.ReceiveBaseManager.sendConsumptionFeedback(ReceiveBaseManager.java:264) [classes!/:1.0.0-SNAPSHOT] at com.baozun.ecs.oms4.ofa.manager.receive.refund.impl.ReceiveRefundPaymentManagerImpl.receiveRefundPayment(ReceiveRefundPaymentManagerImpl.java:59) [classes!/:1.0.0-SNAPSHOT] at com.baozun.ecs.oms4.ofa.manager.receive.refund.impl.ReceiveRefundPaymentManagerImpl$$FastClassBySpringCGLIB$$f205e702.invoke() [classes!/:1.0.0-SNAPSHOT] at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) [spring-core-5.1.7.RELEASE.jar!/:5.1.7.RELEASE] at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:684) [spring-aop-5.1.7.RELEASE.jar!/:5.1.7.RELEASE] at com.baozun.ecs.oms4.ofa.manager.receive.refund.impl.ReceiveRefundPaymentManagerImpl$$EnhancerBySpringCGLIB$$33377ac5.receiveRefundPayment() [classes!/:1.0.0-SNAPSHOT] at sun.reflect.GeneratedMethodAccessor579.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231] at com.baozun.scm.baseservice.message.rocketmq.service.MsgTranscationManagerImpl.businessProcessNoTransacation(MsgTranscationManagerImpl.java:187) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.MsgTranscationManagerImpl$$FastClassBySpringCGLIB$$6e604685.invoke() [common-message-component-rocketmq-1.7.0.4.jar!/:?] at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) [spring-core-5.1.7.RELEASE.jar!/:5.1.7.RELEASE] at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:684) [spring-aop-5.1.7.RELEASE.jar!/:5.1.7.RELEASE] at com.baozun.scm.baseservice.message.rocketmq.service.MsgTranscationManagerImpl$$EnhancerBySpringCGLIB$$f0d0e32e.businessProcessNoTransacation() [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.handle.MessageHandler.excuteBusinessMsg(MessageHandler.java:303) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.handle.MessageHandler.excuteHandle(MessageHandler.java:268) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.handle.MessageHandler.handle(MessageHandler.java:197) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1.consumeMessage$original$i6gTCfL5(RocketMQConcurrentlyConsumerInit.java:270) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1.consumeMessage$original$i6gTCfL5$accessor$50njQtqm(RocketMQConcurrentlyConsumerInit.java) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1$auxiliary$N718zRwF.call(Unknown Source) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:93) [skywalking-agent.jar:6.4.0] at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1.consumeMessage(RocketMQConcurrentlyConsumerInit.java) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService$ConsumeRequest.run(ConsumeMessageConcurrentlyService.java:417) [rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_231] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_231] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]

期望:当broker与namesrv连接恢复后,不影响客户端正常使用,不影响从节点同步主节点数据。 不知是服务端版本有问题还是什么原因。

原提问者GitHub用户tank1314

展开
收起
芬奇福贵 2023-05-26 12:19:10 119 0
1 条回答
写回答
取消 提交回答
  • 先说结论:如果仅仅是broker与nameserver的连接断了,实际上不影响客户端的使用。

    1、客户端会通过nameserver 的感知能力发现broker下线,很快就不会再向这个broker发送消息了(客户端摘除该broker地址)。

    2、在摘除之前的确可能还会继续发送这个broker,但是按照题设只是broker网络和nameserver断了,实际上不影响。 从你的日志上看,实际上不仅仅是断连的问题,看起来是网络或者是服务有问题:

    org.apache.rocketmq.remoting.exception.RemotingTimeoutException: wait response on the channel <rocketmq-namesrv-prod01.cloud.bz/xx.xx.xx.xx:9876> timeout, 6000(ms)

    所以本身需求来说是符合需求的,只是现在不是断连接的问题,更像是一个节点故障的问题。

    原回答者GitHub用户Jaskey

    2023-05-26 17:31:10
    赞同 展开评论 打赏

阿里云拥有国内全面的云原生产品技术以及大规模的云原生应用实践,通过全面容器化、核心技术互联网化、应用 Serverless 化三大范式,助力制造业企业高效上云,实现系统稳定、应用敏捷智能。拥抱云原生,让创新无处不在。

相关电子书

更多
低代码开发师(初级)实战教程 立即下载
冬季实战营第三期:MySQL数据库进阶实战 立即下载
阿里巴巴DevOps 最佳实践手册 立即下载