集群中的其中一个broker(brokerA)与其中的一台namesrv(共两台)节点仅有一次断开连接,导致与brokerA互为主从的从节点一直无法连接brokerA ;并且客户端程序也无法正常发送消息至brokerA ; org.apache.rocketmq.remoting.exception.RemotingTimeoutException: wait response on the channel <rocketmq-namesrv-prod01.cloud.bz/xx.xx.xx.xx:9876> timeout, 6000(ms)
当发现这个问题后,及时拒绝往brokerA写入消息数据(./mqadmin updateBrokerConfig -b xx.xx.xx.xx:10911 -n xx.x.xx.xx:9876 -k brokerPermission -v 4 )
我期望当brokerA与namesrv节点状态复活后,不影响从节点同步主节点的数据,不影响客户端继续往该broker发送消息
实际上:当brokerA与namesrv连接恢复(ms级别)后,brokerA的从节点一直无法和主节点建立有效连接,提示(ERROR BrokerControllerScheduledThread1 - SyncDelayOffset Exception),客户端也无法继续正常发送消息
1、broker集群版本(4.0.0-incubating) 2、jdk:1.8.0_131-b11 3、rocketMQ客户端版本:4.4.0
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_131] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] 客户端日志信息: org.apache.rocketmq.remoting.exception.RemotingTooMuchRequestException: sendDefaultImpl call timeout at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.sendDefaultImpl(DefaultMQProducerImpl.java:634) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.send(DefaultMQProducerImpl.java:1279) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl.send(DefaultMQProducerImpl.java:1225) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at org.apache.rocketmq.client.producer.DefaultMQProducer.send(DefaultMQProducer.java:283) ~[rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at com.baozun.scm.baseservice.message.rocketmq.service.server.RocketMQProducerServer.sendDataMsgConcurrently(RocketMQProducerServer.java:167) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.ecs.oms4.ofa.manager.receive.ReceiveBaseManager.sendConsumptionFeedback(ReceiveBaseManager.java:264) [classes!/:1.0.0-SNAPSHOT] at com.baozun.ecs.oms4.ofa.manager.receive.refund.impl.ReceiveRefundPaymentManagerImpl.receiveRefundPayment(ReceiveRefundPaymentManagerImpl.java:59) [classes!/:1.0.0-SNAPSHOT] at com.baozun.ecs.oms4.ofa.manager.receive.refund.impl.ReceiveRefundPaymentManagerImpl$$FastClassBySpringCGLIB$$f205e702.invoke() [classes!/:1.0.0-SNAPSHOT] at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) [spring-core-5.1.7.RELEASE.jar!/:5.1.7.RELEASE] at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:684) [spring-aop-5.1.7.RELEASE.jar!/:5.1.7.RELEASE] at com.baozun.ecs.oms4.ofa.manager.receive.refund.impl.ReceiveRefundPaymentManagerImpl$$EnhancerBySpringCGLIB$$33377ac5.receiveRefundPayment() [classes!/:1.0.0-SNAPSHOT] at sun.reflect.GeneratedMethodAccessor579.invoke(Unknown Source) ~[?:?] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_231] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_231] at com.baozun.scm.baseservice.message.rocketmq.service.MsgTranscationManagerImpl.businessProcessNoTransacation(MsgTranscationManagerImpl.java:187) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.MsgTranscationManagerImpl$$FastClassBySpringCGLIB$$6e604685.invoke() [common-message-component-rocketmq-1.7.0.4.jar!/:?] at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) [spring-core-5.1.7.RELEASE.jar!/:5.1.7.RELEASE] at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:684) [spring-aop-5.1.7.RELEASE.jar!/:5.1.7.RELEASE] at com.baozun.scm.baseservice.message.rocketmq.service.MsgTranscationManagerImpl$$EnhancerBySpringCGLIB$$f0d0e32e.businessProcessNoTransacation() [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.handle.MessageHandler.excuteBusinessMsg(MessageHandler.java:303) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.handle.MessageHandler.excuteHandle(MessageHandler.java:268) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.handle.MessageHandler.handle(MessageHandler.java:197) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1.consumeMessage$original$i6gTCfL5(RocketMQConcurrentlyConsumerInit.java:270) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1.consumeMessage$original$i6gTCfL5$accessor$50njQtqm(RocketMQConcurrentlyConsumerInit.java) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1$auxiliary$N718zRwF.call(Unknown Source) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:93) [skywalking-agent.jar:6.4.0] at com.baozun.scm.baseservice.message.rocketmq.service.init.RocketMQConcurrentlyConsumerInit$1.consumeMessage(RocketMQConcurrentlyConsumerInit.java) [common-message-component-rocketmq-1.7.0.4.jar!/:?] at org.apache.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService$ConsumeRequest.run(ConsumeMessageConcurrentlyService.java:417) [rocketmq-client-4.4.1.8.jar!/:4.4.1.8] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_231] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_231] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]
期望:当broker与namesrv连接恢复后,不影响客户端正常使用,不影响从节点同步主节点数据。 不知是服务端版本有问题还是什么原因。
原提问者GitHub用户tank1314
先说结论:如果仅仅是broker与nameserver的连接断了,实际上不影响客户端的使用。
1、客户端会通过nameserver 的感知能力发现broker下线,很快就不会再向这个broker发送消息了(客户端摘除该broker地址)。
2、在摘除之前的确可能还会继续发送这个broker,但是按照题设只是broker网络和nameserver断了,实际上不影响。 从你的日志上看,实际上不仅仅是断连的问题,看起来是网络或者是服务有问题:
org.apache.rocketmq.remoting.exception.RemotingTimeoutException: wait response on the channel <rocketmq-namesrv-prod01.cloud.bz/xx.xx.xx.xx:9876> timeout, 6000(ms)
所以本身需求来说是符合需求的,只是现在不是断连接的问题,更像是一个节点故障的问题。
原回答者GitHub用户Jaskey
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。
阿里云拥有国内全面的云原生产品技术以及大规模的云原生应用实践,通过全面容器化、核心技术互联网化、应用 Serverless 化三大范式,助力制造业企业高效上云,实现系统稳定、应用敏捷智能。拥抱云原生,让创新无处不在。