问题描述
Azure Event Hubs -- Kafka 生产者发送消息存在延迟接收和丢失问题, 在客户端的日志中发现如下异常:
2023-06-05 02:00:20.467 [kafka-producer-thread | producer-1] ERROR com.deloitte.common.kafka.CommonKafkaProducer - messageId:9235f334-e39f-b429-227e-45cd30dd6486, topic:notify_topic 发送消息失败 org.springframework.kafka.core.KafkaProducerException: Failed to send; nested exception is org.apache.kafka.common.errors.TimeoutException: The request timed out. at org.springframework.kafka.core.KafkaTemplate.lambda$buildCallback$6(KafkaTemplate.java:690) at org.apache.skywalking.apm.plugin.kafka.CallbackAdapter.onCompletion(CallbackAdapter.java:45) at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion$original$dElInXX8(DefaultKafkaProducerFactory.java:1001) at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion$original$dElInXX8$accessor$6jLL1TNr(DefaultKafkaProducerFactory.java) at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1$auxiliary$ldSQQGBZ.call(Unknown Source) at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86) at org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer$1.onCompletion(DefaultKafkaProducerFactory.java) at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion$original$PwZecSoL(KafkaProducer.java:1350) at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion$original$PwZecSoL$accessor$5Ux1udg0(KafkaProducer.java) at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback$auxiliary$a5oVYNi3.call(Unknown Source) at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86) at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java) at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:273) at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:234) at org.apache.kafka.clients.producer.internals.ProducerBatch.completeExceptionally(ProducerBatch.java:198) at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:758) at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:743) at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:695) at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:634) at org.apache.kafka.clients.producer.internals.Sender.lambda$null$1(Sender.java:575) at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) at org.apache.kafka.clients.producer.internals.Sender.lambda$handleProduceResponse$2(Sender.java:562) at java.base/java.lang.Iterable.forEach(Iterable.java:75) at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:562) at org.apache.kafka.clients.producer.internals.Sender.lambda$sendProduceRequest$5(Sender.java:836) at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109) at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:574) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:566) at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.apache.kafka.common.errors.TimeoutException: The request timed out.
当发送端收到 Failed to send; nested exception is org.apache.kafka.common.errors.TimeoutException: The request timed out. 日志后, 如何来分析这个问题呢?
问题回答
机器人回答一:
This error message indicates that the Kafka producer failed to send a message due to a timeout.
One possible reason for this error is that the transaction timeout is larger than the maximum value allowed by the broker.
You can check the broker configuration for `transaction.max.timeout.ms` and make sure that it is set to a value that is larger than the transaction timeout used by the producer.
You can also check the Kafka logs for more information about the error.
If you are using Flink, you can refer to the Hilo documentation on [Flink to HDInsightKafkaTransactionTimeoutIssue](https://supportability.visualstudio.com/870b0c89-5235-4a33-a0c6-44be26e284e7/_wiki/wikis/e9f84cb2-cd1b-4143-ac21-5bc63f3bdba0?pagePath=%2FHilo%2F[Hilo] TSGs%2F[Hilo] Flink TSGs%2F[Hilo] Flink to HDInsightKafkaTransactionTimeoutIssue) for more information on this issue. One or more links in the message.
机器人回答二:
The error message you provided indicates that the request timed out. This could be due to a network issue or a configuration issue. You can check the following: 1. Check the network connection between the producer and the Event Hub service. Make sure that the network is stable and that there are no issues with the firewall or network security groups. 2. Check the Kafka producer configuration. Make sure that the `request.timeout.ms` property is set to a value that is appropriate for your use case. You can also try increasing the value of this property to see if it resolves the issue. 3. Check the size of the message being sent. If the message size is greater than the maximum allowed size, the request may time out. You can check the maximum message size allowed by the Event Hub service and make sure that your message size is within the limit. 4. Check the Kafka logs for more information about the error. The logs may provide more details about the cause of the error. If you are still unable to resolve the issue, you can contact Azure support for further assistance. Message is interactive. Click to interact.
If you are not using Flink and are using the Kafka producer method to send messages to Event Hub, you can check the Kafka logs for more information about the error.
The error message you provided indicates that the request timed out. This could be due to a network issue or a configuration issue. You can check the following:
1. Check the network connection between the producer and the Event Hub service. Make sure that the network is stable and that there are no issues with the firewall or network security groups.
2. Check the Kafka producer configuration. Make sure that the `request.timeout.ms` property is set to a value that is appropriate for your use case. You can also try increasing the value of this property to see if it resolves the issue.
3. Check the size of the message being sent. If the message size is greater than the maximum allowed size, the request may time out. You can check the maximum message size allowed by the Event Hub service and make sure that your message size is within the limit.
4. Check the Kafka logs for more information about the error. The logs may provide more details about the cause of the error.
If you are still unable to resolve the issue, you can contact Azure support for further assistance.