问题描述
使用Event Hub消费事件时,出现的各种客户端错误的解读。(再后期遇见新的错误信息,会持续添加进此说明)
一:再Linux中运行Event Hub消费端程序,出现Too many open files
解读:该信息是指java程序打开操作系统文件句柄数超出了操作系统的限制,排查操作系统的文件句柄的限制是不是默认的1024,如果是,请改为无限制。
使用ulimit -a 或者 ulimit -n 查看句柄数 open files (-n) 1024
配置文件/etc/security/limits.conf
在该配置文件中添加
* soft nofile 65535 * hard nofile 65535
二:New receiver 'nil' with higher epoch of '197' is created hence current receiver 'nil' with epoch '196' is getting disconnected
错误消息: java.util.concurrent.CompletionException: com.microsoft.azure.eventhubs.ReceiverDisconnectedException: New receiver 'nil' with higher epoch of '197' is created hence current receiver 'nil' with epoch '196' is getting disconnected. If you are recreating the receiver, make sure a higher epoch is used. TrackingId:xxxxxxxxxxxxxxx, SystemTracker:xxxxxxx:eventhub:xxxxxxx|default,Timestamp:2020−10−20T15:50:16,errorContext[NS:xxxxxxxxx.servicebus.chinacloudapi.cn,PATH:xxxxxxxxx/ConsumerGroups/Default/Partitions/3, REFERENCE_ID: xxxxxxxxxx, PREFETCH_COUNT: 300, LINK_CREDIT: 300, PREFETCH_Q_LEN: 0] java.util.concurrent.ExecutionException: com.microsoft.azure.eventprocessorhost.ExceptionWithAction: java.lang.RuntimeException: Lease lost while updating checkpoint |
解读:消费者程序会为每个消息分区创建单独的消费线程,消费线程跟分区是一对一的关系,当有额外的消费程序去消费同样的eventhub时,并存储checkpoint到同一个位置时,就会发生partition的再分配,或者,当其中一个消费线程出现问题时,客户端程序会尝试恢复并接手失败线程所有的分区。通常情况下该错误可以忽略。
三:com.microsoft.azure.eventprocessorhost.ExceptionWithAction:The client could not finish the operation within specified maximum execution timeout.
解读:客户端程序在消费后,将消费offset存入Storage时,发生网络超时,建议您排查下客户端网络情况。
四:com.microsoft.azure.eventhubs.EventHubException: The specified partition is invalid for an EventHub partition sender or receiver. It should be between 0 and 1.
解读:客户端程序在消费eventhub数据时,指定了错误的分区信息。
五:com.microsoft.azure.eventhubs.EventHubException: The supplied offset '0' is invalid. The last offset in the system is '-1'
解读:客户端在消费eventhub数据时,提交了错误的offset值,不能设置初始为0。