Eureka Server服务剔除
这一章我们来分析一下Eureka Server 的服务剔除,它是通过定时任务完成的,在EurekaBootStrap启动引导的initEurekaServerContext上下文初始化方法中,调用了这么一行代码registry.openForTraffic(applicationInfoManager, registryCount);在该方法中又调用了com.netflix.eureka.registry.AbstractInstanceRegistry#postInit方法来初始化服务剔除的定时任务
publicabstractclassAbstractInstanceRegistryimplementsInstanceRegistry { protectedvoidpostInit() { renewsLastMin.start(); if (evictionTaskRef.get() !=null) { //如果服务剔除任务不为空,就执行cancel方法,该方法把任务的状态修改为了cancel任务取消evictionTaskRef.get().cancel(); } //创建新的服务剔除任务evictionTaskRef.set(newEvictionTask()); //交给调度器去执行,延迟60s,每60s执行一次驱逐任务evictionTimer.schedule(evictionTaskRef.get(), serverConfig.getEvictionIntervalTimerInMs(), //60s 逐出间隔计时器serverConfig.getEvictionIntervalTimerInMs()); } /* visible for testing */classEvictionTaskextendsTimerTask { privatefinalAtomicLonglastExecutionNanosRef=newAtomicLong(0l); publicvoidrun() { try { //计算任务执行的时间偏差:补偿时间longcompensationTimeMs=getCompensationTimeMs(); logger.info("Running the evict task with compensationTime {}ms", compensationTimeMs); //执行驱逐evict(compensationTimeMs); } catch (Throwablee) { logger.error("Could not run the evict task", e); } }
在驱逐任务中,计算了任务执行的时间偏差即补偿时间,然后调用com.netflix.eureka.registry.AbstractInstanceRegistry#evict(long)执行服务的剔除逻辑
publicvoidevict(longadditionalLeaseMs) { logger.debug("Running the evict task"); if (!isLeaseExpirationEnabled()) { //如果没启用租约到期,直接返回logger.debug("DS: lease expiration is currently disabled."); return; } //首先收集所有过期的服务,以随机顺序将其逐出// We collect first all expired items, to evict them in random order. For large eviction sets,// if we do not that, we might wipe out whole apps before self preservation kicks in. By randomizing it,// the impact should be evenly distributed across all applications.List<Lease<InstanceInfo>>expiredLeases=newArrayList<>(); //循环注册表中的所有的服务for (Entry<String, Map<String, Lease<InstanceInfo>>>groupEntry : registry.entrySet()) { Map<String, Lease<InstanceInfo>>leaseMap=groupEntry.getValue(); if (leaseMap!=null) { //获取到租约for (Entry<String, Lease<InstanceInfo>>leaseEntry : leaseMap.entrySet()) { Lease<InstanceInfo>lease=leaseEntry.getValue(); //如果服务过期,就把服务添加到expiredLeases map中if (lease.isExpired(additionalLeaseMs) &&lease.getHolder() !=null) { expiredLeases.add(lease); } } } } //为了补偿GC暂停或本地时间差异导致的剔除任务执行时间差异,使用当前注册表大小作为触发自我保存的基础//否则,将清除完整的注册表。// To compensate for GC pauses or drifting local time, we need to use current registry size as a base for// triggering self-preservation. Without that we would wipe out full registry.//注册表大小intregistrySize= (int) getLocalRegistrySize(); //注册表中服务的续约阈值 = 注册大小 * 0.85intregistrySizeThreshold= (int) (registrySize*serverConfig.getRenewalPercentThreshold()); //驱逐极限 = 注册表大小 - 注册表续约阈值 intevictionLimit=registrySize-registrySizeThreshold; //过期的服务数 和 evictionLimit 取最小 ,如果大于 0 说明需要有服务要剔除inttoEvict=Math.min(expiredLeases.size(), evictionLimit); if (toEvict>0) { //剔除 toEvict 个logger.info("Evicting {} items (expired={}, evictionLimit={})", toEvict, expiredLeases.size(), evictionLimit); //取随机值Randomrandom=newRandom(System.currentTimeMillis()); for (inti=0; i<toEvict; i++) { //选择一个随机项目(Knuth随机算法),随机剔除// Pick a random item (Knuth shuffle algorithm)intnext=i+random.nextInt(expiredLeases.size() -i); Collections.swap(expiredLeases, i, next); //获取剔除服务的LeaseLease<InstanceInfo>lease=expiredLeases.get(i); //应用名StringappName=lease.getHolder().getAppName(); //实例IDStringid=lease.getHolder().getId(); //expired Counter 过期计数增加EXPIRED.increment(); logger.warn("DS: Registry: expired lease for {}/{}", appName, id); //内部取消internalCancel(appName, id, false); } } }
这里做了如下事情
- 1.判断是否开启过期驱逐
- 2.获取到所有的过期的服务,通过Lease.isExpired判断过期
- 3.计算一个驱逐极限值 :min( 过期数 ,注册表服务数 - 注册表服务数 * 0.85(续约阈值百分比) )
- 4.如果驱逐极限值 > 0 ,那就从过期的服务中随机驱逐 “驱逐极限”个服务
- 5.调用internalCancel方法消息服务
我们可以看下Lease.isExpired是如何判断实例过期的
/*** Checks if the lease of a given {@link com.netflix.appinfo.InstanceInfo} has expired or not.** Note that due to renew() doing the 'wrong" thing and setting lastUpdateTimestamp to +duration more than* what it should be, the expiry will actually be 2 * duration. This is a minor bug and should only affect* instances that ungracefully shutdown. Due to possible wide ranging impact to existing usage, this will* not be fixed.** @param additionalLeaseMs any additional lease time to add to the lease evaluation in ms.*/publicbooleanisExpired(longadditionalLeaseMs) { return (evictionTimestamp>0||System.currentTimeMillis() > (lastUpdateTimestamp+duration+additionalLeaseMs)); }
这里给的过期计算方式是: evictionTimestamp (剔除时间戳) > 0 || 最后更新时间戳 + 租期(90s) + 补偿时间 。
但是有意思的是这个方法上的注释说了一个问题:它说由于renew()做了“错误”的事情,将lastUpdateTimestamp设置为+duration,超过了它应该的值,因此到期实际上是2 * duration,这个是个小问题,没有什么影响就没做修改。意思是renew方法中的lastUpdateTimestamp时间 不应该 + duration租期时间,这超过了它应该的值,因此到期实际上是2 * duration
publicvoidrenew() { lastUpdateTimestamp=System.currentTimeMillis() +duration; }
当然这个问题没有太大影响,所以没做改正,注释上也说明白了这个问题,这个方法我们就看到这
继续看一下internalCancel内部取消服务的方法
/*** {@link #cancel(String, String, boolean)} method is overridden by {@link PeerAwareInstanceRegistry}, so each* cancel request is replicated to the peers. This is however not desired for expires which would be counted* in the remote peers as valid cancellations, so self preservation mode would not kick-in.*/protectedbooleaninternalCancel(StringappName, Stringid, booleanisReplication) { try { //上锁read.lock(); //服务取消数增加CANCEL.increment(isReplication); //获取当前提出的服务Map<String, Lease<InstanceInfo>>gMap=registry.get(appName); Lease<InstanceInfo>leaseToCancel=null; if (gMap!=null) { //从服务注册的map中移除掉当前服务leaseToCancel=gMap.remove(id); } //添加到最近取消队列synchronized (recentCanceledQueue) { recentCanceledQueue.add(newPair<Long, String>(System.currentTimeMillis(), appName+"("+id+")")); } //overriddenInstanceStatusMap 服务状态map中移除当前服务InstanceStatusinstanceStatus=overriddenInstanceStatusMap.remove(id); if (instanceStatus!=null) { logger.debug("Removed instance id {} from the overridden map which has value {}", id, instanceStatus.name()); } if (leaseToCancel==null) { //没找到服务CANCEL_NOT_FOUND.increment(isReplication); logger.warn("DS: Registry: cancel failed because Lease is not registered for: {}/{}", appName, id); returnfalse; } else { //调用Lease.cancel方法leaseToCancel.cancel(); //获取服务实例信息InstanceInfoinstanceInfo=leaseToCancel.getHolder(); Stringvip=null; Stringsvip=null; if (instanceInfo!=null) { //实例状态修改为删除instanceInfo.setActionType(ActionType.DELETED); //添加最近修改队列recentlyChangedQueue.add(newRecentlyChangedItem(leaseToCancel)); //实例信息对象修改最后修改时间instanceInfo.setLastUpdatedTimestamp(); vip=instanceInfo.getVIPAddress(); svip=instanceInfo.getSecureVipAddress(); } //使缓存无效,调用responseCache.invalidate让服务在缓存中失效invalidateCache(appName, vip, svip); logger.info("Cancelled instance {}/{} (replication={})", appName, id, isReplication); returntrue; } } finally { read.unlock(); } }
这里主要是根据当前要取消的服务名从registry中查询出服务之后做了这些事情
- 1.从registry中移除服务,
- 2.从overriddenInstanceStatusMap状态map中移除服务状态
- 3.添加到最近取消队列
- 4.调用Lease.cancel方法,将租约对象中的逐出时间修改为当前时间
- 5.修改服务的InstanceInfo的状态为DELETE
- 6.添加到最近修改队列
- 7.更新服务最后修改时间
- 8.使ReponseCache缓存无效
总结