系列文章目录
二.SpringCloud源码剖析-Eureka Client 初始化过程
五.SpringCloud源码剖析-Eureka Client服务续约
六.SpringCloud源码剖析-Eureka Client取消注册
七.SpringCloud源码剖析-Eureka Server的自动配置
八.SpringCloud源码剖析-Eureka Server初始化流程
九.SpringCloud源码剖析-Eureka Server服务注册流程
十.SpringCloud源码剖析-Eureka Server服务续约
十一.SpringCloud源码剖析-Eureka Server服务注册表拉取
Eureka Server服务剔除
这一章我们来分析一下Eureka Server 的服务剔除,它是通过定时任务完成的,在EurekaBootStrap启动引导的initEurekaServerContext上下文初始化方法中,调用了这么一行代码registry.openForTraffic(applicationInfoManager, registryCount);在该方法中又调用了com.netflix.eureka.registry.AbstractInstanceRegistry#postInit方法来初始化服务剔除的定时任务
public abstract class AbstractInstanceRegistry implements InstanceRegistry {
protected void postInit() {
renewsLastMin.start();
if (evictionTaskRef.get() != null) {
//如果服务剔除任务不为空,就执行cancel方法,该方法把任务的状态修改为了cancel任务取消
evictionTaskRef.get().cancel();
}
//创建新的服务剔除任务
evictionTaskRef.set(new EvictionTask());
//交给调度器去执行,延迟60s,每60s执行一次驱逐任务
evictionTimer.schedule(evictionTaskRef.get(),
serverConfig.getEvictionIntervalTimerInMs(), //60s 逐出间隔计时器
serverConfig.getEvictionIntervalTimerInMs());
}
/* visible for testing */
class EvictionTask extends TimerTask {
private final AtomicLong lastExecutionNanosRef = new AtomicLong(0l);
@Override
public void run() {
try {
//计算任务执行的时间偏差:补偿时间
long compensationTimeMs = getCompensationTimeMs();
logger.info("Running the evict task with compensationTime {}ms", compensationTimeMs);
//执行驱逐
evict(compensationTimeMs);
} catch (Throwable e) {
logger.error("Could not run the evict task", e);
}
}
在驱逐任务中,计算了任务执行的时间偏差即补偿时间,然后调用com.netflix.eureka.registry.AbstractInstanceRegistry#evict(long)执行服务的剔除逻辑
public void evict(long additionalLeaseMs) {
logger.debug("Running the evict task");
if (!isLeaseExpirationEnabled()) {
//如果没启用租约到期,直接返回
logger.debug("DS: lease expiration is currently disabled.");
return;
}
//首先收集所有过期的服务,以随机顺序将其逐出
// We collect first all expired items, to evict them in random order. For large eviction sets,
// if we do not that, we might wipe out whole apps before self preservation kicks in. By randomizing it,
// the impact should be evenly distributed across all applications.
List<Lease<InstanceInfo>> expiredLeases = new ArrayList<>();
//循环注册表中的所有的服务
for (Entry<String, Map<String, Lease<InstanceInfo>>> groupEntry : registry.entrySet()) {
Map<String, Lease<InstanceInfo>> leaseMap = groupEntry.getValue();
if (leaseMap != null) {
//获取到租约
for (Entry<String, Lease<InstanceInfo>> leaseEntry : leaseMap.entrySet()) {
Lease<InstanceInfo> lease = leaseEntry.getValue();
//如果服务过期,就把服务添加到expiredLeases map中
if (lease.isExpired(additionalLeaseMs) && lease.getHolder() != null) {
expiredLeases.add(lease);
}
}
}
}
//为了补偿GC暂停或本地时间差异导致的剔除任务执行时间差异,使用当前注册表大小作为触发自我保存的基础
//否则,将清除完整的注册表。
// To compensate for GC pauses or drifting local time, we need to use current registry size as a base for
// triggering self-preservation. Without that we would wipe out full registry.
//注册表大小
int registrySize = (int) getLocalRegistrySize();
//注册表中服务的续约阈值 = 注册大小 * 0.85
int registrySizeThreshold = (int) (registrySize * serverConfig.getRenewalPercentThreshold());
//驱逐极限 = 注册表大小 - 注册表续约阈值
int evictionLimit = registrySize - registrySizeThreshold;
//过期的服务数 和 evictionLimit 取最小 ,如果大于 0 说明需要有服务要剔除
int toEvict = Math.min(expiredLeases.size(), evictionLimit);
if (toEvict > 0) {
//剔除 toEvict 个
logger.info("Evicting {} items (expired={}, evictionLimit={})", toEvict, expiredLeases.size(), evictionLimit);
//取随机值
Random random = new Random(System.currentTimeMillis());
for (int i = 0; i < toEvict; i++) {
//选择一个随机项目(Knuth随机算法),随机剔除
// Pick a random item (Knuth shuffle algorithm)
int next = i + random.nextInt(expiredLeases.size() - i);
Collections.swap(expiredLeases, i, next);
//获取剔除服务的Lease
Lease<InstanceInfo> lease = expiredLeases.get(i);
//应用名
String appName = lease.getHolder().getAppName();
//实例ID
String id = lease.getHolder().getId();
//expired Counter 过期计数增加
EXPIRED.increment();
logger.warn("DS: Registry: expired lease for {}/{}", appName, id); //内部取消
internalCancel(appName, id, false);
}
}
}
这里做了如下事情
- 1.判断是否开启过期驱逐
- 2.获取到所有的过期的服务,通过Lease.isExpired判断过期
- 3.计算一个驱逐极限值 :min( 过期数 ,注册表服务数 - 注册表服务数 * 0.85(续约阈值百分比) )
- 4.如果驱逐极限值 > 0 ,那就从过期的服务中随机驱逐 “驱逐极限”个服务
- 5.调用internalCancel方法消息服务
我们可以看下Lease.isExpired是如何判断实例过期的
/**
* Checks if the lease of a given {@link com.netflix.appinfo.InstanceInfo} has expired or not.
*
* Note that due to renew() doing the 'wrong" thing and setting lastUpdateTimestamp to +duration more than
* what it should be, the expiry will actually be 2 * duration. This is a minor bug and should only affect
* instances that ungracefully shutdown. Due to possible wide ranging impact to existing usage, this will
* not be fixed.
*
* @param additionalLeaseMs any additional lease time to add to the lease evaluation in ms.
*/
public boolean isExpired(long additionalLeaseMs) {
return (evictionTimestamp > 0 || System.currentTimeMillis() > (lastUpdateTimestamp + duration + additionalLeaseMs));
}
这里给的过期计算方式是: evictionTimestamp (剔除时间戳) > 0 || 最后更新时间戳 + 租期(90s) + 补偿时间 。
但是有意思的是这个方法上的注释说了一个问题:它说由于renew()做了“错误”的事情,将lastUpdateTimestamp设置为+duration,超过了它应该的值,因此到期实际上是2 duration,这个是个小问题,没有什么影响就没做修改。意思是renew方法中的lastUpdateTimestamp时间 不应该 + duration租期时间,这超过了它应该的值,因此到期实际上是2 duration
public void renew() {
lastUpdateTimestamp = System.currentTimeMillis() + duration;
}
当然这个问题没有太大影响,所以没做改正,注释上也说明白了这个问题,这个方法我们就看到这
继续看一下internalCancel内部取消服务的方法
/**
* {@link #cancel(String, String, boolean)} method is overridden by {@link PeerAwareInstanceRegistry}, so each
* cancel request is replicated to the peers. This is however not desired for expires which would be counted
* in the remote peers as valid cancellations, so self preservation mode would not kick-in.
*/
protected boolean internalCancel(String appName, String id, boolean isReplication) {
try {
//上锁
read.lock();
//服务取消数增加
CANCEL.increment(isReplication);
//获取当前提出的服务
Map<String, Lease<InstanceInfo>> gMap = registry.get(appName);
Lease<InstanceInfo> leaseToCancel = null;
if (gMap != null) {
//从服务注册的map中移除掉当前服务
leaseToCancel = gMap.remove(id);
}
//添加到最近取消队列
synchronized (recentCanceledQueue) {
recentCanceledQueue.add(new Pair<Long, String>(System.currentTimeMillis(), appName + "(" + id + ")"));
}
//overriddenInstanceStatusMap 服务状态map中移除当前服务
InstanceStatus instanceStatus = overriddenInstanceStatusMap.remove(id);
if (instanceStatus != null) {
logger.debug("Removed instance id {} from the overridden map which has value {}", id, instanceStatus.name());
}
if (leaseToCancel == null) {
//没找到服务
CANCEL_NOT_FOUND.increment(isReplication);
logger.warn("DS: Registry: cancel failed because Lease is not registered for: {}/{}", appName, id);
return false;
} else {
//调用Lease.cancel方法
leaseToCancel.cancel();
//获取服务实例信息
InstanceInfo instanceInfo = leaseToCancel.getHolder();
String vip = null;
String svip = null;
if (instanceInfo != null) {
//实例状态修改为删除
instanceInfo.setActionType(ActionType.DELETED);
//添加最近修改队列
recentlyChangedQueue.add(new RecentlyChangedItem(leaseToCancel));
//实例信息对象修改最后修改时间
instanceInfo.setLastUpdatedTimestamp();
vip = instanceInfo.getVIPAddress();
svip = instanceInfo.getSecureVipAddress();
}
//使缓存无效,调用responseCache.invalidate让服务在缓存中失效
invalidateCache(appName, vip, svip);
logger.info("Cancelled instance {}/{} (replication={})", appName, id, isReplication);
return true;
}
} finally {
read.unlock();
}
}
这里主要是根据当前要取消的服务名从registry中查询出服务之后做了这些事情
- 1.从registry中移除服务,
- 2.从overriddenInstanceStatusMap状态map中移除服务状态
- 3.添加到最近取消队列
- 4.调用Lease.cancel方法,将租约对象中的逐出时间修改为当前时间
- 5.修改服务的InstanceInfo的状态为DELETE
- 6.添加到最近修改队列
- 7.更新服务最后修改时间
- 8.使ReponseCache缓存无效