前提
Nacos 支持单机部署以及集群部署
- 针对单机模式,Nacos 只是自己和自己通信;
- 对于集群模式,则集群内的每个 Nacos 成员都需要相互通信。
因此这就带来⼀个问题,该以何种方式去管理集群内的 Nacos 成员节点信息,而这,就是 Nacos 内部的寻址机制。
设计
无论是单机模式,还是集群模式,其根本区别只是 Nacos 成员节点的个数是单个还是多个
- 要能够感知到节点的变更情况:节点是增加了还是减少了;
- 当前最新的成员列表信息是什么;
- 以何种方式去管理成员列表信息;
- 如何快速的支持新的、更优秀的成员列表管理模式等等。
MemberLookup
针对上述需求点,抽象出了⼀个 MemberLookup 接口
package com.alibaba.nacos.core.cluster; import com.alibaba.nacos.api.exception.NacosException; import java.util.Collection; import java.util.Collections; import java.util.Map; /** * Member node addressing mode. * * @author <a href="mailto:liaochuntao@live.com">liaochuntao</a> */ public interface MemberLookup { /** * start. * * @throws NacosException NacosException */ void start() throws NacosException; /** * is using address server. * * @return using address server or not. */ boolean useAddressServer(); /** * Inject the ServerMemberManager property. * * @param memberManager {@link ServerMemberManager} */ void injectMemberManager(ServerMemberManager memberManager); /** * The addressing pattern finds cluster nodes. * * @param members {@link Collection} */ void afterLookup(Collection<Member> members); /** * Addressing mode closed. * * @throws NacosException NacosException */ void destroy() throws NacosException; /** * Some data information about the addressing pattern. * * @return {@link Map} */ default Map<String, Object> info() { return Collections.emptyMap(); } }
ServerMemberManager 存储着本节点所知道的所有成员节点列表信息,提供了针对成员节点的增删改查操作,同时维护了⼀个 MemberLookup 列表,方便进行动态切换成员节点寻址方式。
MemberLookup 接口非常简单,核心接口就两个— injectMemberManager 以及afterLookup ,前者用于将 ServerMemberManager 注入到 MemberLookup 中,方便利用ServerMemberManager 的存储、查询能力,后者 afterLookup 则是⼀个事件接口,当 MemberLookup 需要进行成员节点信息更新时,会将当前最新的成员节点列表信息通过该函数进行通知给ServerMemberManager,具体的节点管理方式,则是隐藏到具体的 MemberLookup 实现中。
内部实现
单机寻址 StandaloneMemberLookup
单机模式的寻址模式很简单,其实就是找到自己的 IP:PORT 组合信息,然后格式化为⼀个节点信息,调用 afterLookup 然后将信息存储到 ServerMemberManager 中。
package com.alibaba.nacos.core.cluster.lookup; import com.alibaba.nacos.core.cluster.AbstractMemberLookup; import com.alibaba.nacos.core.cluster.MemberUtil; import com.alibaba.nacos.sys.env.EnvUtil; import com.alibaba.nacos.sys.utils.InetUtils; import java.util.Collections; /** * Member node addressing mode in stand-alone mode. * * @author <a href="mailto:liaochuntao@live.com">liaochuntao</a> */ public class StandaloneMemberLookup extends AbstractMemberLookup { @Override public void doStart() { String url = InetUtils.getSelfIP() + ":" + EnvUtil.getPort(); afterLookup(MemberUtil.readServerConf(Collections.singletonList(url))); } @Override public boolean useAddressServer() { return false; } }
文件寻址 FileConfigMemberLookup
文件寻址模式是 Nacos 集群模式下的默认寻址实现。
文件寻址模式很简单,其实就是每个 Nacos节点需要维护⼀个叫做 cluster.conf 的文件。
192.168.16.101:8847 192.168.16.102 192.168.16.103
该文件默认只需要填写每个成员节点的 IP 信息即可,端口会自动选择 Nacos 的默认端口 8848,如过说有特殊需求更改了 Nacos 的端口信息,则需要在该文件将该节点的完整网路地址信息补充完整(IP:PORT)。
当 Nacos 节点启动时,会读取该文件的内容,然后将文件内的 IP 解析为节点列表,调用 afterLookup 存入 ServerMemberManager
/** * Cluster.conf file managed cluster member node addressing pattern. * * @author <a href="mailto:liaochuntao@live.com">liaochuntao</a> */ public class FileConfigMemberLookup extends AbstractMemberLookup { private static final String DEFAULT_SEARCH_SEQ = "cluster.conf"; private FileWatcher watcher = new FileWatcher() { @Override public void onChange(FileChangeEvent event) { readClusterConfFromDisk(); } @Override public boolean interest(String context) { return StringUtils.contains(context, DEFAULT_SEARCH_SEQ); } }; @Override public void doStart() throws NacosException { readClusterConfFromDisk(); // Use the inotify mechanism to monitor file changes and automatically // trigger the reading of cluster.conf try { WatchFileCenter.registerWatcher(EnvUtil.getConfPath(), watcher); } catch (Throwable e) { Loggers.CLUSTER.error("An exception occurred in the launch file monitor : {}", e.getMessage()); } } @Override public boolean useAddressServer() { return false; } @Override public void destroy() throws NacosException { WatchFileCenter.deregisterWatcher(EnvUtil.getConfPath(), watcher); } private void readClusterConfFromDisk() { Collection<Member> tmpMembers = new ArrayList<>(); try { List<String> tmp = EnvUtil.readClusterConf(); tmpMembers = MemberUtil.readServerConf(tmp); } catch (Throwable e) { Loggers.CLUSTER .error("nacos-XXXX [serverlist] failed to get serverlist from disk!, error : {}", e.getMessage()); } afterLookup(tmpMembers); } }
如果发现集群扩缩容,那么就需要修改每个 Nacos 节点下的 cluster.conf 文件,然后 Nacos 内部的文件变动监听中心会自动发现文件修改,重新读取文件内容、加载 IP 列表信息、更新新增的节点 (FileWatcher)
但是,这种默认寻址模式有⼀个缺点——运维成本较大,可以想象下,当你新增⼀个 Nacos 节点时,需要去手动修改每个 Nacos 节点下的 cluster.conf 文件,这是多么辛苦的⼀件工作,或者稍微高端⼀点,利用 ansible 等自动化部署的工具去推送 cluster.conf 文件去代替自己的手动操作,虽然说省去了较为繁琐的人工操作步骤,但是仍旧存在⼀个问题——每⼀个 Nacos 节点都存在⼀份cluster.conf 文件,如果其中⼀个节点的 cluster.conf 文件修改失败,就造成了集群间成员节点列表数据的不⼀致性,因此,又引申出了新的寻址模式——地址服务器寻址模式
地址服务器寻址 AddressServerMemberLookup
地址服务器寻址模式是 Nacos 官方推荐的⼀种集群成员节点信息管理,该模式利用了⼀个简易的web 服务器,用于管理 cluster.conf 文件的内容信息,这样,运维人员只需要管理这⼀份集群成员节点内容即可,而每个 Nacos 成员节点,只需要向这个 web 节点定时请求当前最新的集群成员节点列表信息即可。
public class AddressServerMemberLookup extends AbstractMemberLookup { private final GenericType<String> genericType = new GenericType<String>() { }; public String domainName; public String addressPort; public String addressUrl; public String envIdUrl; public String addressServerUrl; private volatile boolean isAddressServerHealth = true; private int addressServerFailCount = 0; private int maxFailCount = 12; private final NacosRestTemplate restTemplate = HttpClientBeanHolder.getNacosRestTemplate(Loggers.CORE); private volatile boolean shutdown = false; private static final String HEALTH_CHECK_FAIL_COUNT_PROPERTY = "maxHealthCheckFailCount"; private static final String DEFAULT_HEALTH_CHECK_FAIL_COUNT = "12"; private static final String DEFAULT_SERVER_DOMAIN = "jmenv.tbsite.net"; private static final String DEFAULT_SERVER_POINT = "8080"; private static final int DEFAULT_SERVER_RETRY_TIME = 5; private static final long DEFAULT_SYNC_TASK_DELAY_MS = 5_000L; private static final String ADDRESS_SERVER_DOMAIN_ENV = "address_server_domain"; private static final String ADDRESS_SERVER_DOMAIN_PROPERTY = "address.server.domain"; private static final String ADDRESS_SERVER_PORT_ENV = "address_server_port"; private static final String ADDRESS_SERVER_PORT_PROPERTY = "address.server.port"; private static final String ADDRESS_SERVER_URL_ENV = "address_server_url"; private static final String ADDRESS_SERVER_URL_PROPERTY = "address.server.url"; private static final String ADDRESS_SERVER_RETRY_PROPERTY = "nacos.core.address-server.retry"; @Override public void doStart() throws NacosException { this.maxFailCount = Integer.parseInt(EnvUtil.getProperty(HEALTH_CHECK_FAIL_COUNT_PROPERTY, DEFAULT_HEALTH_CHECK_FAIL_COUNT)); initAddressSys(); run(); } @Override public boolean useAddressServer() { return true; } private void initAddressSys() { String envDomainName = System.getenv(ADDRESS_SERVER_DOMAIN_ENV); if (StringUtils.isBlank(envDomainName)) { domainName = EnvUtil.getProperty(ADDRESS_SERVER_DOMAIN_PROPERTY, DEFAULT_SERVER_DOMAIN); } else { domainName = envDomainName; } String envAddressPort = System.getenv(ADDRESS_SERVER_PORT_ENV); if (StringUtils.isBlank(envAddressPort)) { addressPort = EnvUtil.getProperty(ADDRESS_SERVER_PORT_PROPERTY, DEFAULT_SERVER_POINT); } else { addressPort = envAddressPort; } String envAddressUrl = System.getenv(ADDRESS_SERVER_URL_ENV); if (StringUtils.isBlank(envAddressUrl)) { addressUrl = EnvUtil.getProperty(ADDRESS_SERVER_URL_PROPERTY, EnvUtil.getContextPath() + "/" + "serverlist"); } else { addressUrl = envAddressUrl; } addressServerUrl = "http://" + domainName + ":" + addressPort + addressUrl; envIdUrl = "http://" + domainName + ":" + addressPort + "/env"; Loggers.CORE.info("ServerListService address-server port:" + addressPort); Loggers.CORE.info("ADDRESS_SERVER_URL:" + addressServerUrl); } @SuppressWarnings("PMD.UndefineMagicConstantRule") private void run() throws NacosException { // With the address server, you need to perform a synchronous member node pull at startup // Repeat three times, successfully jump out boolean success = false; Throwable ex = null; int maxRetry = EnvUtil.getProperty(ADDRESS_SERVER_RETRY_PROPERTY, Integer.class, DEFAULT_SERVER_RETRY_TIME); for (int i = 0; i < maxRetry; i++) { try { syncFromAddressUrl(); success = true; break; } catch (Throwable e) { ex = e; Loggers.CLUSTER.error("[serverlist] exception, error : {}", ExceptionUtil.getAllExceptionMsg(ex)); } } if (!success) { throw new NacosException(NacosException.SERVER_ERROR, ex); } GlobalExecutor.scheduleByCommon(new AddressServerSyncTask(), DEFAULT_SYNC_TASK_DELAY_MS); } @Override public void destroy() throws NacosException { shutdown = true; } @Override public Map<String, Object> info() { Map<String, Object> info = new HashMap<>(4); info.put("addressServerHealth", isAddressServerHealth); info.put("addressServerUrl", addressServerUrl); info.put("envIdUrl", envIdUrl); info.put("addressServerFailCount", addressServerFailCount); return info; } private void syncFromAddressUrl() throws Exception { RestResult<String> result = restTemplate .get(addressServerUrl, Header.EMPTY, Query.EMPTY, genericType.getType()); if (result.ok()) { isAddressServerHealth = true; Reader reader = new StringReader(result.getData()); try { afterLookup(MemberUtil.readServerConf(EnvUtil.analyzeClusterConf(reader))); } catch (Throwable e) { Loggers.CLUSTER.error("[serverlist] exception for analyzeClusterConf, error : {}", ExceptionUtil.getAllExceptionMsg(e)); } addressServerFailCount = 0; } else { addressServerFailCount++; if (addressServerFailCount >= maxFailCount) { isAddressServerHealth = false; } Loggers.CLUSTER.error("[serverlist] failed to get serverlist, error code {}", result.getCode()); } } class AddressServerSyncTask implements Runnable { @Override public void run() { if (shutdown) { return; } try { syncFromAddressUrl(); } catch (Throwable ex) { addressServerFailCount++; if (addressServerFailCount >= maxFailCount) { isAddressServerHealth = false; } Loggers.CLUSTER.error("[serverlist] exception, error : {}", ExceptionUtil.getAllExceptionMsg(ex)); } finally { GlobalExecutor.scheduleByCommon(this, DEFAULT_SYNC_TASK_DELAY_MS); } } } }
因此,通过地址服务器这种模式,大大简化了 Nacos 集群节点管理的成本,同时,地址服务器是⼀个非常简单的 web 程序,其程序的稳定性能够得到很好的保障。
未来可扩展点
集群节点自动扩缩容
目前,Nacos 的集群节点管理,还都是属于人工操作,因此,未来期望能够基于寻址模式,实现集群节点自动管理的功能,能够实现新的节点上线时,只需要知道原有集群中的⼀个节点信息,就可以在⼀定时间内,顺利加入原有 Nacos 集群中;同时,也能够自行发现不存活的节点,自动将其从集群可用节点列表中剔出。这⼀块的逻辑实现,其实就类似 Consul 的 Gossip 协议。