OceanBase 源码解读（十一）：Location Cache 模块浅析

2022-05-23 361

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： OceanBase 源码解读（十一）：Location Cache 模块浅析

此前，OceanBase 源码解读第十篇《一号表及其服务寻址》，为大家介绍了“系统租户的「一号表」”的前世今生，并对一号表相关的服务寻址过程进行了说明。本期“源码解读”继续由 OceanBase 内核研发工程师镇楠为大家带来文中提及的 Location Cache 模块浅析。

Location cache 是 observer 上的一个基础模块，为 SQL、事务、CLOG 等多个其他模块提供获取及缓存某个副本位置信息的能力。Location cache 依赖各级 meta 表以及底层 partition_service、log_service 的服务获取副本的位置信息，通过各模块的调用被动刷新及缓存 cache。同一个 observer 各模块共享同一个 Location cache。

图片.png

location cache 的缓存内容

在 OceanBase 集群中，各个副本的位置信息记录在 meta table 中。如果每次访问副本，都要发 SQL 去 meta table 里面找一遍位置，未免效率太低。因此我们在每个 ObServer 中都会缓存实体表的位置信息，由 location cache 模块负责管理，在 ObPartitionLocationCache 中实现。其主要缓存内容如下：

一、核心 location cache

sys_cache_ 缓存系统表 location 信息，user_cache_ 缓存用户表 location 信息。

之所以采用不同数据结构分开存储，是为了避免用户表数量过大，将系统表的 location cache 挤掉。

图片.png

二、leader cache

sys_leader_cache_：用来缓存本集群系统租户系统表的 leader 信息。

该 cache 的提出是为了在有限场景上支持不依赖内部表获取系统租户系统表 leader 的功能。一方面是为了解决分布式事务推进获取不到系统表 leader 死锁的问题，另一方面可稍微优化获取系统租户系统表 leader 的速度。

leader_cahce_ 用来缓存用户表的 leader 信息。

引入它是考虑到从 KVCache 的结构中获取 location 有一定消耗，又增加一层 leader_cache_ 来缓存 leader 位置信息，从而优化 nonblock_get_leader() 方法的 location 获取路径。

图片.png

location cache 模块对外提供的能力

location cache 模块缓存以各个 observer 为主体，将访问过的实体表的位置信息缓存在本地。location cache 采用被动刷新机制，当其他内部模块发现cache失效时，需调用刷新接口刷新 cache。

对应缓存内容，location cache 模块对外提供获取具体 pkey(pgkey) 对应的 partition 和 leader 位置信息的能力。主要应用于 SQL、Proxy、storage、transaction、clog（后两者重点关注 leader 信息）等模块，接口如下所示：

//同步接口：
int ObPartitionLocationCache::get(const uint64_t table_id,
                                  const int64_t partition_id,
                                  ObPartitionLocation &location,
                                  const int64_t expire_renew_time,  //该参数为INT64_MAX时，表示强制刷新。
                                  bool &is_cache_hit,
                                  const bool auto_update /*= true*/) //get函数带有刷新功能
int ObPartitionLocationCache::get_strong_leader(const common::ObPartitionKey &partition,
                                                common::ObAddr &leader,
                                                const bool force_renew) //本质上通过get函数获取leader，也带刷新功能
//异步接口：
int ObPartitionLocationCache::nonblock_get(const uint64_t table_id,
                                           const int64_t partition_id,
                                           ObPartitionLocation &location,
                                           const int64_t cluster_id)  //以nonblock方式从location cache中查询
int ObPartitionLocationCache::nonblock_get_strong_leader(const ObPartitionKey &partition, ObAddr &leader) //优先从leader cache中查，没有则走nonblock_get
int ObPartitionLocationCache::nonblock_renew(const ObPartitionKey &partition,
                                             const int64_t expire_renew_time,    
                                             const int64_t specific_cluster_id) //配合上面两个函数，访问失败则刷新location cache。通过ObLocationAsyncUpdateTask实现

其中 nonblock_renew() 通过任务队列的方式实现。

图片.png

其中通过多个队列划分优先级，加快异常场景恢复速度：

pall_root_update_queue_; // __all_core_table、__all_root_table、__all_tenant_gts、__all_gts
prs_restart_queue_; // rs restart related sys table
psys_update_queue_; // other sys table in sys tenant
puser_ha_update_queue_; // __all_dummy、__all_tenant_meta_table
ptenant_space_update_queue_; // sys table in tenant space
puser_update_queue_; // user table

location cache 的刷新机制

主要刷新流程：

一、SQL 刷新——最初的刷新方式

SQL 刷新顾名思义，就是通过 SQL 语句去 meta table 中查询 location 信息来刷新本地 cache。该刷新过程与汇报过程如出一辙。

图片.png

SQL 刷新依赖与 meta table 可读以及汇报流程运行正常。SQL 刷新存在一定延时，例如，在 leader 变更，汇报正在进行的时候进行 SQL 刷新，返回的是仍是旧的 leader。该问题通过再次刷新即可解决。

二、RPC 刷新——不依赖 meta table 的刷新方式

为什么提出 RPC 刷新？

因为 SQL 刷新依赖 meta 表、SQL 模块、底层汇报，在网络异常的情况下，会导致location cache一直无法刷新。比如下图中，meta 表所在的 server A 与 DEF 网络断连，D 发出刷新请求，此时无法 SQL 刷新，但可以通过 EF 上同一 partition 的副本 location 信息确定 D 的 location。一定程度上减少对于 meta table 的依赖，减少 SQL 查询消耗，加速 cache 刷新。

图片.png

RPC 刷新的实现

中心思想：通过旧 cache 拿到本 region 的 partition 的所有副本位置信息，RPC 到对应的 server 上通过 partition service 获取 member_info（含leader、member_list、lower_list等），通过对比 member_info 和旧的 location 信息，感知 leader 和 replica type 的变化（F->L）。

成功刷新的条件：member_list 未改变，non_paxos 成员未改变。

RPC 刷新的优缺点：

优点：消耗很小，能更快速地感知到 leader 和成员列表中副本变更，针对无主选举、leader 改选等场景有较好的效果。所以我们会优先使用 RPC 刷新。

缺点：RPC 刷新实现为了尽可能高效，仅保证该副本的 leader、paxos 成员列表、直接级联在本 region paxos 成员下只读副本的 location 信息是准确的，无法感知到二级及以上级联的只读副本的变更，也无法感知级联在其他 region 下的只读副本变更。

三、其他机制

强制 SQL 刷新

目的：RPC 刷新无法感知到异地只读副本变化、二级及以上的级联只读副本变化；

方法：

定时 SQL 刷新，FORCE_REFRESH_LOCATION_CACHE_INTERVAL
限制每秒SQL刷新次数，FORCE_REFRESH_LOCATION_CACHE_THRESHOLD

批量刷新

目的：优化 RTO 场景 location cache 刷新速度，对日常 SQL 执行也有优化效果；

方法：

按 partition table 类型分类任务
按 Sys partitions 和 user partitions分类
按租户分类
__all_core_table/__all_root_table 的 location_cache 单独刷新

虚表的 location cache

虚表没有存储实体，在查询的时候才会按照特定规则生成。为了统一 SQL 层的查询逻辑，Location cache 模块为虚表的查询提供了特殊的“位置信息”。

一、虚表的分类

从分布情况的角度，虚表可以分为以下三类

LOC_DIST_MODE_ONLY_LOCAL：只在本地执行的虚表
LOC_DIST_MODE_DISTRIBUTED：分布式执行的虚表（包括集群级和租户级）
LOC_DIST_MODE_ONLY_RS：需要 RS 上才能执行的虚表

二、虚表获取 location

//关键函数：
int ObSqlPartitionLocationCache::virtual_get(const uint64_t table_id,const int64_t partition_id,share::ObPartitionLocation &location,const int64_t expire_renew_time,bool &is_cache_hit)
//LOC_DIST_MODE_ONLY_LOCAL：//只需要自身的地址
int ObSqlPartitionLocationCache::build_local_location(uint64_t table_id,ObPartitionLocation &location)
|-replica_location.server_ = self_addr_; 
//LOC_DIST_MODE_DISTRIBUTED：//本质上就是集群的server_list
int ObSqlPartitionLocationCache::build_distribute_location(uint64_t table_id, const int64_t partition_id,ObPartitionLocation &location)
|-int ObTaskExecutorCtx::get_addr_by_virtual_partition_id(int64_t partition_id, ObAddr &addr)
//LOC_DIST_MODE_ONLY_RS：//本质上就是RS的位置
int ObSqlPartitionLocationCache::get(const uint64_t table_id,ObIArray<ObPartitionLocation> &locations,const int64_t expire_renew_time,bool &is_cache_hit,const bool auto_update /*=true*/)
|-int ObPartitionLocationCache::get(const uint64_t table_id,ObIArray<ObPartitionLocation> &locations,const int64_t expire_renew_time,bool &is_cache_hit,const bool auto_update /*=true*/)
| |-int ObPartitionLocationCache::vtable_get(const uint64_t table_id,ObIArray<ObPartitionLocation> &locations,const int64_t expire_renew_time,bool &is_cache_hit)
| | |- int renew_vtable_location(const uint64_t table_id,common::ObSArray<ObPartitionLocation> &locations);

言而总之，我们将虚表的 location cache 概括成三句话：

对于本地执行的虚表，Location cache 模块返回本地 server 的地址；
对于分布式执行的虚表，Location cache 模块返回集群的 server_list；
对于 RS 执行的虚表，Location cache 模块返回 RS 所在 server 的地址。

本期的源码解读到此结束，感谢大家的阅读，敬请关注下期 “OceanBase 带你读源码”！

OceanBase 源码解读（十一）：Location Cache 模块浅析

location cache 的缓存内容

一、核心 location cache

二、leader cache

location cache 模块对外提供的能力

location cache 的刷新机制

一、SQL 刷新——最初的刷新方式

二、RPC 刷新——不依赖 meta table 的刷新方式

三、其他机制

虚表的 location cache

一、虚表的分类

二、虚表获取 location

OceanBase

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

直播

下载

镜像站

技术资料

OceanBase 源码解读（十一）：Location Cache 模块浅析

location cache 的缓存内容

一、核心 location cache

二、leader cache

location cache 模块对外提供的能力

location cache 的刷新机制

一、SQL 刷新——最初的刷新方式

二、RPC 刷新——不依赖 meta table 的刷新方式

三、其他机制

虚表的 location cache

一、虚表的分类

二、虚表获取 location

OceanBase

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像