MySQL 关于slave端Retrieved_Gtid_Set的读取改进初探

本文涉及的产品
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS MySQL,高可用系列 2核4GB
RDS MySQL Serverless 高可用系列,价值2615元额度,1个月
简介:

本文为以后学习SLAVE做一个记录不保证正确性。因为算法没看懂。


一、问题由来

今天朋友问我这样一个问题@K.I.S.S,在官方文档中有这样一段描述:

When using GTIDs, the slave tells the master which transactions it has already received, executed, or both. To compute this set, it reads 
the global value of gtid_executed and the value of the Retrieved_gtid_set column from SHOW SLAVE STATUS. The GTID of the last transmitted 
transaction is included in Retrieved_gtid_set only when the full transaction is received. The slave computes the following set:
UNION(@@global.gtid_executed, Retrieved_gtid_set)
Prior to MySQL 5.7.5, the GTID of the last transmitted transaction was included in Retrieved_gtid_set even if the transaction was only 
partially transmitted, and the last received GTID was subtracted from this set. (Bug #17943188) Thus, the slave computed the following set:
UNION(@@global.gtid_executed, Retrieved_gtid_set - last_received_GTID)

问为什么要减去last_received_GTID。

二、查看Bug #17943188的commit

对于这个问题我先查看了一下这个bug到底修复了什么问题如下:

 BUG#17943188 SHOW SLAVE STATUS/RETRIEVED_GTID_SET MAY HAVE PARTIAL TRX
                 OR MISS COMPLETE TRX
    
    Problem:
    =======
    
    The SHOW SLAVE STATUS command contains the column RETRIEVED_GTID_SET.
    This is supposed to contain the set of GTIDs that exist in the relay
    log. However, the field is updated when the slave receiver thread
    (I/O thread) receives a Gtid_log_event, which happens at the beginning
    of the transaction.
    
    If the I/O thread gets disconnected in the middle of a transaction,
    RETRIEVED_GTID_SET can contain a GTID for a transaction that is only
    partially received in the relay log. This transaction will
    subsequently be rolled back, so it is wrong to pretend that the
    transaction is there.
    
    Typical fail-over algorithms use RETRIEVED_GTID_SET to determine which
    slave has received the most transactions to promote the slave to a
    master. This is true for e.g. the mysqlfailover utility.
    
    When RETRIEVED_GTID_SET can contain partially transmitted transactions,
    the fail-over utility can choose the wrong slave to promote. This can
    lead to data corruption later.
    
    This means that even if semi-sync is enabled, transactions that have
    been acknowledged by one slave can be lost.
    
    Fix:
    ===
    
    It was implemented a transaction boundaries parser that will give
    information about transaction boundaries of an event stream based on
    the event types and their queries (when they are Query_log_event).
    
    As events are queued by the I/O thread, it feeds the Master_info
    transaction boundary parser. The slave I/O recovery also uses the
    transaction parser to determine if a given GTID can be added to the
    Retrieved_Gtid_Set or not.
    
    When the event parser is in GTID state because a Gtid_log_event was
    queued, the event's GTID isn't added to the retrieved list yet.
    It is stored in an auxiliary GTID variable.
    
    After flushing an event into the relay log, the IO thread verifies if
    the transaction parser is not inside a transaction anymore (meaning
    that the last event of the transaction has been flushed).
    
    If transaction parser is outside a transaction, the I/O thread
    verifies if a GTID was stored in the start of the transaction, adding
    it to the retrieved list, ensuring that all the transaction has arrived
    and was flushed to the relay log.
    
    Also, before this patch, after the I/O thread flushed a single received
    event into the relaylog, it was possible to rotate the relaylog if the
    current relaylog file size exceeded max_binlog_size/max_relaylog_size.
    After this patch, when GTIDs are enabled we only allow this rotation by
    size if the transaction parser is not in the middle of a transaction.
    
    Note: The current patch removed the changes for BUG#17280176, as it
          also dealt with similar problem in a different way.

大概就是说对于某些I/O线程并没有完整传输的Gtid事物记录到了RETRIEVED_GTID_SET中这会导致比较严重问题,因为某些监控工具根据这个来判断是否切换之类的,因此我们加入了一个事物边界分析器来判断事物是否完整传输,如果完整传输才记录到RETRIEVED_GTID_SET中这是5.7.5过后加入的。总的说来为了进行两个问题的修复:

  • 1、最后一个gtid 事物是否完整。
  • 2、跨越多个relay log的binlog 得到正确的gtid集合。

三、修改了什么

其实修改得非常多,不一一列举,有兴趣可以自己看看commit 9dab9dad975d09b8f37f33bf3c522d36fdf1d0f9,这里列举几个我看了的地方。

1、在 MYSQL_BIN_LOG::init_gtid_sets加入了如下逻辑
/*
      If we use GTIDs and have partial transactions on the relay log,
      must check if it ends on next relay log files.
      We also need to feed the boundary parser with the rest of the
      relay log to put it in the correct state before receiving new
      events from the master in the case of GTID auto positioning be
      disabled.
    */
    if (is_relay_log)
    {
      /*
        Suppose the following relaylog:

         rl-bin.000001 | rl-bin.000002 | rl-bin.000003 | rl-bin-000004
        ---------------+---------------+---------------+---------------
         PREV_GTIDS    | PREV_GTIDS    | PREV_GTIDS    | PREV_GTIDS
         (empty)       | (UUID:1)      | (UUID:1)      | (UUID:1)
        ---------------+---------------+---------------+---------------
         GTID(UUID:1)  | QUERY(INSERT) | QUERY(INSERT) | XID
        ---------------+---------------+---------------+---------------
         QUERY(CREATE  |
         TABLE t1 ...) |
        ---------------+
         GTID(UUID:2)  |
        ---------------+
         QUERY(BEGIN)  |
        ---------------+

        As it is impossible to determine the current Retrieved_Gtid_Set by only
        looking to the PREVIOUS_GTIDS on the last relay log file, and scanning
        events on it, we tried to find a relay log file that contains at least
        one GTID event during the backwards search.

        In the example, we will find a GTID only in rl-bin.000001, as the
        UUID:2 transaction was spanned across 4 relay log files.

        The transaction spanning can be caused by "FLUSH RELAY LOGS" commands
        on slave while it is queuing the transaction.

        So, in order to correctly add UUID:2 into Retrieved_Gtid_Set, we need
        to parse the relay log starting on the file we found the last GTID
        queued to know if the transaction was fully retrieved or not.
      */

      /*
        Adjust the reverse iterator to point to the relaylog file we
        need to start parsing, as it was incremented after generating
        the relay log file name.
      */
      rit--;//回退一个文件 因为前面为rit++做操作  exp:1<2<3<{4}<5<6<7  --->   1<2<3<4<{5}<6<7 
      /* Reset the transaction parser before feeding it with events */
      trx_parser->reset();
      gtid_partial_trx->clear();

      DBUG_PRINT("info", ("Iterating forwards through relay logs, "
                          "updating the Retrieved_Gtid_Set and updating "
                          "IO thread trx parser before start."));
      for (it= find(filename_list.begin(), filename_list.end(), *rit);
           it != filename_list.end(); it++)//从匹配的位置继续向后
      {
        const char *filename= it->c_str();
        DBUG_PRINT("info", ("filename='%s'", filename));
        if (read_gtids_and_update_trx_parser_from_relaylog(filename, all_gtids,
                                                           true, trx_parser,
                                                           gtid_partial_trx))
        {
          error= 1;
          goto end;
        }
      }
    }
  }

其实这一块也说明了解决的什么问题,我们发现一个事物的binlog event 在relay log中是可以跨文件的。而在bin log中是不能跨文件的。仅仅判断最后一个gtid priv event 是不正确的。因此需要这样修改。

2、其次加入了边界分析器Transaction_boundary_parser类

这个类是完全新加入的,这里是其中的一些状态:

num enum_event_boundary_type {
    EVENT_BOUNDARY_TYPE_ERROR= -1,
    /* Gtid_log_event */
    EVENT_BOUNDARY_TYPE_GTID= 0,
    /* Query_log_event(BEGIN), Query_log_event(XA START) */
    EVENT_BOUNDARY_TYPE_BEGIN_TRX= 1,
    /* Xid, Query_log_event(COMMIT), Query_log_event(ROLLBACK), XA_Prepare_log_event */
    EVENT_BOUNDARY_TYPE_END_TRX= 2,
    /* Query_log_event(XA ROLLBACK) */
    EVENT_BOUNDARY_TYPE_END_XA_TRX= 3,
    /* User_var, Intvar and Rand */
    EVENT_BOUNDARY_TYPE_PRE_STATEMENT= 4,
    /*
      All other Query_log_events and all other DML events
      (Rows, Load_data, etc.)
    */
    EVENT_BOUNDARY_TYPE_STATEMENT= 5,
    /*
      All non DDL/DML events: Format_desc, Rotate, Incident,
      Previous_gtids, Stop, etc.
    */
    EVENT_BOUNDARY_TYPE_IGNORE= 6
  };

  /*
    Internal states for parsing a stream of events.

    DDL has the format:
      DDL-1: [GTID]
      DDL-2: [User] [Intvar] [Rand]
      DDL-3: Query

    DML has the format:
      DML-1: [GTID]
      DML-2: Query(BEGIN)
      DML-3: Statements
      DML-4: (Query(COMMIT) | Query([XA] ROLLBACK) | Xid | Xa_prepare)
  */
  enum enum_event_parser_state {
    /* NONE is set after DDL-3 or DML-4 */
    EVENT_PARSER_NONE,
    /* GTID is set after DDL-1 or DML-1 */
    EVENT_PARSER_GTID,
    /* DDL is set after DDL-2 */
    EVENT_PARSER_DDL,
    /* DML is set after DML-2 */
    EVENT_PARSER_DML,
    /* ERROR is set whenever the above pattern is not followed */
    EVENT_PARSER_ERROR
  };
3、read_gtids_and_update_trx_parser_from_relaylog函数新增

这个函数是完全新加入的就是为了完成所说的功能,在read_gtids_and_update_trx_parser_from_relaylog中我看到对文件所有event进行了读取,并且用switch进行了不同event类型的处理,但是具体没有细看。但是最后看到对于对于是否加入retrieve gtid的判断如下:

 /*
        If we reached the end of a transaction after storing it's GTID
        in gtid_partial_trx variable, it is time to add this GTID to the
        retrieved_gtids set because the transaction is complete and there is no
        need for asking this transaction again.
      */
      if (trx_parser->is_not_inside_transaction())
      {
        if (!gtid_partial_trx->is_empty())
        {
          DBUG_PRINT("info", ("Adding Gtid to Retrieved_Gtid_Set as the "
                              "transaction was completed at "
                              "relaylog file '%s': Gtid(%d, %lld).",
                              filename, gtid_partial_trx->sidno,
                              gtid_partial_trx->gno));
          retrieved_gtids->_add_gtid(gtid_partial_trx->sidno,
                                     gtid_partial_trx->gno);
          gtid_partial_trx->clear();
        }
      }

四、初始化的时候retrieve gtid到底如何计算

实际上就在现在看来应该就是读取 relay_log的最后一个gtid事物(gtid event或者gtid priv event)同时需要判断此gtid事物是否完整。对于官方文档给出的UNION(@@global.gtid_executed, Retrieved_gtid_set - last_received_GTID),个人觉得这里的 last_received_GTID应该是经过判断的,如果完整则不减,如果不完整则减去。

作者微信:
微信.jpg

相关实践学习
每个IT人都想学的“Web应用上云经典架构”实战
本实验从Web应用上云这个最基本的、最普遍的需求出发,帮助IT从业者们通过“阿里云Web应用上云解决方案”,了解一个企业级Web应用上云的常见架构,了解如何构建一个高可用、可扩展的企业级应用架构。
MySQL数据库入门学习
本课程通过最流行的开源数据库MySQL带你了解数据库的世界。 &nbsp; 相关的阿里云产品:云数据库RDS MySQL 版 阿里云关系型数据库RDS(Relational Database Service)是一种稳定可靠、可弹性伸缩的在线数据库服务,提供容灾、备份、恢复、迁移等方面的全套解决方案,彻底解决数据库运维的烦恼。 了解产品详情:&nbsp;https://www.aliyun.com/product/rds/mysql&nbsp;
相关文章
|
6月前
|
canal 关系型数据库 MySQL
Canal是怎么伪装成 MySQL slave?
Canal是怎么伪装成 MySQL slave?
|
6月前
|
SQL 关系型数据库 MySQL
【YashanDB知识库】MySQL的FIND_IN_SET如何在YashanDB改写
本文来自YashanDB官网,探讨了MySQL中使用`FIND_IN_SET`函数在YashanDB中不兼容的问题及解决方法。由于YashanDB的`tinyint`类型无法参与条件运算,而MySQL的Boolean类型是`tinyint`的同义词,导致直接迁移时出现错误。文章通过分析原因,提出改写方案:将`where find_in_set(`替换为`where 0 &lt; find_in_set(`,从而实现兼容性调整,方便用户快速迁移和改写SQL语句。
|
存储 SQL 关系型数据库
MySQL 中character_set_server 和collation_server
在MySQL中,`character_set_server` 和 `collation_server` 是两个重要的系统变量,它们分别用于定义服务器级别的字符集和排序规则。 1. **character_set_server**: * 这个变量定义了MySQL服务器使用的默认字符集。字符集是一组字符的编码方式,它决定了如何在数据库中存储和检索字符数据。 * 例如,`utf8` 是一个常用的字符集,它支持大部分Unicode字符,包括中文、英文、阿拉伯文等。 * 你可以通过以下SQL命令查看当前的 `character_set_server` 设置: ``` sql`SHOW V
1864 1
|
存储 JSON 关系型数据库
mysql中find_in_set()函数用法详解及增强函数
总结而言,`FIND_IN_SET()`是MySQL中处理由逗号分隔的字符串列表的一种便捷方法,尤其适用于列表相对较短且不经常更改的场景。然而,对于更为复杂的需要高性能和可扩展性的数据库设计,它可能不是最优选择,应考虑使用更加正规化的数据库结构。
1745 2
mysql中find_in_set()函数用法详解及增强函数
|
11月前
|
存储 关系型数据库 MySQL
MySQL中利用FIND_IN_SET进行包含查询的技巧
`FIND_IN_SET`提供了一种简便的方法来执行包含查询,尤其是当数据以逗号分隔的字符串形式存储时。虽然这个方法的性能可能不如使用专门的关系表,但在某些场景下,它提供了快速简便的解决方案。开发者应该根据具体的应用场景和性能要求,权衡其使用。
420 0
|
存储 关系型数据库 MySQL
【Elasticsearch】在es中实现mysql中的FIND_IN_SET查询条件
【Elasticsearch】在es中实现mysql中的FIND_IN_SET查询条件
262 0
|
SQL 关系型数据库 MySQL
MySQL外键约束行为解析:CASCADE, NO ACTION, RESTRICT, SET NULL
MySQL外键约束行为解析:CASCADE, NO ACTION, RESTRICT, SET NULL
1656 0
|
SQL 关系型数据库 MySQL
mysql从库SHOW SLAVE STATUS字段详解
mysql从库SHOW SLAVE STATUS字段详解
640 0
|
SQL 关系型数据库 MySQL
Mysql主从同步报错解决:Fatal error: The slave I/O thread stops because ..
Mysql主从同步报错解决:Fatal error: The slave I/O thread stops because ..
1384 0

推荐镜像

更多