RM在seata AT模式中如何实现分支事务提交或回滚-阿里云开发者社区

前言

在之前的博客中，我们已经知道了，RM分支事务的提交或回滚是由TC服务下发的指令触发的。

第一种情况：当任意RM的业务处理出现异常，都会触发TM发起全局事务的回滚，相关的回滚指令由TM下发给TC，最终TC把回滚指令依次下发给所有的RM，通过所有分支事务的回滚达到全局事务回滚的目的；
第二种情况：当所有RM都成功提交分支事务后，TM发起全局事务提交指令给到TC服务，TC收到指令后，同样会依次调用所有RM发起分支事务的提交，以便达到全局事务提交的目的；

这篇博客的目的就是介绍RM是如何提交或回滚分支事务的。

TC与RM间的通信

因为RM与TC之间的通信也是基于Netty实现，所以TC下发的提交请求必然是需要对应的处理器来处理的：

public class DefaultRMHandler extends AbstractRMHandler {
  // 接收TC下发的提交请求
  @Override
    public BranchCommitResponse handle(BranchCommitRequest request) {
        MDC.put(RootContext.MDC_KEY_XID, request.getXid());
        MDC.put(RootContext.MDC_KEY_BRANCH_ID, String.valueOf(request.getBranchId()));
        return getRMHandler(request.getBranchType()).handle(request);
    }
    // 接收TC下发的回滚请求
    @Override
    public BranchRollbackResponse handle(BranchRollbackRequest request) {
        MDC.put(RootContext.MDC_KEY_XID, request.getXid());
        MDC.put(RootContext.MDC_KEY_BRANCH_ID, String.valueOf(request.getBranchId()));
        return getRMHandler(request.getBranchType()).handle(request);
    }
}
// 上面的提交或回滚请求，最终会调用到AbstractRMHandler里面的逻辑
public abstract class AbstractRMHandler extends AbstractExceptionHandler
    implements RMInboundHandler, TransactionMessageHandler {
  // 下面使用到了最熟悉的模版模式
  @Override
    public BranchCommitResponse handle(BranchCommitRequest request) {
        BranchCommitResponse response = new BranchCommitResponse();
        // 模版模式
        exceptionHandleTemplate(new AbstractCallback<BranchCommitRequest, BranchCommitResponse>() {
            @Override
            public void execute(BranchCommitRequest request, BranchCommitResponse response)
                throws TransactionException {
                doBranchCommit(request, response);
            }
        }, request, response);
        return response;
    }
  // 回滚也是一样，使用模版模式
    @Override
    public BranchRollbackResponse handle(BranchRollbackRequest request) {
        BranchRollbackResponse response = new BranchRollbackResponse();
        exceptionHandleTemplate(new AbstractCallback<BranchRollbackRequest, BranchRollbackResponse>() {
            @Override
            public void execute(BranchRollbackRequest request, BranchRollbackResponse response)
                throws TransactionException {
                doBranchRollback(request, response);
            }
        }, request, response);
        return response;
    }
}
复制代码

根据上面的源码可知，提交或回滚请求进来后，通过一系列的调用，最终到达了AbstractRMHandler类的处理逻辑中，最核心的逻辑在doBranchCommit(request, response)和doBranchRollback(request, response)中；

RM分支事务提交

AbstractRMHandler.doBranchCommit(request, response)

protected void doBranchCommit(BranchCommitRequest request, BranchCommitResponse response)
        throws TransactionException {
        // 拆解请求参数
        String xid = request.getXid();
        long branchId = request.getBranchId();
        String resourceId = request.getResourceId();
        String applicationData = request.getApplicationData();
        if (LOGGER.isInfoEnabled()) {
            LOGGER.info("Branch committing: " + xid + " " + branchId + " " + resourceId + " " + applicationData);
        }
        // 执行分支事务提交
        BranchStatus status = getResourceManager().branchCommit(request.getBranchType(), xid, branchId, resourceId,
            applicationData);
        // 返回响应结果
        response.setXid(xid);
        response.setBranchId(branchId);
        response.setBranchStatus(status);
        if (LOGGER.isInfoEnabled()) {
            LOGGER.info("Branch commit result: " + status);
        }
    }
复制代码

在上面的代码中，只做了三件事情：

1.先把请求参数拆解出来；

2.调用相应的ResourceManager提交分支事务；

3.返回响应结果；

我们再深入看一下RM是如何提交分支事务的：

public class DataSourceManager extends AbstractResourceManager {
  @Override
    public BranchStatus branchCommit(BranchType branchType, String xid, long branchId, String resourceId,
                                     String applicationData) throws TransactionException {
        // 异步提交
        return asyncWorker.branchCommit(xid, branchId, resourceId);
    }
}
复制代码

为什么说上面的asyncWorker.branchCommit是异步提交呢？因为在asyncWorker中有一个定时任务，每秒中从阻塞队列中取需要提交的分支事务，并完成分支事务的提交；asyncWorker.branchCommit的逻辑只是把需要提交的分支事务放进阻塞队列中；

public class AsyncWorker {
  public AsyncWorker(DataSourceManager dataSourceManager) {
        this.dataSourceManager = dataSourceManager;
        LOGGER.info("Async Commit Buffer Limit: {}", ASYNC_COMMIT_BUFFER_LIMIT);
        commitQueue = new LinkedBlockingQueue<>(ASYNC_COMMIT_BUFFER_LIMIT);
        ThreadFactory threadFactory = new NamedThreadFactory("AsyncWorker", 2, true);
        scheduledExecutor = new ScheduledThreadPoolExecutor(2, threadFactory);
        // 创建定时任务，每秒中调用doBranchCommitSafely提交分支事务
        scheduledExecutor.scheduleAtFixedRate(this::doBranchCommitSafely, 10, 1000, TimeUnit.MILLISECONDS);
    }
  // 异步提交分支事务
    public BranchStatus branchCommit(String xid, long branchId, String resourceId) {
        Phase2Context context = new Phase2Context(xid, branchId, resourceId);
        // 把分支事务放进阻塞队列中
        addToCommitQueue(context);
        return BranchStatus.PhaseTwo_Committed;
    }
}
复制代码

所以，真正的提交逻辑在doBranchCommitSafely中：

void doBranchCommitSafely() {
        try {
            // 提交分支事务
            doBranchCommit();
        } catch (Throwable e) {
            LOGGER.error("Exception occur when doing branch commit", e);
        }
    }
    // 真正干活的代码
    private void doBranchCommit() {
        // 如果阻塞队列中没有需要提交的分支事务，那么直接返回
        if (commitQueue.isEmpty()) {
            return;
        }
        // 这里把阻塞队列中的数据放到了allContexts中，属于CopyOnWrite的思想，下面就可以专注操作allContexts
        List<Phase2Context> allContexts = new LinkedList<>();
        commitQueue.drainTo(allContexts);
        // 通过resourceId对Phase2Context进行分组
        Map<String, List<Phase2Context>> groupedContexts = groupedByResourceId(allContexts);
        // 循环调用dealWithGroupedContexts
        groupedContexts.forEach(this::dealWithGroupedContexts);
    }
private void dealWithGroupedContexts(String resourceId, List<Phase2Context> contexts) {
        if (StringUtils.isBlank(resourceId)) {
            //ConcurrentHashMap required notNull key
            LOGGER.warn("resourceId is empty and will skip.");
            return;
        }
        // 通过resourceId获取DataSourceProxy
        DataSourceProxy dataSourceProxy = dataSourceManager.get(resourceId);
        if (dataSourceProxy == null) {
            LOGGER.warn("failed to find resource for {} and requeue", resourceId);
            // 如果没拿到DataSourceProxy，重新放进阻塞队列中，等待下次提交
            addAllToCommitQueue(contexts);
            return;
        }
        Connection conn = null;
        try {
            // 获取原生Connection，这里就不需要代理Connection了
            conn = dataSourceProxy.getPlainConnection();
            UndoLogManager undoLogManager = UndoLogManagerFactory.getUndoLogManager(dataSourceProxy.getDbType());
            // 按照每次最多1000个Phase2Context来分割所有的Phase2Context
            List<List<Phase2Context>> splitByLimit = Lists.partition(contexts, UNDOLOG_DELETE_LIMIT_SIZE);
            // 删除该分支事务对应的所有的undolog
            for (List<Phase2Context> partition : splitByLimit) {
                deleteUndoLog(conn, undoLogManager, partition);
            }
        } catch (SQLException sqlExx) {
            //如果出了异常，那么重新回到阻塞队列中，等待下次提交
            addAllToCommitQueue(contexts);
            LOGGER.error("failed to get connection for async committing on {} and requeue", resourceId, sqlExx);
        } finally {
            // 释放链接
            IOUtil.close(conn);
        }
    }
// 真正的删除undolog逻辑
private void deleteUndoLog(final Connection conn, UndoLogManager undoLogManager, List<Phase2Context> contexts) {
        Set<String> xids = new LinkedHashSet<>(contexts.size());
        Set<Long> branchIds = new LinkedHashSet<>(contexts.size());
        contexts.forEach(context -> {
            xids.add(context.xid);
            branchIds.add(context.branchId);
        });
        try {
            // 构建删除undolog的sql语句，并执行
            undoLogManager.batchDeleteUndoLog(xids, branchIds, conn);
            if (!conn.getAutoCommit()) {
                // 提交删除结果
                conn.commit();
            }
        } catch (SQLException e) {
            LOGGER.error("Failed to batch delete undo log", e);
            try {
                // 出异常，就回滚，并重新加入待提交队列
                conn.rollback();
                addAllToCommitQueue(contexts);
            } catch (SQLException rollbackEx) {
                LOGGER.error("Failed to rollback JDBC resource after deleting undo log failed", rollbackEx);
            }
        }
    }
复制代码

所以，综上所属，分支事务的提交主要有以下两点：

分支事务的提交其实是定时任务异步提交的，RM中的asyncWorker每秒中会对阻塞队列中的待提交分支事务进行提交；

分支事务的提交其实就是简单地删除对应的undolog日志即可；

RM分支事务回滚

AbstractRMHandler.doBranchRollback(request, response)

/**
     * Do branch rollback.
     *
     * @param request  the request
     * @param response the response
     * @throws TransactionException the transaction exception
     */
    protected void doBranchRollback(BranchRollbackRequest request, BranchRollbackResponse response)
        throws TransactionException {
        String xid = request.getXid();
        long branchId = request.getBranchId();
        String resourceId = request.getResourceId();
        String applicationData = request.getApplicationData();
        if (LOGGER.isInfoEnabled()) {
            LOGGER.info("Branch Rollbacking: " + xid + " " + branchId + " " + resourceId);
        }
        BranchStatus status = getResourceManager().branchRollback(request.getBranchType(), xid, branchId, resourceId,
            applicationData);
        response.setXid(xid);
        response.setBranchId(branchId);
        response.setBranchStatus(status);
        if (LOGGER.isInfoEnabled()) {
            LOGGER.info("Branch Rollbacked result: " + status);
        }
    }
复制代码

同样也是分以下三步：

1.拆解回滚请求参数；

2.调用ResourceManager执行分支事务的回滚；

3.返回响应结果；

public class DataSourceManager extends AbstractResourceManager {
  // 分支事务回滚
  @Override
    public BranchStatus branchRollback(BranchType branchType, String xid, long branchId, String resourceId,
                                       String applicationData) throws TransactionException {
        // 获取DataSourceProxy
        DataSourceProxy dataSourceProxy = get(resourceId);
        if (dataSourceProxy == null) {
            throw new ShouldNeverHappenException(String.format("resource: %s not found",resourceId));
        }
        try {
          // 回滚主要逻辑在undo()里面
            UndoLogManagerFactory.getUndoLogManager(dataSourceProxy.getDbType()).undo(dataSourceProxy, xid, branchId);
        } catch (TransactionException te) {
            StackTraceLogger.info(LOGGER, te,
                "branchRollback failed. branchType:[{}], xid:[{}], branchId:[{}], resourceId:[{}], applicationData:[{}]. reason:[{}]",
                new Object[]{branchType, xid, branchId, resourceId, applicationData, te.getMessage()});
            // 回滚失败，直接返回，并告知是否可以重试
            if (te.getCode() == TransactionExceptionCode.BranchRollbackFailed_Unretriable) {
                return BranchStatus.PhaseTwo_RollbackFailed_Unretryable;
            } else {
                return BranchStatus.PhaseTwo_RollbackFailed_Retryable;
            }
        }
        // 回滚成功
        return BranchStatus.PhaseTwo_Rollbacked;
    }
}
复制代码

所以，我们应该继续深入undo()逻辑里面，看看具体是如何回滚的；

@Override
    public void undo(DataSourceProxy dataSourceProxy, String xid, long branchId) throws TransactionException {
        Connection conn = null;
        ResultSet rs = null;
        PreparedStatement selectPST = null;
        boolean originalAutoCommit = true;
        // 回滚的逻辑使用了死循环，直至回滚成功或失败
        for (; ; ) {
            try {
                // DataSourceProxy主要是为了拿到原生Connection
                conn = dataSourceProxy.getPlainConnection();
                // 改成手动提交
                if (originalAutoCommit = conn.getAutoCommit()) {
                    conn.setAutoCommit(false);
                }
                // 查询undolog
                selectPST = conn.prepareStatement(SELECT_UNDO_LOG_SQL);
                selectPST.setLong(1, branchId);
                selectPST.setString(2, xid);
                rs = selectPST.executeQuery();
                boolean exists = false;
                // 判断是否有undolog
                while (rs.next()) {
                    // 进入while循环，说明一定有undolog
                    exists = true;
                    int state = rs.getInt(ClientTableColumnsName.UNDO_LOG_LOG_STATUS);
                    // 判断当前undolog是否可以操作
                    if (!canUndo(state)) {
                        if (LOGGER.isInfoEnabled()) {
                            LOGGER.info("xid {} branch {}, ignore {} undo_log", xid, branchId, state);
                        }
                        // 不可用操作就返回
                        return;
                    }
                    String contextString = rs.getString(ClientTableColumnsName.UNDO_LOG_CONTEXT);
                    Map<String, String> context = parseContext(contextString);
                    byte[] rollbackInfo = getRollbackInfo(rs);
                    String serializer = context == null ? null : context.get(UndoLogConstants.SERIALIZER_KEY);
                    UndoLogParser parser = serializer == null ? UndoLogParserFactory.getInstance()
                        : UndoLogParserFactory.getInstance(serializer);
                    // 最终通过一系列的解析，最终得到BranchUndoLog实体对象
                    BranchUndoLog branchUndoLog = parser.decode(rollbackInfo);
                    try {
                        // put serializer name to local
                        setCurrentSerializer(parser.getName());
                        List<SQLUndoLog> sqlUndoLogs = branchUndoLog.getSqlUndoLogs();
                        if (sqlUndoLogs.size() > 1) {
                            Collections.reverse(sqlUndoLogs);
                        }
                        // 循环执行回滚操作
                        for (SQLUndoLog sqlUndoLog : sqlUndoLogs) {
                            TableMeta tableMeta = TableMetaCacheFactory.getTableMetaCache(dataSourceProxy.getDbType()).getTableMeta(
                                conn, sqlUndoLog.getTableName(), dataSourceProxy.getResourceId());
                            sqlUndoLog.setTableMeta(tableMeta);
                            AbstractUndoExecutor undoExecutor = UndoExecutorFactory.getUndoExecutor(
                                dataSourceProxy.getDbType(), sqlUndoLog);
                            // 回滚逻辑主要在这里面
                            undoExecutor.executeOn(conn);
                        }
                    } finally {
                        // remove serializer name
                        removeCurrentSerializer();
                    }
                }
                // 回滚完成后，删除undolog，并提交事务
                if (exists) {
                    deleteUndoLog(xid, branchId, conn);
                    conn.commit();
                    if (LOGGER.isInfoEnabled()) {
                        LOGGER.info("xid {} branch {}, undo_log deleted with {}", xid, branchId,
                            State.GlobalFinished.name());
                    }
                } else {
                    // 如果没有找到undolog日志，说明可能在分支事务注册的时候，undolog还没来得及添加，因为超时或者其他分支事务异常触发了TM发起全局事务回滚，所以这里立马插入一条记录，以防undolog被RM插入
                    insertUndoLogWithGlobalFinished(xid, branchId, UndoLogParserFactory.getInstance(), conn);
                    conn.commit();
                    if (LOGGER.isInfoEnabled()) {
                        LOGGER.info("xid {} branch {}, undo_log added with {}", xid, branchId,
                            State.GlobalFinished.name());
                    }
                }
                return;
            } catch (SQLIntegrityConstraintViolationException e) {
                // 产生SQLIntegrityConstraintViolationException异常后，将重新循环执行回滚
                if (LOGGER.isInfoEnabled()) {
                    LOGGER.info("xid {} branch {}, undo_log inserted, retry rollback", xid, branchId);
                }
            } catch (Throwable e) {
                // 如果产生其他异常，那么将回滚前面的操作，并抛出异常跳出循环
                if (conn != null) {
                    try {
                        conn.rollback();
                    } catch (SQLException rollbackEx) {
                        LOGGER.warn("Failed to close JDBC resource while undo ... ", rollbackEx);
                    }
                }
                throw new BranchTransactionException(BranchRollbackFailed_Retriable, String
                    .format("Branch session rollback failed and try again later xid = %s branchId = %s %s", xid,
                        branchId, e.getMessage()), e);
            } finally {
                // 恢复现场
                try {
                    if (rs != null) {
                        rs.close();
                    }
                    if (selectPST != null) {
                        selectPST.close();
                    }
                    if (conn != null) {
                        if (originalAutoCommit) {
                            conn.setAutoCommit(true);
                        }
                        conn.close();
                    }
                } catch (SQLException closeEx) {
                    LOGGER.warn("Failed to close JDBC resource while undo ... ", closeEx);
                }
            }
        }
    }
复制代码

在分支事务的回滚逻辑中，我们还需要额外考虑分支事务注册后，还没来得及插入undolog的情况；

如果找到undolog，那么执行回滚操作；

如果没有undolog，那么先插入一条记录标记，以防undolog被插入；如果标记插入失败，说明undolog已经被其他线程插入成功了，那么重新循环执行回滚操作；

public void executeOn(Connection conn) throws SQLException {
        // 如果开启校验undo日志，并且数据校验失败，不能执行回滚
        // 主要目的是为了确认当前数据和后镜像是否一致，否则就是数据被污染了，不能回滚
        if (IS_UNDO_DATA_VALIDATION_ENABLE && !dataValidationAndGoOn(conn)) {
            return;
        }
        // 下面才是真正的回滚逻辑
        PreparedStatement undoPST = null;
        try {
            // 构建undoSQL，这里用到了策略模式
            String undoSQL = buildUndoSQL();
            // 获取对应的prepareStatement
            undoPST = conn.prepareStatement(undoSQL);
            // 构建反向sql
            TableRecords undoRows = getUndoRows();
            for (Row undoRow : undoRows.getRows()) {
                ArrayList<Field> undoValues = new ArrayList<>();
                List<Field> pkValueList = getOrderedPkList(undoRows, undoRow, getDbType(conn));
                for (Field field : undoRow.getFields()) {
                    if (field.getKeyType() != KeyType.PRIMARY_KEY) {
                        undoValues.add(field);
                    }
                }
                // 设置参数
                undoPrepare(undoPST, undoValues, pkValueList);
                // 执行回滚
                undoPST.executeUpdate();
            }
        } catch (Exception ex) {
            if (ex instanceof SQLException) {
                throw (SQLException) ex;
            } else {
                throw new SQLException(ex);
            }
        }
        finally {
            //important for oracle
            IOUtil.close(undoPST);
        }
    }
复制代码

分支事务的回滚其实就是根据后前后镜像构建反向sql的原理实现的，在这个回滚的逻辑中，需要注意以下几点：

分支事务的回滚不是异步的；

在回滚的过程中，默认是需要校验数据是否被污染；在有完全把握的情况下，开发人员也可以自己配置成回滚不校验以提升效率

小结

根据上述源码分析，我们可以简单归纳为以下几点：

1.分支事务的提交是异步完成的；

2.分支事务的提交只需要删除undolog日志；

3.分支事务的回滚与提交不同，它不是异步的；

4.回滚需要考虑是否存在undolog，以及如何防止undolog被其他线程插入，如果被其他线程插入该如何处理；这里需要结合分支事务的注册逻辑来看；seata使用的方式是在回滚的逻辑中，如果没有发现undolog，为了避免被其他线程插入，自己先插入一条标记占位；如果标记插入失败，说明undolog刚刚被其他线程插入成功了，那么重新循环执行回滚逻辑；

5.在回滚过程中，有一个校验数据是否被污染的逻辑，这里可以让开发人员配置是否需要校验，默认情况下是需要校验afterImage是否和当前数据一致，否则标志着数据已经被污染了，就不能执行回滚了；

6.回滚原理其实就是根据前后镜像生成反向sql，用来还原之前的数据；

RM在seata AT模式中如何实现分支事务提交或回滚

前言

TC与RM间的通信

RM分支事务提交

RM分支事务回滚

小结

热门文章

最新文章

相关电子书

相关实验场景