持久化FileTxnSnapLog-阿里云开发者社区

一、前言
　　前面分析了FileSnap，接着继续分析FileTxnSnapLog源码，其封装了TxnLog和SnapShot，其在持久化过程中是一个帮助类。
二、FileTxnSnapLog源码分析
2.1 类的属性

public class FileTxnSnapLog {
//the direcotry containing the
//the transaction logs
// 日志文件目录
private final File dataDir;
//the directory containing the
//the snapshot directory
// 快照文件目录
private final File snapDir;
// 事务日志
private TxnLog txnLog;
// 快照
private SnapShot snapLog;
// 版本号
public final static int VERSION = 2;
// 版本
public final static String version = "version-";

// Logger
private static final Logger LOG = LoggerFactory.getLogger(FileTxnSnapLog.class);

}
说明：类的属性中包含了TxnLog和SnapShot接口，即对FileTxnSnapLog的很多操作都会转发给TxnLog和SnapLog进行操作，这是一种典型的组合方法。
2.2 内部类
　　FileTxnSnapLog包含了PlayBackListener内部类，用来接收事务应用过程中的回调，在Zookeeper数据恢复后期，会有事务修正过程，此过程会回调PlayBackListener来进行对应的数据修正。其源码如下　
public interface PlayBackListener {
void onTxnLoaded(TxnHeader hdr, Record rec);
}
　说明：在完成事务操作后，会调用到onTxnLoaded方法进行相应的处理。
2.3 构造函数
　　FileTxnSnapLog的构造函数如下
public FileTxnSnapLog(File dataDir, File snapDir) throws IOException {
LOG.debug("Opening datadir:{} snapDir:{}", dataDir, snapDir);
// 在datadir和snapdir下生成version-2目录
this.dataDir = new File(dataDir, version + VERSION);
this.snapDir = new File(snapDir, version + VERSION);
if (!this.dataDir.exists()) { // datadir存在但无法创建目录，则抛出异常
if (!this.dataDir.mkdirs()) {
throw new IOException("Unable to create data directory "

                              + this.dataDir);
    }
}
if (!this.snapDir.exists()) { // snapdir存在但无法创建目录，则抛出异常
    if (!this.snapDir.mkdirs()) {
        throw new IOException("Unable to create snap directory "
                              + this.snapDir);
    }
}
// 给属性赋值
txnLog = new FileTxnLog(this.dataDir);
snapLog = new FileSnap(this.snapDir);

}
说明：对于构造函数而言，其会在传入的datadir和snapdir目录下新生成version-2的目录，并且会判断目录是否创建成功，之后会创建txnLog和snapLog。
2.4 核心函数分析

restore函数　
public long restore(DataTree dt, Map sessions, PlayBackListener listener) throws IOException {
// 根据snap文件反序列化dt和sessions
snapLog.deserialize(dt, sessions);
//
FileTxnLog txnLog = new FileTxnLog(dataDir);
// 获取比最后处理的zxid+1大的log文件的迭代器
TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1);
// 最大的zxid
long highestZxid = dt.lastProcessedZxid;

TxnHeader hdr;
try {

 while (true) {
     // itr在read函数调用后就已经指向第一个合法的事务
     // 获取事务头
     hdr = itr.getHeader();
     if (hdr == null) { // 事务头为空
         // 表示日志文件为空
         return dt.lastProcessedZxid;
     }
     // 事务头的zxid小于snapshot中的最大zxid并且其不为0，则会报错，因为上面获取的是+1
     if (hdr.getZxid() < highestZxid && highestZxid != 0) { 
         LOG.error("{}(higestZxid) > {}(next log) for type {}",
                   new Object[] { highestZxid, hdr.getZxid(),
                                 hdr.getType() });
     } else { 
         // 重新赋值highestZxid
         highestZxid = hdr.getZxid();
     }
     try {
         // 在datatree上处理事务
         processTransaction(hdr,dt,sessions, itr.getTxn());
     } catch(KeeperException.NoNodeException e) {
         throw new IOException("Failed to process transaction type: " +
                               hdr.getType() + " error: " + e.getMessage(), e);
     }
     // 每处理完一个事务都会进行回调
     listener.onTxnLoaded(hdr, itr.getTxn());
     if (!itr.next()) // 已无事务，跳出循环
         break;
 }

} finally {

 if (itr != null) { // 迭代器不为空，则关闭
     itr.close();
 }

}
// 返回最高的zxid
return highestZxid;
}
说明：restore用于恢复datatree和sessions，其步骤大致如下
　　① 根据snapshot文件反序列化datatree和sessions，进入②
　　② 获取比snapshot文件中的zxid+1大的log文件的迭代器，以对log文件中的事务进行迭代，进入③
　　③ 迭代log文件的每个事务，并且将该事务应用在datatree中，同时会调用onTxnLoaded函数进行后续处理，进入④
　　④ 关闭迭代器，返回log文件中最后一个事务的zxid（作为最高的zxid）
其中会调用到FileTxnLog的read函数，read函数在FileTxnLog中已经进行过分析，会调用processTransaction函数，其源码如下　
public void processTransaction(TxnHeader hdr,DataTree dt,

                        Map<Long, Integer> sessions, Record txn)

throws KeeperException.NoNodeException {
// 事务处理结果
ProcessTxnResult rc;
switch (hdr.getType()) { // 确定事务类型

 case OpCode.createSession: // 创建会话
     // 添加进会话
     sessions.put(hdr.getClientId(),
                  ((CreateSessionTxn) txn).getTimeOut());
     if (LOG.isTraceEnabled()) {
         ZooTrace.logTraceMessage(LOG,ZooTrace.SESSION_TRACE_MASK,
                                  "playLog --- create session in log: 0x"
                                  + Long.toHexString(hdr.getClientId())
                                  + " with timeout: "
                                  + ((CreateSessionTxn) txn).getTimeOut());
     }
     // give dataTree a chance to sync its lastProcessedZxid
     // 处理事务
     rc = dt.processTxn(hdr, txn);
     break;
 case OpCode.closeSession: // 关闭会话
     // 会话中移除
     sessions.remove(hdr.getClientId());
     if (LOG.isTraceEnabled()) {
         ZooTrace.logTraceMessage(LOG,ZooTrace.SESSION_TRACE_MASK,
                                  "playLog --- close session in log: 0x"
                                  + Long.toHexString(hdr.getClientId()));
     }
     // 处理事务
     rc = dt.processTxn(hdr, txn);
     break;
 default:
     // 处理事务
     rc = dt.processTxn(hdr, txn);

}

/**

  * Snapshots are lazily created. So when a snapshot is in progress,
  * there is a chance for later transactions to make into the
  * snapshot. Then when the snapshot is restored, NONODE/NODEEXISTS
  * errors could occur. It should be safe to ignore these.
  */

if (rc.err != Code.OK.intValue()) { // 忽略处理结果中可能出现的错误

 LOG.debug("Ignoring processTxn failure hdr:" + hdr.getType()
           + ", error: " + rc.err + ", path: " + rc.path);

}
}
说明：processTransaction会根据事务头中记录的事务类型（createSession、closeSession、其他类型）来进行相应的操作，对于createSession类型而言，其会将会话和超时时间添加至会话map中，对于closeSession而言，会话map会根据客户端的id号删除其会话（因为存储时候key=clientId），同时，所有的操作都会调用到dt.processTxn函数，其源码如下　
public ProcessTxnResult processTxn(TxnHeader header, Record txn)
{
// 事务处理结果
ProcessTxnResult rc = new ProcessTxnResult();
try {

 // 从事务头中解析出相应属性并保存至rc中
 rc.clientId = header.getClientId();
 rc.cxid = header.getCxid();
 rc.zxid = header.getZxid();
 rc.type = header.getType();
 rc.err = 0;
 rc.multiResult = null;
 switch (header.getType()) { // 确定事务类型
     case OpCode.create: // 创建结点
         // 显示转化
         CreateTxn createTxn = (CreateTxn) txn;
         // 获取创建结点路径
         rc.path = createTxn.getPath();
         // 创建结点
         createNode(
             createTxn.getPath(),
             createTxn.getData(),
             createTxn.getAcl(),
             createTxn.getEphemeral() ? header.getClientId() : 0,
             createTxn.getParentCVersion(),
             header.getZxid(), header.getTime());
         break;
     case OpCode.delete: // 删除结点
         // 显示转化
         DeleteTxn deleteTxn = (DeleteTxn) txn;
         // 获取删除结点路径
         rc.path = deleteTxn.getPath();
         // 删除结点
         deleteNode(deleteTxn.getPath(), header.getZxid());
         break;
     case OpCode.setData: // 写入数据
         // 显示转化
         SetDataTxn setDataTxn = (SetDataTxn) txn;
         // 获取写入数据结点路径
         rc.path = setDataTxn.getPath();
         // 写入数据
         rc.stat = setData(setDataTxn.getPath(), setDataTxn
                           .getData(), setDataTxn.getVersion(), header
                           .getZxid(), header.getTime());
         break;
     case OpCode.setACL: // 设置ACL
         // 显示转化
         SetACLTxn setACLTxn = (SetACLTxn) txn;
         // 获取路径
         rc.path = setACLTxn.getPath();
         // 设置ACL
         rc.stat = setACL(setACLTxn.getPath(), setACLTxn.getAcl(),
                          setACLTxn.getVersion());
         break;
     case OpCode.closeSession: // 关闭会话
         // 关闭会话
         killSession(header.getClientId(), header.getZxid());
         break;
     case OpCode.error: // 错误
         // 显示转化
         ErrorTxn errTxn = (ErrorTxn) txn;
         // 记录错误
         rc.err = errTxn.getErr();
         break;
     case OpCode.check: // 检查
         // 显示转化
         CheckVersionTxn checkTxn = (CheckVersionTxn) txn;
         // 获取路径
         rc.path = checkTxn.getPath();
         break;
     case OpCode.multi: // 多个事务
         // 显示转化
         MultiTxn multiTxn = (MultiTxn) txn ;
         // 获取事务列表
         List<Txn> txns = multiTxn.getTxns();
         rc.multiResult = new ArrayList<ProcessTxnResult>();
         boolean failed = false;
         for (Txn subtxn : txns) { // 遍历事务列表
             if (subtxn.getType() == OpCode.error) {
                 failed = true;
                 break;
             }
         }
         boolean post_failed = false;
         for (Txn subtxn : txns) { // 遍历事务列表，确定每个事务类型并进行相应操作
             // 处理事务的数据
             ByteBuffer bb = ByteBuffer.wrap(subtxn.getData());
             Record record = null;
             switch (subtxn.getType()) {
                 case OpCode.create:
                     record = new CreateTxn();
                     break;
                 case OpCode.delete:
                     record = new DeleteTxn();
                     break;
                 case OpCode.setData:
                     record = new SetDataTxn();
                     break;
                 case OpCode.error:
                     record = new ErrorTxn();
                     post_failed = true;
                     break;
                 case OpCode.check:
                     record = new CheckVersionTxn();
                     break;
                 default:
                     throw new IOException("Invalid type of op: " + subtxn.getType());
             }
             assert(record != null);
             // 将bytebuffer转化为record(初始化record的相关属性)
             ByteBufferInputStream.byteBuffer2Record(bb, record);

             if (failed && subtxn.getType() != OpCode.error){ // 失败并且不为error类型
                 int ec = post_failed ? Code.RUNTIMEINCONSISTENCY.intValue() 
                     : Code.OK.intValue();

                 subtxn.setType(OpCode.error);
                 record = new ErrorTxn(ec);
             }

             if (failed) { // 失败
                 assert(subtxn.getType() == OpCode.error) ;
             }

             // 生成事务头
             TxnHeader subHdr = new TxnHeader(header.getClientId(), header.getCxid(),
                                              header.getZxid(), header.getTime(), 
                                              subtxn.getType());
             // 递归调用处理事务
             ProcessTxnResult subRc = processTxn(subHdr, record);
             // 保存处理结果
             rc.multiResult.add(subRc);
             if (subRc.err != 0 && rc.err == 0) {
                 rc.err = subRc.err ;
             }
         }
         break;
 }

} catch (KeeperException e) {

 if (LOG.isDebugEnabled()) {
     LOG.debug("Failed: " + header + ":" + txn, e);
 }
 rc.err = e.code().intValue();

} catch (IOException e) {

 if (LOG.isDebugEnabled()) {
     LOG.debug("Failed: " + header + ":" + txn, e);
 }

}
/*

  * A snapshot might be in progress while we are modifying the data
  * tree. If we set lastProcessedZxid prior to making corresponding
  * change to the tree, then the zxid associated with the snapshot
  * file will be ahead of its contents. Thus, while restoring from
  * the snapshot, the restore method will not apply the transaction
  * for zxid associated with the snapshot file, since the restore
  * method assumes that transaction to be present in the snapshot.
  *
  * To avoid this, we first apply the transaction and then modify
  * lastProcessedZxid.  During restore, we correctly handle the
  * case where the snapshot contains data ahead of the zxid associated
  * with the file.
  */

// 事务处理结果中保存的zxid大于已经被处理的最大的zxid，则重新赋值
if (rc.zxid > lastProcessedZxid) {

 lastProcessedZxid = rc.zxid;

}

  * Snapshots are taken lazily. It can happen that the child
  * znodes of a parent are created after the parent
  * is serialized. Therefore, while replaying logs during restore, a
  * create might fail because the node was already
  * created.
  *
  * After seeing this failure, we should increment
  * the cversion of the parent znode since the parent was serialized
  * before its children.
  *
  * Note, such failures on DT should be seen only during
  * restore.
  */

if (header.getType() == OpCode.create &&

 rc.err == Code.NODEEXISTS.intValue()) { // 处理在恢复数据过程中的结点创建操作
 LOG.debug("Adjusting parent cversion for Txn: " + header.getType() +
           " path:" + rc.path + " err: " + rc.err);
 int lastSlash = rc.path.lastIndexOf('/');
 String parentName = rc.path.substring(0, lastSlash);
 CreateTxn cTxn = (CreateTxn)txn;
 try {
     setCversionPzxid(parentName, cTxn.getParentCVersion(),
                      header.getZxid());
 } catch (KeeperException.NoNodeException e) {
     LOG.error("Failed to set parent cversion for: " +
               parentName, e);
     rc.err = e.code().intValue();
 }

} else if (rc.err != Code.OK.intValue()) {

 LOG.debug("Ignoring processTxn failure hdr: " + header.getType() +
           " : error: " + rc.err);

}
return rc;
}
说明：processTxn用于处理事务，即将事务操作应用到DataTree【参见下面源码】内存数据库中，以恢复成最新的数据。其操作所有数据节点都是借助于DataNode，数据会类似于树状存储，
以删除为例：
public void deleteNode(String path, long zxid) throws KeeperException.NoNodeException {
int lastSlash = path.lastIndexOf('/');
String parentName = path.substring(0, lastSlash);
String childName = path.substring(lastSlash + 1);
DataNode node = nodes.get(path);
if (node == null) {

 throw new KeeperException.NoNodeException();

}
nodes.remove(path);
DataNode parent = nodes.get(parentName);
if (parent == null) {

 throw new KeeperException.NoNodeException();

}
synchronized (parent) {

 parent.removeChild(childName);
 parent.stat.setPzxid(zxid);
 long eowner = node.stat.getEphemeralOwner();
 if (eowner != 0) {
     HashSet<String> nodes = ephemerals.get(eowner);
     if (nodes != null) {
         synchronized (nodes) {
             nodes.remove(path);
         }
     }
 }
 node.parent = null;

}
......
}

save函数　　
public void save(DataTree dataTree, ConcurrentHashMap sessionsWithTimeouts)
throws IOException {
// 获取最后处理的zxid
long lastZxid = dataTree.lastProcessedZxid;
// 生成snapshot文件
File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid));
LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid), snapshotFile);
// 序列化datatree、sessionsWithTimeouts至snapshot文件
snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile);

}
说明：save函数用于将sessions和datatree保存至snapshot文件中，其大致步骤如下
　　① 获取内存数据库中已经处理的最新的zxid，进入②
　　② 根据zxid和快照目录生成snapshot文件，进入③
　　③ 将datatree（内存数据库）、sessionsWithTimeouts序列化至快照文件。
　　其他的函数或多或少都是调用TxnLog和SnapLog中的相应函数，之前已经进行过分析，这里不再累赘。
三、总结
本篇博文分析了FileTxnSnapLog的源码，其主要：
● 封装了TxnLog和SnapLog来进行相应的处理
● 提供了从snapshot文件和log文件中恢复内存数据库的接口
● 其对于事务的维护，或说事务的存储结构是一个个DataTree，代码层面看到的是DataNode，其增删都是追加/删除父子节点，类似于树状操作

持久化FileTxnSnapLog

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

持久化FileTxnSnapLog

热门文章

最新文章

相关电子书