【Java技术指南】「序列化系列」深入挖掘FST快速序列化压缩内存的利器的特性和原理 -阿里云开发者社区

FST的概念和定义

FST序列化全称是Fast Serialization Tool，它是对Java序列化的替换实现。既然前文中提到Java序列化的两点严重不足，在FST中得到了较大的改善，FST的特征如下：

JDK提供的序列化提升了10倍，体积也减少3-4倍多
支持堆外Maps，和堆外Maps的持久化
支持序列化为JSON

FST序列化的使用

FST的使用有两种方式，一种是快捷方式，另一种需要使用ObjectOutput和ObjectInput。

直接使用FSTConfiguration提供的序列化和反序列化接口

public static void serialSample() {
    FSTConfiguration conf = FSTConfiguration.createAndroidDefaultConfiguration();
    User object = new User();
    object.setName("huaijin");
    object.setAge(30);
    System.out.println("serialization, " + object);
    byte[] bytes = conf.asByteArray(object);
    User newObject = (User) conf.asObject(bytes);
    System.out.println("deSerialization, " + newObject);
}
复制代码

FSTConfiguration也提供了注册对象的Class接口，如果不注册，默认会将对象的Class Name写入。这个提供了易用高效的API方式，不使用ByteArrayOutputStreams而直接得到byte[]。

使用ObjectOutput和ObjectInput，能更细腻控制序列化的写入写出：

static FSTConfiguration conf = FSTConfiguration.createAndroidDefaultConfiguration();
static void writeObject(OutputStream outputStream, User user) throws IOException {
    FSTObjectOutput out = conf.getObjectOutput(outputStream);
    out.writeObject(user);
    out.close();
}
static FstObject readObject(InputStream inputStream) throws Exception {
    FSTObjectInput input = conf.getObjectInput(inputStream);
    User fstObject = (User) input.readObject(User.class);
    input.close();
    return fstObject;
}
复制代码

FST在Dubbo中的应用

Dubbo中对FstObjectInput和FstObjectOutput重新包装解决了序列化和反序列化空指针的问题。
并且构造了FstFactory工厂类，使用工厂模式生成FstObjectInput和FstObjectOutput。其中同时使用单例模式，控制整个应用中FstConfiguration是单例，并且在初始化时将需要序列化的对象全部注册到FstConfiguration。
对外提供了同一的序列化接口FstSerialization，提供serialize和deserialize能力。

FST序列化/反序列化

FST序列化存储格式

基本上所有以Byte形式存储的序列化对象都是类似的存储结构，不管class文件、so文件、dex文件都是类似，这方面没有什么创新的格式，最多是在字段内容上做了一些压缩优化，包括我们最常使用的utf-8编码都是这个做法。

FST的序列化存储和一般的字节格式化存储方案也没有标新立异的地方，比如下面这个FTS的序列化字节文件

00000001:  0001 0f63 6f6d 2e66 7374 2e46 5354 4265
00000010:  616e f701 fc05 7630 7374 7200 
复制代码

格式：

Header|类名长度|类名String|字段1类型(1Byte) | [长度] | 内容|字段2类型(1Byte) | [长度] | 内容|…
复制代码

0000：字节数组类型：00标识OBJECT
0001：类名编码，00标识UTF编码，01表示ASCII编码
0002：Length of class name (1Byte) = 15
0003~0011：Class name string (15Byte)
0012：Integer类型标识 0xf7
0013：Integer的值=1
0014：String类型标识 0xfc
0015：String的长度=5
0016~001a：String的值"v0str"
001b~001c：END

从上面可以看到Integer类型序列化后只占用了一个字节（值等于1），并不像在内存中占用4Byte，所以可以看出是根据一定规则做了压缩，具体代码看FSTObjectInput#instantiateSpecialTag中对不同类型的读取，FSTObjectInput也定义不同类型对应的枚举值：

public class FSTObjectOutput implements ObjectOutput {
    private static final FSTLogger LOGGER = FSTLogger.getLogger(FSTObjectOutput.class);
    public static Object NULL_PLACEHOLDER = new Object() { 
    public String toString() { return "NULL_PLACEHOLDER"; }};
    public static final byte SPECIAL_COMPATIBILITY_OBJECT_TAG = -19; // see issue 52
    public static final byte ONE_OF = -18;
    public static final byte BIG_BOOLEAN_FALSE = -17;
    public static final byte BIG_BOOLEAN_TRUE = -16;
    public static final byte BIG_LONG = -10;
    public static final byte BIG_INT = -9;
    public static final byte DIRECT_ARRAY_OBJECT = -8;
    public static final byte HANDLE = -7;
    public static final byte ENUM = -6;
    public static final byte ARRAY = -5;
    public static final byte STRING = -4;
    public static final byte TYPED = -3; // var class == object written class
    public static final byte DIRECT_OBJECT = -2;
    public static final byte NULL = -1;
    public static final byte OBJECT = 0;
    protected FSTEncoder codec;
    ...
}
复制代码

FST序列化和反序列化原理

对Object进行Byte序列化，相当于做了持久化的存储，在反序列的时候，如果Bean的定义发生了改变，那么反序列化器就要做兼容的解决方案，我们知道对于JDK的序列化和反序列，serialVersionUID对版本控制起了很重要的作用。FST对这个问题的解决方案是通过@Version注解进行排序。

在进行反序列操作的时候，FST会先反射或者对象Class的所有成员，并对这些成员进行了排序，这个排序对兼容起了关键作用，也就是@Version的原理。在FSTClazzInfo中定义了一个

defFieldComparator比较器，用于对Bean的所有Field进行排序：

public final class FSTClazzInfo {
    public static final Comparator<FSTFieldInfo> defFieldComparator = new Comparator<FSTFieldInfo>() {
        @Override
        public int compare(FSTFieldInfo o1, FSTFieldInfo o2) {
            int res = 0;
            if ( o1.getVersion() != o2.getVersion() ) {
                return o1.getVersion() < o2.getVersion() ? -1 : 1;
            }
            // order: version, boolean, primitives, conditionals, object references
            if (o1.getType() == boolean.class && o2.getType() != boolean.class) {
                return -1;
            }
            if (o1.getType() != boolean.class && o2.getType() == boolean.class) {
                return 1;
            }
            if (o1.isConditional() && !o2.isConditional()) {
                res = 1;
            } else if (!o1.isConditional() && o2.isConditional()) {
                res = -1;
            } else if (o1.isPrimitive() && !o2.isPrimitive()) {
                res = -1;
            } else if (!o1.isPrimitive() && o2.isPrimitive())
                res = 1;
//                if (res == 0) // 64 bit / 32 bit issues
//                    res = (int) (o1.getMemOffset() - o2.getMemOffset());
            if (res == 0)
                res = o1.getType().getSimpleName().compareTo(o2.getType().getSimpleName());
            if (res == 0)
                res = o1.getName().compareTo(o2.getName());
            if (res == 0) {
                return o1.getField().getDeclaringClass().getName().compareTo(o2.getField().getDeclaringClass().getName());
            }
            return res;
        }
    };
    ...
}
复制代码

从代码实现上可以看到，比较的优先级是Field的Version大小，然后是Field类型，所以总的来说Version越大排序越靠后，至于为什么要排序，看下FSTObjectInput#instantiateAndReadNoSer方法

public class FSTObjectInput implements ObjectInput {
  protected Object instantiateAndReadNoSer(Class c, FSTClazzInfo clzSerInfo, FSTClazzInfo.FSTFieldInfo referencee, int readPos) throws Exception {
        Object newObj;
        newObj = clzSerInfo.newInstance(getCodec().isMapBased());
        ...
        } else {
            FSTClazzInfo.FSTFieldInfo[] fieldInfo = clzSerInfo.getFieldInfo();
            readObjectFields(referencee, clzSerInfo, fieldInfo, newObj,0,0);
        }
        return newObj;
    }
    protected void readObjectFields(FSTClazzInfo.FSTFieldInfo referencee, FSTClazzInfo serializationInfo, FSTClazzInfo.FSTFieldInfo[] fieldInfo, Object newObj, int startIndex, int version) throws Exception {
        if ( getCodec().isMapBased() ) {
            readFieldsMapBased(referencee, serializationInfo, newObj);
            if ( version >= 0 && newObj instanceof Unknown == false)
                getCodec().readObjectEnd();
            return;
        }
        if ( version < 0 )
            version = 0;
        int booleanMask = 0;
        int boolcount = 8;
        final int length = fieldInfo.length;
        int conditional = 0;
        for (int i = startIndex; i < length; i++) { // 注意这里的循环
            try {
                FSTClazzInfo.FSTFieldInfo subInfo = fieldInfo[i];
                if (subInfo.getVersion() > version ) {   // 需要进入下一个版本的迭代
                    int nextVersion = getCodec().readVersionTag();  // 对象流的下一个版本
                    if ( nextVersion == 0 ) // old object read
                    {
                        oldVersionRead(newObj);
                        return;
                    }
                    if ( nextVersion != subInfo.getVersion() ) {  // 同一个Field的版本不允许变，并且版本变更和流的版本保持同步
                        throw new RuntimeException("read version tag "+nextVersion+" fieldInfo has "+subInfo.getVersion());
                    }
          readObjectFields(referencee,serializationInfo,fieldInfo,newObj,i,nextVersion);  // 开始下一个Version的递归
                    return;
                }
                if (subInfo.isPrimitive()) {
                  ...
                } else {
                    if ( subInfo.isConditional() ) {
                      ...
                    }
                    // object 把读出来的值保存到FSTFieldInfo中
                    Object subObject = readObjectWithHeader(subInfo);
                    subInfo.setObjectValue(newObj, subObject);
        }
        ...
复制代码

从这段代码的逻辑基本就可以知道FST的序列化和反序列化兼容的原理了，注意里面的循环，正是按照排序后的Filed进行循环，而每个FSTFieldInfo都记录自己在对象流中的位置、类型等详细信息：

序列化：

按照Version对Bean的所有Field进行排序（不包括static和transient修饰的member），没有@Version注解的Field默认version=0；如果version相同，按照version, boolean, primitives, conditionals, object references排序
按照排序的Field把Bean的Field逐个写到输出流
@Version的版本只能加不能减小，如果相等的话，有可能因为默认的排序规则，导致流中的Filed顺序和内存中的FSTFieldInfo[]数组的顺序不一致，而注入错误

反序列化：

反序列化按照对象流的格式进行解析，对象流中保存的Field顺序和内存中的FSTFieldInfo顺序保持一致
相同版本的Field在对象流中存在，在内存Bean中缺失：可能抛异常（会有后向兼容问题）
对象流中包含内存Bean中没有的高版本Field：正常（老版本兼容新）
相同版本的Field在对象流中缺失，在内存Bean中存在：抛出异常
相同的Field在对象流和内存Bean中的版本不一致：抛出异常
内存Bean增加了不高于最大版本的Field：抛出异常

所以从上面的代码逻辑就可以分析出这个使用规则：@Version的使用原则就是，每新增一个Field，就对应的加上@Version注解，并且把version的值设置为当前版本的最大值加一，不允许删除Field

另外再看一下@Version注解的注释：明确说明了用于后向兼容

package org.nustaq.serialization.annotations;
import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.FIELD})
/**
 * support for adding fields without breaking compatibility to old streams.
 * For each release of your app increment the version value. No Version annotation means version=0.
 * Note that each added field needs to be annotated.
 *
 * e.g.
 *
 * class MyClass implements Serializable {
 *
 *     // fields on initial release 1.0
 *     int x;
 *     String y;
 *
 *     // fields added with release 1.5
 *     @Version(1) String added;
 *     @Version(1) String alsoAdded;
 *
 *     // fields added with release 2.0
 *     @Version(2) String addedv2;
 *     @Version(2) String alsoAddedv2;
 *
 * }
 *
 * If an old class is read, new fields will be set to default values. You can register a VersionConflictListener
 * at FSTObjectInput in order to fill in defaults for new fields.
 *
 * Notes/Limits:
 * - Removing fields will break backward compatibility. You can only Add new fields.
 * - Can slow down serialization over time (if many versions)
 * - does not work for Externalizable or Classes which make use of JDK-special features such as readObject/writeObject
 *   (AKA does not work if fst has to fall back to 'compatible mode' for an object).
 * - in case you use custom serializers, your custom serializer has to handle versioning
 *
 */
public @interface Version {
    byte value();
}
复制代码

public class FSTBean implements Serializable {
    /** serialVersionUID */
    private static final long serialVersionUID = -2708653783151699375L;
    private Integer v0int
    private String v0str;
}
复制代码

准备序列化和反序列化方法

public class FSTSerial {
    private static void serialize(FstSerializer fst, String fileName) {
        try {
            FSTBean fstBean = new FSTBean();
            fstBean.setV0int(1);
            fstBean.setV0str("v0str");
            byte[] v1 = fst.serialize(fstBean);
            FileOutputStream fos = new FileOutputStream(new File("byte.bin"));
            fos.write(v1, 0, v1.length);
            fos.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    private static void deserilize(FstSerializer fst, String fileName) {
        try {
            FileInputStream fis = new FileInputStream(new File("byte.bin"));
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            byte[] buf = new byte[256];
            int length = 0;
            while ((length = fis.read(buf)) > 0) {
                baos.write(buf, 0, length);
            }
            fis.close();
            buf = baos.toByteArray();
            FSTBean deserial = fst.deserialize(buf, FSTBean.class);
            System.out.println(deserial);
            System.out.println(deserial);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    public static void main(String[] args) {
        FstSerializer fst = new FstSerializer();
        serialize(fst, "byte.bin");
        deserilize(fst, "byte.bin");
    }
}

【Java技术指南】「序列化系列」深入挖掘FST快速序列化压缩内存的利器的特性和原理

FST的概念和定义

FST序列化的使用

直接使用FSTConfiguration提供的序列化和反序列化接口

使用ObjectOutput和ObjectInput，能更细腻控制序列化的写入写出：

FST在Dubbo中的应用

FST序列化/反序列化

FST序列化存储格式

格式：

FST序列化和反序列化原理

序列化：

反序列化：

准备序列化和反序列化方法

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【Java技术指南】「序列化系列」深入挖掘FST快速序列化压缩内存的利器的特性和原理

FST的概念和定义

FST序列化的使用

直接使用FSTConfiguration提供的序列化和反序列化接口

使用ObjectOutput和ObjectInput，能更细腻控制序列化的写入写出：

FST在Dubbo中的应用

FST序列化/反序列化

FST序列化存储格式

格式：

FST序列化和反序列化原理

序列化：

反序列化：

准备序列化和反序列化方法

热门文章

最新文章

相关课程

相关电子书

相关实验场景