写在前面
前阵子,我们写过一篇关于fury和protostuff的性能对比的文章:性能飙升20倍!!! 超高性能协议框架fury完爆protostuff,那么,fury 是否能真的坐稳协议框架的头把交椅呢?正逢fastjson2推出了新2.0.37版本,据说也很早就支持了二进制格式字节JSONB序列化和反序列化,正好打算一较高下!
简单介绍
序列化框架是系统通信的基础组件,在游戏,大数据、AI 框架和云原生等分布式系统中广泛使用。当对象需要跨进程、跨语言、跨节点传输、持久化、状态读写、复制时,都需要进行序列化,其性能和易用性影响运行效率和开发效率。
fastjson2 是 FASTJSON 项目的重要升级,目标是为下一个十年提供一个高性能的JSON库, fastjson2 性能相比原先旧的 fastjson 有了很大提升,并且 fastjson2 更安全,完全删除autoType白名单,提升了安全性。
Fury 是一个基于 JIT 动态编译和零拷贝的多语言序列化框架,支持 Java/Python/Golang/JavaScript/C++ 等语言,提供全自动的对象多语言 / 跨语言序列化能力。
至于protostuff,这里就不提了,可以参看上一篇文章,它在这里,只能当个可怜的反面教材。
今天,我们从序列化/反序列化,包体的压缩率,GC,和JIT优化的角度来做下性能评测,这里要强调的是,是基于二进制格式的协议做对比,是jsonb,而非json, 不弄清这点,你就会很困惑!
官网和引入
fastjson2
官网:无
开源地址:https://github.com/alibaba/fastjson2
使用引入:
implementation 'com.alibaba.fastjson2:fastjson2:2.0.37'
fury
官网:https://furyio.org
开源地址:https://github.com/alipay/fury
使用引入:
implementation 'org.furyio:fury-core:0.1.0'
设备,环境及样本
设备
测试设备: win11, 8core,16g memory,
JDK
openjdk version "11.0.16.1" 2022-08-16 OpenJDK Runtime Environment TencentKonaJDK (build 11.0.16.1+2) OpenJDK 64-Bit Server VM TencentKonaJDK (build 11.0.16.1+2, mixed mode)
样本
用游戏中高频调用的技能回包做样本,字节大小 为704 bytes,
SkillFire_S2C_Msg[attackerId=2013850838,harmList={HarmDTO[curHp=1061639.1,dead=true,maxHp=972081.06,real=36249,targetId=1711281434,type=84,value=18168.72],HarmDTO[curHp=836323.44,dead=true,maxHp=8546706.0,real=91675,targetId=1527336063,type=22,value=30714.76],HarmDTO[curHp=2022717.6,dead=true,maxHp=8923567.0,real=74008,targetId=1684460215,type=67,value=93250.83]},index=37,param1={7153337,1918282,5243103,1985757,7515730},skillCategory=ATTACK_PASSIVE]
放一张使用的游戏场景,让大家感性认识一下,你所看到的绚丽的技能效果背后是无数个SkillFire_S2C_Msg协议在来回游荡:
测评数据
性能测评的项目下载路径如下:
https://github.com/jiangguilong2000/gamioo-sandbox.git
包体大小
对序列化后传输包体压缩率的各种比较如下:
协议 | 设置 | 压缩率 |
fastjson2 | BeanToArray=false | 41.48% |
fastjson2 | BeanToArray=true | 16.34% |
fury | NumberCompressed=false | 35.94% |
fury | NumberCompressed=true | 27.84% |
fury | NumberCompressed=true,ClassRegistration | 20.45% |
Protostuff | 17.90% |
序列化
Benchmark Mode Cnt Score Error Units ProtoSerializeBenchMark.furySerialize thrpt 10 4405011.951 ± 151916.918 ops/s ProtoSerializeBenchMark.furySerializeWithClassRegistrationAndNumberCompressed thrpt 10 5537945.622 ± 245265.769 ops/s ProtoSerializeBenchMark.jsonSerialize thrpt 10 2149077.623 ± 81047.066 ops/s ProtoSerializeBenchMark.jsonSerializeWithBeanToArray thrpt 10 5309057.521 ± 220991.568 ops/s ProtoSerializeBenchMark.jsonSerializeWithBeanToArrayAndFieldBase thrpt 10 5060364.814 ± 342432.492 ops/s ProtoSerializeBenchMark.protostuffSerialize thrpt 10 196659.980 ± 7036.993 ops/s
反序列化
Benchmark Mode Cnt Score Error Units ProtoDeserializeBenchMark.furyDeserialize thrpt 10 3273154.497 ± 246280.027 ops/s ProtoDeserializeBenchMark.furyDeserializeWithClassRegistrationAndNumberCompressed thrpt 10 4343790.775 ± 175190.374 ops/s ProtoDeserializeBenchMark.jsonDeserialize thrpt 10 2478522.415 ± 36606.918 ops/s ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBean thrpt 10 4805905.704 ± 73786.104 ops/s ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBeanAndFieldBase thrpt 10 4666934.415 ± 262638.869 ops/s ProtoDeserializeBenchMark.protostuffDeserialize thrpt 10 192222.309 ± 4843.832 ops/s
垃圾回收
分析垃圾回收器在内存空间上所花费的时间消耗:
Benchmark Mode Cnt Score Error Units ProtoSerializeBenchMark.furySerialize thrpt 10 4039378.284 ± 142043.248 ops/s ProtoSerializeBenchMark.furySerialize:·gc.alloc.rate thrpt 10 1170.814 ± 41.186 MB/sec ProtoSerializeBenchMark.furySerialize:·gc.alloc.rate.norm thrpt 10 304.001 ± 0.001 B/op ProtoSerializeBenchMark.furySerialize:·gc.count thrpt 10 75.000 counts ProtoSerializeBenchMark.furySerialize:·gc.time thrpt 10 60.000 ms ProtoSerializeBenchMark.furySerializeWithClassRegistrationAndNumberCompressed thrpt 10 6338653.047 ± 288140.271 ops/s ProtoSerializeBenchMark.furySerializeWithClassRegistrationAndNumberCompressed:·gc.alloc.rate thrpt 10 1015.321 ± 46.153 MB/sec ProtoSerializeBenchMark.furySerializeWithClassRegistrationAndNumberCompressed:·gc.alloc.rate.norm thrpt 10 168.001 ± 0.001 B/op ProtoSerializeBenchMark.furySerializeWithClassRegistrationAndNumberCompressed:·gc.count thrpt 10 66.000 counts ProtoSerializeBenchMark.furySerializeWithClassRegistrationAndNumberCompressed:·gc.time thrpt 10 57.000 ms ProtoSerializeBenchMark.jsonSerialize thrpt 10 2433452.831 ± 134796.548 ops/s ProtoSerializeBenchMark.jsonSerialize:·gc.alloc.rate thrpt 10 1262.157 ± 69.960 MB/sec ProtoSerializeBenchMark.jsonSerialize:·gc.alloc.rate.norm thrpt 10 544.001 ± 0.002 B/op ProtoSerializeBenchMark.jsonSerialize:·gc.count thrpt 10 82.000 counts ProtoSerializeBenchMark.jsonSerialize:·gc.time thrpt 10 70.000 ms ProtoSerializeBenchMark.jsonSerializeWithBeanToArray thrpt 10 4824280.181 ± 355784.630 ops/s ProtoSerializeBenchMark.jsonSerializeWithBeanToArray:·gc.alloc.rate thrpt 10 1545.476 ± 113.917 MB/sec ProtoSerializeBenchMark.jsonSerializeWithBeanToArray:·gc.alloc.rate.norm thrpt 10 336.001 ± 0.001 B/op ProtoSerializeBenchMark.jsonSerializeWithBeanToArray:·gc.count thrpt 10 77.000 counts ProtoSerializeBenchMark.jsonSerializeWithBeanToArray:·gc.time thrpt 10 66.000 ms ProtoSerializeBenchMark.jsonSerializeWithBeanToArrayAndFieldBase thrpt 10 4994330.560 ± 402699.443 ops/s ProtoSerializeBenchMark.jsonSerializeWithBeanToArrayAndFieldBase:·gc.alloc.rate thrpt 10 1599.959 ± 129.010 MB/sec ProtoSerializeBenchMark.jsonSerializeWithBeanToArrayAndFieldBase:·gc.alloc.rate.norm thrpt 10 336.001 ± 0.001 B/op ProtoSerializeBenchMark.jsonSerializeWithBeanToArrayAndFieldBase:·gc.count thrpt 10 105.000 counts ProtoSerializeBenchMark.jsonSerializeWithBeanToArrayAndFieldBase:·gc.time thrpt 10 90.000 ms ProtoSerializeBenchMark.protostuffSerialize thrpt 10 196414.101 ± 9192.079 ops/s ProtoSerializeBenchMark.protostuffSerialize:·gc.alloc.rate thrpt 10 870.439 ± 40.732 MB/sec ProtoSerializeBenchMark.protostuffSerialize:·gc.alloc.rate.norm thrpt 10 4648.018 ± 0.026 B/op ProtoSerializeBenchMark.protostuffSerialize:·gc.count thrpt 10 67.000 counts ProtoSerializeBenchMark.protostuffSerialize:·gc.time thrpt 10 53.000 ms
Benchmark Mode Cnt Score Error Units ProtoDeserializeBenchMark.furyDeserialize thrpt 10 3769407.082 ± 153367.315 ops/s ProtoDeserializeBenchMark.furyDeserialize:·gc.alloc.rate thrpt 10 2357.655 ± 95.882 MB/sec ProtoDeserializeBenchMark.furyDeserialize:·gc.alloc.rate.norm thrpt 10 656.001 ± 0.001 B/op ProtoDeserializeBenchMark.furyDeserialize:·gc.count thrpt 10 95.000 counts ProtoDeserializeBenchMark.furyDeserialize:·gc.time thrpt 10 93.000 ms ProtoDeserializeBenchMark.furyDeserializeWithClassRegistrationAndNumberCompressed thrpt 10 4246958.076 ± 165462.964 ops/s ProtoDeserializeBenchMark.furyDeserializeWithClassRegistrationAndNumberCompressed:·gc.alloc.rate thrpt 10 2494.351 ± 97.119 MB/sec ProtoDeserializeBenchMark.furyDeserializeWithClassRegistrationAndNumberCompressed:·gc.alloc.rate.norm thrpt 10 616.001 ± 0.001 B/op ProtoDeserializeBenchMark.furyDeserializeWithClassRegistrationAndNumberCompressed:·gc.count thrpt 10 96.000 counts ProtoDeserializeBenchMark.furyDeserializeWithClassRegistrationAndNumberCompressed:·gc.time thrpt 10 95.000 ms ProtoDeserializeBenchMark.jsonDeserialize thrpt 10 2461425.079 ± 71354.374 ops/s ProtoDeserializeBenchMark.jsonDeserialize:·gc.alloc.rate thrpt 10 2027.745 ± 58.903 MB/sec ProtoDeserializeBenchMark.jsonDeserialize:·gc.alloc.rate.norm thrpt 10 864.001 ± 0.002 B/op ProtoDeserializeBenchMark.jsonDeserialize:·gc.count thrpt 10 91.000 counts ProtoDeserializeBenchMark.jsonDeserialize:·gc.time thrpt 10 90.000 ms ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBean thrpt 10 4627064.619 ± 203137.578 ops/s ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBean:·gc.alloc.rate thrpt 10 3705.782 ± 162.762 MB/sec ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBean:·gc.alloc.rate.norm thrpt 10 840.001 ± 0.001 B/op ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBean:·gc.count thrpt 10 119.000 counts ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBean:·gc.time thrpt 10 120.000 ms ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBeanAndFieldBase thrpt 10 4523421.319 ± 154096.124 ops/s ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBeanAndFieldBase:·gc.alloc.rate thrpt 10 3622.814 ± 123.593 MB/sec ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBeanAndFieldBase:·gc.alloc.rate.norm thrpt 10 840.001 ± 0.001 B/op ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBeanAndFieldBase:·gc.count thrpt 10 107.000 counts ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBeanAndFieldBase:·gc.time thrpt 10 110.000 ms ProtoDeserializeBenchMark.protostuffDeserialize thrpt 10 189091.087 ± 10938.077 ops/s ProtoDeserializeBenchMark.protostuffDeserialize:·gc.alloc.rate thrpt 10 787.513 ± 45.575 MB/sec ProtoDeserializeBenchMark.protostuffDeserialize:·gc.alloc.rate.norm thrpt 10 4368.019 ± 0.027 B/op ProtoDeserializeBenchMark.protostuffDeserialize:·gc.count thrpt 10 40.000 counts ProtoDeserializeBenchMark.protostuffDeserialize:·gc.time thrpt 10 39.000 ms
JIT优化耗时
Benchmark Mode Cnt Score Error Units ProtoSerializeBenchMark.furySerialize thrpt 10 3988034.285 ± 139528.585 ops/s ProtoSerializeBenchMark.furySerialize:·compiler.time.profiled thrpt 10 4.000 ms ProtoSerializeBenchMark.furySerialize:·compiler.time.total thrpt 10 2932.000 ms ProtoSerializeBenchMark.furySerializeWithClassRegistrationAndNumberCompressed thrpt 10 6445600.977 ± 153492.182 ops/s ProtoSerializeBenchMark.furySerializeWithClassRegistrationAndNumberCompressed:·compiler.time.profiled thrpt 10 6.000 ms ProtoSerializeBenchMark.furySerializeWithClassRegistrationAndNumberCompressed:·compiler.time.total thrpt 10 2624.000 ms ProtoSerializeBenchMark.jsonSerialize thrpt 10 2413208.213 ± 77023.998 ops/s ProtoSerializeBenchMark.jsonSerialize:·compiler.time.profiled thrpt 10 4.000 ms ProtoSerializeBenchMark.jsonSerialize:·compiler.time.total thrpt 10 1986.000 ms ProtoSerializeBenchMark.jsonSerializeWithBeanToArray thrpt 10 6272217.689 ± 162504.678 ops/s ProtoSerializeBenchMark.jsonSerializeWithBeanToArray:·compiler.time.profiled thrpt 10 4.000 ms ProtoSerializeBenchMark.jsonSerializeWithBeanToArray:·compiler.time.total thrpt 10 1948.000 ms ProtoSerializeBenchMark.jsonSerializeWithBeanToArrayAndFieldBase thrpt 10 6028406.523 ± 161064.529 ops/s ProtoSerializeBenchMark.jsonSerializeWithBeanToArrayAndFieldBase:·compiler.time.profiled thrpt 10 3.000 ms ProtoSerializeBenchMark.jsonSerializeWithBeanToArrayAndFieldBase:·compiler.time.total thrpt 10 1938.000 ms ProtoSerializeBenchMark.protostuffSerialize thrpt 10 212378.958 ± 13824.911 ops/s ProtoSerializeBenchMark.protostuffSerialize:·compiler.time.profiled thrpt 10 4.000 ms ProtoSerializeBenchMark.protostuffSerialize:·compiler.time.total thrpt 10 3196.000 ms
Benchmark Mode Cnt Score Error Units ProtoDeserializeBenchMark.furyDeserialize thrpt 10 3577484.780 ± 180920.346 ops/s ProtoDeserializeBenchMark.furyDeserialize:·compiler.time.profiled thrpt 10 6.000 ms ProtoDeserializeBenchMark.furyDeserialize:·compiler.time.total thrpt 10 3235.000 ms ProtoDeserializeBenchMark.furyDeserializeWithClassRegistrationAndNumberCompressed thrpt 10 4192498.438 ± 176522.218 ops/s ProtoDeserializeBenchMark.furyDeserializeWithClassRegistrationAndNumberCompressed:·compiler.time.profiled thrpt 10 4.000 ms ProtoDeserializeBenchMark.furyDeserializeWithClassRegistrationAndNumberCompressed:·compiler.time.total thrpt 10 3479.000 ms ProtoDeserializeBenchMark.jsonDeserialize thrpt 10 2399044.624 ± 165715.862 ops/s ProtoDeserializeBenchMark.jsonDeserialize:·compiler.time.profiled thrpt 10 7.000 ms ProtoDeserializeBenchMark.jsonDeserialize:·compiler.time.total thrpt 10 3355.000 ms ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBean thrpt 10 4677755.187 ± 135508.540 ops/s ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBean:·compiler.time.profiled thrpt 10 4.000 ms ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBean:·compiler.time.total thrpt 10 3562.000 ms ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBeanAndFieldBase thrpt 10 4635990.508 ± 108135.163 ops/s ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBeanAndFieldBase:·compiler.time.profiled thrpt 10 6.000 ms ProtoDeserializeBenchMark.jsonDeserializeWithArrayToBeanAndFieldBase:·compiler.time.total thrpt 10 3464.000 ms ProtoDeserializeBenchMark.protostuffDeserialize thrpt 10 205998.280 ± 8099.058 ops/s ProtoDeserializeBenchMark.protostuffDeserialize:·compiler.time.profiled thrpt 10 5.000 ms ProtoDeserializeBenchMark.protostuffDeserialize:·compiler.time.total thrpt 10 4534.000 ms
结论
序列化对比
fastjson2在BeanToArray(将对象序列化成数组)情况下,是protostuff 的26.996倍。
fury在 引用解析(RefTracking)关闭,类注册(ClassRegistration)打开,整数压缩(NumberCompressed)打开的情况下 ,是protostuff 的28.160倍。
fury 胜出!
反序列化对比
fastjson2在SupportArrayToBean(将数组反序列化成对象)情况下,是protostuff 的25.002倍。
fury在 引用解析(RefTracking)关闭,类注册(ClassRegistration)打开,整数压缩(NumberCompressed)打开的情况下 ,是protostuff 的22.598倍。
fastjson2 胜出!
包体压缩比上
选取各自表现最优的情况下,fury , protostuff,fastjson 比较 20.45%> 17.90%>16.34% ,fastjson2 胜出!
API易用性上
准备数据:
byte[] array = FileUtils.getByteArrayFromFile("message.txt"); SkillFire_S2C_Msg skillFire_s2C_msg= JSON.parseObject(array,SkillFire_S2C_Msg.class);
fastjson2使用
//序列化 byte[] bytes=JSONB.toBytes(skillFire_s2C_msg, JSONWriter.Feature.BeanToArray); //反序列化 SkillFire_S2C_Msg message=JSONB.parseObject(bytes, SkillFire_S2C_Msg.class, JSONReader.Feature.SupportArrayToBean);
fury 使用
//初始化 Fury fury = Fury.builder().withLanguage(Language.JAVA) .withRefTracking(false).requireClassRegistration(true).withNumberCompressed(true).build(); //下面是一堆的自定义对象的显式注册,不注册的话(requireClassRegistration(false)),不止会影响压缩率,而且也会影响性能 fury .register(SkillFire_S2C_Msg.class); fury .register(SkillCategory.class); fury .register(HarmDTO.class); //序列化 byte[] bytes=fury.serialize(skillFire_s2C_msg); //反序列化 SkillFire_S2C_Msg message=fury.deserializeJavaObject(bytes, SkillFire_S2C_Msg.class);
很明显,接口的易用性上来说,fastjson2相对更友好优雅些,也可能和我以前用了多年的fastjson的缘故,笔者特别困惑的是fury要求对用到的自定义消息对象要提前注册,虽然说可以在程序初始化时通过遍历消息包目录通过反射去注册完成,但总感觉还有优化的空间,参考Spring中的循环依赖的解决方案,希望官方能再优化。
fastjson2 胜出!
多语言生态上
fury 目前支持了 Java/Python/Golang/Rust/JavaScript/C++等 ,缺少对C# 版本的支持。
fastjson2 目前多语言支持很有限, 仅仅对java/Kotlin 做了支持,缺少对JavaScript和C# 版本的支持。
可以这么说,谁先对JavaScript和C# 的版本提供强力支持,谁将会吸引到一大批游戏从业者来使用该协议框架。
fury 胜出!
垃圾回收上
fastjson2,在BeanToArray(将对象序列化成数组)情况下,GC总共出现过 77 次,总共耗时 66 毫秒,在此期间也发生了多次的堆内存的申请,每秒钟大约会有 1545.476MB 的数据被创建,若换算成对jsonSerializeWithBeanToArray方法的每次调用,那么我们会发现大约有 336.001 Byte 的内存使用。
Fury,在 引用解析(RefTracking)关闭,类注册(ClassRegistration)打开,整数压缩(NumberCompressed)打开的情况下,GC总共出现过 66 次,总共耗时 57 毫秒,在此期间也发生了多次的堆内存的申请,每秒钟大约会有 1015.321 MB 的数据被创建,若换算成对furySerializeWithClassRegistrationAndNumberCompressed方法的每次调用,那么我们会发现大约有 168.001 Byte 的内存使用。
这个我们可以理解成,吞吐量越大,创建的数据越多,只要冗余的临时对象数量产生合理,该指标应该不会相差太大,这点上,两者战平。
JIT编译器所消耗时间上
fastjson2是基于asm做JIT的,内置裁剪后的的ASM实现,fury是基于javac做JIT,使用JavaCompiler实现,这两个JIT其实不一样的,他们都会基于每个bean class生成一个serializer类以及deserializer类,对SkillFire_S2C_Msg/SkillCategory/HarmDTO生成对应的Serliazer和Deserializer,各自采用的办法不一样,比如:
com.alibaba.fastjson2.writer.OWG_1_5_SkillFire_S2C_Msg.writeArrayMappingJSONB(JSONWriter, Object, Object, Type, long) 是指fastjson2的JIT过程。
io.gamioo.sandbox.SkillFire_S2C_MsgFuryCodec_1210966325_1908153060_225832990.write(MemoryBuffer, Object)是指fury的JIT过程。
fastjson2,在BeanToArray(将对象序列化成数组)情况下,执行过程中,profiled 的优化耗时为 4 毫秒,total 的优化耗时为 1948 毫秒。
Fury,在 引用解析(RefTracking)关闭,类注册(ClassRegistration)打开,整数压缩(NumberCompressed)打开的情况下,执行过程中,profiled 的优化耗时为 6 毫秒,total 的优化耗时为 2624 毫秒。
fastjson2 胜出!
综述
总之,两个协议框架各有千秋,不分伯仲,除了希望性能和包体上后面再接再厉,如果能在多语言生态,和接口的易用性上更上一层,两大框架会有更灿烂更广泛的应用前景。
附言:本人认知有限,如本文中有错误的使用或者见解,请联系我纠正,谢谢!
参考链接: