作者:亮 亮言
1 测试覆盖率的几个衡量指标
2 传统数据库测试
2.1 SQLite
2.2 Oracle
3 新兴的NewSQL数据库
3.1 比较常见的设计
3.2 Tidb
3.3 FoundationDB
1 测试覆盖率的几个衡量指标
常见的几种衡量测试覆盖率方法
函数覆盖(Function Coverage)
语句覆盖(Statement Coverage)
决策覆盖(Decision Coverage)Branch Coverage
条件覆盖(Condition Coverage)
Modified Condition/Decision Coverage (DC + CC + 每一个condition能独立影响decision结果的测试用例)
主要用在 safety-critical system (航空航天器件)
int foo( int x ,int y)
{
int z = 0;
if ( (x>0) && (y>0) ) {
z = x;
}
return z;
}
if ( a or b ) and c then
condition/decision criteria 满足:
a = true , b = true, c = true
a = false, b = false, c= false
第一个Test中,b的值 ;第二个test中,c的值 都不会影响 decision的值。
a=false, b=true, c=false
a=false, b=true, c=true
a=false, b=false, c=true
a=true, b=false, c=true
2 传统数据库测试
2.1 SQLite
https://www.sqlite.org/testing.html
源码:128.9 KSLOC of C code
测试代码:91772.0 KSLOC of test code
比例:711:1
PR测试示例: https://sqlite.org/src/info/940f2adc8541a838
主要测试集合列表:
Three independently developed test harnesses
100% branch test coverage in an as-deployed configuration
Millions and millions of test cases
Out-of-memory tests
I/O error tests
Crash and power loss tests
Fuzz tests
Boundary value tests
Disabled optimization tests
Regression tests
Malformed database tests
Extensive use of assert() and run-time checks
Valgrind analysis
Undefined behavior checks
其中三块独立开发的测试用例集合:
1 TCL 脚本写的Test (最开始的测试用例):开发测试时候用
26.1 KSLOC of C code
million 级别的case
2 C语言的测试用例:100% MC/DC test coverage
792.3 KSLOC of C code
hundreds of millions of tests.
3 SQL Logic Test (逻辑测试)
SQLite同时和PostgreSQL, MySQL,Oracle 等数据库跑相同功能的SQL语句,用来确认各个语句的执行结果是一致的。
7.2 million
就算是跑了这么多的测试,还是不能阻止bug的发生,腾讯的安全平台部门发现了SQLite的一个远程代码执行漏洞。
SQLite的作者也谈到了最开始开发SQLite的一段故事:
当SQLite运行的场景越来越多的时候(million 级别的应用,billion个设备上的时候),他会稳定的收到bug报告。
当bug数累计的越来越多的时候,作者花了十个月 (2008-09-25 through 2009-07-25) 的时间编写测试用例,使
SQLite能达到100% MC/DC指标。在这之后,bug报告数目就迅速降低下来了。
SQLite是如何做测试的
https://news.ycombinator.com/item?id=18685296
Remote Code Execution vulnerability in SQLite
100% branch, line coverage means nothing. It's about logical coverage. What are you testing for? You are not testing lines of code, but logic.
Right. The actual standard is called "modified condition/decison coverage" or MC/DC. In languages like C, MC/DC and branch coverage, though not exactly the same, are very close.
Achieving 100% MC/DC does not prove that you always get the right answer. All it means is that your tests are so extensive that you managed to get every machine-code branch to go in both directions at least once. It is a high standard and is difficult to achieve. It does not mean that the software is perfect.
But it does help. A lot. When I was young, I used to think I could right flawless code. Then I wrote SQLite, and it got picked up and used by lots of applications. It will amaze you how many problems will crop up when your code runs on in millions of application on billions of devices.
I was getting a steady stream of bug reports against SQLite. Then I took 10 months (2008-09-25 through 2009-07-25) to write the 100% MC/DC tests for SQLite. And after that, the number of bug reports slowed to a trickle. There still are bugs. But the number of bugs is greatly reduced. (Note that 100% MC/DC was first obtained on 2009-07-25, but the work did not end there. I spend most of my development time adding and enhancing test cases to keep up with changes in the deliverable SQLite code.)
100% MC/DC is just an arbitrary threshold - a high threshold and one that is easy to measure and difficult to cheat - but it is just a threshold at which we say "enough". You could just as easily choose a different threshold, such as 100% line coverage. The higher the threshold, the fewer bugs will slip through. But there will always be bugs.
My experience is that the weird tests you end up having to write just to cause some obscure branch to go one way or another end up finding problems in totally unrelated parts of the system. One of the chief benefits of 100% MC/DC is not so much that every branch is tested, but rather that you have to write so many tests, and such strange, weird, convoluted, and stressful tests, that you randomly stumble across (and fix) lots of problems you would have never thought about otherwise.
Another big advantage of 100% MC/DC is that once they are in place, you can change anything, anywhere in the code, and if the tests all still pass, you have high confidence that you didn't break anything. This enables us to evolve the SQLite code much faster than we could otherwise, using relatively few eyeballs.
Yet another advantage of 100% MC/DC is that you are really testing compiled machine code, not source code. So you worry less about compiler bugs. "Undefined behavior" is a big bugbear with C. We worry less than others about UB because we have tested the output of the compiler and we know that the compiler did what we wanted, even if the official C-language spec didn't require it to. We still avoid UB, and SQLite does not currently contain any UB as far as we know. But is is nice to know that even if we missed some UB in the code someplace, it probably doesn't matter.
2.2 Oracle
https://news.ycombinator.com/item?id=18442941
Oracle Database 12.2
Oracle Database developer 的工作内容:
1 开始处理新的bug
2 花费两周时间理清引起这个bug的代码逻辑,这个涉及到需要了解20多个标志位在不同场景下的组合情况,有时候多达上百个
3 为了处理bug,新增加一个标志位和几行代码规避这个bug
4 将Oracle DB代码打包,提交到有100-200台机器组成的测试集群里测试代码
5 可以回家了,测试需要跑20-30个小时
6 3-4-5的情况要重复几次,如果不顺利的话。
7 在大约两周之后,你终于把几十个标志位的排列组合情况摸清楚了,测试用例有都OK了
8 再给自己新增加的标志位写测试用例,再来一轮测试。然后提交代码review
9 代码review过程持续2周到2个月不等。
所以大概一个bug的解决时间在 2w-2m不等。
A bug's Odyssey
Here is how the life of an Oracle Database developer is:
- Start working on a new bug.
- Spend two weeks trying to understand the 20 different flags
that interact in mysterious ways to cause this bag. - Add one more flag to handle the new special scenario. Add a few more lines of code that
checks this flag and works around the problematic situation and avoids the bug. - Submit the changes to a test farm consisting of about 100 to 200 servers that would compile the code,
build a new Oracle DB, and run the millions of tests in a distributed fashion. - Go home. Come the next day and work on something else. The tests can take 20 hours to 30 hours to complete.
- Go home. Come the next day and check your farm test results. On a good day, there would be about 100 failing tests.
On a bad day, there would be about 1000 failing tests.
Pick some of these tests randomly and try to understand what went wrong with your assumptions.
Maybe there are some 10 more flags to consider to truly understand the nature of the bug.
- Add a few more flags in an attempt to fix the issue. Submit the changes again for testing. Wait another 20 to 30 hours.
- Rinse and repeat for another two weeks until you get the mysterious incantation of the combination of flags right.
- Finally one fine day you would succeed with 0 tests failing.
- Add a hundred more tests for your new change to ensure that the next developer who has the misfortune of touching
this new piece of code never ends up breaking your fix. - Submit the work for one final round of testing. Then submit it for review.
The review itself may take another 2 weeks to 2 months. So now move on to the next bug to work on. - After 2 weeks to 2 months, when everything is complete, the code would be finally merged into the main branch.
The above is a non-exaggerated description of the life of a programmer in Oracle fixing a bug. Now imagine what horror it is going to be to develop a new feature. It takes 6 months to a year (sometimes two years!) to develop a single small feature (say something like adding a new mode of authentication like support for AD authentication).
I don't work for Oracle anymore. Will never work for Oracle again!
3 新兴的NewSQL数据库
3.1 比较常见的设计
单机版的KV store 设计:
在单机KV store基础上 通过分布式一致性协议(Paxos极其变种(Raft,ZK,VR等)
3.2 Tidb
我们现在有六百多万个 Test
1 自动化测试
怎么去自动生成测试Case:
1.1 fault injection 测试异常分支
Hardware : disck,cpu,network card,
Software: os,network protocol,file system
kill -9
1.2 fuzz testing
1.3 上层协议兼容Mysql,把Mysql的测试用例直接迁移过来
1.3 通过解析语法树
https://www.pingcap.com/blog-cn/golang-failpoint/
1.4 Jespen测试:
验证分布式系统一致性的测试框架
https://pingcap.com/blog-cn/tidb-jepsen/
2 所有出现过的 bug,历史上只要出现过一次,你一定要写一个 Test 去 cover 它 目前主流开源社区都在坚持的做法
3.3 FoundationDB
FoundationDB开源产品,后来被Apple收购又不开源了,用在Apple的Icloud存储后台场景下,
现在又被重新开源了。
https://apple.github.io/foundationdb/testing.html
重要思想:Simulation
a deterministic simulation of an entire FoundationDB cluster within a single-threaded process.
用单个进程(single-thread)确定性的(deterministic)模拟出整个集群(entire cluster)
确定性:可以复现问题
同时可以方面的模拟各个层面的Failure Mode
在C++11 基础上加入了actor-based-concurrency的能力,
开发出了一个新的Flow语言
1 high performance
2 actor-based concurrency
3 Simulation
one trillion CPU-hours of simulation on FoundationDB
Reference:
https://www.infoq.cn/article/test-coverage-rate-role
https://www.sqlite.org/testing.html
https://www.sqlite.org/th3/doc/trunk/www/th3.wiki
https://www.pingcap.com/blog-cn/distributed-system-test-1/
https://www.pingcap.com/blog-cn/distributed-system-test-2/
https://www.pingcap.com/blog-cn/distributed-system-test-3/
https://apple.github.io/foundationdb/testing.html
https://apple.github.io/foundationdb/engineering.html#simulation
Remote code execution vulnerability in SQLite
https://news.ycombinator.com/item?id=18685296
Ask HN: What's the largest amount of bad code you have ever seen work?