标签
PostgreSQL , 10.0 , amcheck , 逻辑一致性检测 , 物理存储检测
背景
一些高端存储、包括ZFS文件系统,在使用了RAID后,有块检测和异常块的修复功能。
对于数据库来说,数据的可靠性是非常重要的指标,例如:
1. 写进入是什么,读出来就应该是什么。
2. 当操作系统的collate发生变化时,索引的顺序可能与实际的collate顺序不匹配。造成不稳定现象。
3. 数据块partial write,可能导致数据损坏。
4. 内存页异常,使用到某些异常页时,可能带来问题。
PostgreSQL通过full page write来避免3的问题。另外在数据页上面有checksum提供检测。
PostgreSQL 10.0 提供了一个check接口,可以对数据进行检测,发现以上问题。
amcheck接口
amcheck是一个框架,用于检测数据的一致性。例如PG有heap存储,b-tree,gist,gin,sp-gist,brin索引存储。amcheck可以用于检测各种接口对应数据存储的一致性。
命名为amcheck, am指的是access method,检测的自然是access method相关的。
src/backend/access
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 brin
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 common
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 gin
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 gist
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 hash
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:38 heap
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 index
-rw-r--r-- 1 digoal digoal 321 Apr 14 12:17 Makefile
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 nbtree
-rw-rw-r-- 1 digoal digoal 4759 Apr 14 23:38 objfiles.txt
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 rmgrdesc
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 spgist
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 tablesample
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:38 transam
目前amcheck已经做到可以检测索引的异常(例如前面提到的操作系统collate变化引发的索引的逻辑顺序异常)。未来会扩展更多的检测接口。
10.0也推出了对ICU的支持,从根源上避免了collate的问题。
《PostgreSQL 10.0 preview 功能增强 - 国际化功能增强,支持ICU(International Components for Unicode)》
amcheck 可检测的异常
1. Structural inconsistencies caused by incorrect operator class implementations.
问题可能来自操作系统collate的变化,导致collate变化前后,QUERY输出不一致(顺序)的结果.
检测方法,参考每种access method的一致性校验function
https://www.postgresql.org/docs/devel/static/xindex.html#xindex-support
2. Corruption caused by hypothetical undiscovered bugs in the underlying PostgreSQL access method code or sort code.
3. Filesystem or storage subsystem faults where checksums happen to simply not be enabled.
4. Corruption caused by faulty RAM, and the broader memory subsystem and operating system.
amcheck 检测到的异常修复
不同的异常,修复的方法不一样,通常能直接修复的是REINDEX。(但并不是所有的异常都有方法修复。)
但是社区给出了一个建议,如果是代码的BUG,REINDEX可能是无法修复的,但是通过pageinspect插件,可以帮助进行问题诊断。
amcheck patch介绍
Add amcheck extension to contrib.
author Andres Freund <andres@anarazel.de>
Fri, 10 Mar 2017 07:50:40 +0800 (15:50 -0800)
committer Andres Freund <andres@anarazel.de>
Fri, 10 Mar 2017 08:33:02 +0800 (16:33 -0800)
This is the beginning of a collection of SQL-callable functions to
verify the integrity of data files. For now it only contains code to
verify B-Tree indexes.
This adds two SQL-callable functions, validating B-Tree consistency to
a varying degree. Check the, extensive, docs for details.
The goal is to later extend the coverage of the module to further
access methods, possibly including the heap. Once checks for
additional access methods exist, we'll likely add some "dispatch"
functions that cover multiple access methods.
Author: Peter Geoghegan, editorialized by Andres Freund
Reviewed-By: Andres Freund, Tomas Vondra, Thomas Munro,
Anastasia Lubennikova, Robert Haas, Amit Langote
Discussion: CAM3SWZQzLMhMwmBqjzK+pRKXrNUZ4w90wYMUWfkeV8mZ3Debvw@mail.gmail.com
amcheck b-tree数据检测接口
对于b-tree索引数据,通过这两个接口可以进行检测。
1. bt_index_check(index regclass) returns void
加select一样的accessshared锁。基本无影响。注意,如果被检测的索引页在shared buffer中时,不会扫磁盘。
返回空表示正常。
test=# SELECT bt_index_check(c.oid), c.relname, c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
bt_index_check | relname | relpages
----------------+---------------------------------+----------
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
| pg_proc_proname_args_nsp_index | 31
| pg_description_o_c_o_index | 21
| pg_attribute_relid_attnam_index | 14
| pg_proc_oid_index | 10
| pg_attribute_relid_attnum_index | 9
| pg_amproc_fam_proc_index | 5
| pg_amop_opr_fam_index | 5
| pg_amop_fam_strat_index | 5
(10 rows)
2. bt_index_parent_check(index regclass) returns void
被检测的索引,以及索引对应的表加ShareLock锁。冲突较大,堵塞INSERT, UPDATE, and DELETE,表的VACUUM,以及更大的锁操作。
HOT STNADBY不允许执行 bt_index_parent_check(index regclass) 。
这个patch的讨论,详见邮件组,本文末尾URL。
PostgreSQL社区的作风非常严谨,一个patch可能在邮件组中讨论几个月甚至几年,根据大家的意见反复的修正,patch合并到master已经非常成熟,所以PostgreSQL的稳定性也是远近闻名的。