PostgreSQL , 10.0 , amcheck , 逻辑一致性检测 , 物理存储检测
1. 写进入是什么,读出来就应该是什么。
2. 当操作系统的collate发生变化时,索引的顺序可能与实际的collate顺序不匹配。造成不稳定现象。
3. 数据块partial write,可能导致数据损坏。
4. 内存页异常,使用到某些异常页时,可能带来问题。
PostgreSQL通过full page write来避免3的问题。另外在数据页上面有checksum提供检测。
PostgreSQL 10.0 提供了一个check接口,可以对数据进行检测,发现以上问题。
命名为amcheck, am指的是access method,检测的自然是access method相关的。
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 brin
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 common
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 gin
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 gist
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 hash
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:38 heap
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 index
-rw-r--r-- 1 digoal digoal 321 Apr 14 12:17 Makefile
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 nbtree
-rw-rw-r-- 1 digoal digoal 4759 Apr 14 23:38 objfiles.txt
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 rmgrdesc
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 spgist
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:37 tablesample
drwxrwxr-x 2 digoal digoal 4096 Apr 14 23:38 transam
amcheck 可检测的异常
1. Structural inconsistencies caused by incorrect operator class implementations.
检测方法,参考每种access method的一致性校验function
2. Corruption caused by hypothetical undiscovered bugs in the underlying PostgreSQL access method code or sort code.
3. Filesystem or storage subsystem faults where checksums happen to simply not be enabled.
4. Corruption caused by faulty RAM, and the broader memory subsystem and operating system.
amcheck 检测到的异常修复
amcheck patch介绍
Add amcheck extension to contrib.
author Andres Freund <andres@anarazel.de>
Fri, 10 Mar 2017 07:50:40 +0800 (15:50 -0800)
committer Andres Freund <andres@anarazel.de>
Fri, 10 Mar 2017 08:33:02 +0800 (16:33 -0800)
This is the beginning of a collection of SQL-callable functions to
verify the integrity of data files. For now it only contains code to
verify B-Tree indexes.
This adds two SQL-callable functions, validating B-Tree consistency to
a varying degree. Check the, extensive, docs for details.
The goal is to later extend the coverage of the module to further
access methods, possibly including the heap. Once checks for
additional access methods exist, we'll likely add some "dispatch"
functions that cover multiple access methods.
Author: Peter Geoghegan, editorialized by Andres Freund
Reviewed-By: Andres Freund, Tomas Vondra, Thomas Munro,
Anastasia Lubennikova, Robert Haas, Amit Langote
Discussion: CAM3SWZQzLMhMwmBqjzK+pRKXrNUZ4w90wYMUWfkeV8mZ3Debvw@mail.gmail.com
amcheck b-tree数据检测接口
1. bt_index_check(index regclass) returns void
加select一样的accessshared锁。基本无影响。注意,如果被检测的索引页在shared buffer中时,不会扫磁盘。
test=# SELECT bt_index_check(c.oid), c.relname, c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
bt_index_check | relname | relpages
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
| pg_proc_proname_args_nsp_index | 31
| pg_description_o_c_o_index | 21
| pg_attribute_relid_attnam_index | 14
| pg_proc_oid_index | 10
| pg_attribute_relid_attnum_index | 9
| pg_amproc_fam_proc_index | 5
| pg_amop_opr_fam_index | 5
| pg_amop_fam_strat_index | 5
(10 rows)
2. bt_index_parent_check(index regclass) returns void
被检测的索引,以及索引对应的表加ShareLock锁。冲突较大,堵塞INSERT, UPDATE, and DELETE,表的VACUUM,以及更大的锁操作。
HOT STNADBY不允许执行 bt_index_parent_check(index regclass) 。