使用pg_resetxlog修复因pg_control损坏导致的PostgreSQL无法启动-开发者社区-阿里云

PostgreSQL 控制文件在$PGDATA/global目录下名为pg_control.

控制文件中记录了以下三部分信息 :

1. initdb时生成的静态信息 :

pg_control version number:            922    
Catalog version number:               201204301    
Database system identifier:           5831753892046499175    
Maximum data alignment:               8    
Database block size:                  8192    
Blocks per segment of large relation: 131072    
WAL block size:                       16384    
Bytes per WAL segment:                16777216    
Maximum length of identifiers:        64    
Maximum columns in an index:          32    
Maximum size of a TOAST chunk:        1996    
Date/time type storage:               64-bit integers    
Float4 argument passing:              by value    
Float8 argument passing:              by value    
2. postgresql.conf中的配置信息 :     
Current wal_level setting:            hot_standby    
Current max_connections setting:      1000    
Current max_prepared_xacts setting:   10    
Current max_locks_per_xact setting:   64    
3. write-ahead logging以及checkpoint的动态信息 :     
Latest checkpoint location:           96E8/5B000020    
Prior checkpoint location:            96E8/5A0C8CC0    
Latest checkpoint's REDO location:    96E8/5B000020    
Latest checkpoint's TimeLineID:       1    
Latest checkpoint's full_page_writes: on    
Latest checkpoint's NextXID:          0/1183664222    
Latest checkpoint's NextOID:          309701    
Latest checkpoint's NextMultiXactId:  1    
Latest checkpoint's NextMultiOffset:  0    
Latest checkpoint's oldestXID:        1006759584    
Latest checkpoint's oldestXID's DB:   1    
Latest checkpoint's oldestActiveXID:  0    
Time of latest checkpoint:            Fri 11 Jan 2013 07:44:19 AM CST    
Minimum recovery ending location:     0/0    
Backup start location:                0/0    
Backup end location:                  0/0    
End-of-backup record required:        no

以上信息可以使用pg_controldata从pg_control获取 :

src/bin/pg_controldata/pg_controldata.c    
 * pg_controldata    
 *    
 * reads the data from $PGDATA/global/pg_control

如果控制文件$PGDATA/global/pg_control损坏或丢失, 数据库将运行异常, 无法启动.

如何修复? 关键在于恢复write-ahead logging以及checkpoint的动态信息.

这些信息可以从pg_xlog, pg_clog, pg_multixact这些目录的文件中解析出来。

pg_xlog的文件名解析可参看, 不同的段大小, 命名大不相同, pg_resetxlog的帮助文件适用16MB的段大小, 如果是其他大小, 需要重新计算名字 :
http://blog.163.com/digoal@126/blog/static/1638770402012914112949546/

接下来介绍一下使用pg_resetxlog重建pg_control的方法.
pg_resetxlog功能如下 :

src/bin/pg_resetxlog/pg_resetxlog.c    
 * pg_resetxlog.c    
 *        A utility to "zero out" the xlog when it's corrupt beyond recovery.    
 *        Can also rebuild pg_control if needed.    
 *    
 * The theory of operation is fairly simple:    
 *        1. Read the existing pg_control (which will include the last    
 *               checkpoint record).  If it is an old format then update to    
 *               current format.    
 *        2. If pg_control is corrupt, attempt to intuit reasonable values,    
 *               by scanning the old xlog if necessary.    
 *        3. Modify pg_control to reflect a "shutdown" state with a checkpoint    
 *               record at the start of xlog.    
 *        4. Flush the existing xlog files and write a new segment with    
 *               just a checkpoint record in it.  The new segment is positioned    
 *               just past the end of the old xlog, so that existing LSNs in    
 *               data pages will appear to be "in the past".    
 * This is all pretty straightforward except for the intuition part of    
 * step 2 ...

pg_resetxlog的用法 :

ocz@db-172-16-3-150-> pg_resetxlog --help    
pg_resetxlog resets the PostgreSQL transaction log.    
    
Usage:    
  pg_resetxlog [OPTION]... DATADIR    
    
Options:    
  -e XIDEPOCH      set next transaction ID epoch    
  -f               force update to be done    
  -l TLI,FILE,SEG  force minimum WAL starting location for new transaction log    
  -m XID           set next multitransaction ID    
  -n               no update, just show extracted control values (for testing)    
  -o OID           set next OID    
  -O OFFSET        set next multitransaction offset    
  -V, --version    output version information, then exit    
  -x XID           set next transaction ID    
  -?, --help       show this help, then exit

参数具体含义 :

-l timelineid,fileid,seg    
  The WAL starting address (-l) should be larger than any WAL segment file name currently existing in the    
directory pg_xlog under the data directory. These names are also in hexadecimal and have three parts. The    
first part is the “timeline ID” and should usually be kept the same. Do not choose a value larger than 255    
(0xFF) for the third part; instead increment the second part and reset the third part to 0. For example, if    
00000001000000320000004A is the largest entry in pg_xlog, -l 0x1,0x32,0x4B will work; but if the largest    
entry is 000000010000003A000000FF, choose -l 0x1,0x3B,0x0 or more.    
    Note    
    pg_resetxlog itself looks at the files in pg_xlog and chooses a default -l setting beyond the last    
    existing file name. Therefore, manual adjustment of -l should only be needed if you are aware of WAL    
    segment files that are not currently present in pg_xlog, such as entries in an offline archive; or if    
    the contents of pg_xlog have been lost entirely.    
    
-e XIDEPOCH    
  The transaction ID epoch is not actually stored anywhere in the database except in the field that is set by    
pg_resetxlog, so any value will work so far as the database itself is concerned. You might need to adjust    
this value to ensure that replication systems such as Slony-I work correctly - if so, an appropriate value    
should be obtainable from the state of the downstream replicated database.    
    
-x XID    
  A safe value for the next transaction ID (-x) can be determined by looking for the numerically largest file    
name in the directory pg_clog under the data directory, adding one, and then multiplying by 1048576. Note    
that the file names are in hexadecimal. It is usually easiest to specify the option value in hexadecimal    
too. For example, if 0011 is the largest entry in pg_clog, -x 0x1200000 will work (five trailing zeroes    
provide the proper multiplier).    
    
-m XID    
  A safe value for the next multitransaction ID (-m) can be determined by looking for the numerically largest    
file name in the directory pg_multixact/offsets under the data directory, adding one, and then multiplying    
by 65536. As above, the file names are in hexadecimal, so the easiest way to do this is to specify the    
option value in hexadecimal and add four zeroes.    
    
-O OFFSET    
   A safe value for the next multitransaction offset (-O) can be determined by looking for the numerically    
largest file name in the directory pg_multixact/members under the data directory, adding one, and then    
multiplying by 65536. As above, the file names are in hexadecimal, so the easiest way to do this is to    
specify the option value in hexadecimal and add four zeroes.    
    
-o OID    
  There is no comparably easy way to determine a next OID that's beyond the largest one in the database, but    
fortunately it is not critical to get the next-OID setting right.    
    
-n    
  no update, just show extracted control values (for testing)    
-f    
  force

测试步骤如下(基于PostgreSQL 9.2.1) :

新建测试数据, 用到with oids的表, 因为OID无法确定, 看看是否会有异常.
关闭数据库
记下pg_controldata信息, 方便修复后进行比对
删除$PGDATA/global/pg_control
开启数据库观察报错输出
touch $PGDATA/global/pg_control
使用pg_resetxlog修复pg_control
记下pg_controldata信息, 与前面的pg_controldata输出进行比对
启动数据库
查看测试数据是否正常, 新插入数据
关闭数据库, 并记下pg_controldata的信息, 看看有何变化.

测试过程 :

1. 测试数据

digoal=> create table oid_test(id int primary key) with oids;    
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "oid_test_pkey" for table "oid_test"    
CREATE TABLE    
digoal=> insert into oid_test select generate_series(1,100000);    
INSERT 0 100000    
digoal=> select min(oid),max(oid) from oid_test ;    
  min  |  max       
-------+--------    
 16397 | 116396    
(1 row)

2. 关闭数据

ocz@db-172-16-3-150-> pg_ctl stop -m fast    
waiting for server to shut down.... done    
server stopped

3. 记下pg_controldata信息, 方便修复后进行比对

ocz@db-172-16-3-150-> pg_controldata     
pg_control version number:            922    
Catalog version number:               201204301    
Database system identifier:           5832000131111550393    
Database cluster state:               shut down    
pg_control last modified:             Fri 11 Jan 2013 09:48:18 AM CST    
Latest checkpoint location:           96E8/5F000020    
Prior checkpoint location:            96E8/5EE5C698    
Latest checkpoint's REDO location:    96E8/5F000020    
Latest checkpoint's TimeLineID:       1    
Latest checkpoint's full_page_writes: on    
Latest checkpoint's NextXID:          0/1183842312    
Latest checkpoint's NextOID:          116414    
Latest checkpoint's NextMultiXactId:  65536    
Latest checkpoint's NextMultiOffset:  65536    
Latest checkpoint's oldestXID:        1006759584    
Latest checkpoint's oldestXID's DB:   1    
Latest checkpoint's oldestActiveXID:  0    
Time of latest checkpoint:            Fri 11 Jan 2013 09:48:18 AM CST    
Minimum recovery ending location:     0/0    
Backup start location:                0/0    
Backup end location:                  0/0    
End-of-backup record required:        no    
Current wal_level setting:            hot_standby    
Current max_connections setting:      1000    
Current max_prepared_xacts setting:   10    
Current max_locks_per_xact setting:   64    
Maximum data alignment:               8    
Database block size:                  8192    
Blocks per segment of large relation: 131072    
WAL block size:                       16384    
Bytes per WAL segment:                16777216    
Maximum length of identifiers:        64    
Maximum columns in an index:          32    
Maximum size of a TOAST chunk:        1996    
Date/time type storage:               64-bit integers    
Float4 argument passing:              by value    
Float8 argument passing:              by value

4. 删除$PGDATA/global/pg_control

ocz@db-172-16-3-150-> rm $PGDATA/global/pg_control     
rm: remove regular file `/data05/ocz/pg_root/global/pg_control'? y

5. 开启数据库观察报错输出

ocz@db-172-16-3-150-> pg_ctl start    
server starting    
ocz@db-172-16-3-150-> postgres: could not find the database system    
Expected to find it in the directory "/data05/ocz/pg_root",    
but could not open file "/data05/ocz/pg_root/global/pg_control": No such file or directory

接下来进行修复 :

6. touch $PGDATA/global/pg_control

ocz@db-172-16-3-150-> touch $PGDATA/global/pg_control    
ocz@db-172-16-3-150-> chmod 600 $PGDATA/global/pg_control

7. 使用pg_resetxlog修复pg_control

首先确定-l timelineid,fileid,seg的信息 :

ocz@db-172-16-3-150-> cd $PGDATA/pg_xlog    
ocz@db-172-16-3-150-> ll    
total 65M    
-rw------- 1 ocz ocz 16M Jan 11 09:39 00000001000096E80000005C    
-rw------- 1 ocz ocz 16M Jan 11 09:39 00000001000096E80000005D    
-rw------- 1 ocz ocz 16M Jan 11 09:48 00000001000096E80000005E    
-rw------- 1 ocz ocz 16M Jan 11 09:48 00000001000096E80000005F    
drwx------ 2 ocz ocz 44K Jan 11 09:48 archive_status

-l timelineid,fileid,seg 的数据来自pg_xlog文件名的三个部分, 分别占用8个16进制位.

段大小为16MB, 所以末端最大为0xFF.

得出-l 0x1,0x96E8,0x60

接下来确定-x XID的信息

来自pg_clog

ocz@db-172-16-3-150-> cd $PGDATA/pg_clog    
ocz@db-172-16-3-150-> ll -t|head -n 5    
total 43M    
-rw------- 1 ocz ocz 8.0K Jan 11 09:48 0469    
-rw------- 1 ocz ocz 216K Jan 10 21:00 0468    
-rw------- 1 ocz ocz 256K Jan 10 12:56 0467    
-rw------- 1 ocz ocz 256K Jan 10 09:35 0466

取最大值加1然后乘以1048576.

转换成16进制的话相当于取最大值加1然后末尾添加5个0

得到-x 0x046A00000

接下来确定-m XID的信息

来自pg_multixact/offsets

ocz@db-172-16-3-150-> cd $PGDATA/pg_multixact/offsets    
ocz@db-172-16-3-150-> ll    
total 0

取最大值加1然后乘以65536.

转换成16进制的话相当于取最大值加1然后末尾添加4个0

没有文件的话使用0加1, 然后末尾添加4个0

得到-m 0x10000

接下来确定-O OFFSET的信息

来自pg_multixact/members

ocz@db-172-16-3-150-> cd $PGDATA/pg_multixact/members    
ocz@db-172-16-3-150-> ll    
total 0

取最大值加1然后乘以65536.

转换成16进制的话相当于取最大值加1然后末尾添加4个0

没有文件的话使用0加1, 然后末尾添加4个0

得到-O 0x10000

最后, 不确定的值有2个 :

-e XIDEPOCH    ，如果么有使用 slony或者londiste这种基于触发器的数据同步软件，则-e意义不大，它实际上是在将32位的xid转换为64位的xid时使用的一个转换系数 。 
-o OID    , 系统会自动跳过已经分配的OID，自动容错，例如OID被别的程序使用掉了，PG会自动生成下一个OID，并且继续判断可用性。知道可用为止。

可以先不管这两个值.

执行pg_resetxlog 如下 :

ocz@db-172-16-3-150-> pg_resetxlog -l 0x1,0x96E8,0x60 -x 0x046A00000 -m 0x10000 -O 0x10000 -f $PGDATA    
pg_resetxlog: pg_control exists but is broken or unknown version; ignoring it    
Transaction log reset

8. 记下pg_controldata信息, 与前面的pg_controldata输出进行比对

ocz@db-172-16-3-150-> pg_controldata     
pg_control version number:            922    
Catalog version number:               201204301    
Database system identifier:           5832008033851373032    
Database cluster state:               shut down    
pg_control last modified:             Fri 11 Jan 2013 10:09:44 AM CST    
Latest checkpoint location:           96E8/60000020    
Prior checkpoint location:            0/0    
Latest checkpoint's REDO location:    96E8/60000020    
Latest checkpoint's TimeLineID:       1    
Latest checkpoint's full_page_writes: off    
Latest checkpoint's NextXID:          0/1184890880    
Latest checkpoint's NextOID:          10000    
Latest checkpoint's NextMultiXactId:  65536    
Latest checkpoint's NextMultiOffset:  65536    
Latest checkpoint's oldestXID:        3479858176    
Latest checkpoint's oldestXID's DB:   0    
Latest checkpoint's oldestActiveXID:  0    
Time of latest checkpoint:            Fri 11 Jan 2013 10:09:44 AM CST    
Minimum recovery ending location:     0/0    
Backup start location:                0/0    
Backup end location:                  0/0    
End-of-backup record required:        no    
Current wal_level setting:            minimal    
Current max_connections setting:      100    
Current max_prepared_xacts setting:   0    
Current max_locks_per_xact setting:   64    
Maximum data alignment:               8    
Database block size:                  8192    
Blocks per segment of large relation: 131072    
WAL block size:                       16384    
Bytes per WAL segment:                16777216    
Maximum length of identifiers:        64    
Maximum columns in an index:          32    
Maximum size of a TOAST chunk:        1996    
Date/time type storage:               64-bit integers    
Float4 argument passing:              by value    
Float8 argument passing:              by value

注意修复后从控制文件读取到的不确定的-e XIDEPOCH和-o OID信息如下 :

也就是initdb后的初始值.

Latest checkpoint's NextXID:          0/1184890880  : XIDEPOCH=0    
Latest checkpoint's NextOID:          10000

与修复pg_control前发生了变化的值如下 :
修复前

Database system identifier:           5832000131111550393    
pg_control last modified:             Fri 11 Jan 2013 09:48:18 AM CST    
Prior checkpoint location:            96E8/5EE5C698    
Latest checkpoint's full_page_writes: on    
Latest checkpoint's NextXID:          0/1183842312    
Latest checkpoint's NextOID:          116414    
Latest checkpoint's oldestXID:        1006759584    
Latest checkpoint's oldestXID's DB:   1    
Time of latest checkpoint:            Fri 11 Jan 2013 09:48:18 AM CST    
Current wal_level setting:            hot_standby    
Current max_connections setting:      1000    
Current max_prepared_xacts setting:   10

修复后

Database system identifier:           5832008033851373032    
pg_control last modified:             Fri 11 Jan 2013 10:09:44 AM CST    
Prior checkpoint location:            0/0    
Latest checkpoint's full_page_writes: off    
Latest checkpoint's NextXID:          0/1184890880    
Latest checkpoint's NextOID:          10000    
Latest checkpoint's oldestXID:        3479858176    
Latest checkpoint's oldestXID's DB:   0    
Time of latest checkpoint:            Fri 11 Jan 2013 10:09:44 AM CST    
Current wal_level setting:            minimal    
Current max_connections setting:      100    
Current max_prepared_xacts setting:   0

9. 启动数据库

ocz@db-172-16-3-150-> pg_ctl start    
server starting    
ocz@db-172-16-3-150-> LOG:  00000: loaded library "pg_stat_statements"    
LOCATION:  load_libraries, miscinit.c:1249

10. 查看测试数据是否正常, 然后新插入数据

ocz@db-172-16-3-150-> psql digoal digoal    
psql (9.2.1)    
Type "help" for help.    
digoal=> select min(oid),max(oid),count(*) from oid_test ;    
  min  |  max   | count      
-------+--------+--------    
 16397 | 116396 | 100000    
(1 row)

数据可以正常访问.

新插入数据 :

digoal=> insert into oid_test select generate_series(100001,200000);    
INSERT 0 100000    
digoal=> select min(oid),max(oid),count(*) from oid_test ;    
  min  |  max   | count      
-------+--------+--------    
 16384 | 116396 | 200000    
(1 row)    
digoal=> select oid,* from oid_test where oid=16397;    
  oid  |   id       
-------+--------    
 16397 |      1    
 16397 | 100014    
(2 rows)

注意oid出现了重复, 印证了PostgreSQL中的说明, OID不确保唯一性.

11. 关闭数据库, 并记下pg_controldata的信息, 看看有何变化.

ocz@db-172-16-3-150-> pg_ctl stop -m fast    
waiting for server to shut down.... done    
server stopped    
ocz@db-172-16-3-150-> pg_controldata     
pg_control version number:            922    
Catalog version number:               201204301    
Database system identifier:           5832008033851373032    
Database cluster state:               shut down    
pg_control last modified:             Fri 11 Jan 2013 10:16:18 AM CST    
Latest checkpoint location:           96E8/61000020    
Prior checkpoint location:            96E8/60DFF470    
Latest checkpoint's REDO location:    96E8/61000020    
Latest checkpoint's TimeLineID:       1    
Latest checkpoint's full_page_writes: on    
Latest checkpoint's NextXID:          0/1184890883    
Latest checkpoint's NextOID:          116385    
Latest checkpoint's NextMultiXactId:  65536    
Latest checkpoint's NextMultiOffset:  65536    
Latest checkpoint's oldestXID:        1006759584    
Latest checkpoint's oldestXID's DB:   1    
Latest checkpoint's oldestActiveXID:  0    
Time of latest checkpoint:            Fri 11 Jan 2013 10:16:18 AM CST    
Minimum recovery ending location:     0/0    
Backup start location:                0/0    
Backup end location:                  0/0    
End-of-backup record required:        no    
Current wal_level setting:            hot_standby    
Current max_connections setting:      1000    
Current max_prepared_xacts setting:   10    
Current max_locks_per_xact setting:   64    
Maximum data alignment:               8    
Database block size:                  8192    
Blocks per segment of large relation: 131072    
WAL block size:                       16384    
Bytes per WAL segment:                16777216    
Maximum length of identifiers:        64    
Maximum columns in an index:          32    
Maximum size of a TOAST chunk:        1996    
Date/time type storage:               64-bit integers    
Float4 argument passing:              by value    
Float8 argument passing:              by value

关闭数据库后与刚修复好时的控制文件信息变化如下 :
开库前 :

pg_control last modified:             Fri 11 Jan 2013 10:09:44 AM CST    
Latest checkpoint location:           96E8/60000020    
Prior checkpoint location:            0/0    
Latest checkpoint's REDO location:    96E8/60000020    
Latest checkpoint's full_page_writes: off    
Latest checkpoint's NextXID:          0/1184890880    
Latest checkpoint's NextOID:          10000    
Latest checkpoint's oldestXID:        3479858176    
Latest checkpoint's oldestXID's DB:   0    
Time of latest checkpoint:            Fri 11 Jan 2013 10:09:44 AM CST    
Current wal_level setting:            minimal    
Current max_connections setting:      100    
Current max_prepared_xacts setting:   0

关库后 :

pg_control last modified:             Fri 11 Jan 2013 10:16:18 AM CST    
Latest checkpoint location:           96E8/61000020    
Prior checkpoint location:            96E8/60DFF470    
Latest checkpoint's REDO location:    96E8/61000020    
Latest checkpoint's full_page_writes: on    
Latest checkpoint's NextXID:          0/1184890883    
Latest checkpoint's NextOID:          116385    
Latest checkpoint's oldestXID:        1006759584    
Latest checkpoint's oldestXID's DB:   1    
Time of latest checkpoint:            Fri 11 Jan 2013 10:16:18 AM CST    
Current wal_level setting:            hot_standby    
Current max_connections setting:      1000    
Current max_prepared_xacts setting:   10

小结

1. 使用pg_resetxlog后, 先检查数据一致性, 必要时将数据导出, 使用initdb新建数据库, 再导入.

2. 如果控制文件丢失, 并且没有备份的话, pg_resetxlog你不知道该填啥, 但是可以从pg_xlog目录中获得大概的redo location, 或者pg_resetxlog 会猜测一些值, 直接-f生成控制文件, 启动数据库后, 可能由于XID回归到以前的XID而致使数据"消失", 你可以使用txid_current()函数不断的消耗XID来得到一致的值.

http://blog.163.com/digoal@126/blog/static/163877040201183043153622/
http://blog.163.com/digoal@126/blog/static/163877040201251911813661/

使用pg_xlogdump从xlog中抽取信息, 包括txid.
http://blog.163.com/digoal@126/blog/static/16387704020134993845555/

pg_resetxlog的版本必须要与数据库集群的版本一致。

注意

控制文件和PostgreSQL的大版本相关，建议用户使用pg_resetxlog前，看一看对应版本的使用说明。

参考

1. man pg_controldata
2. man pg_resetxlog
3. http://blog.163.com/digoal@126/blog/static/163877040201171233710582/
4. http://blog.163.com/digoal@126/blog/static/1638770402012914112949546/
5. http://www.postgresql.org/docs/9.2/static/wal-internals.html
6. http://www.postgresql.org/docs/9.2/static/app-pgresetxlog.html
7. http://www.postgresql.org/docs/9.2/static/pgupgrade.html
8. http://www.postgresql.org/docs/9.2/static/continuous-archiving.html
9. src/bin/pg_resetxlog/pg_resetxlog.c
10. src/include/catalog/pg_control.h
11. src/bin/pg_controldata/pg_controldata.c
12. src/backend/access/transam/clog.c
13. src/include/access/clog.h
14. src/backend/access/transam/xlog.c
15. src/include/access/xlog.h

祝大家玩得开心，欢迎随时来 阿里云促膝长谈业务需求 ，恭候光临。

阿里云的小伙伴们加油，努力 做好内核与服务，打造最贴地气的云数据库 。

使用pg_resetxlog修复PostgreSQL控制文件的方法

小结

注意

参考

关系型数据库

热门文章

最新文章

相关产品

相关课程

相关电子书

推荐镜像