案例解析二、
PostgreSQL HOT STANDBY by stream replication 测试:
一、准备硬件
1. 主节点硬件配置
DISK : 146GB*6
MEM : 14GB
CPU : 2.83GHz*8
2. standby节点硬件配置
DISK : 146GB*4
MEM : 8GB
CPU : 2.0GHz*8
二、准备环境
1. 系统
Red Hat Enterprise Linux Server release 5.5 (Tikanga) x64
2. 时钟同步
8 * * * * /usr/sbin/ntpdate asia.pool.ntp.org && /sbin/hwclock --systohc
3. 配置目录
master节点:
slave节点:
vi /etc/fstab
4. 在主节点配置nfs,将wal归档目录export出去,
(这里没有使用中央日志服务器,有条件的话还是需要一个比较大的日志服务器为好,以便支持更多的slave节点)
/database/pgdata/pg_arch 172.16.3.39/32(rw,no_root_squash,sync)
slave节点mount这个目录.
确保master节点和slave节点的postgres用户gid uid相同,否则可能有权限的问题.
5. 配置内核参数等
6. 配置系统服务
7. 配置防火墙
vi /etc/sysconfig/iptables
PostgreSQL HOT STANDBY by stream replication 测试:
一、准备硬件
1. 主节点硬件配置
DISK : 146GB*6
MEM : 14GB
CPU : 2.83GHz*8
2. standby节点硬件配置
DISK : 146GB*4
MEM : 8GB
CPU : 2.0GHz*8
二、准备环境
1. 系统
Red Hat Enterprise Linux Server release 5.5 (Tikanga) x64
2. 时钟同步
8 * * * * /usr/sbin/ntpdate asia.pool.ntp.org && /sbin/hwclock --systohc
3. 配置目录
mkdir -p /database/pgdata/tbs1
mkdir -p /database/pgdata/tbs2
mkdir -p /database/pgdata/tbs3
mkdir -p /database/pgdata/tbs4
mkdir -p /database/pgdata/tbs5
fdisk
mkfs.ext3
mount /dev/cciss/c0d1p1 /database/pgdata/tbs1
mount /dev/cciss/c0d2p1 /database/pgdata/tbs2
mount /dev/cciss/c0d3p1 /database/pgdata/tbs3
mount /dev/cciss/c0d4p1 /database/pgdata/tbs4
mount /dev/cciss/c0d5p1 /database/pgdata/tbs5
master节点:
[root@db-172-16-3-33 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/cciss/c0d0p1 31G 8.1G 21G 29% /
/dev/cciss/c0d0p3 88G 1.7G 81G 3% /opt
tmpfs 6.9G 0 6.9G 0% /dev/shm
/dev/cciss/c0d1p1 135G 76M 128G 1% /database/pgdata/tbs1
/dev/cciss/c0d2p1 135G 6.1G 122G 5% /database/pgdata/tbs2
/dev/cciss/c0d3p1 135G 3.3G 125G 3% /database/pgdata/tbs3
/dev/cciss/c0d4p1 135G 5.6G 123G 5% /database/pgdata/tbs4
/dev/cciss/c0d5p1 135G 16G 113G 13% /database/pgdata/tbs5
slave节点:
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 31G 3.5G 26G 13% /
/dev/sda3 94G 386M 89G 1% /opt
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sdb1 134G 76M 128G 1% /database/pgdata/tbs1
/dev/sdc1 134G 188M 127G 1% /database/pgdata/tbs2
/dev/sdd1 134G 2.9G 125G 3% /database/pgdata/tbs3
172.16.3.33:/database/pgdata/pg_arch
135G 5.6G 123G 5% /database/pgdata/pg_arch
vi /etc/fstab
4. 在主节点配置nfs,将wal归档目录export出去,
(这里没有使用中央日志服务器,有条件的话还是需要一个比较大的日志服务器为好,以便支持更多的slave节点)
/database/pgdata/pg_arch 172.16.3.39/32(rw,no_root_squash,sync)
slave节点mount这个目录.
确保master节点和slave节点的postgres用户gid uid相同,否则可能有权限的问题.
5. 配置内核参数等
kernel.shmmni = 4096
kernel.sem = 501000 6412800000 501000 12800
fs.file-max = 767246
net.ipv4.ip_local_port_range = 1024 65000
net.core.rmem_default = 1048576
net.core.rmem_max = 1048576
net.core.wmem_default = 262144
net.core.wmem_max = 262144
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_max_syn_backlog=4096
net.core.netdev_max_backlog=10000
vm.overcommit_memory=0
net.ipv4.ip_conntrack_max=655360
* soft nofile 131072
* hard nofile 131072
* soft nproc 131072
* hard nproc 131072
* soft core unlimited
* hard core unlimited
* soft memlock 50000000
* hard memlock 50000000
6. 配置系统服务
chkconfig --level 35 nfs on
chkconfig --level 35 portmap pn
7. 配置防火墙
vi /etc/sysconfig/iptables
允许master-slave相互访问nfs, PostgreSQL监听端口
8. 升级操作系统补丁,驱动等
三、安装PostgreSQL 9.0.2
1. postgres user profile:
2. 配置数据库相关目录
2.1 pghome
/opt/pgsql
2.2 pgdata
/database/pgdata/tbs1/pg_root
2.3 pgarchive
/database/pgdata/pg_arch
3. 初始化数据库
initdb -D /database/pgdata/tbs1/pg_root -E UTF8 --locale=C -U postgres -X /database/pgdata/tbs2/pg_xlog -W
四、配置master节点
1. 新建slave用于连接master的数据库超级用户(在v8的版本中建议为每个stream standby新建一个超级用户,在v9中使用一个用户的情况下建议不同的stream standby配置不同的application_name参数值来区分不同的stream standby.)
为了安全的考虑,配置pg_hba.conf,数据库复制的超级用户只允许从使用该用户的stream standby的主机连过来.
连接限制:一个stream standby数据库至少需要一个连接,因为连接有hang住的可能,建议不要配太少了.
create role repuser1 SUPERUSER LOGIN CONNECTION LIMIT 20 ENCRYPTED PASSWORD 'repuser1REPUSER1';
2. pg_hba.conf
3. postgresql.conf
4. 启动主节点.
五、传输基础文件至slave节点,模拟一个正在运行的数据库生成复制库的操作.
1. on the master
select pg_start_backup('replication backup');
2. on the master
scp $PGDATA $SLAVE_IP:$PGDATA
3. on the master
select pg_stop_backup();
六、配置slave节点
1. on the slave
如果有非默认表空间,需要手工处理pg_tblspc
2. 配置pg_hba.conf
允许需要访问的客户端,
3. 配置postgresql.conf
3. 配置recovery.conf
4. 配置~/.pgpass 或$PGPASSFILE (PGPASSFILE specifies the name of the password file to use for lookups. If not set, it defaults to ~/.pgpass)
格式:hostname:port:database:username:password
172.16.3.33:1921:replication:repuser1:repuser1REPUSER1
chmod 400 .pgpass
5. 启动slave节点
启动完后,可以通过top看到slave节点在拼命的恢复pg_start_backup以来的所有wal.
查看slave节点日志:
查看主节点进程:
postgres: wal sender process repuser1 172.16.3.39(18716) startup
恢复完后修改hot_standby = on,重启slave节点
七、测试
1. (on master)新建用户
2. 新建表空间
3. (on master)新建数据库
4. (on master)新建schema
5. (on master)新建表
6. (on master)插入测试数据
测试脚本,开50个进程后台插入.
查看主节点,仅仅有一个sender进程在发送数据,一个standby对应一个sender
查看standby节点,
很快,standby和master最终数据一致.
7. (on master)使用DDL测试冲突
on slave 在slave恢复期间,在slave节点执行 select count(*) from tbl_users;发生等待事件.
on slave $PGDATA/pg_log中查看最近一个日志文件,
主节点执行完后,slave节点的waiting很快消失,不会像log shipping模式可能出现继续等待含有SQL结束的WAL的情况.
8. (on master)测试checkpoint
在PostgreSQL中发生checkpoint后,在此之前的WAL在做数据库恢复时就用不到了,因为确保数据都写入数据文件了.
pg_archivecleanup也是根据checkpoint来判断和删除不需要的WAL的.
9. (on slave)测试cleanarchive
在做checkpoint前,去看$PGARCHIVE目录,已经被apply的文件还存在,并没有被pg_archivecleanup命令清除掉,原因就是这些文件是最近一次checkpoint以来的WAL文件,在数据库恢复时是需要用到的.
如果你手工执行pg_archivecleanup $PGARCHIVE 000000010000000200000031 (假设000000010000000200000031这个是在$PGARCHIVE中的一个WAL的文件名)
这条命令将删除000000010000000200000031以前生成的所有WAL文件,一定要小心操作,万一不小心把最近一次CHECKPOINT以来的WAL删除了,
补救的方法是赶紧到master上做一次checkpoint,让slave知道这次checkpoint,否则的话下次slave启动还会读到000000010000000200000031这个文件以前的文件,那时候就只能找到这些文件或重建slave了.
10. (on slave)测试active slave
激活SLAVE很简单,了解到已经apply了最新的WAL后,执行以下
su - postgres
touch /database/pgdata/tbs1/pg_root/postgresql.trigger.1921
数据库会触发激活的动作,激活后/database/pgdata/tbs1/pg_root/postgresql.trigger.1921这个文件会自动删掉,并且recovery.conf被重命名为recovery.done.
激活后的slave不可逆转为slave了.需要重建.
11. (on slave)测试write操作
postgres=# create table tbl_test (id int);
ERROR: cannot execute CREATE TABLE in a read-only transaction
12. 监控
13. 优化
1. Both the WALSender and WALReceiver will work continuously on any outstanding data to be
replicated until the queue is empty. If there is a quiet period, then the WALReceiver will sleep
for 100ms at a time, and the WALSender will sleep for wal_sender_delay. Typically, the
value of wal_sender_delay need not be altered, because it only affects behavior during
momentary quiet periods. The default value is a good balance between effciency and data
protection. If the Master and Standby are connected by a low bandwidth network, and the
write rate on the Master is high, you may wish to lower this value to perhaps 20ms or 50ms.
Reducing this value will reduce the amount of data loss if the Master becomes permanently
unavailable, though will also marginally increase the cost of streaming the transaction log
data to the Standbys.
2. If the connection drops between Master and Standby, it will take some time for that to be
noticed across an indirect network. To ensure that a dropped connection is noticed as soon
as possible, you may wish to adjust the keepalive settings.
If you want a Standby to notice that the connection to the Master has dropped, you need
to set the keepalives in the primary_conninfo in the recovery.conf on the Standby
as follows:
primary_conninfo = '….keepalives_idle= 60 …'
If you want the Master to notice that a streaming Standby connection has dropped, you can
set the keepalive parameters in postgresql.conf on the Master, such as:
tcp_keepalives_idle = 60 # time before we send keepalives
That setting will then apply to all connections from users and replication. If you want to be very
specifc, and just set that for replication, you must supply this as an option to be passed to the
Master, which is specifed like the following:
primary_conninfo = '….options="-c tcp_keepalives_idle= 60" …'
All of the preceding examples set the length of time the connection will be idle before we start
sending keepalives to be 60 seconds. The default is two hours, and is not recommended.
There are multiple keepalive parameters we can set; I have avoided showing those here
for clarity. A related option is connection_timeout. Remember, you can hide all of this
complexity in a connection service fle, so that primary_conninfo only refers to a single
service name, as described in the First Steps chapter.
3. One thing that is a possibility is to set archive_command only until the end of the catch
up period. After that you can reset it to the dummy value ("cd") and then continue just with
streaming replication. Data is only transferred from the Master to the Standby once that data
has been written (or more precisely, fsynced) to disk. So setting synchronous_commit =
off will not improve the replication delay, even if that improves performance on the Master.
Once WAL data is received by the Standby, the WAL data is fsynced to disk on the Standby to
ensure that it is not lost if the Standby system restarts.
4. For streaming replication, the Master keeps a number of fles that is at least wal_keep_
segments. If the Standby database server has been down for long enough, the Master will have
moved on and will no longer have the data for the last point of transfer. If that should occur, then
the Standby needs to be re-confgured using the same procedure with which we started.
5. You may also wish to increase max_wal_senders, so that it will be possible to reconnect
even before a dropped connection is noted; this allows a manual restart to re-establish
connections more easily. If you do this, then also increase the connection limit for the
replication user.
Data transfer may stop because the connection drops or the Standby server or the Standby
system is shutdown. If replication data transfer stops for any reason, it will attempt to restart
from the point of last transfer.
14. 注意事项
1. 清除归档时需要考虑到master-slave是一对多的情况,使用一对多的PGARCHIVE或者是全局的pg_archivecleanup
八、附pgctl.sh脚本
8. 升级操作系统补丁,驱动等
三、安装PostgreSQL 9.0.2
1. postgres user profile:
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=1921
export PGDATA=/database/pgdata/tbs1/pg_root
export PGARCHIVE=/database/pgdata/pg_arch
export LANG=en_US.utf8
export PGHOME=/opt/pgsql
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
alias rm='rm -i'
alias ll='ls -lh'
2. 配置数据库相关目录
2.1 pghome
/opt/pgsql
2.2 pgdata
/database/pgdata/tbs1/pg_root
2.3 pgarchive
/database/pgdata/pg_arch
3. 初始化数据库
initdb -D /database/pgdata/tbs1/pg_root -E UTF8 --locale=C -U postgres -X /database/pgdata/tbs2/pg_xlog -W
四、配置master节点
1. 新建slave用于连接master的数据库超级用户(在v8的版本中建议为每个stream standby新建一个超级用户,在v9中使用一个用户的情况下建议不同的stream standby配置不同的application_name参数值来区分不同的stream standby.)
为了安全的考虑,配置pg_hba.conf,数据库复制的超级用户只允许从使用该用户的stream standby的主机连过来.
连接限制:一个stream standby数据库至少需要一个连接,因为连接有hang住的可能,建议不要配太少了.
create role repuser1 SUPERUSER LOGIN CONNECTION LIMIT 20 ENCRYPTED PASSWORD 'repuser1REPUSER1';
2. pg_hba.conf
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
# host all all ::1/128 trust
host replication repuser1 172.16.3.39/32 md5
3. postgresql.conf
listen_addresses = '*' # what IP address(es) to listen on;
port = 1921 # (change requires restart)
max_connections = 2000 # (change requires restart)
unix_socket_directory = '/database/pgdata/tbs1/pg_root' # (change requires restart)
unix_socket_permissions = 0700 # begin with 0 to use octal notation
password_encryption = on
shared_buffers = 2048MB # min 128kB
maintenance_work_mem = 2048MB # min 1MB
max_stack_depth = 8MB # min 100kB
wal_level = hot_standby # minimal, archive, or hot_standby
synchronous_commit = off # immediate fsync at commit
wal_sync_method = fdatasync # the default is the first option
wal_buffers = 128000kB # min 32kB
wal_writer_delay = 20ms # 1-10000 milliseconds
checkpoint_segments = 64 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 30min # range 30s-1h
archive_mode = on # allows archiving to be done
archive_command = 'cp %p $PGARCHIVE/%f' # command to use to archive a logfile segment
max_wal_senders = 30 # max number of walsender processes
wal_keep_segments = 1000 # in logfile segments, 16MB each; 0 disables;因为本例使用的logfile segment单个=64M,所以这里需要至少64*1000=64GB的pg_xlog目录空间,否则可能空间溢出.
random_page_cost = 2.0 # same scale as above
effective_cache_size = 12800MB
constraint_exclusion = partition # on, off, or partition
log_destination = 'csvlog' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_connections = on # 调试阶段建议主库和standby都打开log_connections,调试完后建议关闭.
log_directory = 'pg_log' # directory where log files are written,
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
log_truncate_on_rotation = on # If on, an existing log file of the
log_rotation_age = 1d # Automatic rotation of logfiles will
log_rotation_size = 10MB # Automatic rotation of logfiles will
log_min_duration_statement = 1000ms # -1 is disabled, 0 logs all statements
log_checkpoints = on
log_lock_waits = on # log lock waits >= deadlock_timeout
log_statement = 'ddl' # none, ddl, mod, all
track_activity_query_size = 2048 # (change requires restart)
autovacuum = on # Enable autovacuum subprocess? 'on'
log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and
check_function_bodies = on
bytea_output = 'escape' # hex, escape
datestyle = 'iso, mdy'
lc_messages = 'C' # locale for system error message
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
deadlock_timeout = 1s
tcp_keepalives_idle = 60
4. 启动主节点.
五、传输基础文件至slave节点,模拟一个正在运行的数据库生成复制库的操作.
1. on the master
select pg_start_backup('replication backup');
2. on the master
scp $PGDATA $SLAVE_IP:$PGDATA
3. on the master
select pg_stop_backup();
六、配置slave节点
1. on the slave
chown -R postgres:postgres $PGDATA
su - postgres
cd $PGDATA
rm postmaster.pid
rm .s.PGSQL.1921.lock
rm -rf pg_xlog
ln -s /database/pgdata/tbs2/pg_xlog ./pg_xlog
如果有非默认表空间,需要手工处理pg_tblspc
2. 配置pg_hba.conf
允许需要访问的客户端,
3. 配置postgresql.conf
listen_addresses = '*' # what IP address(es) to listen on;
port = 1921 # (change requires restart)
max_connections = 2000 # (change requires restart)
unix_socket_directory = '/database/pgdata/tbs1/pg_root' # (change requires restart)
unix_socket_permissions = 0700 # begin with 0 to use octal notation
password_encryption = on
shared_buffers = 2048MB # min 128kB
maintenance_work_mem = 2048MB # min 1MB
max_stack_depth = 8MB # min 100kB
wal_level = hot_standby # minimal, archive, or hot_standby
synchronous_commit = off # immediate fsync at commit
wal_sync_method = fdatasync # the default is the first option
wal_buffers = 128000kB # min 32kB
wal_writer_delay = 20ms # 1-10000 milliseconds
checkpoint_segments = 64 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 30min # range 30s-1h
archive_mode = on # allows archiving to be done
archive_command = 'cp %p $PGARCHIVE/%f' # command to use to archive a logfile segment
max_wal_senders = 30 # max number of walsender processes
wal_keep_segments = 1000 # in logfile segments, 16MB each; 0 disables;因为本例使用的logfile segment单个=64M,所以这里需要至少64*1000=64GB的pg_xlog目录空间,否则可能空间溢出.
random_page_cost = 2.0 # same scale as above
effective_cache_size = 12800MB
constraint_exclusion = partition # on, off, or partition
log_destination = 'csvlog' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_connections = on # 调试阶段建议主库和standby都打开log_connections,调试完后建议关闭.
log_directory = 'pg_log' # directory where log files are written,
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
log_truncate_on_rotation = on # If on, an existing log file of the
log_rotation_age = 1d # Automatic rotation of logfiles will
log_rotation_size = 10MB # Automatic rotation of logfiles will
log_min_duration_statement = 1000ms # -1 is disabled, 0 logs all statements
log_checkpoints = on
log_lock_waits = on # log lock waits >= deadlock_timeout
log_statement = 'ddl' # none, ddl, mod, all
track_activity_query_size = 2048 # (change requires restart)
autovacuum = on # Enable autovacuum subprocess? 'on'
log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and
check_function_bodies = on
bytea_output = 'escape' # hex, escape
datestyle = 'iso, mdy'
lc_messages = 'C' # locale for system error message
lc_monetary = 'C' # locale for monetary formatting
lc_numeric = 'C' # locale for number formatting
lc_time = 'C' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
deadlock_timeout = 1s
hot_standby = off # 这个参数在初始化slave的时候关闭是比较明智的选择,在初始同步完成后在开启
tcp_keepalives_idle = 60 #
3. 配置recovery.conf
cp $PGHOME/share/recovery.conf.sample $PGDATA/
cd $PGDATA
mv recovery.conf.sample recovery.conf
vi recovery.conf
restore_command = 'cp $PGARCHIVE/%f %p' # e.g. 'cp /mnt/server/archivedir/%f %p'
archive_cleanup_command = 'pg_archivecleanup $PGARCHIVE %r'
standby_mode = 'on'
trigger_file = '/database/pgdata/tbs1/pg_root/postgresql.trigger.1921'
primary_conninfo = 'host=172.16.3.33 port=1921 user=repuser1 keepalives_idle=60' # application_name也可以加在这里
4. 配置~/.pgpass 或$PGPASSFILE (PGPASSFILE specifies the name of the password file to use for lookups. If not set, it defaults to ~/.pgpass)
格式:hostname:port:database:username:password
172.16.3.33:1921:replication:repuser1:repuser1REPUSER1
chmod 400 .pgpass
5. 启动slave节点
启动完后,可以通过top看到slave节点在拼命的恢复pg_start_backup以来的所有wal.
查看slave节点日志:
2011-01-04 14:51:51.363 CST,,,25950,,4d22c387.655e,1,,2011-01-04 14:51:51 CST,,0,LOG,00000,"database system was shut down in recovery at 2011-01-04 14:51:26 CST",,,,,,,,,""
2011-01-04 14:51:51.363 CST,,,25950,,4d22c387.655e,2,,2011-01-04 14:51:51 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""
2011-01-04 14:51:51.442 CST,,,25950,,4d22c387.655e,3,,2011-01-04 14:51:51 CST,,0,LOG,00000,"restored log file ""00000001000000020000003E"" from archive",,,,,,,,,""
2011-01-04 14:51:51.443 CST,,,25950,,4d22c387.655e,4,,2011-01-04 14:51:51 CST,,0,LOG,00000,"redo starts at 2/F8000020",,,,,,,,,""
2011-01-04 14:51:51.443 CST,,,25950,,4d22c387.655e,5,,2011-01-04 14:51:51 CST,,0,LOG,00000,"consistent recovery state reached at 2/FC000000",,,,,,,,,""
2011-01-04 14:51:51.480 CST,,,25954,,4d22c387.6562,1,,2011-01-04 14:51:51 CST,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,,""
查看主节点进程:
postgres: wal sender process repuser1 172.16.3.39(18716) startup
恢复完后修改hot_standby = on,重启slave节点
七、测试
1. (on master)新建用户
create role digoal nosuperuser login encrypted password 'digoal';
(on slave)查看,比log shipping模式快很多,在master建立好用户后slave马上就可以看到已经复制过来了.
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+-----------------------------------+-----------
digoal | | {}
postgres | Superuser, Create role, Create DB | {}
repuser1 | Superuser +| {}
| 20 connections |
2. 新建表空间
on master
su - postgres
mkdir /database/pgdata/tbs3/tbs_digoal
on slave
su - postgres
mkdir /database/pgdata/tbs3/tbs_digoal
on master
create tablespace tbs_digoal owner digoal location '/database/pgdata/tbs3/tbs_digoal';
on slave (查看)
postgres=# \db
List of tablespaces
Name | Owner | Location
------------+----------+----------------------------------
pg_default | postgres |
pg_global | postgres |
tbs_digoal | digoal | /database/pgdata/tbs3/tbs_digoal
(3 rows)
3. (on master)新建数据库
create database digoal with owner digoal template template0 encoding 'UTF8' tablespace tbs_digoal;
(on slave)查看
postgres=# \l
List of databases
Name | Owner | Encoding | Collation | Ctype | Access privileges
-----------+----------+----------+-----------+-------+-----------------------
digoal | digoal | UTF8 | C | C |
postgres | postgres | UTF8 | C | C |
template0 | postgres | UTF8 | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | C | C | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
4. (on master)新建schema
\c digoal digoal
create schema digoal authorization digoal;
(on slave)查看
postgres=# \c digoal digoal
You are now connected to database "digoal" as user "digoal".
digoal=> \dn
List of schemas
Name | Owner
--------------------+----------
digoal | digoal
information_schema | postgres
pg_catalog | postgres
pg_toast | postgres
pg_toast_temp_1 | postgres
public | postgres
(6 rows)
5. (on master)新建表
\c digoal digoal
create table tbl_users (id int8 , nick varchar(32));
(on slave)查看
digoal=> \c digoal digoal
You are now connected to database "digoal".
digoal=> \d tbl_users
Table "digoal.tbl_users"
Column | Type | Modifiers
--------+-----------------------+-----------
id | bigint |
nick | character varying(32) |
6. (on master)插入测试数据
测试脚本,开50个进程后台插入.
#!/bin/bash
for ((i=0;i<50;i++))
do
psql -h 127.0.0.1 digoal digoal -c "insert into tbl_users select generate_series(1,1000000),'digoal'" &
done
查看主节点,仅仅有一个sender进程在发送数据,一个standby对应一个sender
[root@db-172-16-3-33 ~]# ps -ewf|grep sender
postgres 20921 20622 0 14:56 ? 00:00:02 postgres: wal sender process repuser1 172.16.3.39(18716) streaming 3/70000000
[root@db-172-16-3-33 ~]# netstat -anp|grep 172.16.3.39
tcp 0 0 172.16.3.33:1921 172.16.3.39:18716 ESTABLISHED 20921/EF5BDEC8
查看standby节点,
top
26154 postgres 25 0 2389m 1.0g 1.0g R 100.2 13.3 0:51.64 postgres: startup process recovering 00000001000000030000001D
26159 postgres 15 0 2401m 6052 1652 S 12.9 0.1 0:06.28 postgres: wal receiver process streaming 3/80000000
[root@db-172-16-3-39 ~]# netstat -anp|grep 172.16.3.33
tcp 0 0 172.16.3.39:18716 172.16.3.33:1921 ESTABLISHED 26159/90940000
很快,standby和master最终数据一致.
digoal=> select count(*) from tbl_users ;
count
----------
50000000
(1 row)
7. (on master)使用DDL测试冲突
on master
alter table tbl_users add column first_name varchar(32) default 'zhou';
alter table tbl_users add column last_name varchar(32) default 'digoal';
on slave 在slave恢复期间,在slave节点执行 select count(*) from tbl_users;发生等待事件.
26232 postgres 18 0 2392m 2.0g 2.0g S 0.0 26.3 0:09.78 postgres: digoal digoal 127.0.0.1(39324) SELECT waiting
on slave $PGDATA/pg_log中查看最近一个日志文件,
2011-01-04 15:17:43.268 CST,"digoal","digoal",26232,"127.0.0.1:39324",4d22c709.6678,4,"SELECT waiting",2011-01-04 15:06:49 CST,2/19,0,LOG,00000,"process 26232 still waiting for AccessShareLock on relation 16404 of database 16402 after 1000.302 ms",,,,,,"select count(*) from tbl_users;",22,,"psql"
主节点执行完后,slave节点的waiting很快消失,不会像log shipping模式可能出现继续等待含有SQL结束的WAL的情况.
8. (on master)测试checkpoint
在PostgreSQL中发生checkpoint后,在此之前的WAL在做数据库恢复时就用不到了,因为确保数据都写入数据文件了.
pg_archivecleanup也是根据checkpoint来判断和删除不需要的WAL的.
9. (on slave)测试cleanarchive
在做checkpoint前,去看$PGARCHIVE目录,已经被apply的文件还存在,并没有被pg_archivecleanup命令清除掉,原因就是这些文件是最近一次checkpoint以来的WAL文件,在数据库恢复时是需要用到的.
如果你手工执行pg_archivecleanup $PGARCHIVE 000000010000000200000031 (假设000000010000000200000031这个是在$PGARCHIVE中的一个WAL的文件名)
这条命令将删除000000010000000200000031以前生成的所有WAL文件,一定要小心操作,万一不小心把最近一次CHECKPOINT以来的WAL删除了,
补救的方法是赶紧到master上做一次checkpoint,让slave知道这次checkpoint,否则的话下次slave启动还会读到000000010000000200000031这个文件以前的文件,那时候就只能找到这些文件或重建slave了.
10. (on slave)测试active slave
激活SLAVE很简单,了解到已经apply了最新的WAL后,执行以下
su - postgres
touch /database/pgdata/tbs1/pg_root/postgresql.trigger.1921
数据库会触发激活的动作,激活后/database/pgdata/tbs1/pg_root/postgresql.trigger.1921这个文件会自动删掉,并且recovery.conf被重命名为recovery.done.
激活后的slave不可逆转为slave了.需要重建.
11. (on slave)测试write操作
postgres=# create table tbl_test (id int);
ERROR: cannot execute CREATE TABLE in a read-only transaction
12. 监控
pg_current_xlog_insert_location
pg_current_xlog_location
pg_last_xlog_receive_location
pg_last_xlog_replay_location
top
CREATE OR REPLACE VIEW pg_stat_replication AS
SELECT
S.procpid,
S.usesysid,
U.rolname AS usename,
S.application_name,
S.client_addr,
S.client_port,
S.backend_start
FROM pg_stat_get_activity(NULL) AS S, pg_authid U
WHERE S.usesysid = U.oid AND S.datid = 0;
13. 优化
1. Both the WALSender and WALReceiver will work continuously on any outstanding data to be
replicated until the queue is empty. If there is a quiet period, then the WALReceiver will sleep
for 100ms at a time, and the WALSender will sleep for wal_sender_delay. Typically, the
value of wal_sender_delay need not be altered, because it only affects behavior during
momentary quiet periods. The default value is a good balance between effciency and data
protection. If the Master and Standby are connected by a low bandwidth network, and the
write rate on the Master is high, you may wish to lower this value to perhaps 20ms or 50ms.
Reducing this value will reduce the amount of data loss if the Master becomes permanently
unavailable, though will also marginally increase the cost of streaming the transaction log
data to the Standbys.
2. If the connection drops between Master and Standby, it will take some time for that to be
noticed across an indirect network. To ensure that a dropped connection is noticed as soon
as possible, you may wish to adjust the keepalive settings.
If you want a Standby to notice that the connection to the Master has dropped, you need
to set the keepalives in the primary_conninfo in the recovery.conf on the Standby
as follows:
primary_conninfo = '….keepalives_idle= 60 …'
If you want the Master to notice that a streaming Standby connection has dropped, you can
set the keepalive parameters in postgresql.conf on the Master, such as:
tcp_keepalives_idle = 60 # time before we send keepalives
That setting will then apply to all connections from users and replication. If you want to be very
specifc, and just set that for replication, you must supply this as an option to be passed to the
Master, which is specifed like the following:
primary_conninfo = '….options="-c tcp_keepalives_idle= 60" …'
All of the preceding examples set the length of time the connection will be idle before we start
sending keepalives to be 60 seconds. The default is two hours, and is not recommended.
There are multiple keepalive parameters we can set; I have avoided showing those here
for clarity. A related option is connection_timeout. Remember, you can hide all of this
complexity in a connection service fle, so that primary_conninfo only refers to a single
service name, as described in the First Steps chapter.
3. One thing that is a possibility is to set archive_command only until the end of the catch
up period. After that you can reset it to the dummy value ("cd") and then continue just with
streaming replication. Data is only transferred from the Master to the Standby once that data
has been written (or more precisely, fsynced) to disk. So setting synchronous_commit =
off will not improve the replication delay, even if that improves performance on the Master.
Once WAL data is received by the Standby, the WAL data is fsynced to disk on the Standby to
ensure that it is not lost if the Standby system restarts.
4. For streaming replication, the Master keeps a number of fles that is at least wal_keep_
segments. If the Standby database server has been down for long enough, the Master will have
moved on and will no longer have the data for the last point of transfer. If that should occur, then
the Standby needs to be re-confgured using the same procedure with which we started.
5. You may also wish to increase max_wal_senders, so that it will be possible to reconnect
even before a dropped connection is noted; this allows a manual restart to re-establish
connections more easily. If you do this, then also increase the connection limit for the
replication user.
Data transfer may stop because the connection drops or the Standby server or the Standby
system is shutdown. If replication data transfer stops for any reason, it will attempt to restart
from the point of last transfer.
14. 注意事项
1. 清除归档时需要考虑到master-slave是一对多的情况,使用一对多的PGARCHIVE或者是全局的pg_archivecleanup
八、附pgctl.sh脚本
#!/bin/bash
# environment.
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
export PGHOME=/opt/pgsql
export PATH=$PGHOME/bin:$PATH
export PGDATA=/database/pgdata/tbs1/pg_root
export PGPORT=1921
export LANG='en_US.utf8'
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
RETVAL=1
start() {
su - postgres -c "/usr/bin/nohup $PGHOME/bin/postgres -D $PGDATA -p $PGPORT >/dev/null 2>>$PGDATA/pg_log/start_err.log </dev/null &"
RETVAL=$?
return $RETVAL
}
stop() {
su - postgres -c "$PGHOME/bin/pg_ctl stop -D $PGDATA -m fast"
RETVAL=$?
return $RETVAL
}
reload() {
su - postgres -c "$PGHOME/bin/pg_ctl reload -D $PGDATA"
RETVAL=$?
return $RETVAL
}
# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
reload)
reload
;;
*)
echo $"Usage: $prog {start|stop|restart|reload}"
exit 2
esac
exit $RETVAL
# Auth Digoal.Zhou
# Corp. Sky-Mobi