1、组复制是一种可用于实现容错系统的技术。复制组是一个通过消息传递相互交互的server集群。通信层提供了原子消息和完全有序信息交互等保障机制。MySQL组复制以这些功能和架构为基础,实现了基于复制协议的多主更新。复制组由多个server成员构成,并且组中的每个server成员可以独立地执行事务。但所有读写事务只有在冲突检测成功后才会提交。只读事务不需要在冲突检测,可以立即提交。换句话说,对于任何读写事务,提交操作并不是由始发server单向决定的,而是由组来决定是否提交。准确地说,在始发server上,当事务准备好提交时,该server会广播写入值(已改变的行)和对应的写入集(已更新的行的唯一标识符)。然后会为该事务建立一个全局的顺序。最终,这意味着所有server成员以相同的顺序接收同一组事务。因此,所有server成员以相同的顺序应用相同的更改,以确保组内一致。在不同server上并发执行的事务可能存在冲突。根据组复制的冲突检测机制,对两个不同的并发事务的写集合进行检测。如在不同的server成员执行两个更新同一行的并发事务,则会出现冲突。排在最前面的事务可以在所有server成员上提交,第二个事务在源server上回滚,并在组中的其他server上删除。 这就是分布式的先提交当选规则。
组复制技术的核心是 Paxos 算法实现的,是组复制中保证数据一致性复制的关键, 它充当了组通信系统的引擎。该协议保障了故障检测机制,组成员服务的安全和消息的完全有序传递。
2、部署MySQL GR(group replication)
2.1、环境:
MySQL:5.7.21
Host:IP
gr1:10.10.10.11,gr2:10.10.10.12,gr3:10.10.10.13
Port:3306
2.2、安装依赖包
yum install -y gcc gcc-c++ libaio-devel boost-devel autoconf automake zlib-devel libxml2-devel ncurses-devel libgcrypt-devel libtool-devel openssl-devel bison-devel unzip numactl-devel numactl
2.3、安装MySQL(略)
2.4、配置MySQL参数
[mysql]
prompt = [\\u@\\h][\\d]>\\_
[client]
user = root
password = 111
[mysqld]
# basic settings #
user = mysql
sql_mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZERO_IN_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER"
autocommit = 1
character_set_server = utf8mb4
transaction_isolation = READ-COMMITTED
explicit_defaults_for_timestamp = 1
max_allowed_packet = 16777216
event_scheduler = 1
basedir = /usr/local/mysql
datadir = /data
auto_increment_increment = 1
auto_increment_offset = 1
lower_case_table_names = 1
# connection #
interactive_timeout = 1800
wait_timeout = 1800
lock_wait_timeout = 1800
skip_name_resolve = 1
max_connections = 512
max_connect_errors = 1000000
# session memory setting #
read_buffer_size = 16777216
read_rnd_buffer_size = 33554432
sort_buffer_size = 33554432
tmp_table_size = 67108864
join_buffer_size = 134217728
# log settings #
log_error = error.log
slow_query_log = 1
slow_query_log_file = slow.log
log_queries_not_using_indexes = 1
log_slow_admin_statements = 1
log_slow_slave_statements = 1
log_throttle_queries_not_using_indexes = 10
expire_logs_days = 90
long_query_time = 2
min_examined_row_limit = 100
binlog-rows-query-log-events = 1
log-bin-trust-function-creators = 1
expire-logs-days = 90
log-slave-updates = 1
# innodb settings #
innodb_page_size = 8192
innodb_buffer_pool_size = 512M
innodb_buffer_pool_instances = 16
innodb_buffer_pool_load_at_startup = 1
innodb_buffer_pool_dump_at_shutdown = 1
innodb_lru_scan_depth = 4096
innodb_lock_wait_timeout = 5
innodb_io_capacity = 10000
innodb_io_capacity_max = 20000
innodb_flush_method = O_DIRECT
innodb_file_format = Barracuda
innodb_file_format_max = Barracuda
innodb_undo_logs = 128
innodb_undo_tablespaces = 3
innodb_flush_neighbors = 0
innodb_log_file_size = 17179869184
innodb_log_files_in_group = 2
innodb_log_buffer_size = 16777216
innodb_purge_threads = 4
innodb_large_prefix = 1
innodb_thread_concurrency = 64
innodb_print_all_deadlocks = 1
innodb_strict_mode = 1
innodb_sort_buffer_size = 67108864
innodb_write_io_threads = 16
innodb_read_io_threads = 16
innodb_file_per_table = 1
innodb_stats_persistent_sample_pages = 64
innodb_autoinc_lock_mode = 2
# replication setting #
master_info_repository = TABLE
relay_log_info_repository = TABLE
sync_binlog = 1
gtid_mode = on
enforce_gtid_consistency = 1
log_slave_updates
binlog_format = ROW
binlog_rows_query_log_events = 1
relay_log = relay.log
relay_log_recovery = 1
binlog_gtid_simple_recovery = 1
slave_skip_errors = ddl_exist_errors
slave-rows-search-algorithms = 'INDEX_SCAN,HASH_SCAN'
# group replication #
transaction_write_set_extraction =XXHASH64
loose-group_replication_group_name = "5e9994a1-84bf-499c-a1cf-4d5f900e793f"
loose-group_replication_start_on_boot = off
loose-group_replication_local_address = "10.10.10.11:53306"
loose-group_replication_group_seeds ="10.10.10.11:53306,10.10.10.12:53306,10.10.10.13:53306"
loose-group_replication_bootstrap_group =off
[mysqld-5.7]
# new innodb setting #
loose_innodb_numa_interleave = 1
innodb_buffer_pool_dump_pct = 40
innodb_page_cleaners = 16
innodb_undo_log_truncate = 1
innodb_max_undo_log_size = 2G
innodb_purge_rseg_truncate_frequency = 128
# new replication setting #
slave-parallel-type = LOGICAL_CLOCK
slave-parallel-workers = 16
slave_preserve_commit_order=1
slave_transaction_retries=128
# other change setting #
binlog_gtid_simple_recovery = 1
log_timestamps = system
show_compatibility_56 = on
说明:
group_replication变量使用的loose-前缀是指示Server启用时尚未加载复制插件也将继续启动
transaction_write_set_extraction = XXHASH64指示Server必须为每个事务收集写集合,并使用XXHASH64哈希算法将其编码为散列
loose-group_replication_group_name="5e9994a1-84bf-499c-a1cf-4d5f900e793f"表示将加入或者创建的复制组命名为5e9994a1-84bf-499c-a1cf-4d5f900e793f
loose-group_replication_start_on_boot=off 设置为Server启动时不自动启动组复制
loose-group_replication_local_address="10.10.10.11:53306" 绑定本地的10.10.10.11及53306端口接受其他组成员的连接,IP地址必须为其他组成员可正常访问
loose-group_replication_group_seeds="10.10.10.11:53306,10.10.10.12:53306,10.10.10.13:53306"本行为告诉服务器当服务器加入组时,应当连接到10.10.10.11:53306,10.10.10.12:53306,10.10.10.13:53306这些种子服务器进行配置。
loose-group_replication_bootstrap_group = off 配置是否自动引导组
2.5、初始化MySQL
mysqld --initialize-insecure --basedir=/usr/local/mysql --datadir=/data
2.6、设置MySQL密码,并启动
mysqld_safe --user=mysql --skip-grant-tables &
update mysql.user set authentication_string=password('111') where user='root';
FLUSH PRIVILEGES;
mysqld --defaults-file=/etc/my.cnf &
2.7、创建复制用户及密码
set sql_log_bin=0;
grant replication slave,replication client on *.* to 'repl_user'@'%' identified by '111';
flush privileges;
set sql_log_bin=1;
2.8、设置复制用户名及密码
change master to master_user='repl_user',master_password='111' for channel 'group_replication_recovery’;
2.9、安装组复制插件
install plugin group_replication soname 'group_replication.so';
#检查插件是否安装成功
show plugins;
2.10、gr1启动组复制
set global group_replication_bootstrap_group=ON;
START group_replication;
set global group_replication_bootstrap_group=OFF;
2.11、gr2和gr3节点
START group_replication;
3、检查GR状态
select * from performance_schema.replication_group_members;
select * from performance_schema.global_status where VARIABLE_NAME='group_replication_primary_member';
select * from performance_schema.replication_connection_status;
select * from performance_schema.replication_applier_status;
4、部署过程中遇到的报错
错误一:
2018-03-15T06:45:02.805490+08:00 0 [ERROR] mysqld: slave_preserve_commit_order is not supported unless both log_bin and log_slave_updates are enabled.
2018-03-15T06:45:02.805523+08:00 0 [Note] Some of the channels are not created/initialized properly. Check for additional messages above. You will not be able to start replication on those channels until the issue is resolved and the server restarted.
2018-03-15T06:45:02.810826+08:00 0 [Note] Event Scheduler: Loaded 0 events
2018-03-15T06:45:02.811042+08:00 0 [Note] mysqld: ready for connections.
Version: '5.7.21-log' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL)
2018-03-15T06:45:02.811081+08:00 1 [Note] Event Scheduler: scheduler thread started with id 1
2018-03-15T06:46:40.451807+08:00 3 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2018-03-15T06:47:48.488841+08:00 3 [ERROR] Plugin group_replication reported: 'Binlog must be enabled for Group Replication'
2018-03-15T06:50:22.444771+08:00 3 [ERROR] Plugin group_replication reported: 'Binlog must be enabled for Group Replication'
解决一:
开启binlog,在参数中添加
log-bin=binlog
错误二:
2018-03-15T06:56:29.282975+08:00 0 [Note] Relay log recovery skipped for group replication channel.
2018-03-15T06:56:29.283010+08:00 0 [Warning] Recovery from master pos 4 and file for channel 'group_replication_recovery'. Previous relay log pos and relay log file had been set to 4, ./relay-group_replication_recovery.000001 respectively.
2018-03-15T06:56:29.294715+08:00 0 [Note] Event Scheduler: Loaded 0 events
2018-03-15T06:56:29.294977+08:00 0 [Note] mysqld: ready for connections.
Version: '5.7.21-log' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL)
2018-03-15T06:56:29.295029+08:00 1 [Note] Event Scheduler: scheduler thread started with id 1
2018-03-15T06:56:54.284784+08:00 3 [ERROR] Plugin group_replication reported: 'binlog_checksum should be NONE for Group Replication'
2018-03-15T07:03:49.230847+08:00 3 [ERROR] Plugin group_replication reported: 'binlog_checksum should be NONE for Group Replication'
解决二:
关闭binlog日志的校验
binlog_checksum = NONE
错误三:
2018-03-15T07:30:19.993097+08:00 0 [Note] Plugin group_replication reported: 'new state x_start'
2018-03-15T07:30:19.998411+08:00 0 [Warning] Plugin group_replication reported: 'read failed'
2018-03-15T07:30:20.007051+08:00 0 [ERROR] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 53306'
2018-03-15T07:31:15.950421+08:00 3 [ERROR] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2018-03-15T07:31:15.950516+08:00 3 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
2018-03-15T07:31:15.950557+08:00 3 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'
2018-03-15T07:31:15.950733+08:00 3 [Note] Plugin group_replication reported: 'auto_increment_increment is reset to 1'
2018-03-15T07:31:15.950763+08:00 3 [Note] Plugin group_replication reported: 'auto_increment_offset is reset to 1'
2018-03-15T07:31:15.950947+08:00 8 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed
2018-03-15T07:31:15.957958+08:00 5 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
解决三:
多个节点的UUID设置成一样
loose-group_replication_group_name = "8673f3d4-3410-4959-91c8-834362fe41a2"
错误四:
2018-03-15T07:45:15.633453+08:00 0 [Note] Plugin group_replication reported: 'connecting to 10.10.10.21 53306'
2018-03-15T07:45:16.636380+08:00 0 [Note] Plugin group_replication reported: 'Getting the peer name failed while connecting to server 10.10.10.21 with error 113 -No route to host.'
2018-03-15T07:45:16.636505+08:00 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to 10.10.10.21:53306 on local port: 53306.'
2018-03-15T07:45:16.636547+08:00 0 [Note] Plugin group_replication reported: 'connecting to 10.10.10.23 53306'
2018-03-15T07:45:16.637354+08:00 0 [Note] Plugin group_replication reported: 'Getting the peer name failed while connecting to server 10.10.10.23 with error 113 -No route to host.'
2018-03-15T07:45:16.637438+08:00 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to 10.10.10.23:53306 on local port: 53306.'
2018-03-15T07:45:16.637496+08:00 0 [Note] Plugin group_replication reported: 'connecting to 10.10.10.21 53306'
2018-03-15T07:45:17.639353+08:00 0 [Note] Plugin group_replication reported: 'Getting the peer name failed while connecting to server 10.10.10.21 with error 113 -No route to host.'
2018-03-15T07:45:17.639478+08:00 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to 10.10.10.21:53306 on local port: 53306.'
2018-03-15T07:45:17.639519+08:00 0 [Note] Plugin group_replication reported: 'connecting to 10.10.10.23 53306'
2018-03-15T07:45:17.640317+08:00 0 [Note] Plugin group_replication reported: 'Getting the peer name failed while connecting to server 10.10.10.23 with error 113 -No route to host.'
2018-03-15T07:45:17.640396+08:00 0 [ERROR] Plugin group_replication reported: '[GCS] Error on opening a connection to 10.10.10.23:53306 on local port: 53306.'
2018-03-15T07:45:17.640450+08:00 0 [ERROR] Plugin group_replication reported: '[GCS] Error connecting to all peers. Member join failed. Local port: 53306'
2018-03-15T07:45:17.640570+08:00 0 [Note] Plugin group_replication reported: 'state 4338 action xa_terminate'
2018-03-15T07:45:17.640602+08:00 0 [Note] Plugin group_replication reported: 'new state x_start'
2018-03-15T07:45:17.640613+08:00 0 [Note] Plugin group_replication reported: 'state 4338 action xa_exit'
2018-03-15T07:45:17.640724+08:00 0 [Note] Plugin group_replication reported: 'Exiting xcom thread'
2018-03-15T07:45:17.640740+08:00 0 [Note] Plugin group_replication reported: 'new state x_start'
2018-03-15T07:45:17.646843+08:00 0 [Warning] Plugin group_replication reported: 'read failed'
2018-03-15T07:45:17.662591+08:00 0 [ERROR] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 53306'
2018-03-15T07:46:13.601337+08:00 3 [ERROR] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2018-03-15T07:46:13.601458+08:00 3 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
2018-03-15T07:46:13.601507+08:00 3 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.
解决四:
关闭iptables和防火墙
/etc/init.d/iptables stop
chkconfig iptables off