
数据库技术爱好者,专注于MySQL领域的运维与运营,擅长性能调优,系统瓶颈分析,热爱数据领域的一切
环境 * OS CentOS release 6.6 (Final) Linux 2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux * disk 2*SAS raid1 + 6*800G ssd raid5 * MySQL MySQL 5.6.16 * memory 128G 症状 1) MySQL huang 住, 除了show variables 和 show processlist ,其他都做不了。 2)如果过一段时间不管他,会自己crash掉。然后就一直起不来了。 how to repeat M1 -> M2(log_slave_update) -> slave 当M2(log_slave_update) 开启同步后,过1~2小时,M2就会自己crash。如果想加速他的crash,可以写脚本对库里面的表转化表引擎。 * 做过的尝试和修改 1) 关闭AIO功能 , 但是结果还是一样,会huag住 2) 关闭自适应hash索引, 还是会报同样的错,然后hung住。 https://dev.mysql.com/doc/refman/5.6/en/innodb-adaptive-hash.html You can monitor the use of the adaptive hash index and the contention for its use in the SEMAPHORES section of the output of the SHOW ENGINE INNODB STATUS command. If you see many threads waiting on an RW-latch created in btr0sea.c, then it might be useful to disable adaptive hash indexing. MySQL 配置文件 [client] port = 3306 socket = /tmp/mysql.sock [mysqld] basedir = /usr/local/mysql datadir = /data/mysql_data port = 3306 socket = /tmp/mysql.sock init-connect='SET NAMES utf8' character-set-server = utf8 back_log = 500 max_connections = 3500 max_user_connections = 2000 max_connect_errors = 100000 max_allowed_packet = 16M binlog_cache_size = 1M max_heap_table_size = 64M sort_buffer_size = 8M join_buffer_size = 8M thread_cache_size = 100 thread_concurrency = 8 query_cache_type = 0 query_cache_size = 0 ft_min_word_len = 4 thread_stack = 192K tmp_table_size = 64M # *** Log related settings log-bin=/data/mysql.bin/xx binlog-format=ROW log-error=xx relay-log=xx-relay-bin slow_query_log = 1 slow-query-log-file = xx-slow.log long_query_time = 0.1 log_queries_not_using_indexes = 1 log_slow_admin_statements = 1 log_slow_slave_statements = 1 #log_throttle_queries_not_using_indexes = 10 min_examined_row_limit = 1000 # *** Replication related settings server-id = xx replicate-ignore-db=mysql replicate-wild-ignore-table=mysql.% replicate-ignore-db=test replicate-wild-ignore-table=test.% ##replicate_do_db=c2cdb ##replicate-wild-do-table= c2cdb.% skip-slave-start #read_only log_slave_updates #innodb_adaptive_hash_index=off #** Timeout options wait_timeout = 1800 interactive_timeout = 1800 skip-name-resolve skip-external-locking #skip-bdb #skip-innodb ##*** InnoDB Specific options default-storage-engine = InnoDB transaction_isolation = READ-COMMITTED innodb_file_format=barracuda innodb_file_format_max=Barracuda innodb_buffer_pool_size = 95G innodb_data_file_path = ibdata1:4G:autoextend innodb_strict_mode = 1 innodb_file_per_table = 1 innodb_write_io_threads=32 innodb_read_io_threads=32 innodb_thread_concurrency = 64 innodb_io_capacity=4000 innodb_io_capacity_max=8000 innodb_flush_log_at_trx_commit = 1 innodb_log_buffer_size = 32M innodb_log_file_size = 4G innodb_log_files_in_group = 2 innodb_adaptive_flushing = 1 innodb_lock_wait_timeout = 120 innodb_fast_shutdown = 0 ##innodb_status_file ##innodb_open_files ##innodb_table_locks ##5.6 new## sync_master_info = 10000 sync_relay_log = 10000 sync_relay_log_info = 10000 relay_log_info_repository = table master_info_repository = table sync_binlog = 1 #explicit_defaults_for_timestamp innodb_buffer_pool_instances = 8 sysdate-is-now performance_schema performance_schema_max_table_instances = 30000 sql_mode= innodb_flush_neighbors=1 innodb_flush_method=O_DIRECT innodb_old_blocks_time = 1000 innodb_stats_on_metadata = off innodb_online_alter_log_max_size = 256M innodb_stats_persistent = on innodb_stats_auto_recalc = on table_definition_cache=4096 table_open_cache = 4096 innodb_open_files=4096 [mysqldump] quick max_allowed_packet = 16M [mysql] default-character-set=utf8 prompt="\\u:\\d> " pager=more #tee="/tmp/query.log" no-auto-rehash [isamchk] key_buffer = 512M sort_buffer_size = 512M read_buffer = 8M write_buffer = 8M [myisamchk] key_buffer = 512M sort_buffer_size = 512M read_buffer = 8M write_buffer = 8M [mysqlhotcopy] interactive-timeout [mysqld_safe] open-files-limit = 65535 user = mysql #nice = -20 error log 第一个错误: ---------------------------- END OF INNODB MONITOR OUTPUT ============================ InnoDB: ###### Diagnostic info printed to the standard error stream InnoDB: Error: semaphore wait has lasted > 600 seconds InnoDB: We intentionally crash the server, because it appears to be hung. 2015-12-03 00:23:03 7fddc77c9700 InnoDB: Assertion failure in thread 140590511331072 in file srv0srv.cc line 1748 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 第二错误: 2015-11-30 18:08:42 17070 [Note] Check error log for additional messages. You will not be able to start replication until the issue is resolved and the server restarted. 2015-11-30 18:08:42 17070 [Note] Event Scheduler: Loaded 0 events 2015-11-30 18:08:42 17070 [Note] /usr/local/mysql/bin/mysqld: ready for connections. Version: '5.6.16-log' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL) 2015-11-30 18:11:36 17070 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='x', master_port= x, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='x', master_port= 3306, master_log_file='x', master_log_pos= x, master_bind=''. 2015-11-30 18:11:38 17070 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2015-11-30 18:11:38 17070 [Note] Slave SQL thread initialized, starting replication in log 'x.004942' at position 118512044, relay log './x-relay-bin.000001' position: 4 2015-11-30 18:11:39 17070 [Note] Slave I/O thread: connected to master 'repl@x:3306',replication started in log 'db10-049.004942' at position 118512044 InnoDB: Warning: a long semaphore wait: --Thread 139611514595072 has waited at sync0rw.cc line 297 for 241.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139611483125504 has waited at btr0cur.cc line 545 for 241.00 seconds the semaphore: X-lock (wait_ex) on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139611546064640 has waited at sync0rw.cc line 297 for 241.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139610916673280 has waited at sync0rw.cc line 270 for 241.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139611357247232 has waited at btr0cur.cc line 554 for 241.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139611619493632 has waited at btr0cur.cc line 554 for 241.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139611525084928 has waited at sync0rw.cc line 297 for 241.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139611441166080 has waited at btr0cur.cc line 554 for 241.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139719716267776 has waited at buf0buf.cc line 2457 for 241.00 seconds the semaphore: S-lock on RW-latch at 0x7f098dc11540 created in file buf0buf.cc line 996 a writer (thread id 139719716267776) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file not yet reserved line 0 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/buf/buf0buf.cc line 3579 InnoDB: Warning: a long semaphore wait: --Thread 139610260764416 has waited at btr0cur.cc line 554 for 241.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info: InnoDB: Pending preads 0, pwrites 0 ===================================== 2015-11-30 20:33:03 7ef9b3b86700 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 60 seconds ----------------- BACKGROUND THREAD ----------------- srv_master_thread loops: 8135 srv_active, 0 srv_shutdown, 175 srv_idle srv_master_thread log flush and writes: 8309 ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 66048 --Thread 139611514595072 has waited at sync0rw.cc line 297 for 251.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 --Thread 139611483125504 has waited at btr0cur.cc line 545 for 251.00 seconds the semaphore: X-lock (wait_ex) on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 --Thread 139611546064640 has waited at sync0rw.cc line 297 for 251.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 --Thread 139610916673280 has waited at sync0rw.cc line 270 for 251.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 --Thread 139611357247232 has waited at btr0cur.cc line 554 for 251.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 --Thread 139611619493632 has waited at btr0cur.cc line 554 for 251.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 --Thread 139611525084928 has waited at sync0rw.cc line 297 for 251.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 --Thread 139611441166080 has waited at btr0cur.cc line 554 for 251.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 --Thread 139719716267776 has waited at buf0buf.cc line 2457 for 251.00 seconds the semaphore: S-lock on RW-latch at 0x7f098dc11540 created in file buf0buf.cc line 996 a writer (thread id 139719716267776) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file not yet reserved line 0 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/buf/buf0buf.cc line 3579 --Thread 139610260764416 has waited at btr0cur.cc line 554 for 251.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 --Thread 139610895693568 has waited at buf0flu.cc line 1064 for 244.00 seconds the semaphore: S-lock on RW-latch at 0x7f09893f2340 created in file buf0buf.cc line 996 a writer (thread id 139611525084928) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file btr0cur.cc line 265 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/btr/btr0cur.cc line 265 OS WAIT ARRAY INFO: signal count 82316 Mutex spin waits 5723675, rounds 5019676, OS waits 27829 RW-shared spins 68787, rounds 1361576, OS waits 31392 RW-excl spins 47999, rounds 493233, OS waits 5902 Spin rounds per wait: 0.88 mutex, 19.79 RW-shared, 10.28 RW-excl ------------ TRANSACTIONS ------------ Trx id counter 39221621177 Purge done for trx's n:o < 39221621177 undo n:o < 0 state: running but idle History list length 2003 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 39221621175, not started MySQL thread id 2, OS thread handle 0x7f13080b5700, query id 0 Waiting for master to send event ---TRANSACTION 39221620820, ACTIVE 251 sec inserting mysql tables in use 1, locked 1 1 lock struct(s), heap size 360, 0 row lock(s), undo log entries 1 MySQL thread id 3, OS thread handle 0x7ef98bfff700, query id 28601706 System lock ---TRANSACTION 39221511555, ACTIVE 300 sec fetching rows, thread declared inside InnoDB 1853 mysql tables in use 1, locked 1 27937 lock struct(s), heap size 3405352, 7514649 row lock(s) MySQL thread id 981, OS thread handle 0x7f13080e6700, query id 28351219 localhost dbadmin copy to tmp table alter table user_pool_20150318 engine=MyISAM -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: complete io for buf page (read thread) ev set I/O thread 3 state: waiting for i/o request (read thread) I/O thread 4 state: waiting for i/o request (read thread) I/O thread 5 state: waiting for i/o request (read thread) I/O thread 6 state: waiting for i/o request (read thread) I/O thread 7 state: waiting for i/o request (read thread) I/O thread 8 state: waiting for i/o request (read thread) I/O thread 9 state: complete io for buf page (read thread) ev set I/O thread 10 state: waiting for i/o request (read thread) I/O thread 11 state: complete io for buf page (read thread) ev set I/O thread 12 state: complete io for buf page (read thread) ev set I/O thread 13 state: waiting for i/o request (read thread) I/O thread 14 state: waiting for i/o request (read thread) I/O thread 15 state: complete io for buf page (read thread) ev set I/O thread 16 state: complete io for buf page (read thread) ev set I/O thread 17 state: waiting for i/o request (read thread) I/O thread 18 state: waiting for i/o request (read thread) I/O thread 19 state: complete io for buf page (read thread) ev set I/O thread 20 state: waiting for i/o request (read thread) I/O thread 21 state: waiting for i/o request (read thread) I/O thread 22 state: waiting for i/o request (read thread) I/O thread 23 state: waiting for i/o request (read thread) I/O thread 24 state: waiting for i/o request (read thread) I/O thread 25 state: waiting for i/o request (read thread) I/O thread 26 state: waiting for i/o request (read thread) I/O thread 27 state: complete io for buf page (read thread) ev set I/O thread 28 state: waiting for i/o request (read thread) I/O thread 29 state: waiting for i/o request (read thread) I/O thread 30 state: waiting for i/o request (read thread) I/O thread 31 state: waiting for i/o request (read thread) I/O thread 32 state: waiting for i/o request (read thread) I/O thread 33 state: waiting for i/o request (read thread) I/O thread 34 state: waiting for i/o request (write thread) I/O thread 35 state: waiting for i/o request (write thread) I/O thread 36 state: waiting for i/o request (write thread) I/O thread 37 state: waiting for i/o request (write thread) I/O thread 38 state: waiting for i/o request (write thread) I/O thread 39 state: waiting for i/o request (write thread) I/O thread 40 state: waiting for i/o request (write thread) I/O thread 41 state: waiting for i/o request (write thread) I/O thread 42 state: waiting for i/o request (write thread) I/O thread 43 state: waiting for i/o request (write thread) I/O thread 44 state: waiting for i/o request (write thread) I/O thread 45 state: waiting for i/o request (write thread) I/O thread 46 state: waiting for i/o request (write thread) I/O thread 47 state: waiting for i/o request (write thread) I/O thread 48 state: waiting for i/o request (write thread) I/O thread 49 state: waiting for i/o request (write thread) I/O thread 50 state: waiting for i/o request (write thread) I/O thread 51 state: waiting for i/o request (write thread) I/O thread 52 state: waiting for i/o request (write thread) I/O thread 53 state: waiting for i/o request (write thread) I/O thread 54 state: waiting for i/o request (write thread) I/O thread 55 state: waiting for i/o request (write thread) I/O thread 56 state: waiting for i/o request (write thread) I/O thread 57 state: waiting for i/o request (write thread) I/O thread 58 state: waiting for i/o request (write thread) I/O thread 59 state: waiting for i/o request (write thread) I/O thread 60 state: waiting for i/o request (write thread) I/O thread 61 state: waiting for i/o request (write thread) I/O thread 62 state: waiting for i/o request (write thread) I/O thread 63 state: waiting for i/o request (write thread) I/O thread 64 state: waiting for i/o request (write thread) I/O thread 65 state: waiting for i/o request (write thread) Pending normal aio reads: 100 [5, 0, 0, 0, 0, 0, 0, 7, 0, 3, 70, 0, 0, 4, 2, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] , ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 270736 OS file reads, 396343 OS file writes, 132039 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- InnoDB: ###### Diagnostic info printed to the standard error stream InnoDB: Warning: a long semaphore wait: --Thread 139611514595072 has waited at sync0rw.cc line 297 for 272.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139611483125504 has waited at btr0cur.cc line 545 for 272.00 seconds the semaphore: X-lock (wait_ex) on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139611546064640 has waited at sync0rw.cc line 297 for 272.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139610916673280 has waited at sync0rw.cc line 270 for 272.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139611357247232 has waited at btr0cur.cc line 554 for 272.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139611619493632 has waited at btr0cur.cc line 554 for 272.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139611525084928 has waited at sync0rw.cc line 297 for 272.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139611441166080 has waited at btr0cur.cc line 554 for 272.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139719716267776 has waited at buf0buf.cc line 2457 for 272.00 seconds the semaphore: S-lock on RW-latch at 0x7f098dc11540 created in file buf0buf.cc line 996 a writer (thread id 139719716267776) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file not yet reserved line 0 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/buf/buf0buf.cc line 3579 InnoDB: Warning: a long semaphore wait: --Thread 139610260764416 has waited at btr0cur.cc line 554 for 272.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139610895693568 has waited at buf0flu.cc line 1064 for 265.00 seconds the semaphore: S-lock on RW-latch at 0x7f09893f2340 created in file buf0buf.cc line 996 a writer (thread id 139611525084928) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file btr0cur.cc line 265 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/btr/btr0cur.cc line 265 InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info: InnoDB: Pending preads 0, pwrites 0 InnoDB: ###### Diagnostic info printed to the standard error stream InnoDB: Warning: a long semaphore wait: --Thread 139611514595072 has waited at sync0rw.cc line 297 for 303.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139611483125504 has waited at btr0cur.cc line 545 for 303.00 seconds the semaphore: X-lock (wait_ex) on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139611546064640 has waited at sync0rw.cc line 297 for 303.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139610916673280 has waited at sync0rw.cc line 270 for 303.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139611357247232 has waited at btr0cur.cc line 554 for 303.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139611619493632 has waited at btr0cur.cc line 554 for 303.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139611525084928 has waited at sync0rw.cc line 297 for 303.00 seconds the semaphore: Mutex at 0x1372a40 created file sync0sync.cc line 1472, lock var 1 waiters flag 1 InnoDB: Warning: a long semaphore wait: --Thread 139611441166080 has waited at btr0cur.cc line 554 for 303.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139719716267776 has waited at buf0buf.cc line 2457 for 303.00 seconds the semaphore: S-lock on RW-latch at 0x7f098dc11540 created in file buf0buf.cc line 996 a writer (thread id 139719716267776) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file not yet reserved line 0 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/buf/buf0buf.cc line 3579 InnoDB: Warning: a long semaphore wait: --Thread 139610260764416 has waited at btr0cur.cc line 554 for 303.00 seconds the semaphore: S-lock on RW-latch at 0xa37b3dc8 created in file dict0dict.cc line 2420 a writer (thread id 139611483125504) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file btr0cur.cc line 554 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/ibuf/ibuf0ibuf.cc line 409 InnoDB: Warning: a long semaphore wait: --Thread 139610895693568 has waited at buf0flu.cc line 1064 for 296.00 seconds the semaphore: S-lock on RW-latch at 0x7f09893f2340 created in file buf0buf.cc line 996 a writer (thread id 139611525084928) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file btr0cur.cc line 265 Last time write locked in file /export/home/pb2/build/sb_0-11248666-1389714123.71/mysql-5.6.16/storage/innobase/btr/btr0cur.cc line 265 InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info: InnoDB: Pending preads 0, pwrites 0 InnoDB: ###### Diagnostic info printed to the standard error stream 期间,本想对有问题的MySQL进行strace跟踪,看看出错后发生了什么。 结果,开启strace,他却没有发生故障,如果你再次关闭strace,不久立马报错。 尝试过的解决方案: 1)echo "kernel.sem=250 32000 100 128″>>/etc/sysctl.conf 2)设置 innodb innodb_adaptive_hash_index=OFF 均无效果。 这时,已哭晕在厕所,醒来之后开始模拟测试各种场景。 测试手法 1 . M-M(log_slave_update)-S 2 . set trx=0 & alter table engine=Myisam & start slave 测试方案 第一轮测试(无厘头测试) MySQL 5.6.16 ROW AIO 边转换表引擎,边级联复制 1.0) 天津机房: centos6.6 二进制安装 aio=ON or aio=OFF 结果:crash 1.1) 天津机房 :centos6.6 源码编译安装 aio=ON 结果:crash 1.2) 天津机房: centos6.6 二进制安装 开启strace 结果: * 没有crash,但是有如下错误 * InnoDB: unable to purge a record * Enabling keys got errno 127 on aifang_adm.#sql-5d39_12d9, retrying 1.3) 天津机房: centos6.6 二进制安装 dw load压力测试 结果: ? 1.4) 天津机房: centos6.6 二进制安装, yum erase libaio 结果: 2.1) 上海机房: redhat6.x 二进制安装 aio=ON 结果:ok 2.2)上海机房: redhat6.x 源码编译安装 aio=OFF 结果:ok MySQL 5.6.27 ROW AIO 边转换表引擎,边级联复制 1.0) 天津机房: centos6.6 二进制安装 aio=ON or aio=OFF 结果:? 1.1) 天津机房 : centos6.6 源码编译安装 aio=ON 结果:? 1.3) 天津机房: centos6.6 二进制安装 dw load压力测试 结果: ? 2.1) 上海机房: redhat6.x 二进制安装 aio=ON 结果:ok 2.2)上海机房: redhat6.x 源码编译安装 aio=OFF 结果: ok 测试方式 机房 memory OS版本 OS内核 MySQL版本 MySQL安装方式 innodb_use_native_aio libaio是否安装 测试结果 补充 ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.16 binary ON or OFF yes crash - ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.16 source ON or OFF yes crash - ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.16 binary ON or OFF yes problem 开启strace后,没有crash ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.16 binary OFF no 无法启动mysql - ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.16 source OFF no ok - ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.23.4.el6.x86_64 5.6.16 binary ON or OFF yes ? - ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.23.4.el6.x86_64 5.6.16 source ON or OFF yes ? - ROW 模式,大量load数据 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.16 binary ON yes ? - ROW 模式,边转换表引擎,边级联复制 天津 64G centos6.6 2.6.32-504.el6.x86_64 5.6.16 binary ON or OFF yes crash - ROW 模式,边转换表引擎,边级联复制 天津 64G centos6.6 2.6.32-504.el6.x86_64 5.6.16 source ON or OFF yes crash - ROW 模式,边转换表引擎,边级联复制 上海 128G redhat6.5 2.6.32-504.el6.x86_64 5.6.16 binary ON or OFF yes ok - ROW 模式,边转换表引擎,边级联复制 上海 128G redhat6.5 2.6.32-504.el6.x86_64 5.6.16 source OFF yes ok - ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.27 binary ON or OFF yes ? - ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.27 source ON or OFF yes ? - ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.27 binary ON or OFF yes ok 开启strace后,没有crash ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.27 binary OFF no ? - ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.27 source OFF no ? - ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.23.4.el6.x86_64 5.6.27 binary ON or OFF yes ? - ROW 模式,边转换表引擎,边级联复制 天津 128G centos6.6 2.6.32-504.23.4.el6.x86_64 5.6.27 source ON or OFF yes ? - ROW 模式,大量load数据 天津 128G centos6.6 2.6.32-504.el6.x86_64 5.6.27 binary ON yes ? - ROW 模式,边转换表引擎,边级联复制 上海 128G redhat6.5 2.6.32-504.el6.x86_64 5.6.27 binary ON or OFF yes ok - ROW 模式,边转换表引擎,边级联复制 上海 128G redhat6.5 2.6.32-504.el6.x86_64 5.6.27 source OFF yes ok - 经过第一轮测试,大致得到以下结论,初步怀疑是硬件故障: 1. 部分机器dmesg : hpsa 0000:03:00.0: out of memory 网上搜索 & 联系HP官方 得到的答复是:驱动升级 1) https://access.redhat.com/solutions/1248173 2) http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=c04302261&lang=en-us&cc=us 不过,再升级之前,我又做了以下测试, 想验证除了硬件故障外,是否还有其他猫腻: 第二波测试: id 内存 OS aio 是否有out of memory MySQL版本 测试方法 结果 补充 1 128G redhat6.5 innodb_use_native_aio=on no 5.6.16 OLAP压测 ok - 2 128G redhat6.5 innodb_use_native_aio=on no 5.6.27 OLAP压测 ok - 3 128G centos6.6 innodb_use_native_aio=off no 5.6.16 OLAP压测 ok - 4 128G centos6.6 innodb_use_native_aio=off no 5.6.27 OLAP压测 ok - 5 128G centos6.6 innodb_use_native_aio=on yes 5.6.16 OLAP压测 crash - 6 128G centos6.6 innodb_use_native_aio=on yes 5.6.27 OLAP压测 ok - 7 128G centos6.6 innodb_use_native_aio=off yes 5.6.27 OLAP压测 ok - 8 128G centos6.6 innodb_use_native_aio=off yes 5.6.16 OLAP压测 crash - 9 128G centos6.6 innodb_use_native_aio=on yes 5.6.16 追同步测试 crash - 10 128G centos6.6 innodb_use_native_aio=on yes 5.6.27 追同步测试 ok - 11 128G centos6.6 innodb_use_native_aio=on no 5.6.16 追同步测试 ok - 12 128G centos6.6 innodb_use_native_aio=on no 5.6.27 追同步测试 ok - 13 128G centos6.6 innodb_use_native_aio=off yes 5.6.27 追同步测试 ok - 14 128G centos6.6 innodb_use_native_aio=on yes 5.6.27 追同步测试 ok - 14 128G centos6.6 innodb_use_native_aio=off no,驱动升级 5.6.16 追同步测试 crash - 15 128G centos6.6 innodb_use_native_aio=on no,驱动升级 5.6.27 追同步测试 ok - 测试表示 1. MySQL5.6.27 在各方面都ok,没有报错 2. MySQL5.6.16 在硬件完全没问题的机器上ok,不会报错 总结 1. 硬件知识需要多学习,自动化检测,硬件监控需加强 2. Linux 调优与故障诊断需要加强 善于利用dmesg,对/var/log/kernel.log & /var/log/messges 信息要过于敏感。 3. 要经常翻MySQL的release note,多关注bug的修复,做到心中有数 4. 测试非常重要,自动化压力测试和基准测试,可以提前发现很多问题 5. 态度,情怀,毅力最重要,要怀抱敬畏之心 最后方案 从以上测试得知,并非完全是硬件问题,但又和硬件相关。 MySQL5.6.27 的稳定性 要优于 MySQL5.6.16,所以升级是王道。 但是由于线上还有5.1,5.5 的服务,所以还是不要用有问题的机器。
以下列出的是平常工作中用到的小工具和命令 连接 如何不需要密码的登录,直接mysql my.cnf 或者写在 ~/.my.cnf, 相对安全 [mysql] --表示: 只有mysql命令才能免密码 user=root password=123 socket=/tmp/mysql.sock [mysqladmin] --表示: 只有mysqladmin命令才能免密码 user=root password=123 socket=/tmp/mysql.sock [mysqldump] --表示: 只有mysqldump命令才能免密码 user=root password=123 socket=/tmp/mysql.sock [client] --表示:只要是客户端的命令,都是可以免密码的 user=root password=123 socket=/tmp/mysql.sock MySQL如何查看用户名密码 MySQL5.7.6之前 1. show grants for $user; 2. select host,user,Password from user; MySQL5.7.6+ 1. select host,user,authentication_string,password_lifetime,password_expired,password_last_changed from mysql.user where user='lc_rx'; information_schema相关 如何在线kill掉满足某种条件的session DB_SYS: perl /home/Keithlan/scripts/outage/kill_connection/kill_sleepconn_by_opt.pl -opt $opt PROCESSLIST 分析出当前连接过来的客户端ip的分布情况 select substring_index(host,':', 1) as appip ,count(*) as count from information_schema.PROCESSLIST group by appip order by count desc ; 分析处于Sleep状态的连接分布情况 select substring_index(host,':', 1) as appip ,count(*) as count from information_schema.PROCESSLIST where COMMAND='Sleep' group by appip order by count desc ; 分析哪些DB访问的比较多 select DB ,count(*) as count from information_schema.PROCESSLIST where COMMAND='Sleep' group by DB order by count desc ; 分析哪些用户访问的比较多 select user ,count(*) as count from information_schema.PROCESSLIST where COMMAND='Sleep' group by user order by count desc ; TABLES 列出大于10G以上的表 select TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,ROUND((INDEX_LENGTH+DATA_FREE+DATA_LENGTH)/1024/1024/1024) as size_G from information_schema.tables where ROUND((INDEX_LENGTH+DATA_FREE+DATA_LENGTH)/1024/1024/1024) > 10 order by size_G desc ; performance_schema相关 performance_schema占用多少内存 http://dev.mysql.com/doc/refman/5.7/en/show-engine.html SHOW ENGINE PERFORMANCE_SCHEMA STATUS; For the Performance Schema as a whole, performance_schema.memory is the sum of all the memory used (the sum of all other memory values). performance_schema 瓶颈 1) SHOW VARIABLES LIKE 'perf%'; 2) SHOW STATUS LIKE 'perf%'; 3) SHOW ENGINE PERFORMANCE_SCHEMA STATUS\G 详细细节:http://keithlan.github.io/2015/07/17/22_performance_schema/ 如何查看每个threads当前session变量的值 select * from performance_schema.variables_by_thread as a,(select THREAD_ID,PROCESSLIST_ID,PROCESSLIST_USER,PROCESSLIST_HOST,PROCESSLIST_COMMAND,PROCESSLIST_STATE from performance_schema.threads where PROCESSLIST_USER<>'NULL') as b where a.THREAD_ID = b.THREAD_ID and a.VARIABLE_NAME = 'sql_safe_updates' TOP SQL 相关 能够解决什么问题: 可以找到某个表是否还有业务访问? 能够解决什么问题: 可以确定某个库,某个表的业务是否迁移干净? 能够解决什么问题: 可以用于分析业务是否异常? 能够解决什么问题: 根据TopN 可以分析压力? 能够解决什么问题: 可以用于分析哪些表是热点数据,这些TopN的表才是值得优化的表。只要每一条语句快0.01ms,那么1亿条呢? 实例中: 求SQL 一个实例中查询最多的TopN SQL select SCHEMA_NAME,DIGEST_TEXT,COUNT_STAR,FIRST_SEEN,LAST_SEEN from performance_schema.events_statements_summary_by_digest where DIGEST_TEXT like 'select%' and DIGEST_TEXT not like '%SESSION%' order by COUNT_STAR desc limit 10\G 一个实例中写入最多的TopN SQL select SCHEMA_NAME,DIGEST_TEXT,COUNT_STAR,FIRST_SEEN,LAST_SEEN from performance_schema.events_statements_summary_by_digest where DIGEST_TEXT like 'insert%' or DIGEST_TEXT like 'update%'or DIGEST_TEXT like 'delete%' or DIGEST_TEXT like 'replace%' order by COUNT_STAR desc limit 10\G 库中: 求SQL 一个库中查询最多的TopN SQL 同上 实例中: 求SQL 一个库中写入最多的TopN SQL 同上 实例中: 求SQL 实例中:求table 使用说明 usage: perl xx.pl -i 192.168.1.10 -p 3306 -e read|write|all 2>/dev/null ; opt e: read get select count write get insert,update,delete count all get all sql count opt i: 192.xx.xx.xx ip address opt p: 3306 db port 查看一个实例中,哪个表的SQL语句 访问最多? DB_SYS: perl get_table_from_sql.pl -i $ip -p $port -e all 2> /dev/null 查看一个实例中,哪个表的SQL语句 select【读】最多? DB_SYS: perl get_table_from_sql.pl -i $ip -p $port -e read 2> /dev/null 查看一个实例中,哪个表的SQL语句 insert+update+delete+replace【写】最多? DB_SYS: perl get_table_from_sql.pl -i $ip -p $port -e write 2> /dev/null Table IO 相关的监控 库级别 如何查看一个MySQL实例中哪个库的all latency时间最大 select OBJECT_SCHEMA,sum(SUM_TIMER_WAIT) as all_time,sum(SUM_TIMER_READ) as all_read_time,sum(SUM_TIMER_WRITE) as all_write_time,sum(COUNT_STAR) as all_star,sum(COUNT_read) as all_read ,sum(COUNT_WRITE) as all_write,sum(COUNT_FETCH) as all_fetch,sum(COUNT_INSERT) as all_insert,sum(COUNT_UPDATE) as all_update,sum(COUNT_DELETE) as all_delete from performance_schema.table_io_waits_summary_by_table group by OBJECT_SCHEMA order by all_time desc; 如何查看一个MySQL实例中哪个库的read latency时间最大 select OBJECT_SCHEMA,sum(SUM_TIMER_WAIT) as all_time,sum(SUM_TIMER_READ) as all_read_time,sum(SUM_TIMER_WRITE) as all_write_time,sum(COUNT_STAR) as all_star,sum(COUNT_read) as all_read ,sum(COUNT_WRITE) as all_write,sum(COUNT_FETCH) as all_fetch,sum(COUNT_INSERT) as all_insert,sum(COUNT_UPDATE) as all_update,sum(COUNT_DELETE) as all_delete from performance_schema.table_io_waits_summary_by_table group by OBJECT_SCHEMA order by all_read_time desc; 如何查看一个MySQL实例中哪个库的write latency时间最大 select OBJECT_SCHEMA,sum(SUM_TIMER_WAIT) as all_time,sum(SUM_TIMER_READ) as all_read_time,sum(SUM_TIMER_WRITE) as all_write_time,sum(COUNT_STAR) as all_star,sum(COUNT_read) as all_read ,sum(COUNT_WRITE) as all_write,sum(COUNT_FETCH) as all_fetch,sum(COUNT_INSERT) as all_insert,sum(COUNT_UPDATE) as all_update,sum(COUNT_DELETE) as all_delete from performance_schema.table_io_waits_summary_by_table group by OBJECT_SCHEMA order by all_write_time desc; 如何查看一个MySQL实例中哪个库的总访问量最大 select OBJECT_SCHEMA,sum(SUM_TIMER_WAIT) as all_time,sum(COUNT_STAR) as all_star,sum(COUNT_read) as all_read ,sum(COUNT_WRITE) as all_write,sum(COUNT_FETCH) as all_fetch,sum(COUNT_INSERT) as all_insert,sum(COUNT_UPDATE) as all_update,sum(COUNT_DELETE) as all_delete from performance_schema.table_io_waits_summary_by_table group by OBJECT_SCHEMA order by all_star desc; 如何查看一个MySQL实例中哪个库的查询量(除了select中的fetchs外,还包括update,delete过程中的fetchs)最大 select OBJECT_SCHEMA,sum(SUM_TIMER_WAIT) as all_time,sum(COUNT_STAR) as all_star,sum(COUNT_read) as all_read ,sum(COUNT_WRITE) as all_write,sum(COUNT_FETCH) as all_fetch,sum(COUNT_INSERT) as all_insert,sum(COUNT_UPDATE) as all_update,sum(COUNT_DELETE) as all_delete from performance_schema.table_io_waits_summary_by_table group by OBJECT_SCHEMA order by all_read desc; 如何查看一个MySQL实例中哪个库的写入量最大 select OBJECT_SCHEMA,sum(SUM_TIMER_WAIT) as all_time,sum(COUNT_STAR) as all_star,sum(COUNT_read) as all_read ,sum(COUNT_WRITE) as all_write,sum(COUNT_FETCH) as all_fetch,sum(COUNT_INSERT) as all_insert,sum(COUNT_UPDATE) as all_update,sum(COUNT_DELETE) as all_delete from performance_schema.table_io_waits_summary_by_table group by OBJECT_SCHEMA order by all_write desc; 如何查看一个MySQL实例中哪个库的update量最大 select OBJECT_SCHEMA,sum(SUM_TIMER_WAIT) as all_time,sum(COUNT_STAR) as all_star,sum(COUNT_read) as all_read ,sum(COUNT_WRITE) as all_write,sum(COUNT_FETCH) as all_fetch,sum(COUNT_INSERT) as all_insert,sum(COUNT_UPDATE) as all_update,sum(COUNT_DELETE) as all_delete from performance_schema.table_io_waits_summary_by_table group by OBJECT_SCHEMA order by all_update desc; 如何查看一个MySQL实例中哪个库的insert量最大 select OBJECT_SCHEMA,sum(SUM_TIMER_WAIT) as all_time,sum(COUNT_STAR) as all_star,sum(COUNT_read) as all_read ,sum(COUNT_WRITE) as all_write,sum(COUNT_FETCH) as all_fetch,sum(COUNT_INSERT) as all_insert,sum(COUNT_UPDATE) as all_update,sum(COUNT_DELETE) as all_delete from performance_schema.table_io_waits_summary_by_table group by OBJECT_SCHEMA order by all_insert desc; 如何查看一个MySQL实例中哪个库的delete量最大 select OBJECT_SCHEMA,sum(SUM_TIMER_WAIT) as all_time,sum(COUNT_STAR) as all_star,sum(COUNT_read) as all_read ,sum(COUNT_WRITE) as all_write,sum(COUNT_FETCH) as all_fetch,sum(COUNT_INSERT) as all_insert,sum(COUNT_UPDATE) as all_update,sum(COUNT_DELETE) as all_delete from performance_schema.table_io_waits_summary_by_table group by OBJECT_SCHEMA order by all_delete desc; 表级别 表的all latency时间(read + write)最大 select OBJECT_SCHEMA,OBJECT_NAME,SUM_TIMER_WAIT,SUM_TIMER_READ,SUM_TIMER_WRITE,COUNT_STAR,COUNT_read,COUNT_WRITE,COUNT_UPDATE,COUNT_insert,COUNT_delete from performance_schema.table_io_waits_summary_by_table order by SUM_TIMER_WAIT desc limit 10 表的read latency(fetch)时间最大 select OBJECT_SCHEMA,OBJECT_NAME,SUM_TIMER_WAIT,SUM_TIMER_READ,SUM_TIMER_WRITE,COUNT_STAR,COUNT_read,COUNT_WRITE,COUNT_UPDATE,COUNT_insert,COUNT_delete from performance_schema.table_io_waits_summary_by_table order by SUM_TIMER_READ desc limit 10 表的write latency 时间最大 select OBJECT_SCHEMA,OBJECT_NAME,SUM_TIMER_WAIT,SUM_TIMER_READ,SUM_TIMER_WRITE,COUNT_STAR,COUNT_read,COUNT_WRITE,COUNT_UPDATE,COUNT_insert,COUNT_delete from performance_schema.table_io_waits_summary_by_table order by SUM_TIMER_WRITE desc limit 10 表的rows 总访问量最大 select OBJECT_SCHEMA,OBJECT_NAME,SUM_TIMER_WAIT,COUNT_STAR,COUNT_read,COUNT_WRITE,COUNT_UPDATE,COUNT_insert,COUNT_delete from performance_schema.table_io_waits_summary_by_table order by COUNT_STAR desc limit 10 表的rows 查询量最大 select OBJECT_SCHEMA,OBJECT_NAME,SUM_TIMER_WAIT,COUNT_STAR,COUNT_read,COUNT_WRITE,COUNT_UPDATE,COUNT_insert,COUNT_delete from performance_schema.table_io_waits_summary_by_table order by COUNT_read desc limit 10 表的rows 写入量最大 select OBJECT_SCHEMA,OBJECT_NAME,SUM_TIMER_WAIT,COUNT_STAR,COUNT_read,COUNT_WRITE,COUNT_UPDATE,COUNT_insert,COUNT_delete from performance_schema.table_io_waits_summary_by_table order by COUNT_WRITE desc limit 10 表的rows update量最大 select OBJECT_SCHEMA,OBJECT_NAME,SUM_TIMER_WAIT,COUNT_STAR,COUNT_read,COUNT_WRITE,COUNT_UPDATE,COUNT_insert,COUNT_delete from performance_schema.table_io_waits_summary_by_table order by COUNT_update desc limit 10 表的rows insert量最大 select OBJECT_SCHEMA,OBJECT_NAME,SUM_TIMER_WAIT,COUNT_STAR,COUNT_read,COUNT_WRITE,COUNT_UPDATE,COUNT_insert,COUNT_delete from performance_schema.table_io_waits_summary_by_table order by COUNT_insert desc limit 10 表的rows delete量最大 select OBJECT_SCHEMA,OBJECT_NAME,SUM_TIMER_WAIT,COUNT_STAR,COUNT_read,COUNT_WRITE,COUNT_UPDATE,COUNT_insert,COUNT_delete from performance_schema.table_io_waits_summary_by_table order by COUNT_delete desc limit 10 抓包 * tshark 中的 -e参数有哪些内容请参考 http://www.wireshark.org/docs/dfref/ https://www.wireshark.org/docs/dfref/m/mysql.html https://www.wireshark.org/docs/dfref/t/tcp.html https://www.wireshark.org/docs/dfref/m/memcache.html https://www.wireshark.org/docs/dfref/h/http.html * tshark: 抓取mysql tcp包,以及大小 tshark -i any -R 'tcp.port == 3306 && mysql' -T fields -e tcp.port -e ip.addr -e mysql.query -e mysql.packet_length -e tcp.len tshark 高级版本将 -R 替换成了 -Y * tshark: 抓mysql包 tshark -i any dst host ${ip} and dst port 3306 -l -d tcp.port==3306,mysql -T fields -e frame.time -e 'ip.src' -e 'mysql.query' > yy.tshark --这种方式,会在/tmp/目录下创建很多临时文件,要小心,会产生磁盘报警。 * tshark -i any dst host ${ip} and dst port 3306 -l -d tcp.port==3306,mysql -T fields -e 'ip.src' -e 'tcp.srcport' -e 'mysql.schema' -e 'mysql.query' -w yy.tshark --类似于tcpdump。 nohup tshark -i any dst host ${ip} and dst port 3306 -l -d tcp.port==3306,mysql -a duration:20 -T fields -e mysql.schema -e frame.time -e ip.src -e tcp.srcport -e mysql.query -w xx.sql & -- -a duration 当时间超过 20秒时,停止抓取。 nohup tshark -i any dst host ${ip} and dst port 3306 -l -d tcp.port==3306,mysql -a filesize:2000000 -T fields -e mysql.schema -e frame.time -e ip.src -e tcp.srcport -e mysql.query -w xx.sql & 注:当文件超过2G时,停止抓取。单位是Kilobyte。 * 只抓取MySQL的包,不会有空格之类的了 tshark -i any dst host ${ip} and dst port 3306 -l -a duration:10 -R 'mysql.query' -T fields -e 'ip.src' -e 'mysql.query' ==from gitlab http://gitlab.corp.anjuke.com/_incubator/knowledge/blob/master/tshark.md * thark:解tcpdump包 tshark -r xx.tcpdump -d tcp.port==3306,mysql -T fields -e mysql.schema -e frame.time -e ip.src -e mysql.query > test.tshark * 案例一、 memcache # 需要使用 -d 让 tshark 认为 11213 是使用的 memcache 协议,否则 tshark 默认是将 11211 认为是 memcache 协议 ~ tshark -i eth0 -d tcp.port==11213,memcache -R 'tcp.dstport == 11213 && memcache' * 案例二、 mysql ~ tshark -i eth0 -R 'tcp.port == 3306 && mysql.query' -T fields -e frame.time -e 'ip.src' -e 'mysql.query' * 案例三、http ~ tshark -i eth0 -R 'tcp.port == 80 && http' # 这个命令非常有用,当我们的程序非常慢,但是有没有打印任何日志时,我们怀疑可能是某个 http 请求慢了,可以用这个命令检查 # http.time 表示整个 http请求 消耗的时间 # http.response.code 200、403、500 等 # tcp.analysis.initial_rtt tcp 三次握手时间 # tcp.stream tshark 针对每一个5元组,都有一个编号,根据这个编号,可以方便的查到整个会话过程的所有请求 src.ip,src.port,tcp,dst.ip,dst.port,例如在这里,可以根据这个编号,找到请求所对应的 http.response.code ,因为在并发很高的时候,2个记录不一定紧挨着 ~ tshark -i eth0 -R 'http && tcp.port == 80' -T fields -e tcp.analysis.initial_rtt -e frame.time -e ip.addr -e tcp.port -e http.request.uri -e tcp.stream -e http.response.code -e http.time * 案例四、tcp # 检查是否有tcp 包重传 ~ tshark -i eth0 -R 'tcp.analysis.retransmission' slow query优化--切忌:不要在master进行分析和调优,在没有业务的机器上或者etl上分析诊断 1) 先搞清楚时间到底花在哪里&&为什么时间会花在那 (show profile) 1.1 ) 主要工具和方法就是profiling 1.2 ) 整个性能优化,应该花90%的时间在测量上面,只有这样才能够对症下药 1.3 ) 通过show profile 可以知道,时间都花在哪里 1.4 )通过session级别的status,可以知道为什么时间会花在那里 flush status; select xx from tt where ff ; show status where variable_name like 'Handler%' or Variable_name like 'Created%'; 2) 完成一项任务的时间分两个部分 执行时间和等待时间 如何优化执行时间呢 --比较简单? 2.1) 降低子任务数量 2.2) 降低子任务的执行频率 2.3) 提升子任务的执行效率并且判断任务在什么时间执行最长 如何优化等待时间呢 --比较复杂? 2.4) 一般是由于资源竞争导致,要用合适的工具找到竞争点。 2.5) 判断任务在什么地方被阻塞的时间最长。 3) 通过slow,可以找到值得优化的SQL awk '/^# Time:/{print $3, $4, c;c=0}/^# User/{c++}' dbbak10-001-slow.log --可以统计出每个时间点的slow 数量,精度比较细 3.1) 执行总时间最多的SQL 3.2) 单条SQL执行时间最多的SQL 4) 三种轻量级别的SQL抓取 show processlist & tcpdump & slow-query 解析工具可以用:pt-query-digest 解析tcpdump和slow query msyql -e 'show proceslist\G' | grep State: | sort | uniq -c | sort -rn --轻量级 (show processlist && show status) 5) 找到最需要优化的SQL后,可以开始跟踪分析单条SQL来获得更加底层实际的东 西,目前最好的三种方法是a)show profile b)show status c)slow query条目 a)show profile SQL> set profiling=1; SQL> select * from table; SQL> show profiles; SQL> show profile for query 1; 格式化输出: SQL> set @query_id = 1; SQL> SELECT STATE,SUM(DURATION) AS Total_R, ROUND( 100*SUM(DURATION) / (SELECT SUM(DURATION) FROM INFORMATION_SCHEMA.PROFILING WHERE QUERY_ID = @query_id ), 2) AS Pct_R, COUNT(*) AS CallS, SUM(DURATION) / COUNT(*) AS "R/CALL" FROM INFORMATION_SCHEMA.PROFILING WHERE QUERY_ID = @query_id GROUP BY STATE ORDER BY Total_R DESC; 当然,通过show profile 可以知道时间主要花在什么地方,但是你不知道为什么 会花在那些地方?这是时候就必须要跟踪堆栈来找到进一步的原因了。 查看是否使用了磁盘临时表还是内存临时表: flush status; sql; show status where variable_name like 'Handler%' or variable_name like 'Created%'; b) show status SQL> 句柄计数器 handler counter,临时文件,表计数器 SQL> flush status ; 刷新绘画级别的状态值。 SQL> select * from table; SQL> show status where variable_name like 'Handler%' or Variable_name like 'Created%'; --可以看到是否利用了磁盘临时表,而explain是无法看到的。 6. 监控点 -- 通过监控状态数据可以发现哪些地方是异常的,然后再具体分析异 常时间点的日志。 a)show global status; --开销比较低 b)show processlist | grep state; 或者使用innotop --开销比较低 c)slow query + pt-query-digest d)show innoDB status; e) vmstat f) iostat 7. 关于索引统计 发生过一件事情,show table status看到的大小100M,但是实际物理大小10G,通过这个发型索引统计有的时候非常不准确 这里简单介绍下: innodb_stats_persistent=on , db重启后不会清空,不需要重新收集 innodb_stats_persistent=off, db重启后统计信息清空,需要重新收集统计 1、针对是否持久化统计信息mysql可以通过innodb_stats_persistent参数来控制 2、针对统计信息的时效性,mysql通过innodb_stats_auto_recalc参数来控制是否自动更新 3、针对统计信息的准确性,mysql通过innodb_stats_persistent_sample_pages 参数来控制更新 4、mysql通过analyze table 语句来手动的更新统计信息 5、mysql> select * from innodb_table_stats; last_update可以查看索引统计的最后更新时间 6、当索引统计不准确的时候,可以通过analyze table来更新索引统计信息,让执行计划更加准确。 如果这样做后,执行计划还是不准确,那么可以试图调大innodb_stats_persistent_sample_pages,让索引页收集的更加多,让执行计划更准确 8. 关于索引选择性: 字段1 building_id,字段2 status 单索引字段的索引选择性: select count(distinct building_id)/count(*) as selectivity from community_units; 组合索引的索引选择性: select count(distinct (concat(building_id,status)))/count(*) as selectivity from community_units; 组合前缀的索引选择性: select count(distinct (concat(building_id,left(status,2))))/count(*) as selectivity from community_units; 得到的结果越接近1,效果越好 cpu 模式调节 https://wiki.archlinux.org/index.php/CPU_frequency_scaling 有哪几种模式 Governor Description ondemand Dynamically switch between CPU(s) available if at 95% cpu load performance Run the cpu at max frequency conservative Dynamically switch between CPU(s) available if at 75% load powersave Run the cpu at the minimum frequency userspace Run the cpu at user specified frequencies 如何查看当前的cpu模式 cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 如何查看cpu支持哪几种模式 cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_available_governors 如何设置 bios里面设置 os设置 NC 传送 传送文件 目的主机监听 nc -l 监听端口<未使用端口> > 要接收的文件名 nc -l 4444 > cache.tar.gz 源主机发起请求 nc 目的主机ip 目的端口 < 要发送的文件 nc 192.168.0.85 4444 < /root/cache.tar.gz ============================================= ==传送文件夹== 接收方的命令: nc -l ${ip} 4444 | tar xf - 传送方的命令: tar -cvf - ppc_* | nc ${ip} 4444 rsync 核心算法:http://www.oschina.net/question/28_54213?fromerr=DHoiMICG 小bug:如果rsync一段时间,突然不传了,且流量中断,不妨加上这个参数试试 /usr/bin/rsync --sockopts=SO_RCVBUF=10485760 配置:/etc/rsyncd.conf uid = root gid = root use chroot = no max connections = 64 pid file = /var/run/rsyncd.pid lock file = /var/run/rsync.lock log file = /var/log/rsyncd.log [dbbak] path = /data/dbbackup use chroot = no ignore errors read only = no list = no [Binlog] path = /data/BINLOG_BACKUP use chroot = no ignore errors read only = no list = no [fullbak] path = /data/FULL_BACKUP use chroot = no ignore errors read only = no list = no 启动: /usr/bin/rsync --daemon 限速100k/s传输 : /usr/bin/rsync -av --progress --update --bwlimit=100 --checksum --compress $file root@$ip::dbbak 正常传输: /usr/bin/rsync -av --progress $file root@$ip::dbbak pigz使用 常用知识普及 错误的写法:nohup tar -cvf - xx_20151129 | pigz -p 24 > xx_20151129.tar.gz & --一定不能加nohup,因为中间有管道符,不能传递下去的 错误的代价: tar: This does not look like a tar archive tar: Skipping to next header tar: Exiting with failure status due to previous errors 以上错误的案例中,为此付出过很大的代价,哭晕在厕所N次了... 正确的写法: tar -cvf - xx_20151129 | pigz -p 24 > xx_20151129.tar.gz & 用法 * 压缩 tar cvf - 目录名 | pigz -9 -p 24 > file.tgz pigz:用法-9是压缩比率比较大,-p是指定cpu的核数。 * 解压1 pigz -d file.tgz tar -xf --format=posix file * 解压2 tar xf file.tgz axel & httpd 多线程数据传输 * axel 下载&安装 wget -c http://pkgs.repoforge.org/axel/axel-2.4-1.el5.rf.x86_64.rpm rpm -ivh axel-2.4-1.el5.rf.x86_64.rpm * axel 核心参数 -n 指定线程数 -o 指定另存为目录 * httpd服务搭建与配置 yum install httpd * httpd配置主目录 /etc/httpd/conf/httpd.conf [xx html]# cat /etc/httpd/conf/httpd.conf | grep DocumentRoot # DocumentRoot: The directory out of which you will serve your #DocumentRoot "/var/www/html" --注释 DocumentRoot "/data/dbbackup/html" --配置成容量大的地址 * 开启httpd服务 service httpd restart * 下载数据 目的地ip shell> nohup axel -n 10 -v -o /data/dbbackup/ http://$数据源ip/xx_20151129.tar.gz & git 基本 1. git add xx 2. git commit -m 'xx' 3. git pull 4. git push 如何模拟网络延迟或丢包 模拟网络eth0 timeout 1000ms tc qdisc add dev eth0 root netem delay 1000ms 模拟网络eth0丢包率 10% tc qdisc add dev eth0 root netem loss 10% 删除以上tc命令导致的网络延迟或者丢包规则 tc qdisc del dev eth0 root 如何模拟网络故障 host1 网络断掉,只允许host2 访问 host1> iptables -A INPUT -p tcp -s host2 -j ACCEPT host1> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP host1 网络断掉,只允许host2的22端口访问 host1> iptables -A INPUT -p tcp -s host2 --dport 22 -j ACCEPT host1> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP 恢复host1 网络 host1> service iptables restart ansible 基础知识 文档 官方: http://docs.ansible.com/ 个人: http://sofar.blog.51cto.com/353572/1579894 基础用法 ssh互信 1) 不需要加入key,也能登陆到所有机器 2)前提是: ssh-add --mac本地,线下,将私钥加入到内存 ssh -A root@xx ; --会将私钥传送到远端机器 ssh-add -L 查看下。 --查看私钥是否传送过来 yaml - hosts: etl remote_user: root tasks: - shell: cat /home/mysql/xx.pl - copy: src=files/rsync dest=/usr/bin/ - template: src=files/xx.pl dest=/home/mysql/ hosts [test1] 10.x.x.x bak_dest_ip=10.y.y.y bak_source_port=xx [etl] 10.x.x.x bak_dest_ip=10.y.y.y bak_source_port=xx files *.pl *.file 常用语法 * 命令中如果有管道等多种命令,需要用bash -c ,并且引号起来 * -T:ping延迟时间 -f:线程数 -i:后面接hosts文件,xx标签 -m:command 命令模式 -a:命令内容 ansible -T 2 -f 1 -i ./hosts etl -m command -a "bash -c 'cat /home/mysql/xx.pl |grep bin/rsync'" * playbook方式跑ansible ansible-playbook -i ./hosts rsync.yaml 网络流量诊断 tools * ifstat * iftop iftop -nNP -i tunl1 —看出口流量 iftop -nNP —看看整体的 * 查看ip1 与 ip2 之间的流量 root@ip1> iftop -F $ip2/32 ============= iftop -F $P{ip}/32 * 如何查看一个机器上哪个端口占用的流量最大 1> iftop 进入界面 2> 按 N 3> 按 S vim块操作 [选择] -> 在普通模式下按ctrl+v或者v进入块操作模式 v(小写) 按字符选择,选中按下V时光标所在的字符到当前光标所在字符间的内容 V(大写) 按行选择 [Ctrl]+V 选择矩形字符块 [动作] -> 通过光标移动选中内容,可以进行ydp操作 y:复制选中内容到粘贴板 d:删除选中内容 p:用粘贴板里的内容替换选中的内容 =:对齐选中内容 对于矩阵字符块:[Shift] + i xxx [esc] :把xxx写到每一行的光标前面的位置 [替换] -> 批量缩进或反缩进,类似于文本编辑器中的格式化 选中多行,按I(大写)进入插入模式,写入Tab,之后按ESC,即可完成批量缩进的功能 也可以写入内容,到选中的每一行的光标位置 TGW 接口 TGW相关问题 * 根据vip,vport,找到rsip(不需要固定key,因为不需要访问real-server) wget -O- --post-data 'data={ "operator":"xx_DEV", "rulelist":[ { "vip":"'"$vip"'", "vport":'"$vport"', "protocol":"TCP" } ] }' "http://10.126.70.51/cgi-bin/fun_logic/bin/public_api/getrs.cgi" * 将vip 从rsip下线(需要固定key,因为要访问real-server) $del_rs=`wget -O- --post-data 'data={ "client_type" : "x'x_DB", "ignore_exist_error" : false, "operator" : "xx_DEV", "rs_type" : "linux_tunl", "need_setup_rs" : true, "op_type" : "'del'", "rule_list" : [ { "rule_group":[ { "vip":"'$vip'", "vport":'$vport', "protocol":"TCP" } ], "rs_os_type":"linux", "rs_list":[ { "rs_ip":"'$source_ip'", "rs_port":'$source_port', "rs_weight":100 } ] } ], "sync" : true }' 'http://xx/cgi-bin/fun_logic/bin/public_api/op_rs.cgi' 2>/dev/null`; * 将vip 从rsip上线(需要固定key,因为要访问real-server) $add_rs=`wget -O- --post-data 'data={ "client_type" : "xx_DB", "ignore_exist_error" : false, "operator" : "xx_DEV", "rs_type" : "linux_tunl", "need_setup_rs" : true, "op_type" : "'add'", "rule_list" : [ { "rule_group":[ { "vip":"'$vip'", "vport":'$vport', "protocol":"TCP" } ], "rs_os_type":"linux", "rs_list":[ { "rs_ip":"'$target_ip'", "rs_port":'$target_port', "rs_weight":100 } ] } ], "sync" : true }' 'http://xx/cgi-bin/fun_logic/bin/public_api/op_rs.cgi' 2>/dev/null`; * 问题 其实TGW的接口会做两步操作:1,操作TGW server上的配置 2,操作real-server上的配置,这两步应该是原子操作。 > 假设:1 成功,2失败,那么就会导致tgw上的配置,请求均切换了,但是real-server却没做改变,导致两端出现问题。 临时解决方案:2失败了,那么手动执行2的操作。假设在TGW上执行的操作是del_rs,那么可以在read-server上执行 /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c (将本地的rsip和vip直接的配置关系清理掉) > 假设:1 没有执行,2 执行了,那么就会导致tgw上的配置没变,但是real-server的配置改变了,导致从tgw来的请求均在real-server上找不到,出现问题。 临时解决方案:2执行了,那么手动让2还原到没有执行的状态。假设在read-server上误清理掉相关rs配置(/usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c),那么可以调用add_rs 来恢复。 vip漂移脚本 * 位置: db_sys: /data/online/tools/tgw_vip_shift usage: python vip_shift.py view --vip=$vip --vip_port=$vip_port python vip_shift.py del --vip=$vip --vip_port=$vip_port --src_ip=$src_ip --src_port=$src_port python vip_shift.py add --vip=$vip --vip_port=$vip_port --target_ip=$target_ip --target_port=$target_port python vip_shift.py change --vip=$vip --vip_port=$vip_port --src_ip=$src_ip --src_port=$src_port --target_ip=$target_ip --target_port=$target_port [-h] [--vip VIP] [--vip_port VIP_PORT] [--src_ip SRC_IP] [--src_port SRC_PORT] [--target_ip TARGET_IP] [--target_port TARGET_PORT] [-v VERBOSITY] {del,add,change,view} SSH 如何跳过输入密码,只允许认证模式 ssh -o BatchMode=yes -o PasswordAuthentication=no root@ip 如何永久清空一台机器上的history * 立即清空里的history当前历史命令的记录 history -c * 要求bash立即更新history文件 history -w nohup 失效的问题 在secureCRT 或者 iterm2 等类似终端,使用nohup 执行命令,为啥退出后,后台执行的命令也就停止了? * 错误的做法 1. nohup xx_cmd & 2. 点击左上角或者右上角的xx按钮退出 3. 然后发现,刚刚在后台的命令异常终止了 * 正确的做法 1. nohup xx_cmd & 2. 必须显示的 exit 退出shell,接下来,你想干嘛干嘛 3. 然后发现,刚刚在后台的命令,安然无恙,放心睡觉吧 如何让iTerm2 tab页面显示从哪台机器上登陆过来的 sudo vi /bin/go #!/bin/sh if [ "$1" = "" ]; then echo "pleaes input ip" else echo "go ==> ssh -A root@$1" echo "\033]0;$1\007" ssh -A root@$1 # ssh -A Keithlan@$堡垒机 -t "ssh root@xx" fi 如何查看memcache/redis当前哪个链接数最多 ss | grep '$ip:$port' | awk '{print $5}' | awk -F ':' '{print $1}' | sort -nr | uniq -c | sort -nr kibana简单语法 * 地址:http://opses.corp.anjuke.com/ * 注意:选择搜索的时间段,右上角 * filter: 语法: message:(+SQLSTATE +connection) 每个关键字用+号,不能有空格 * 选择log name: ops-user-userlog* ops-xinfang-userlog* ops-broker-userlog* * 哪些关键字跟DB紧密相关 SQLSTATE connection time out too many connection max_user_connections 定位系统问题的工具和方法 * perf top -G : 当CPU性能出现问题的时候,使用最佳 --注意: 会卡住,导致linux宕机,小心 : http://blog.51cto.com/1152313/1767927 [ ] perf record -g --保留文件,稍后可以用 perf report分析 [ ] 如果需要分析某一个进程,可以加 -p , perf record -g -p $pid [ ] perf top -g 实时分析,不保留数据到文件 [ ] 如果需要分析某一个进程,可以加 -p , perf top -g -p $pid * pidstat 1 5 :分析cpu问题的好工具 * dstat * pstack : 当进程卡住的时候,使用效果最佳 * ss -tnlp * nstat 1. 检查back_log 是否设置合理,如果不合理,那么就会看到很多如下信息,代表客户端的请求会connect timeout linux> nstat -a | grep -i 'drops\|Overflow' TcpExtListenOverflows 208539 0.0 TcpExtListenDrops 236999 0.0 * top : 1. top -Hp $pid 2. top , 然后输入f,然后输入p和y , 就可以看到top显示中对了2列, p对应的是swap(查看swap的进程),y对应的是wchan(Sleeping in Function),很实用 * gdb https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64 gdb -p $id info thread thread $id bt * strace 第一种: strace -o /data/dbbackup/strace.log -fp $pid 第二种: 跟踪某些具体的操作 strace -o /data/dbbackup/strace.log -T -tt -f -e trace=read,open -p $pid * other http://blog.donghao.org/2014/04/24/%E8%BF%BD%E8%B8%AAcpu%E8%B7%91%E6%BB%A1/ 如果perf都用不了,可以尝试 echo t > /proc/sysrq-trigger , 然后dmesg 或者查看kernel日志 如果上述方法还不行, 可以尝试 /proc/{pid}/wchan atop的使用方法 查看历史的top atop -r /var/log/atop/atop_20180906 -b 4:00 -e 5:00 --查看某台机器凌晨4点~5点的top日志, t 下一页,T 上一页 如何优化swap被占用的情况 处理原则 1. 如果swap占用的内存比较小(500M以内),那么通过 swapoff -a && swapon -a 可以快速释放掉(此操作有风险,谨慎) 2. 如果swap占用的内存比较大,则需要保证两点 2.1 必须保证linux的空闲内存 大于 swap占用空间 2.2 然后通过下面的方法找到占用swap最多的进程,优化处理进程,让其达到第一点后再释放swap 发现swap占用最多的进程 1. for i in $(ls /proc | grep "^[0-9]" | awk '$0>100'); do awk '/Swap:/{a=a+$2}END{print '"$i"',a/1024"M"}' /proc/$i/smaps;done| sort -k2nr | head 有些linux无法跑上面的程序,可参考下一条命令 2. for i in $(ll /proc | awk '{print $9}' | grep "^[0-9]" | awk '$0>100'); do awk '/Swap:/{a=a+$2}END{print '"$i"',a/1024"M"}' /proc/$i/smaps;done| sort -k2nr | head 查看机器有哪些服务 ss -tpnl 如何是否os的cache cat /proc/sys/vm/drop_caches sync;sync;sync; sync;sync;sync; sync;sync;sync; echo 3 > /proc/sys/vm/drop_caches sync;sync;sync; sync;sync;sync; echo 0 > /proc/sys/vm/drop_caches sync;sync;sync; sync;sync;sync; cat /proc/sys/vm/drop_caches
一、背景 GTID的原理这篇文章不再展开,有兴趣的同学可以关注之前的GTID原理,GTID实战,GTID运维实战文章。 如果每个实例的GTID相同,那么可以大概率说明数据的一致性。 所以,我们要保证slave的GTID一定是master的子集,因为基于复制原理,slave一般是延后master的。 于是,我们就实现了一个监控,如果slave不是master的子集,那么告警出来,截图如下: 上图列出的GTID就是有问题的,不是master的子集。 一开始,这么做主要是处于自己的洁癖,以及对规范的强要求和依赖。后来有好多小朋友跟我说,这个监控没有任何意义: 1) slave切换下,就不一致了2) 即便不是子集,在slave进行了操作,比如:flush 等操作,只要不影响数据一致性,也没关系的 balabala好多类似的理由。当时,我也没有太好的利用说服,只能自己负责的业务默默遵从。 后来再仔细想想GTID的原理,结合实战,对这个监控有了新的认识 二、故障复现和原理剖析 先简单说说结论: 如果candidate master的非子集GTID对应的binlog日志被purge了,那么MHA切换的时候,会导致从库IO线程失败。 报错如下: Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' 故障复现的步骤: candidate master: flush slow logs; --产生一些非子集的gtid event candidate master: purge binary logs to xx --将刚刚产生的非子集gtid所在的binlog给删除掉 master : 模拟切换 报错产生 原理剖析: a) 当备选master晋升为new master时,其他的实例会获取cm_uuid:1这个gtidb)如果cm_uuid:1 已经被purge了,那么就会报错 。 回到开头,为什么说这个监控价值一个亿呢? 如果slave没有业务,其实问题不大。 如果slave 有业务呢,现在很多架构是读写分离的,如果不能及时修复主从关系,那么延迟的数据造成的损失就不能简简单单的钱来衡量了。 三、解决方案 方案其实很简单: 巡检出问题,修复问题,最终一定要保证slave是master的子集。 如果修复gtid呢:如果确定slave上的gtid不影响数据的一致性,那么可以手动reset gtid来修复即可。 四、Q&A Q1: 通过在slave 设置 read_only 可以避免吧。A1: 因为flush 命令,是可以绕过read only并产生binlog的。 Q2:假如从库start slave失败,我也可以手动修复吧。A2:如果只是切换一次,我相信你可以,如果切换5次,10次呢。如果只是今天早slave操作了,你姑且可以记住。如果是半年前的操作呢?你怎么确定这个日志是可以skip的? Q3:从库的binlog怎么会被purge呢?A3:这个一般互联网公司的binlog日志,在线不会保留太长时间,保留1个月已经算是谢天谢地了。 即便不是人为的purge,也会通过expire_logs来删掉的。 这个原理非常简单,但是 越简单的事情 却 不容易做到。
大家都知道,slow query系统做的好不好,直接决定了解决slow query的效率问题 一个数据库管理平台,拥有一个好的slow query系统,基本上就拥有了解锁性能问题的钥匙 但是今天主要分享的并不是平台,而是在平台中看到的奇葩指数五颗星的slow issue 好了,关子卖完了,直接进入正题 一、症状 一堆如下慢查询 # User@Host: cra[cra] @ [xx] Id: 3352884621 # Query_time: 0.183673 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0 use xx_db; SET timestamp=1549900927; # administrator command: Prepare; # Time: 2019-02-12T00:02:07.516803+08:00 # User@Host: cra[cra] @ [xx] Id: 3351119968 # Query_time: 0.294081 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0 SET timestamp=1549900927; # administrator command: Prepare; 从我们的监控图上可以看到,每天不定时间段的slow query 总数在攀升,但是却看不到任何query 语句 这是我接触到的slow query优化案例中从来没有过的情况,比较好奇,也比较兴奋,至此决心要好好看看这个问题 二、排查 要解决这个问题,首先想到的是,如何复现这个问题,如何模拟复现这个症状 MySQL客户端 模拟prepare * 模拟 root:xx> prepare stmt1 from 'select * from xx_operation_log where id = ?'; Query OK, 0 rows affected (0.00 sec) Statement prepared * 结果 # Time: 2019-02-14T14:14:50.937462+08:00 # User@Host: root[root] @ localhost [] Id: 369 # Query_time: 0.000105 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0 SET timestamp=1550124890; prepare stmt1 from 'select * from xx_operation_log where id = ?'; 结论是: MySQL client 模拟出来的prepare 并不是我们期待的,并没有得到我们想要的 administrator command: Prepare perl 模拟prepare #!/usr/bin/perl use DBI; my $dsn = "dbi:mysql:database=${db_name};hostname=${db_host};port=${db_port}";#数据源 #获取数据库句柄 my $dbh = DBI->connect("DBI:mysql:database=xx;host=xx", "xx", "xx", {'RaiseError' => 1}); my $sql = qq{select * from xx_operation_log where id in (?)}; my $sth = $dbh->prepare($sql); $sth->bind_param (1, '100'); sleep 3; $sth->execute(); 结论是:跟MySQL客户端一样,同样是看不到administrator command: Prepare php 模拟prepare 1. 官方网址: https://dev.mysql.com/doc/apis-php/en/apis-php-mysqli-stmt.prepare.html <?php $link = mysqli_connect("xx", "dba", "xx", "xx_db"); /* check connection */ if (mysqli_connect_errno()) { printf("Connect failed: %s\n", mysqli_connect_error()); exit(); } $city = '1'; /* create a prepared statement */ $stmt = mysqli_stmt_init($link); if (mysqli_stmt_prepare($stmt, 'select * from xx_operation_log where id in (1,2,3)'){ /* bind parameters for markers */ /* mysqli_stmt_bind_param($stmt, "s", $city); /* execute query */ mysqli_stmt_execute($stmt); /* bind result variables */ mysqli_stmt_bind_result($stmt, $district); /* fetch value */ mysqli_stmt_fetch($stmt); printf("%s is in district %s\n", $city, $district); /* close statement */ mysqli_stmt_close($stmt); } /* close connection */ mysqli_close($link); ?> php模拟得到的slow 结果 [root@xx 20190211]# cat xx-slow.log | grep 'administrator command: Prepare' -B4 | grep 'User@Host' | grep 'xx_rx' | wc -l 7891 [root@xx 20190211]# cat xx-slow.log | grep 'administrator command: Prepare' -B4 | grep 'User@Host' | wc -l 7908 结论: 通过php代码,我们成功模拟出了想要的结果 那顺藤摸瓜,抓取下这段时间有相同session id的整个sql执行过程 MySQL开启slow=0的抓包模式 可以定位到同一个session id(3415357118) 的 prepare + execute + close stmt # User@Host: xx_rx[xx_rx] @ [xx.xxx.xxx.132] Id: 3415357118 # Query_time: 0.401453 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 0 use xx_db; SET timestamp=1550017125; # administrator command: Prepare; # Time: 2019-02-13T08:18:45.624671+08:00 -- # User@Host: xx_rx[xx_rx] @ [xx.xxx.xxx.132] Id: 3415357118 # Query_time: 0.001650 Lock_time: 0.000102 Rows_sent: 0 Rows_examined: 1 use xx_db; SET timestamp=1550017125; update `xx` set `updated_at` = '2019-02-13 08:18:45', `has_sales_office_phone` = 1, `has_presale_permit` = 1 where `id` = 28886; # Time: 2019-02-13T08:18:45.626138+08:00 -- # User@Host: xx_rx[xx_rx] @ [xx.xxx.xxx.132] Id: 3415357118 # Query_time: 0.000029 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 1 use xx_db; SET timestamp=1550017125; # administrator command: Close stmt; # Time: 2019-02-13T08:18:45.626430+08:00 结论:我们发现,prepare时间的确很长,但是sql语句却执行的很快,这就很尴尬了 本来是想通过抓包,看看是否能够验证我们的猜想: prepare的语句非常大,或者条件非常复杂,从而导致prepare在服务器端很慢结果发现query语句也都非常简单 那么既然如此,我们就找了业务方,将对应业务的prepare方法一起看看结果发现,业务使用的是php-pdo的方式,所以我们就又有了如下发现 php-pdo 两种prepare模式 http://php.net/manual/zh/pdo.prepare.php 1. 本地prepare $dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES,true); 不会发送给MySQL Server 2. 服务器端prepare $dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES,false); 发送给MySQL Server 验证两种prepare模式 服务端prepare模式( ATTR_EMULATE_PREPARES = false) <?php $dbms='mysql'; //数据库类型 $host='xxx'; //数据库主机名 $dbName='test'; //使用的数据库 $user='xx'; //数据库连接用户名 $pass='123456'; //对应的密码 $dsn="$dbms:host=$host;dbname=$dbName"; try { $pdo = new PDO($dsn, $user, $pass); //初始化一个PDO对象 $pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES,false); echo "----- prepare begin -----\n"; $stmt = $pdo->prepare("select * from test.chanpin where id = ?"); echo "----- prepare after -----\n"; $stmt->execute([333333]); echo "----- execute after -----\n"; $rs = $stmt->fetchAll(); } catch (PDOException $e) { die ("Error!: " . $e->getMessage() . "<br/>"); } strace -s200 -f php mysql1.php 跟踪 大家可以看到这个模式下,prepare的时候,是将query+占位符 发送给服务端的 本地prepare模式 (ATTR_EMULATE_PREPARES = true ) <?php $dbms='mysql'; //数据库类型 $host='xx'; //数据库主机名 $dbName='test'; //使用的数据库 $user='xx'; //数据库连接用户名 $pass='123456'; //对应的密码 $dsn="$dbms:host=$host;dbname=$dbName"; try { $pdo = new PDO($dsn, $user, $pass); //初始化一个PDO对象 $pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES,true); echo "----- prepare begin -----\n"; $stmt = $pdo->prepare("select * from test.chanpin where id = ?"); echo "----- prepare after -----\n"; $stmt->execute([333333]); echo "----- execute after -----\n"; $rs = $stmt->fetchAll(); } catch (PDOException $e) { die ("Error!: " . $e->getMessage() . "<br/>"); } strace -s200 -f php mysql1.php 跟踪 大家可以看到这个模式下,prepare的时候,是不会将query发送给服务端的,只有execute的时候才会发送 跟业务方确认后,他们使用的是后者,也就是修改了默认值,他们原本是想提升数据库的性能,因为预处理后只需要传参数就好了但是对于我们的业务场景并不适合,我们的场景是频繁打开关闭连接,也就是预处理基本就用不到 另外文档上面也明确指出prepared statements 性能会不好 调整和验证 如何验证业务方是否将prepare修改为local了呢? dba:(none)> show global status like 'Com_stmt_prepare'; +------------------+-----------+ | Variable_name | Value | +------------------+-----------+ | Com_stmt_prepare | 716836596 | +------------------+-----------+ 1 row in set (0.00 sec) 通过观察,发现这个值没有变化,说明调整已经生效 总结 prepare的优点 1. 防止SQL注入 2. 特定场景下提升性能 什么是特定场景: 就是先去服务端用占位符占位,后面可以直接发送请求来填空(参数值) 这样理论上来说, 你填空的次数非常多,性能才能发挥出来 prepare的缺点 1. 在服务器端的prepare毕竟有消耗,当并发量大,频繁prepare的时候,就会有性能问题 2. 服务端的prepare模式还会带来的另外一个问题就是,排错和slow 优化有困难,因为大部分情况下是看不到真实query的 3. 尽量设置php-pdo为 $pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES,true) ,在本地prepare,不要给服务器造成额外压力 建议 1. 默认情况下,应该使用php-pdo的默认配置,采用本地prepare的方式,这样可以做到防SQL注入的效果,性能差不到哪里去 2. 除非真的是有上述说的特定场景,可以考虑配置成服务器prepare模式,前提是要做好测试
https://dev.mysql.com/doc/refman/8.0/en/group-replication-technical-details.html 这一章主要描述MGR的更多细节 18.10.1 Group Replication Plugin Architecture MGR是一个MySQL插件,它是构建在MySQL复制架构上的,因此就拥有了它的很多优秀的特性比如: binog、row-based、GTID等它也整合了现在MySQL的一些组件如:performance schema 、plugin、service的架构下面一张图可以很好的展示MGR的整体结构和架构 Figure 18.9 Group Replication Plugin Block Diagram MGR包括了一系列的API如:capture, apply, and lifecycle,这些东西控制这个plugin如何与MySQL server进行协助这些接口令信息从server到plugin进行流动这,反之亦然这些接口将MySQL Server和Group进行了隔离在某一方面,从server到pugin,有一些事件通知信息如:server的开启、server的恢复、server接收请求连接、提交事务等在另一方面,plugin通知server完成相关动作,如:commit事务,或拒绝即将来临的事务,让事务排队等 下面一层,又是一些MGR组件capture组件 负责 执行并与相关的事务持续保持联系applier组件 负责 执行远程接收的事务recovery组件 负责 管理分布式恢复,以及管理成员的加入、新成员的日志同步,处理相关donor失败等情况 继续往下,replication protocol模块,包含了具体的逻辑复制协议他负责处理组复制的事务冲突、竞争 最后2层是:Group Communication System (GCS) API 和 communication engine (XCom)【基于Paxos的实现】Group Communication System (GCS) API 高层的API,负责抽象复制状态机所需要的属性communication engine(XCom)主要处理组成员之间的协作和交流 18.10.2 The Group 在MGR中,一堆servers组成了复制group一个由UUID组成的名字的group这个group是动态的,且servers可在任何时间自由(不管是主动,还是被动)加入和离开 如果一个server加入了一个group,他会自动的从donar中catch没有的事务,这其实就是异步复制机制如果一个server离开了group,剩下的server会意识到它的离开,并自动重新更新配置 18.10.3 Data Manipulation Statements 任何事务都可以自由执行事务不用协调,但是在commit的时候,需要其他server一起协调来做决定这个事务的命运这种协调有两种目的:1)检测这个事务是应该commit,还是不应该commit2)传递这个changes,以至于其他的servers可以很好的应用它 由于事务是通过原子广播的形式来传递,所以要么所有server都能接收到,要么全部都接收不到如果他们接收到了原子广播信息: 那么他们都将以同样的顺序接收到由于冲突检测需要比对事务写集,因此他们是在row-level层面上进行检测冲突检测的解决方案是:谁第一个提交,谁获胜的方式(first committer wins rule)假设:t1和t2同时提交,那么总有一个在前面,如果t1在前,t2在后,那么t1就会赢得提交权,t2就会被拒绝或rollback 18.10.4 Data Definition Statements 在MGR中,DDL是需要大家关注的 虽然8.0介绍说已经支持原子DDL,就是完全的DDL语句作为一个原子事务一样,要么提交,要么rollback但是,DDL语句,原子的,非原子的,都会隐式提交当前session的任何活跃事务也就是说:DDL无法跟其他事务组合使用 MGR是基于乐观复制的模式,也就是先执行,如有必要在rollback的模式在multi-primary模式下,DDL和DML作用在同一个对象上,会造成数据不一致的情况,所以需要引起大家足够的关心如果是single-primary,这种问题就不会发生,因为所有事务更新都在同一个server完成,那就是primary 18.10.5 Distributed Recovery 当一个成员加入group,需要追上现有成员的事务日志,这个过程叫做Distributed Recovery这一节,主要描述Distributed Recovery 18.10.5.1 Distributed Recovery Basics Distributed Recovery的基础是:异步复制主要分2阶段: phase1: 一个server要加入一个group,首先会选择一个成员作为donar,它主要提供新成员所需要的所有事务日志除此之外,它还会cache住这个group的其他exchange事务一旦从donar的复制结束,对于donar的异步复制通道就会关闭 ,然后这个server 开启第二个步骤,catch up phase2:这个阶段,它会执行之前cache住的exchange,直到这个queue的队列为0,最后宣布这个成员为 online 在恢复过程中,如果在phase1的时候,遇到donar server的错误,那么就会换一个server作为donar开始同步数据如果phase1 donar结束connection的阶段有问题,那么直接开启一个新的connection指向新的donar即可,这都是自动的 18.10.5.2 Recovering From a Point-in-time GTID可以提供哪些日志需要恢复,server已经处理哪些事务,但是它没办法做到标记一个具体的point(组成员进行catch up),也没办法传送certification信息这是binlog view marker做的事情,它可以在binlog stream中标记一个view,也可以打上额外的元数据信息标记(缺失的certification信息) 18.10.5.3 View Changes 这一节主要描述 view change identifier内部是如何在binary log 事件中协调工作的 Begin: Stable Group 所有的成员都是online,且正在处理即将要来的事务有些成员可能落后,但是最后都会追上 View Change: a Member Joins 当一个新的成员需要加入时,这个view就change了,每一个server都在queue一个view change 同时,S4 选择需要在online列表中选择一个server作为donar每一个online server 都讲view change事 写入到了 binlog State Transfer: Catching Up 一旦这个server选择了s2作为donar,那么就会创建一个异步复制通道(之前说过的phase1)来同步数据,直到之前的view change 事件(VC4) 换句话说就是:新成员从donar(s2)中复制数据,直到view change 事件结束(vc4) 当server正从donar中同步数据的同时,它也会cache住从group传来的事务(temporary Applier Buffer)。一旦从donar同步结束,就会切换,选择来应用之前cache的事务 Finish: Caught Up 在 catch up (phase 2) 阶段中,一旦cached事务队列数量变成0,他就会变成online,正式成为其中一员 18.10.5.4 Usage Advice and Limitations of Distributed Recovery 分布式恢复也有一些限制。 由于phase1阶段需要同步大量数据,所以推荐做法是,在加入group的server,应该要选择一个合适的Snapchat或备份(rencent),这样会减少catch-up的phase1的时间 18.10.6 Observability 由于MGR内部很多机制都是自动的,所以你需要了解其中的原理和场景。这样看来,Performance Schema是非常重要的,因为他可以监控和查询相关的MGR场景和状态 18.10.7 Group Replication Performance 这一节主要描述怎么样配置才能让MGR达到最好的性能 18.10.7.1 Fine Tuning the Group Communication Thread 当MGR插件load的时候, group communication thread (GCT)就在循环跑起来了 如果想要强制让GCT来等待,可以使用 group_replication_poll_spin_loops mysql> SET GLOBAL group_replication_poll_spin_loops= 10000; 18.10.7.2 Message Compression 当网络带宽是瓶颈的时候,消息压缩可能提升30-40%的吞吐 默认情况下,压缩是开启的默认是LZ4,阈值是:1000000 bytes 如果设置阈值: STOP GROUP_REPLICATION; SET GLOBAL group_replication_compression_threshold= 2097152; START GROUP_REPLICATION; 以上设置的是2MB,意味着,如果事务产生了2MB的消息,它就会压缩。 取消压缩,设置为group_replication_compression_threshold=0 18.10.7.3 Flow Control 大部分成员确认接收到事务,且同意所有事务的顺序后,MGR才能确保事务commit 如果所有的写事务没有超过这个group任何成员的压力承受极限的时候,一切都运行的很好一旦有部分成员承受不了极限,那么他们就可能落后其他成员 当部分成员落后的时候,就很有可能产生一致性问题,尤其是部分读可能读到的是落后的数据 为了解决这种问题,有一种复制协议机制,他就是流控 流控的队列有两个:1)certification queue 2)binlog applier queue.两大机制:1)monitor机制 2)Throttling机制 18.10.7.3.1 探针和统计 监控机制是建立在group中设置探针,并定期收集数据,阶段性上报信息,来一起分享这些探针数据 探针数据如下: certifier queue 大小 replication applier queue 大小 总认证事务的大小 总远程执行事务的大小(一个member) 总本地事务的大小 18.10.7.3.2 MGR Throttling 一旦达到1)certification queue 2)binlog applier queue.的上限,那么Throttling机制就开启
https://dev.mysql.com/doc/refman/8.0/en/group-replication-frequently-asked-questions.html 一、MGR的成员数量最大是多少 最大9个 二、group中的成员是如何连接的 他们直接是通过peer-to-peer TCP连接,主要用作内部交流和信息传递通过group_replication_local_address 可以设置相关的地址 三、group_replication_bootstrap_group主要用作什么用途 bootstrap flag,主要用作创建一个group,然后扮演一个初始化server的角色第二个成员加入到组,需要问bootstrap server来动态调整配置,以便自己能够顺利加入该组 一个成员bootstrap一个组的场景大概2个: 第一次初始化创建group shutdown,然后重启整个group 四、为了恢复,如何设置credentials 提前配置一个GR的恢复通道credentials,使用CHANGE MASTER TO 语句 五、可以使用MGR来scale-out我的写压力么 a)并不是直接的扩展方式,因为MGR的每一个成员都有完整的数据copy b)但是,其他server并不是做完全一样的写动作,因为MGR通过ROW模式复制,其他server只需要apply row即可,并不是re-executed事务了,因此会快且压力小很多 c)更进一步讲,row-based应用都是经过压缩过的,可以减少很多IO动作,相比master上的执行压力会小很多的 d)总结,你可以scale-out写,在没有写冲突事务的时候在多台服务器上执行事务是可以做到scale-out的。 六、相比普通复制,在相同的负载下,MGR需要更多的网络带宽和cpu计算资源吗 是会有一些额外的压力产生,因为MGR需要不断的沟通协作来保证同步的目的,但是很难计算出高出多少资源 七、可以在广域网部署MGR吗 可以,但是要保证他们的可靠和合适的网络性能低延迟、高吞吐是MGR的基本配备条件 如果网络带宽是问题,可以使用 Section 18.10.7.2, “Message Compression” 方法来降低带宽的所需但是如果网络丢包,导致的数据重传会严重影响性能 八、如果网络临时有问题,组成员会自动重新加入group吗 这取决于是什么网络问题如果网络问题是短暂的,瞬间的,那么MGR的错误检测机制根本还没来得及探测到此问题,那么该成员是不会被移除出组的如果是长时间的问题,那么错误检测机制最终会认为它除了问题,会将此server移除出组 一旦移除出组,你就需要让他重新加入一次,换句话说,你需要手工来处理,或用脚本来自动处理 九、什么时候成员会被排除(excluded)在外 如果一个server变成了孤岛,其他成员会从组配置中将其移除出组一般这种情况发生在 server挂了,或网络disconnect了 在指定的timeout后,这个错误被检测出来,然后一个新的没有该成员的配置会重新生成 十、如果一个节点严重延迟,会产生什么问题 没有一个很好的策略来自动判断什么时候去驱逐一个成员你需要找到为什么它会延迟,并解决它,或移除它否则,当一个server慢到触发流控,然后整个group都会变的慢下来流控可以根据你喜好来配置 十一、有没有一个特殊的成员来负责触发重新更新配置来踢出某个member 没有。每个member都是一样的,你无法控制和设置 十二、是否可以用MGR来sharding 无法对MRG成员进行sharding,但是你可以设计,以MGR作为sharding的一个分片,即: MGR1 是一个分片,MGR2是另外一个分片 十三、是否可以在selinux和iptables环境下使用MGR 可以,需要额外配置和过滤 十四、作为组成员,如何恢复relay-log in replication channel STOP GROUP_REPLICATION, START GROUP_REPLICATION,这样MGR会再次创建一个group_replication_applier 通道 十五、为什么MGR使用2个绑定地址 MGR使用两个绑定地址,主要是为了区分 SQL地址(业务应用ip来连接server) 和 group_replication_local_address (成员内部通信)主要是为了隔离和安全 十六、如何找到primary 如果是single-primary,你可以使用Section 18.4.1.3, “Finding the Primary”的方法,轻松找到primary 方法一、 sql> SELECT MEMBER_HOST, MEMBER_ROLE FROM performance_schema.replication_group_members; +-------------------------+-------------+ | MEMBER_HOST | MEMBER_ROLE | +-------------------------+-------------+ | remote1.example.com | PRIMARY | | remote2.example.com | SECONDARY | | remote3.example.com | SECONDARY | +-------------------------+-------------+ 方法二、 mysql> SHOW STATUS LIKE 'group_replication_primary_member'
https://dev.mysql.com/doc/refman/8.0/en/group-replication-requirements-and-limitations.html 关于Group Replication System Variables这一节没有讲,主要是变量属于工具类,需要查看的时候去搜一下即可 https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html 18.8.1 Group Replication Requirements 需要使用MGR的实例必须满足如下要求 基础设施 InnoDB存储引擎 主键 网络性能 实例配置 开启binlog log-slave-update=on binlog必须是row格式 GTID=on 复制信息必须以table存储 --master-info-repository=TABLE and --relay-log-info-repository=TABLE 事务写集 --transaction-write-set-extraction=XXHASH64 多线程复制开启 1. slave_parallel_type=LOGICAL_CLOCK 2. slave_preserve_commit_order=1 3. slave_parallel_workers= (0~1024) ## 可以配置使用多线程,也可以不使用多线程 18.8.2 Group Replication Limitations 下面列了一些已知的MGR的限制 注意:由于MGR是在GTID的基础上构建的,所以GTID的限制也同样是MGR的限制 Section 17.1.3.6, “Restrictions on Replication with GTIDs”. 复制event的checksums --binlog-checksum=NONE 由于设计的问题,MGR不能使用event的checksums --binlog-checksum=NONE 必须这样设置 Gap locks , 建议设置隔离级别为 READ COMMITTED 由于认证阶段无法使用gap lock,所以建议使用隔离级别为READ COMMITTED,READ COMMITTED 不适用gap locks SERIALIZABLE , MGR不支持SERIALIZABLE隔离级别 并发DDL和DML在同一个对象上的操作,会有问题 举例: A实例 表t进行DDL B实例 表t进行dml 会导致冲突无法检测到,会有很高的风险 这种情况一般在multi-primary模式下容易遇到(因为多实例写嘛的原因嘛),所以DDL要特别小心 外键级联约束 大事务 在5秒钟的世界窗口中如果无法将事务copy到其他成员的话,那么MGR的通信会失败,重传,会有严重影响 建议切分、限制 事务大小 multi-primary的死锁检测 多主模式下,如果使用SELECT .. FOR UPDATE 会导致死锁 主要是lock无法跨越多服务器 复制过滤 MGR中不要使用任何复制的filter
https://dev.mysql.com/doc/refman/8.0/en/group-replication-upgrade.html 这个章节主要描述升级MGR的计划基本的升级MGR成员的方法基本跟单独的实例升级一样(可参考 Section 2.11.1, “Upgrading MySQL”)选择in-place,还是logical方式升级取决于数据量大小通常in-place升级会非常的快速,因此也是官方最推荐的由于MGR是分布式的环境,所以在升级的时候有一些考虑,比如:成员升级的顺序问题等 如果你的MGR环境可以允许offline,那么就参考下列的Group Replication Offline Upgrade 方法如果你的MGR环境需要在online进行,参考Group Replication Online Upgrade方法(极小的downtime) 18.6.1 Group Replication Offline Upgrade 对一个MGR进行offline升级的时候,你需要将成员从group中分别移除掉,然后升级成员,然后重启这个group在 multi-primary环境下,你可以按照任何顺序shutdown组成员在 single-primary环境下,先shutdown secondary成员节点,最后shutdown primary节点如何移除成员节点,你可以参考 Section 18.6.2.3, “Upgrading a Group Replication Member” 一点group变成offline,你可以就想升级单独的实例一样升级他们,参考 Section 2.11.1, “Upgrading MySQL”所有成员升级完毕后,在重启成员 18.6.2 Group Replication Online Upgrade 当你需要在线升级MGR,且不影响你的application,那么你就需要考虑下自己的方法了这一节主要描述online升级的一些考虑,方法,和步骤 18.6.2.1 Online Upgrade Considerations 当你需要online升级的时候,需要考虑如下几个点: 不管哪种升级group的方法,对组成员停写是至关重要的一步,直到它重新加入group 当一个组成员stop的时候,super_read_only 会自动设置成on,但是这个改变不会被写入配置文件,并不持久 当5.7.22或者8.0.11想要加入5.7.21或更低版本的group的时候会失败,因为5.7.21不会发送lower_case_table_names变量的值 18.6.2.2 Combining Group Replication Versions 不同版本的MySQL组合的GROUP可能会存在着不兼容性,这一章主要描述不同组合的最佳实践 如何查看版本 SELECT MEMBER_HOST,MEMBER_PORT,MEMBER_VERSION FROM performance_schema.replication_group_members; +-------------+-------------+----------------+ | member_host | member_port | member_version | +-------------+-------------+----------------+ | example.com | 3306 | 8.0.13 | +-------------+-------------+----------------+ 不同大版本group中的组合成员的规则如下: 如果你跑着一个8.0版本的GR,你需要添加一个成员为5.7的,这样就不行 如果你跑着一个5.7版本的GR,你需要添加一个8.0的成员是可以的,它必须保持read-only模式 不同小版本group中的组合成员的规则如下: 如果是小版本的之间的差异,是可以的随时加入进来的,且可读可写。如果是single-primary group,添加的成员默认是read-only模式 18.6.2.3 Upgrading a Group Replication Member 这一小节主要描述升级组成员版本的基本步骤这里面的步骤是Section 18.6.2.4, “Group Replication Online Upgrade Methods”. 提到步骤的一部分 升级组成员版本的步骤包括:将成员移除组,接下来选择你要升级的方法,然后重新加入升级过成员的groupsingle-primary模式下的推荐升级方法是: 先升级所有的secondaries,然后再primary节点 升级一个组成员的方法: 连接一个成员,然后敲 STOP GROUP_REPLICATION. 在此之前,要确认下该成员状态是offline 通过 replication_group_members 表 设置group_replication_start_on_boot=0 防止成员已启动就自动加入,会有安全隐患(在你还没upgrade mysql之前就自动加入了 等等情况) 使用 mysqladmin shutdown关闭该成员,其他成员继续保持running 使用in-place方式升级该成员,由于你没有设置group_replication_start_on_boot=1,所以重新启动升级过的成员时,它不会自动加入MGR 一旦你使用mysql_upgrade升级成功后,再将 group_replication_start_on_boot 设置为1,这样可以确保之后重启服务器的时候可以自动加入进来 链接到升级成功过后的该成员,敲 START GROUP_REPLICATION.重新加入group。该server的元数据会自动重新配置,且开始追数据,一旦数据追完,它将变成online状态 当升级成功的成员加入到group中的时候,只要group中还有早期的的版本成员在,那么该成员都会自动被设置成 super_read_only=on,不管它是primary还是secondary这样可以保证升级后的成员不会有写,直到所有的版本全部一致但是如果是multi-primary模式的group,一旦确认升级成功,这个group就可以处理事务,所以该模式下人工配置哪个可以写,哪个不可以写是非常重要的步骤 SET GLOBAL read_only=off; 18.6.2.4 Group Replication Online Upgrade Methods Rolling In-Group Upgrade 对于single-primary的group,一旦所有secondary节点都升级了,然后primary节点从group中移除出去升级的时候,一个新的primary节点会自动被选择出来对于multi-primary的group,直到所有成员都被升级了,你才需要手动的给所有成员设置 super_read_only=OFF对于multi-primary的group,在上述过程中,所有primaries被降级,会降低可用性。但是在single-primary模式中,就不会有影响 Rolling Migration Upgrade 这个方法就是:你从组成员中移除成员,然后升级,然后用他们创建第二个group对于multi-primary的group,在上述过程中,所有primaries被降级,会降低可用性。但是在single-primary模式中,就不会有影响 升级过程中,由于新版本的group为了追上老版本group的数据,因此在新版本的group中需要配置成老版本group中的slave角色对于single-primary的group,该slave的角色,也必须是新版本group的primary角色对于multi-primary的group,该slave的橘色,可以是任何一个primary角色 方法基本如下: 在origin group中一个个的移除成员,参考 Section 18.6.2.3, “Upgrading a Group Replication Member” 升级成员的版本 , 参考 Section 2.11.1, “Upgrading MySQL”. 使用升级过的成员,创建一个新的group。你需要配置一个新的group name,因为老的name还在运行。 创建一个异步复制通道在新老group中。老的group的priamry作为master,新的group成员作为GTID-based slave 在你切换应用之前,你必须确保你的新的group有比较合适的成员数量敲SELECT * FROM performance_schema.replication_group_members来比对新老成员数量大小最后,如果数据都同步完了,那么就可以停止复制,切换应用了 Rolling Duplication Upgrade 这个方法主要描述的是,如果在不减少原来group数量的同时,构建新group因为很多时候,multi-primary都在提供业务,是不允许减少节点的 该处理方法是: 部署合适数量成员的新group 对老group的成员进行备份 使用这个备份进行升级,参考 Section 18.6.2.5, “Group Replication Upgrade with mysqlbackup” 使用升级好server进行构建一个新的group 创建一个异步复制通道在新老group中。老的group的priamry作为master,新的group成员作为GTID-based slave 一旦新老group直接的数据差异越来越小,小到很快就能追上,那么就可以重新指向业务application 18.6.2.5 Group Replication Upgrade with mysqlbackup 步骤如下: 使用mysqlbackup来备份老group的成员 部署一个跟备份一样版本的成员实例 使用mysqlbackup来恢复一个新成员实例 在新实例上升级
https://dev.mysql.com/doc/refman/8.0/en/group-replication-security.html 18.5.1 IP Address Whitelisting MGR有个配置项可以决定哪些server可以被GR接受,它就是group_replication_ip_whitelist如果你在s1设置了这个选项,然后s2给s1发送连接想要加入这个组,那么s1就会去whitelist中检查,如果s2在选项中,那么允许,如果不在选项中,那么拒绝 如果你没有显示的配置whitelist,那么MGR引擎(XCom)就会自动根据host的活跃接口自动配置一个私有的子网这些地址和localhost ip地址会被自动创建自动生产的whitelist因此包括host中发现的任意网段 IPv4 (as defined in RFC 1918) 10/8 prefix (10.0.0.0 - 10.255.255.255) - Class A 172.16/12 prefix (172.16.0.0 - 172.31.255.255) - Class B 192.168/16 prefix (192.168.0.0 - 192.168.255.255) - Class C IPv6 (as defined in RFC 4193 and RFC 5156) fc00:/7 prefix - unique-local addresses fe80::/10 prefix - link-local unicast addresses 127.0.0.1 - localhost for IPv4 ::1 - localhost for IPv6 如果你要手动指定某些whitelist,就需要使用 group_replication_ip_whitelist 选项,必须先停止MGR mysql> STOP GROUP_REPLICATION; mysql> SET GLOBAL group_replication_ip_whitelist="192.0.2.21/24,198.51.100.44,203.0.113.0/24,2001:db8:85a3:8d3:1319:8a2e:370:7348,example.org,www.example.com/24"; mysql> START GROUP_REPLICATION; 为了方便,推荐每个服务器都配置相同的whitelist,并且都包括所有 group_replication_group_seeds 的成员 18.5.2 Secure Socket Layer Support (SSL) 这个用的不多,暂时不讲,有需要自己看官方文档
https://dev.mysql.com/doc/refman/8.0/en/group-replication-monitoring.html 使用Perfomance Schema来监控MGR MGR主要添加了这两个表 performance_schema.replication_group_member_stats performance_schema.replication_group_members 关于MGR复制相关的表 performance_schema.replication_connection_status performance_schema.replication_applier_status MGR创建了两个复制通道 group_replication_recovery: 主要是分布式恢复阶段的replication changes group_replication_applier:主要用作来组group的 incoming changes 18.3.1 Group Replication Server States 如果servers之间协作正常,那么看到的state都是一样的但是,一旦发生网络分区,或者有server挂掉并脱离group,那么不同信息就会被报告出来如果一个server离开了这个group,那么它就不能上报其他server的状态信息如果发生了网络分区,那么仲裁法定人数就缺少,servers之间就不能很好的协作,他们只能猜测其他server的状态并报告为unreachable Table 18.1 Server State 字段 描述 组同步 ONLINE 用户可正常连接和执行事务 yes RECOVERING 正在从donar服务器同步数据 no OFFLINE 插件已经装载,但是该成员不属于任何组 no ERROR 无论是recovery阶段,还是应用事务更新,表示遇到错误了 no UNREACHABLE 失联了 no 重要:一旦实例的状态变成了ERROR,super_read_only 会被设置成on如果ERROR状态消失,需要人工介入调整super_read_only=OFF 注意:MGR不是强同步的,但是最终会一致的确切的说:事务会按照相同的顺序发送给这个group的所有成员,但是事务的执行、commit完全由成员自行处理,并不是同步进行的 18.3.2 The replication_group_members Table performance_schema.replication_group_members 这个表主要用来监控不同成员的状态表里面的信息会自动更新,如果有新成员的加入或离开每个成员的元数据信息都是共享的,可以被其他成员随时查到这个表主要是在比较高的层面来看复制group的一些状态信息,比如: SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+ | group_replication_applier | 041f26d8-f3f3-11e8-adff-080027337932 | example1 | 3306 | ONLINE | SECONDARY | 8.0.13 | | group_replication_applier | f60a3e10-f3f2-11e8-8258-080027337932 | example2 | 3306 | ONLINE | PRIMARY | 8.0.13 | | group_replication_applier | fc890014-f3f2-11e8-a9fd-080027337932 | example3 | 3306 | ONLINE | SECONDARY | 8.0.13 | +---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+ 从上面的输出可以看出: 这个组由3个成员组成,每个成员的host、port、server-uuid一清二楚MEMBER_STATE显示他们都是online状态MEMBER_ROLE这列显示 有2个secondaries,1个primary,因此这个group是一个single-primary 模式的GRMEMBER_VERSION这列在某些场景对你非常重要,比如:你需要升级一个group,或者将不同mysql版本的server组合在一起的时候 18.3.3 Replication_group_member_stats 每个组的成员认证和执行事务两步关于认证和执行事务的一些统计信息对明白applier queue(有多少冲突被发现了,多少事务被check了,哪些事务被commit了 等等)的增长非常有用 performance_schema.replication_group_member_stats 提供了group-level 级别的认证、统计等很多信息这里面的信息是所有成员共享的,任何成员都能查得到值得注意的是:刷新远程成员的统计信息是根据 group_replication_flow_control_period选项,所以在本地查的信息可能互相有点延迟,有点差异是正常现象 Table 18.2 replication_group_member_stats 字段 描述 Channel_name GR通道的名称 View_id group的当前view id Member_id 成员的uuid Count_transactions_in_queue 需要被检测的冲突事务数量 Count_transactions_checked 已经被检测为冲突的事务数量 Count_conflicts_detected 没有通过冲突检测的事务数量 Count_transactions_rows_validating 冲突检测数据库的大小 Transactions_committed_all_members 所有成员都commit成功的事务集 Last_conflict_free_transaction The transaction identifier of the last conflict free transaction checked. Count_transactions_remote_in_applier_queue 有多少远程事务在队列里面需要被执行 Count_transactions_remote_applied 已经被执行过的远程事务数量 Count_transactions_local_proposed 本地产生的需要被其他远程成员执行的事务数量 Count_transactions_local_rollback 本地产生的事务,有多少是发送给其他成员,后面又被自己rollback的事务数量 这些信息对监控MGR非常重要举个例子:假设这个group中的一个成员延迟了,无法跟上其他成员,那么你会看到queue里面有很多事务通过以上信息,你可以决定是要移除这个成员,还是延迟在其他成员中处理这些事务来减少这个队列中的事务数量通过以上信息,也能帮助你决定是否需要开启MGR的流控措施
https://dev.mysql.com/doc/refman/8.0/en/group-replication-getting-started.html MGR 作为一个Server插件提供支持的,每个group的server都需要配置和加载这个插件这一章主要教大家在三节点的MGR环境下,怎么一步步搭建起来的 18.2.1 Deploying Group Replication in Single-Primary Mode 每个group中的实例既可以是在单独的物理机部署,也可以在同一台物理机部署这一节主要描述怎么在同一个物理机部署MGR Figure 18.4 Group Architecture 这个教程主要描述如何部署MGR,在构建MGR前如何配置每个实例,以及如何使用Performance Schema 来监控MGR正确运行 18.2.1.1 Deploying Instances for Group Replication 第一步:部署三个MySQL实例由于接下来的步骤是在同一台物理机搭建多个实例的,因此每个MySQL实例都必须要指定一个特定的数据目录 mkdir data mysql-8.0/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-8.0 --datadir=$PWD/data/s1 mysql-8.0/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-8.0 --datadir=$PWD/data/s2 mysql-8.0/bin/mysqld --initialize-insecure --basedir=$PWD/mysql-8.0 --datadir=$PWD/data/s3 在data/s1,data/s2,data/s3 里面都是初始化好的数据目录,里面有mysql 系统库等等 warnings: 不要在生产环境使用--initialize-insecure ,这里只是用来简化教程的,详情请看: Section 18.5, “Group Replication Security”. 18.2.1.2 Configuring an Instance for Group Replication Group Replication Server Settings 安装和使用MGR插件,你必须正确配置MySQL Server才行下面的配置将是你的MGR第一个实例的配置S1 [mysqld] # server configuration datadir=<full_path_to_data>/data/s1 basedir=<full_path_to_bin>/mysql-8.0/ port=24801 socket=<full_path_to_sock_dir>/s1.sock 如果你的三个实例都在一个机器上,那么你应该配置report_host=127.0.0.1 , 让其互相可联系 Replication Framework 接下来的配置就是复制 所需要的 server_id=1 gtid_mode=ON enforce_gtid_consistency=ON binlog_checksum=NONE 如果你使用的版本低于8.0.3(8.0.3默认配置可以满足复制要求),那么需要在配置如下选项 log_bin=binlog log_slave_updates=ON binlog_format=ROW master_info_repository=TABLE relay_log_info_repository=TABLE Group Replication Settings 接下来的配置,就是组复制所需要的了 transaction_write_set_extraction=XXHASH64 group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" group_replication_start_on_boot=off group_replication_local_address= "127.0.0.1:24901" group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903" group_replication_bootstrap_group=off a) transaction_write_set_extraction=XXHASH64 : 表示使用XXHASH64算法来编码write set b) group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" : 告诉插件这个组已经创建了,它的名字是aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa group_replication_group_name 的值必须是UUID,可以使用select UUID()来生产 c) group_replication_start_on_boot=off :表示当server开启的时候,并不自动开启MGR。 d)group_replication_local_address= "127.0.0.1:24901" : 告诉插件用127.0.0.1:24901进行内部通信,不是用来给业务查询的哦 推荐的端口是:33061 ,教程中是24901,因为是部署在同一台机器上 e)group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903": 这里面列的ip,port是给该组新成员使用的,叫做种子成员。 在performance_schema.replication_group_members能查到 当开启组复制的时候,它是不会使用group_replication_group_seeds选项的,因为该机器是负责引导这个组的 换句话说,任何引导server的数据都是给其他加入成员的server服务的 第二个加入到组成员的server都必须询问,只有组成员列表的成员才能加入,任何缺少的数据都可以问负责引导的成员server获取,随后就加入到了这组group 第三个server可以询问前两个server成员中的任意一个询问、并同步数据 随后的server都是以同样的步骤来加入组 一个即将加入的成员必须跟种子成员(group_replication_group_seeds)进行通信 f)group_replication_bootstrap_group=off: 说明插件是否进行引导 重要: 这个选项只能使用一次,否则会出现脑裂的可能。当第一个server引导成online后,应该讲其从on变为off 配置这个group的其他server实例跟以上的方法非常相似,需要改变下特殊的选项如(server_id, datadir, group_replication_local_address) 18.2.1.3 User Credentials MGR需要一个group_replication_recovery的复制通道来完成节点之间的数据恢复以及补偿所以,这一节主要讲group_replication_recovery 开启server使用这个配置文件: mysql-8.0/bin/mysqld --defaults-file=data/s1/s1.cnf 创建一个mysql用户,具有 REPLICATION-SLAVE 权限如果你想避免这个grant操作在其他server也发生,可以如下配置 mysql> SET SQL_LOG_BIN=0; 相关创建用户的命令: mysql> CREATE USER rpl_user@'%' IDENTIFIED BY 'password'; mysql> GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%'; mysql> FLUSH PRIVILEGES; 如果之前这是了sql-log-bin,那么现在需要恢复原状 mysql> SET SQL_LOG_BIN=1; 使用change master命令来配置group_replication_recovery mysql> CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='password' \\ FOR CHANNEL 'group_replication_recovery'; 分布式recovery是加入一个组的第一步,用来获取自己没有的事务如果这个group_replication_recovery通道没有配置正确,那么此server将不能从donar member中获取事务来进行数据同步恢复,因此就加入组失败 18.2.1.4 Launching Group Replication s1配置正确后,接下来在sever执行如下命令 INSTALL PLUGIN group_replication SONAME 'group_replication.so'; 重要:在你load MGR前,mysql.session(8.0.2引入)用户必须要存在,如果你的数据字典表是老版本,那么需要mysql_upgrade,否则会报错 There was an error when trying to access the server with user: mysql.session@localhost. Make sure the user is present in the server and that mysql_upgrade was ran after a server update.. 可以通过如下命令来检测pugin是否正确 mysql> SHOW PLUGINS; +----------------------------+----------+--------------------+----------------------+-------------+ | Name | Status | Type | Library | License | +----------------------------+----------+--------------------+----------------------+-------------+ | binlog | ACTIVE | STORAGE ENGINE | NULL | PROPRIETARY | (...) | group_replication | ACTIVE | GROUP REPLICATION | group_replication.so | PROPRIETARY | +----------------------------+----------+--------------------+----------------------+-------------+ 开启group,在s1作为引导server,并开启MGR引导过程,只能在一个server上设置,而且只能一次这就是为什么配置文件中设置为off的原因 SET GLOBAL group_replication_bootstrap_group=ON; START GROUP_REPLICATION; SET GLOBAL group_replication_bootstrap_group=OFF; 一旦START GROUP_REPLICATION;成功,这个group就算启动成功了 , 你可以这样来check mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+---------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+---------------+ | group_replication_applier | ce9be252-2b71-11e6-b8f4-00212844f856 | myhost | 24801 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+---------------+ 为了论证它确实OK了,可以如下: mysql> CREATE DATABASE test; mysql> USE test; mysql> CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 TEXT NOT NULL); mysql> INSERT INTO t1 VALUES (1, 'Luis'); mysql> SELECT * FROM t1; +----+------+ | c1 | c2 | +----+------+ | 1 | Luis | +----+------+ mysql> SHOW BINLOG EVENTS; +---------------+-----+----------------+-----------+-------------+--------------------------------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +---------------+-----+----------------+-----------+-------------+--------------------------------------------------------------------+ | binlog.000001 | 4 | Format_desc | 1 | 123 | Server ver: 8.0.2-gr080-log, Binlog ver: 4 | | binlog.000001 | 123 | Previous_gtids | 1 | 150 | | | binlog.000001 | 150 | Gtid | 1 | 211 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1' | | binlog.000001 | 211 | Query | 1 | 270 | BEGIN | | binlog.000001 | 270 | View_change | 1 | 369 | view_id=14724817264259180:1 | | binlog.000001 | 369 | Query | 1 | 434 | COMMIT | | binlog.000001 | 434 | Gtid | 1 | 495 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:2' | | binlog.000001 | 495 | Query | 1 | 585 | CREATE DATABASE test | | binlog.000001 | 585 | Gtid | 1 | 646 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:3' | | binlog.000001 | 646 | Query | 1 | 770 | use `test`; CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 TEXT NOT NULL) | | binlog.000001 | 770 | Gtid | 1 | 831 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:4' | | binlog.000001 | 831 | Query | 1 | 899 | BEGIN | | binlog.000001 | 899 | Table_map | 1 | 942 | table_id: 108 (test.t1) | | binlog.000001 | 942 | Write_rows | 1 | 984 | table_id: 108 flags: STMT_END_F | | binlog.000001 | 984 | Xid | 1 | 1011 | COMMIT /* xid=38 */ | +---------------+-----+----------------+-----------+-------------+--------------------------------------------------------------------+ 18.2.1.5 Adding Instances to the Group 现在,group已经有一个成员s1了,也有一些数据在里面。现在是时候在给这个group扩展之前配置的server了 18.2.1.5.1 Adding a Second Instance 添加第二个实例 为了给这个group添加第二个实例S2,首先要创建一个配置文件这个配置文件跟s1类似,除了一些位置和目录信息、port、serverid 之外 [mysqld] # server configuration datadir=<full_path_to_data>/data/s2 basedir=<full_path_to_bin>/mysql-8.0/ port=24802 socket=<full_path_to_sock_dir>/s2.sock # # Replication configuration parameters # server_id=2 gtid_mode=ON enforce_gtid_consistency=ON binlog_checksum=NONE # # Group Replication configuration # transaction_write_set_extraction=XXHASH64 group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" group_replication_start_on_boot=off group_replication_local_address= "127.0.0.1:24902" group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903" group_replication_bootstrap_group= off 开启server mysql-8.0/bin/mysqld --defaults-file=data/s2/s2.cnf 给group_replication_recovery 配置 recovery credentials SET SQL_LOG_BIN=0; CREATE USER rpl_user@'%' IDENTIFIED BY 'password'; GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%'; SET SQL_LOG_BIN=1; CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='password' \\ FOR CHANNEL 'group_replication_recovery'; 安装MGR 插件 mysql> INSTALL PLUGIN group_replication SONAME 'group_replication.so'; 将s2加入到group , 跟之前不一样的是:s2不需要设置group_replication_bootstrap_group=on了,因为s1已经引导过一次了 mysql> START GROUP_REPLICATION; 检测MGR是否加入了s2 mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+---------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+---------------+ | group_replication_applier | 395409e1-6dfa-11e6-970b-00212844f856 | myhost | 24801 | ONLINE | | group_replication_applier | ac39f1e6-6dfa-11e6-a69d-00212844f856 | myhost | 24802 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+---------------+ 如果s2标记为online,那么它必须要跟s1的数据自动保持一致。 请如下确认下 mysql> SHOW DATABASES LIKE 'test'; +-----------------+ | Database (test) | +-----------------+ | test | +-----------------+ mysql> SELECT * FROM test.t1; +----+------+ | c1 | c2 | +----+------+ | 1 | Luis | +----+------+ mysql> SHOW BINLOG EVENTS; +---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+ | binlog.000001 | 4 | Format_desc | 2 | 123 | Server ver: 8.0.3-log, Binlog ver: 4 | | binlog.000001 | 123 | Previous_gtids | 2 | 150 | | | binlog.000001 | 150 | Gtid | 1 | 211 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1' | | binlog.000001 | 211 | Query | 1 | 270 | BEGIN | | binlog.000001 | 270 | View_change | 1 | 369 | view_id=14724832985483517:1 | | binlog.000001 | 369 | Query | 1 | 434 | COMMIT | | binlog.000001 | 434 | Gtid | 1 | 495 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:2' | | binlog.000001 | 495 | Query | 1 | 585 | CREATE DATABASE test | | binlog.000001 | 585 | Gtid | 1 | 646 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:3' | | binlog.000001 | 646 | Query | 1 | 770 | use `test`; CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 TEXT NOT NULL) | | binlog.000001 | 770 | Gtid | 1 | 831 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:4' | | binlog.000001 | 831 | Query | 1 | 890 | BEGIN | | binlog.000001 | 890 | Table_map | 1 | 933 | table_id: 108 (test.t1) | | binlog.000001 | 933 | Write_rows | 1 | 975 | table_id: 108 flags: STMT_END_F | | binlog.000001 | 975 | Xid | 1 | 1002 | COMMIT /* xid=30 */ | | binlog.000001 | 1002 | Gtid | 1 | 1063 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:5' | | binlog.000001 | 1063 | Query | 1 | 1122 | BEGIN | | binlog.000001 | 1122 | View_change | 1 | 1261 | view_id=14724832985483517:2 | | binlog.000001 | 1261 | Query | 1 | 1326 | COMMIT | +---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+ 如果s2的数据和s1的数据一样,那么说明s2真的加入成功了 18.2.1.5.2 Adding Additional Instances 添加其他的实例 添加第三个和其他的server加入到group的步骤跟添加s2是一模一样的,除了一些变量之外下面罗列下步骤 1) Create the configuration file [mysqld] # server configuration datadir=<full_path_to_data>/data/s3 basedir=<full_path_to_bin>/mysql-8.0/ port=24803 socket=<full_path_to_sock_dir>/s3.sock # # Replication configuration parameters # server_id=3 gtid_mode=ON enforce_gtid_consistency=ON binlog_checksum=NONE # # Group Replication configuration # group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" group_replication_start_on_boot=off group_replication_local_address= "127.0.0.1:24903" group_replication_group_seeds= "127.0.0.1:24901,127.0.0.1:24902,127.0.0.1:24903" group_replication_bootstrap_group= off 2) Start the server mysql-8.0/bin/mysqld --defaults-file=data/s3/s3.cnf 3) Configure the recovery credentials for the group_replication_recovery channel. SET SQL_LOG_BIN=0; CREATE USER rpl_user@'%' IDENTIFIED BY 'password'; GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%'; FLUSH PRIVILEGES; SET SQL_LOG_BIN=1; CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='password' \\ FOR CHANNEL 'group_replication_recovery'; 4) Install the Group Replication plugin and start it. INSTALL PLUGIN group_replication SONAME 'group_replication.so'; START GROUP_REPLICATION; 5) 检查 mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+---------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+---------------+ | group_replication_applier | 395409e1-6dfa-11e6-970b-00212844f856 | myhost | 24801 | ONLINE | | group_replication_applier | 7eb217ff-6df3-11e6-966c-00212844f856 | myhost | 24803 | ONLINE | | group_replication_applier | ac39f1e6-6dfa-11e6-a69d-00212844f856 | myhost | 24802 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+---------------+ 6) 确认数据是否ok mysql> SHOW DATABASES LIKE 'test'; +-----------------+ | Database (test) | +-----------------+ | test | +-----------------+ mysql> SELECT * FROM test.t1; +----+------+ | c1 | c2 | +----+------+ | 1 | Luis | +----+------+ mysql> SHOW BINLOG EVENTS; +---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+ | binlog.000001 | 4 | Format_desc | 3 | 123 | Server ver: 8.0.3-log, Binlog ver: 4 | | binlog.000001 | 123 | Previous_gtids | 3 | 150 | | | binlog.000001 | 150 | Gtid | 1 | 211 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1' | | binlog.000001 | 211 | Query | 1 | 270 | BEGIN | | binlog.000001 | 270 | View_change | 1 | 369 | view_id=14724832985483517:1 | | binlog.000001 | 369 | Query | 1 | 434 | COMMIT | | binlog.000001 | 434 | Gtid | 1 | 495 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:2' | | binlog.000001 | 495 | Query | 1 | 585 | CREATE DATABASE test | | binlog.000001 | 585 | Gtid | 1 | 646 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:3' | | binlog.000001 | 646 | Query | 1 | 770 | use `test`; CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 TEXT NOT NULL) | | binlog.000001 | 770 | Gtid | 1 | 831 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:4' | | binlog.000001 | 831 | Query | 1 | 890 | BEGIN | | binlog.000001 | 890 | Table_map | 1 | 933 | table_id: 108 (test.t1) | | binlog.000001 | 933 | Write_rows | 1 | 975 | table_id: 108 flags: STMT_END_F | | binlog.000001 | 975 | Xid | 1 | 1002 | COMMIT /* xid=29 */ | | binlog.000001 | 1002 | Gtid | 1 | 1063 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:5' | | binlog.000001 | 1063 | Query | 1 | 1122 | BEGIN | | binlog.000001 | 1122 | View_change | 1 | 1261 | view_id=14724832985483517:2 | | binlog.000001 | 1261 | Query | 1 | 1326 | COMMIT | | binlog.000001 | 1326 | Gtid | 1 | 1387 | SET @@SESSION.GTID_NEXT= 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:6' | | binlog.000001 | 1387 | Query | 1 | 1446 | BEGIN | | binlog.000001 | 1446 | View_change | 1 | 1585 | view_id=14724832985483517:3 | | binlog.000001 | 1585 | Query | 1 | 1650 | COMMIT | +---------------+------+----------------+-----------+-------------+--------------------------------------------------------------------+
https://dev.mysql.com/doc/refman/8.0/en/group-replication-background.html 这一章主要描述一些组复制的背景 构建一个容错系统最常用的方法就是让组件冗余,换句话说就是组件即便被移除掉,整个系统还是能够正常对外提供服务这无疑在不同层面上提出了更多的挑战需要注意的是,复制结构的数据库系统必须思考的一个事实就是:他们需要维护和管理一堆不同的sever此外,他们还必须解决分布式系统所面临的问题:比如 脑裂、网络分区等等 因此,最大的挑战就是去融合这种逻辑数据库,保证数据复制的一致性换句话说,为了让不同server都同意这个系统的状态,他们每一台server的数据修改都必须验证一致这就意味着他们需要运作的想一个状态机一样(分布式) MySQL Group Replication提供了一套分布式状态机制复制管理方法 对于要提交的事务,这个group采取大部分原则来投票,让事务全局有序决定commit还是拒绝这个事务都是由server自行判断,但是所有servers都会做出一样的决定如果网络产生了分区,脑裂产生导致成员之间无法达成一致投票决定,那么这个系统会停止运行直到这个问题被解决所以,他有一个内置、自动的脑裂包含机制在运行 以上所有的功能都是由Group Communication System (GCS) 协议来保证它有错误检测机制、组成员通信服务、安全可靠的顺序一致消息分发所有这些特性是搭建一个 数据完全一致性的系统 的关键要素在一些非常核心重要的技术点上 罗列了Paxos 算法的实现,它扮演着组复制通信引擎的角色,至关重要 18.1.1 Replication Technologies 在了解MGR内幕之前,这里先主要介绍下相关的背景概念、以及概述这章主要告诉我们,MGR需要什么,以及传统的异步复制和MGR直接的一些区别 18.1.1.1 Primary-Secondary Replication 传统的复制提供了一个简单的主从复制架构(Primary-Secondary)primary就是master,secondary就是slaves,可以有多个slavesmaster执行事务、commit事务,然后异步的将这些事务发送到slaves,让他们re-executed一遍(statement模式)或者 重新applied (ROW模式)它是share-nothing架构,即所有server都有一份完整的数据copy 还有一种传统复制叫:半同步复制它意味着:在commit的之前,master等待,直到slaves给master一个确认接收到事务的ack,master才恢复commit的操作 在上面的两幅图中,你能看到异步传统复制协议的基本架构,箭头代表client消息的流动和转变 18.1.1.2 Group Replication 组复制是一个实现了容错系统的技术组复制集群就是一堆机器,他们之间通过消息进行沟通communication 层:提供了一系列的保障机制,atomic message(原子广播) , total order message delivery(全局序列消息分发机制) MGR在此基础上构建并实现了一个multi-master的复制协议,它可以在任何server上写数据集群的本质就是多server,每个server可以独立的处理事务但是所有的读写(RW)事务都必须经过集群的审核所有的只读(RO)事务不受任何影响换句话说,对于RW事务,group只需要决定它应该commit还是拒绝commit,因此事务操作并不是单方面(origi server)的决定确切的说,当origin server准备进行事务commit的时候,这个server会自动广播这个写集然后,一个全局排序的事务产生了这意味着,所有的server都接收同样顺序的事务集由于是有序的,所有server应用相同顺序,相同数据的写集,因此他们的数据也是一致的 然而,如果是并发写在不同server的场景会遇到冲突因此,对应这种情况需要进行冲突检测,这个过程叫做认证 certification如果两个并发事务在不同server同时执行,并且更新了相同的row,那么他们就是冲突的那么它的解决方案就是,排在前面的事务会被标记commit,排在后面的会被拒绝(这个事务在origin server会回滚,其他server会被丢弃) 最后,MGR也是一种share-nothing架构,每个server都有一份完整的数据copy 18.1.2 Group Replication Use Cases 组复制提供了一个高容错性的系统,即使一些机器宕机,只要不是所有或者大多数机器不可用,那么整个系统还是可用状态总结下来,MGR保证数据库持续可用 18.1.2.1 Examples of Use Case Scenarios 以下就是典型的MGR使用案例 Elastic Replication 可伸缩的复制 Highly Available Shards 高可用的分片 Alternative to Master-Slave replication 可选择master-slave架构 Autonomic Systems 完全自动化的系统 18.1.3 Group Replication Details 18.1.3.1 Failure Detection 它提供一个错误检测机制,可以找到或报告出哪些servers没有回应,哪些server挂了在高一层次来将,错误检测机制就是一个分布式服务,用于提供哪些server挂掉或可能挂掉的情报信息之后,如果组成员通过某种协议认证了这个嫌疑犯(可能挂掉的家伙)已经真的挂了,那么集群就会决定这个嫌疑犯真的的确挂了这意味着,组的其他成员一致决定将这个嫌疑犯踢出集群 当Server A 在指定time-out时间内没有收到来自server B的回应,那么B就会被提升为嫌疑犯如果一个Server被其他group成员隔离,那么它就会怀疑所有其他的成员都挂了由于它不能达成投票的一致性认可(没有达到法定人数的确认),所以它认为的嫌疑犯就不能被确认为failed如果一个Server在这种情况下被隔离,那么他是不能执行local事务的 18.1.3.2 Group Membership MGR依赖组会员服务(Group Membership Service,简称GMS),它是内置的它定义了哪些servers是online并加入了这个group,这些online servers经常被称为view因此,这个组里面的任一online成员都有一个一致的view 如果servers同意让一个新server加入到这个group中来,那么这个group就好重新自动将其配置上,且重新触发形成一个新的view如果一个server非自愿的离开了group,那么错误检测机制就开始识别,也会重新配置上一个新的view上面提到的这些都需要一个协议,并且需要大多数人参与并认可的协议如果这个group没有满足达到这个协议认可的要求,那么自动配置将不会起作用,并且该系统会被阻塞来防止脑裂的产生最后,这意味着管理员需要手动介入来解决这个问题 18.1.3.3 Fault-tolerance MGR 是在Paxos分布式算法构建实现的,以此它需要满足大多数活跃成员进行投票选举的策略。有一个公式:n = 2 x f + 1 , n代表group的成员数,f代表允许挂掉的成员数 ,在这个公式下,整个集群是安全的 如果n=3,那么允许挂掉的server是1,也能满足要求,但是如果再挂一个呢, 其实就问题非常大了 集群成员数量n majority 可允许挂掉的server数量 1 1 0 2 2 0 3 2 1 4 3 1 5 3 2 6 4 2 7 4 3
https://dev.mysql.com/doc/refman/8.0/en/group-replication.html 目录 18.1 Group Replication Background 18.2 Getting Started 18.3 Monitoring Group Replication 18.4 Group Replication Operations 18.5 Group Replication Security 18.6 Upgrading Group Replication 18.7 Group Replication System Variables 18.8 Requirements and Limitations 18.9 Frequently Asked Questions 18.10 Group Replication Technical Details 本文是基于MySQL8.0 官方文档翻译而成,大概10个章节 这个章节主要描述MySQL组复制以及如何安装、配置和监控。MySQL组复制是Server层的一个plugin,它是一个灵活、高可用、高容错的复制技术 Groups可以是一个 single-primary 模式,它提供自动选主,且同时只能有一个server是可写状态,其他都是read-only另外,对于更高端的用户,Groups还提供了 multi-primary 模式,即多个server可写, 即便这些写是并发的也没关系 Group Replication提供了一个内置的group membership服务,它可以保证这个组中的每一个server在任意时间点都是一致和可用的Servers 可以主动加入和主动退出这个组,相应的这个view也会随之自动更改有的时候,servers遇到故障会被迫自动退出group,那么这个view也会更改,这全是自动的 这个章节的结构如下: Section 18.1, “组复制的背景” 介绍组复制是如何工作的 Section 18.2, “组复制的开始” 介绍如何配置多实例MySQL来创建一个group Section 18.3, “组复制的监控” 介绍如何监控一个group Section 18.4, “组复制的操作” 介绍如何使用组复制 Section 18.5, “组复制的安全” 介绍如何让一个组复制更加安全 Section 18.6, “组复制的升级” 如何升级组复制 Section 18.10, “组复制的详细技术细节” 介绍组复制的核心理论
最近很长时间,陆续有不少朋友跟我说他们的MySQL经常重启,卡住,然后结了一堆报错信息。 正好,自己之前也遇到过大批量的MySQL hang和innobackupex备份卡住的问题,一直没时间写,现在就分享下自己遇到的问题,希望后面的人可以避免 好了,直接上图实战 症状 MySQL hang Innobackupex hang 相关跟踪 结论 如果是centos 2.6.32-504 内核 + MySQL社区版本 , 那么大概率会出现这个问题 修复方案:yum update kernal 小版本升级到 2.6.32-504.23.4 可以解决此问题 相关链接:https://ma.ttias.be/linux-futex_wait-bug/https://blogs.oracle.com/poonam/hung-jvm-due-to-the-threads-stuck-in-pthreadcondtimedwait
一、环境 MySQL版本:MySQL5.7.22 表结构: CREATE TABLE `crm_report_accounting_income` ( `id` int(10) NOT NULL AUTO_INCREMENT, `contract_id` int(10) NOT NULL, `contract_no` varchar(50) NOT NULL, `date` int(8) NOT NULL, `city_id` int(11) NOT NULL DEFAULT '0' COMMENT '城市id', `city_name` varchar(50) DEFAULT NULL, `adviser_id` int(10) NOT NULL, `adviser_name` varchar(50) DEFAULT NULL, `accounting` decimal(15,2) NOT NULL COMMENT 'xx', `receivable` decimal(15,2) NOT NULL DEFAULT '0.00' COMMENT '当xx', `contract_type` tinyint(1) NOT NULL DEFAULT '1' COMMENT '1:xx合同;2:xx合同;3:xx合同', PRIMARY KEY (`id`), KEY `contract_id` (`contract_id`), KEY `date` (`date`), KEY `city_id` (`city_id`) ) ENGINE=InnoDB AUTO_INCREMENT=734525 DEFAULT CHARSET=utf8 二、业务问题 * 基本信息,由于合同号太多,所以这边就以一个有重复数据的合同id为例 dba:aif_db> select contract_id,contract_no,receivable,date from crm_report_accounting_income_2015_online where contract_id = 27310; +-------------+----------------------------+------------+----------+ | contract_id | contract_no | receivable | date | +-------------+----------------------------+------------+----------+ | 27310 | A00-SHEN-05-2018-06-004613 | 2941.18 | 20180628 | | 27310 | A00-SHEN-05-2018-06-004613 | 5882.36 | 20180629 | | 27310 | A00-SHEN-05-2018-06-004613 | 8823.54 | 20180630 | | 27310 | A00-SHEN-05-2018-06-004613 | 11764.72 | 20180701 | | 27310 | A00-SHEN-05-2018-06-004613 | 14705.90 | 20180702 | | 27310 | A00-SHEN-05-2018-06-004613 | 17647.08 | 20180703 | | 27310 | A00-SHEN-05-2018-06-004613 | 20588.26 | 20180704 | | 27310 | A00-SHEN-05-2018-06-004613 | 23529.44 | 20180705 | | 27310 | A00-SHEN-05-2018-06-004613 | 26470.62 | 20180706 | | 27310 | A00-SHEN-05-2018-06-004613 | 29411.80 | 20180707 | | 27310 | A00-SHEN-05-2018-06-004613 | 32352.98 | 20180708 | | 27310 | A00-SHEN-05-2018-06-004613 | 35294.16 | 20180709 | +-------------+----------------------------+------------+----------+ 12 rows in set (0.00 sec) * 查询每个最新合同的信息,由于合同号太多,所以这边就以一个有重复数据的合同id为例 select contract_no, contract_id, city_name, receivable,date from (select * from crm_report_accounting_income_2015_online where contract_id = 27310 ORDER BY `date` desc) p GROUP BY contract_id +----------------------------+-------------+-----------+------------+----------+ | contract_no | contract_id | city_name | receivable | date | +----------------------------+-------------+-----------+------------+----------+ | A00-xxxx-05-2018-06-xxxxxx | xxxxx | 沈阳 | 2941.18 | 20180628 | +----------------------------+-------------+-----------+------------+----------+ 1 row in set (0.00 sec) 以上看到的写法,是通过子查询写的,5.6查询没问题,5.7就变成了以上的结果,很明显得到的答案不是业务想要的 究其原因还是因为,MySQL5.7 sql mode更加严格了,如果设置sql_mode = ONLY_FULL_GROUP_BY, 那么以上SQL就会报错 因为sql_mode = ONLY_FULL_GROUP_BY 要求符合SQL 92标准,即:select列表里只能出现分组列(即group by后面的列)和聚合函数(max,min等等) 然而为了兼容5.6,我们设置sql_mode='', 所以我们的Group by 在子查询中就跟5.6就不一致了 当然,我们应该避免不标准的SQL写法,这样的问题,我们的解法就是调整业务的SQL语句,改写成SQL 92标准的语法 那么以上SQL语句应该调整为: select contract_no, e.contract_id, city_name, receivable, date from crm_report_accounting_income_2015_online e, ( select contract_id , max(date) max_date from crm_report_accounting_income_2015_online where contract_id = 27310 group by contract_id ) t where e.contract_id = t.contract_id and e.date = t.max_date +----------------------------+-------------+-----------+------------+----------+ | contract_no | contract_id | city_name | receivable | date | +----------------------------+-------------+-----------+------------+----------+ | A00-xxxx-05-2018-06-004613 | 27310 | xxxx | 35294.16 | 20180709 | +----------------------------+-------------+-----------+------------+----------+ 1 row in set (0.00 sec) 以上都还是需要业务代码修改,这样如果没有提前发现问题,岂不是会导致业务出错了?有没有更好的办法? MySQL方面其实还是可以配置相关的参数的: dba:aif_db> set optimizer_switch='derived_merge=off'; Query OK, 0 rows affected (0.00 sec) dbadmin:aifangcrm_db> select contract_no, contract_id, city_name, receivable,date from -> (select * from crm_report_accounting_income_2015_online where contract_id = 27310 ORDER BY `date` desc) p GROUP BY contract_id -> ; +----------------------------+-------------+-----------+------------+----------+ | contract_no | contract_id | city_name | receivable | date | +----------------------------+-------------+-----------+------------+----------+ | A00-xxxx-05-2018-06-004613 | 27310 | xxxx | 35294.16 | 20180709 | +----------------------------+-------------+-----------+------------+----------+ 1 row in set (0.00 sec) 三、总结 SQL语法应该要按照标准的SQL92来写 数据库升级到5.7之后,应该提前监控处group by + 子查询的情况,提前告知业务修改业务代码 设置参数也能解决问题,但是这个参数毕竟是5.7新增的,如果关闭后,以后会不会导致其他的bug就不知晓了 最后,还是希望能够修改query 语句到标准语法,如果出现业务问题,可以让业务修改参数快速解决问题,然后再修改语句比较与时俱进
安装xtrabackup 下载2.4版本即可 运行报错 /data/Keithlan/pt_backup/bin/innobackupex: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found (required by /data/Keithlan/pt_backup/bin/innobackupex) 解决问题: strings /usr/lib64/libstdc++.so.6 | grep GLIBCXX GLIBCXX_3.4 GLIBCXX_3.4.1 GLIBCXX_3.4.2 GLIBCXX_3.4.3 GLIBCXX_3.4.4 GLIBCXX_3.4.5 GLIBCXX_3.4.6 GLIBCXX_3.4.7 GLIBCXX_3.4.8 GLIBCXX_3.4.9 GLIBCXX_3.4.10 GLIBCXX_3.4.11 GLIBCXX_3.4.12 GLIBCXX_3.4.13 GLIBCXX_FORCE_NEW GLIBCXX_DEBUG_MESSAGE_LENGTH 发现的确没有GLIBCXX_3.4.15 那么需要下载相关文件并替代 http://ftp.de.debian.org/debian/pool/main/g/gcc-4.7/libstdc++6_4.7.2-5_amd64.deb 去这里下载 然后: ar -x libstdc++6_4.7.2-5_amd64.deb && tar xvf data.tar.gz cp -p libstdc++.so.6.0.17 /usr/lib64 cd /usr/lib64/ rm libstdc++.so.6 ln -s libstdc++.so.6.0.17 libstdc++.so.6 此问题解决 安装qpress 1. 这是解压*.qp的工具,如果你使用了--compress 压缩备份的话 2. 官方地址:http://www.quicklz.com/ 3. tar xf qpress-11-linux-x64.tar 一、生产环境默认参数 常用参数 * 基本option: --defaults-file=[MY.CNF]: 指定配置文件,一定要放在最参数最前面 --user=$backupUser --password=$backupPwd --socket=$socket --ibbackup=$ibbackup 主要用于指定xtrabackup二进制文件的,在某些场景特别有用:比如多个xtrabackup文件,比如xtrabackup没有在环境变量中 等等 --stream=xbstream 流式备份的格式,目前常用的是xbstream --rsync 主要是用来优化本地文件传输的 使用rsync代替cp,用来传输所有non-innodb文件,可以更加快速的传输文件 --kill-long-queries-timeout=30 在FLUSH TABLES WITH READ LOCK 开始之前,xtrabackup等待指定秒数进行kill掉其他query 主要目的就是要保证FLUSH TABLES WITH READ LOCK能够执行成功,所以xtrabackup用户需要process和super权限 --kill-long-query-type=all 什么类型的query会被kill掉,配合上面的参数一起使用 --parallel=8 备份文件的时候并行多少个进程,用来加快备份速度,如果你的ibd文件越多,作用越大。如果你就一个大表,那么不起作用 --compress: 创建一个压缩的备份 --compress-threads=#: 派生出几个工作线程用于压缩 --no-timestamp 指定该参数后,备份目录BACKUP-ROOT-DIR不会有时间戳 --slave-info 这个参数非常有用,尤其是在slave进行备份 将binlog位置和master的binlog写到xtrabackup_slave_info,作为change master的命令 --tmpdir=$tmpDIR 临时文件存放的目录 redo日志就是先存放在这个临时目录,然后在拷贝到remote host的。 --copy-back 拷贝所有备份的文件到original目录 它不会覆盖已经存在的文件,除非你指定了 --force-non-empty-directories --move-back 跟copy back类型,但是它会删掉老的文件 , 使用前请谨慎 传输速度: 61440 KB/S /usr/bin/rsync -auvP --bwlimit=$bwlimit $dataDir 主要适用于针对local备份,然后rsync到remote server用的 * 其他option --apply-log: 将redo日志(xtrabackup_logfile)apply到备份中,并且根据backup-my.cnf重新创建redo日志 innobackupex –apply-log 使用的是backup-my.cnf,或者你显示指定了–defaults-file ,它主要是初始化innodb_page_size,innodb_log_block_size等等,所以不要随意添加配置文件 --backup-locks: 备份锁是在percona server独有的,如果MySQL Server不支持back lock,那么会忽略掉,使用原生态的FLUSH TABLES WITH READ LOCK --compact: 对所有溢出的二级索引页创建一个紧凑的(compact)格式,一般情况下有碎片和稀疏 --rebuild-indexes: 这个参数用于--apply-log阶段,在每次apply完日志后,会重建所有二级索引 所以这个参数主要是用来跟上面--compact对应的,用于创建compact backups --rebuild-threads=NUMBER-OF-THREADS: 重建二级索引的线程数 --decompress: 解压缩--compress的.qp扩展的文件 默认xtrabackup不会自动删除*.qp文件,所以如果你需要clean up这些文件,需要自己手动接入进来 --export: 主要用于export独立的表(而不是整个备份),用于恢复到另外的server上去 --sshopt=SSH-OPTION: 主要用于传输相关ssh的参数,尤其是指定了 --remost-host xtrabackup备份相关的常用文件 * backup-my.cnf 跟原始my.cnf不一样,这是xtrabackup创建出来的my.cnf 只包含了备份需要的选项 [mysqld] innodb_checksum_algorithm=crc32 innodb_log_checksum_algorithm=strict_crc32 innodb_data_file_path=ibdata1:4G:autoextend innodb_log_files_in_group=2 innodb_log_file_size=4294967296 innodb_fast_checksum=false innodb_page_size=16384 innodb_log_block_size=512 innodb_undo_directory=./ innodb_undo_tablespaces=0 server_id=1261261646 redo_log_version=1 server_uuid=f085ef25-dbf5-11e8-8813-ecf4bbf1f518 master_key_id=0 * xtrabackup_checkpoints 包含了LSN和备份的类型 backup_type = full-backuped from_lsn = 0 to_lsn = 74398727925 last_lsn = 74398729153 compact = 0 recover_binlog_info = 0 * xtrabackup_info 备份相关的日志信息 uuid = dcfccff9-f174-11e8-a83b-ecf4bbf1f518 name = tool_name = innobackupex tool_command = --defaults-file=/etc/my.cnf --user=pt_kill --password=... --socket=/tmp/mysql.sock --rsync --kill-long-queries-timeout=30 --kill-long-query-type=all --parallel=8 --tmpdir=/data/dbbackup --backup /data/dbbackup/ tool_version = 2.4.12 ibbackup_version = 2.4.12 server_version = 5.7.21-log start_time = 2018-11-26 20:14:34 end_time = 2018-11-26 20:14:46 lock_time = 0 binlog_pos = filename 'xx.bin.000001', position '18667', GTID of the last change 'f085ef25-dbf5-11e8-8813-ecf4bbf1f518:1-51' innodb_from_lsn = 0 innodb_to_lsn = 74398957203 partial = N incremental = N format = file compact = N compressed = N encrypted = N * xtrabackup_binlog_info 包含了备份时刻的binlog位置,主要用于在master上的备份,这个位置可以用来搭建new slave xx.bin.000818 3465069 f085ef25-dbf5-11e8-8813-ecf4bbf1f518:1-51 * xtrabackup_slave_info 包含了show slave 的信息,可用于未来的搭建新的slave,change master用 SET GLOBAL gtid_purged='f085ef25-dbf5-11e8-8813-ecf4bbf1f518:1-1531'; CHANGE MASTER TO MASTER_AUTO_POSITION=1; * xtrabackup_logfile 就是在备份的过程中,拷贝的redo log日志 这个日志会用在后面的 --apply-log中 local * non-gtid $innobackupex --defaults-file=$defaultFile --user=$backupUser --password=$backupPwd --socket=$socket --ibbackup=$ibbackup $backupOptions $dataDIR >>$logFile 2>>$xtrabackupLogFile * gtid $innobackupex24 --defaults-file=$defaultFile --user=$backupUser --password=$backupPwd --socket=$socket --ibbackup=$ibbackup24 $backupOptions $dataDIR >>$logFile 2>>$xtrabackupLogFile remote * non-gtid $innobackupex --defaults-file=$defaultFile --user=$backupUser --password=$backupPwd --socket=$socket --ibbackup=$ibbackup $backupOptions --stream=xbstream --tmpdir=$tmpDIR $tmpDIR 2>>$xtrabackupLogFile | ssh root@${destIP} "$xtrabackupDIR/xbstream -x -C ${dataDIR}" >>$logFile 2>>$xtrabackupLogFile * gtid $innobackupex24 --defaults-file=$defaultFile --user=$backupUser --password=$backupPwd --socket=$socket --ibbackup=$ibbackup24 $backupOptions --stream=xbstream --tmpdir=$tmpDIR $tmpDIR 2>>$xtrabackupLogFile | ssh root@${destIP} "$xtrabackupDIR24/xbstream -x -C ${dataDIR}" >>$logFile 2>>$xtrabackupLogFile 二、各种场景实战 2.1 gtid和non-gtid gtid和non-gtid的备份都差不多,唯一的区别就在还原上 non-gtid,是change master file,positiongtid,是set globa gtid_purged='', 然后再change master而已。 接下来的案例全部以GTID为例,因为GTID是未来趋势 2.2 生产环境常用 2.2.1 备份 gtid: 如何再master进行local备份 /data/Keithlan/pt_backup/bin/innobackupex --defaults-file=/etc/my.cnf --user=pt_kill --password=pt_kill --socket=/tmp/mysql.sock --rsync --kill-long-queries-timeout=30 --kill-long-query-type=all --parallel=8 --tmpdir=/data/dbbackup --backup --slave-info /data/dbbackup /data/dbbackup 生成一个timestamp(2018-11-27_13-40-33)的备份 gtid: 如何再master进行local + compress备份 /data/Keithlan/pt_backup/bin/innobackupex --defaults-file=/etc/my.cnf --user=pt_kill --password=pt_kill --socket=/tmp/mysql.sock --rsync --kill-long-queries-timeout=30 --kill-long-query-type=all --parallel=8 --tmpdir=/data/dbbackup --backup --compress --compress-threads=8 --slave-info /data/dbbackup /data/dbbackup 生成一个timestamp(2018-11-27_13-40-33)的压缩过后的备份 gtid: 如何再master进行remote备份 /data/Keithlan/pt_backup/bin/innobackupex --defaults-file=/etc/my.cnf --user=pt_kill --password=pt_kill --socket=/tmp/mysql.sock --stream=xbstream --kill-long-queries-timeout=30 --kill-long-query-type=all --parallel=8 --tmpdir=/data/dbbackup --backup --slave-info /data/dbbackup | ssh root@${destIP} "/data/Keithlan/pt_backup/bin/xbstream -x -C /data/dbbackup/remote_backup" /data/dbbackup/remote_backup 目录必须在目标机器上存在,否则报错 gtid: 如何再master/slave进行remote的[部分库、表]备份 之前我一直想这样的备份有什么用?现在想想还是有些应用场景的场景一、有些数据业务不要了,但是不确定能不能删,所以希望备份不需要的库、表,如果有异常,还希望能够恢复并查看解决问题 /data/Keithlan/pt_backup/bin/innobackupex --defaults-file=/etc/my.cnf --user=pt_kill --password=pt_kill --socket=/tmp/mysql.sock --stream=xbstream --kill-long-queries-timeout=30 --kill-long-query-type=all --parallel=8 --tmpdir=/data/dbbackup --databases="mysql lc performance_schema sys" --slave-info /data/dbbackup | ssh root@$dest_ip "/data/Keithlan/pt_backup/bin/xbstream -x -C /data/dbbackup/remote_partial_backup" 更多的【部分库、表】备份选项,请参考 https://www.percona.com/doc/percona-xtrabackup/2.4/innobackupex/partial_backups_innobackupex.html 简单的示例: * USING THE --INCLUDE OPTION $ innobackupex --include='^mydatabase[.]mytable' /path/to/backup * USING THE --TABLES-FILE OPTION $ echo "mydatabase.mytable" > /tmp/tables.txt $ innobackupex --tables-file=/tmp/tables.txt /path/to/backup * USING THE --DATABASES OPTION $ innobackupex --databases="mydatabase.mytable mysql" /path/to/backup gtid: 如果再master进行remote + compress 备份 --推荐 /data/Keithlan/pt_backup/bin/innobackupex --defaults-file=/etc/my.cnf --user=pt_kill --password=pt_kill --socket=/tmp/mysql.sock --stream=xbstream --kill-long-queries-timeout=30 --kill-long-query-type=all --parallel=8 --tmpdir=/data/dbbackup --backup --compress --compress-threads=8 --slave-info /data/dbbackup | ssh root@xx.xx.126.166 "/data/Keithlan/pt_backup/bin/xbstream -x -C /data/dbbackup/remote_compress_backup" /data/dbbackup/remote_compress_backup 目录必须在目标机器上存在,否则报错 gtid:如何再slave上进行local备份 跟再master进行local备份一致,请参考 gtid: 如何再slave上进行remote备份 跟在master上进行remote备份一样,请参考 gtid: 如何再slave上进行local + compress备份 --推荐 跟在master上进行local+compress备份一致,请参考 gtid: 如何再slave上进行remote + compress备份 --推荐 跟在master上进行remote+compress备份一致,请参考 2.2.2 还原备份 gtid:有一个完整的全备,如何快速还原整个实例 重点: /etc/my.cnf 要存在,这个需要自己备份 innobackupex --defaults-file=/etc/my.cnf --apply-log 2018-11-26_20-20-21 innobackupex --defaults-file=/etc/my.cnf --copy-back 2018-11-26_20-20-21 chown -R mysql:mysql /data/mysql_data mysqld_safe --user=mysql & cat xtrabackup_binlog_info : 如果是再master进行备份的话 cat xtrabackup_slave_info : 如果是在slave进行备份的话 reset slave all reset master set global gtid_purged='$gtid_xtrabackup_$binlog_$slave_info' CHANGE MASTER TO MASTER_HOST = '$master_ip' ,MASTER_PORT = 3306 , MASTER_USER = 'repl',MASTER_PASSWORD = 'xx' ,MASTER_AUTO_POSITION = 1 start slave; gtid:有一个完整的压缩全备,如何快速还原整个实例 重点: /etc/my.cnf 要存在,这个需要自己备份 innobackupex --decompress --parallel=16 --remove-original remote_compress_backup innobackupex --defaults-file=/etc/my.cnf --apply-log remote_compress_backup innobackupex --defaults-file=/etc/my.cnf --copy-back remote_compress_backup chown -R mysql:mysql /data/mysql_data mysqld_safe --user=mysql & cat xtrabackup_binlog_info : 如果是再master进行备份的话 cat xtrabackup_slave_info : 如果是在slave进行备份的话 reset slave all reset master set global gtid_purged='$gtid_xtrabackup_$binlog_$slave_info' CHANGE MASTER TO MASTER_HOST = '$master_ip' ,MASTER_PORT = 3306 , MASTER_USER = 'repl',MASTER_PASSWORD = 'xx' ,MASTER_AUTO_POSITION = 1 start slave; gtid:有一个完整的压缩全备,如何快速还原单个表 * step1 创建一个部分表目录,用来存放未来需要恢复的表 mkdir partial_backup/ * step2 拷贝需要恢复的库、表到刚刚新建的目录中 cp -pr remote_compress_backup/*.qp partial_backup/ --xtrabackup恢复需要的基本表 cp -pr remote_compress_backup/xtrabackup_checkpoints partial_backup/ --xtrabackup恢复需要的基本表 cp -pr remote_compress_backup/mysql partial_backup/ --MySQL系统表 cp -pr remote_compress_backup/sys partial_backup/ --MySQL系统表 cp -pr remote_compress_backup/lc partial_backup/ --用户需要恢复的表,我这里lc库就是要恢复的单库 * step3 解压缩,并行16个线程 /data/Keithlan/pt_backup/bin/innobackupex --decompress --parallel=16 --remove-original partial_backup * step4 prepare阶段,应用差异redo日志,让备份恢复到FTWRL的点 /data/Keithlan/pt_backup/bin/innobackupex --defaults-file=/etc/my.cnf --apply-log partial_backup * step5 restore阶段,copy_back 将数据拷贝到配置文件所指定的目录 /data/Keithlan/pt_backup/bin/innobackupex --defaults-file=/etc/my.cnf --copy-back partial_backup * step6 赋予权限 chown -R mysql:mysql /data/mysql_data * step7 启动MySQL mysqld_safe --user=mysql & * last 如果需要恢复到指定的点位的话,还需要设置配置文件,并且set global $gtid_purged='',change master xx auto_master=1, start slave until $gtid_set xtrabackup_binlog_info : 如果是再master进行备份的话 xtrabackup_slave_info : 如果是在slave进行备份的话 重要的是:配置文件需要设置过滤条件 replicate_do_DB: 如果是恢复一个库 replicate-do-table: 如果是恢复一个表 gtid: 如果给你一个只备份了部分库、表的备份(不是全部库的备份哦),应该如何恢复呢? 跟之前正常的套路一样,没啥区别,但是这是野路子,可能会有跟之前数据不一致的情况,但是基本可以放心用 1. /data/Keithlan/pt_backup/bin/innobackupex --defaults-file=/etc/my.cnf --apply-log remote_partial_backup 2. /data/Keithlan/pt_backup/bin/innobackupex --defaults-file=/etc/my.cnf --copy-back remote_partial_backup 3. chown -R mysql:mysql /data/mysql_data 4. mysqld_safe --user=mysql & 非常严谨的方法可以尝试用官方推荐(Restoring Individual Tables),但是我觉得麻烦,不乐意。 https://www.percona.com/doc/percona-xtrabackup/2.4/innobackupex/restoring_individual_tables_ibk.html 三、已知的限制和不足 known issues: 1. 5.1 , 5.5 版本有一些已知的还没修复的bug,比如:redo-logging on 压缩表上,有有很多问题。 bug #16267120 , 5.6.12 修复 2. 5.6 版本,对于compress的innodb表,不推荐使用innodb_log_compressed_pages=OFF,这样会导致backup失败由于压缩算法的原因 3. 如果backup和OPTIMIZE TABLE or ALTER TABLE ... TABLESPACE 同时发生,那么备份将还原失败 4. Compact Backups 目前不支持,有bug #1192834. 5. Error 24: 'Too many open files' 如果表很多,可能发生这样的错误,所以需要调整/etc/security/limits.conf , 目前最大为1048576 6. throttle参数建议不要使用 https://zhuanlan.zhihu.com/p/43913304 limitation 1. xtrabackup_logfile 如果超过4GB,那么--prepare在32位操作系统会失败 2. xtrabackup 将不明白--set-variable my.cnf 这样古老的语法,所以最好不要这样使用它 四、频繁问到的问题
一、常见的几种方案 1.1 MySQL源生的IN-PLACE ONLINE DDL 5.5,5.6 开始支持 5.7 支持的更好,有更多ddl操作支持online 8.0 支持快速加列功能 1.2 第三方工具 1. pt-online-schema-change 2. gh-ost 1.3 slave 先ddl,后切换主从 二、方案剖析 2.1 MySQL源生的IN-PLACE ONLINE DDL 原理 原理比较复杂,不一一解读。但是中间有几个重要的过程: 1. 加一会排它锁,开启战场,并释放排它锁 2. 记录ddl期间产生的增量dml(大小由innodb_online_alter_log_max_size控制) 3. 应用这些增量dml 4. 再加一会排它锁,清理战场,释放排它锁 这里关心的问题: 1. 如果再ddl期间,innodb_online_alter_log_max_size的大小被占满,会有怎样的后果? 2. 如果DDL期间,被强行终止了,会有怎么样的后果? 优点 1. 官方出品,原生态,品质有保障 缺点 1. 有所等待风险 2. innodb_online_alter_log_max_size 是有限制的 3. 有可能造成主从延迟 4. 不是所有的ddl都是online的,对ddl类型有要求 哪些DDL可以online (基于5.7的官方文档) 8.0 可以支持快速加列 类型 操作 是否需要copy数据,重新rebuild表 是否允许并发DML 是否只修改元数据 备注 索引相关 创建、添加二级索引 NO YES NO - 索引相关 删除索引 NO YES YES - 索引相关 重命名索引 NO YES YES - 索引相关 添加FULLTEXT索引 NO* NO NO - 索引相关 添加SPATIAL索引 NO NO NO - 索引相关 改变索引类型(USING {BTREE or HASH}) NO YES YES - 主键相关 添加主键 YES* YES NO - 主键相关 删除主键 YES NO NO - 主键相关 删除主键并且又添加主键 YES YES NO - 列操作相关 添加列 YES YES* NO - 列操作相关 删除列 YES YES NO - 列操作相关 重命名列 NO YES* YES - 列操作相关 重新排列列(use FIRST or AFTER) YES YES NO - 列操作相关 设置列的默认值 NO YES YES - 列操作相关 修改列的数据类型 YES NO NO - 列操作相关 扩展varchar列的长度 NO YES YES 0~255 , 256 ~ 256+ 这两个区间可以in-place 列操作相关 删除列的默认值 NO YES YES - 列操作相关 修改auto-increcement的值 NO YES NO* - 列操作相关 使某列修改成NULL YES* YES NO - 列操作相关 使某列修改成NOT NULL YES* YES NO - 列操作相关 修改列定义为ENUM、SET NO YES YES - 表相关操作 optimizing table YES YES NO - 表相关操作 Rebuilding with the FORCE option YES YES NO - 表相关操作 Renaming a table NO YES YES - 三、第三方工具 3.0 第三方工具大致原理 先创建一个临时表 old_table_tmp 给临时表变更结构 alter old_table_tmp ... 然后呢就是关键了: 将增量数据 和 原表的数据 都拷贝到 临时表 当原表数据拷贝完毕后,对原表加锁,进行切换 打扫战场,结束 好了,这里pt-online-shema-change 是通过触发器的方式,来同步增量数据的 , gh-ost 是通过模拟slave,监听binlog并应用binlog来完成增量数据同步的,这里是主要区别。 所以,不管哪种方式,这里需要解决一个时序的问题(因为rowcopy和row_apply是并行的,不知道哪个先哪个后),我们暂且认为 拷贝原表数据叫: rowcopy , 拷贝增量数据并应用为 row_apply 由于rowcopy从时序上来说,都是老数据,所以它的优先级是最低的,所以将rowcopy的动作转换为inset ignore,意味着,row apply是可以覆盖rowcopy数据的,这样理解没问题吧 好了,上面的问题解决了,其他的基本就不是问题了 3.1 pt-online-shema-change 优点 1. percona 出品,必属金品 2. 经过多年的生产环境验证,质量可靠 3. 支持并发DML操作 缺点 1. 原表不能有触发器 3. 由于触发器的原因,对master的性能消耗比较大 4. 处理外键有一定的风险,需要特殊处理 5. 原表中至少要有主键或者唯一键 检查是否具有主键或者唯一索引,如果都没有,这一步会报错 提示The new table `xx`.`_xx_new` does not have a PRIMARY KEY or a unique index which is required for the DELETE trigger. 6. ddl不能有添加唯一索引的操作 如果对表增加唯一索引的话,会存在丢数据的风险。 具体原因是因为pt-osc在copy已有的数据时会使用insert ignore将老表中的数据插入到新表中,因为新表已经增加了unique index,所以重复的数据会被ignore掉 --check-unique-key-change 可以避免 , 默认yes 原理 1. 创建一张新表 2. alter新表 3. 原表创建insert,update,delete三种触发器 4. 原表开始拷贝数据到新表,且触发器也开始映射到新表 5. 处理外键(如果没有忽略) 6. 重命名新表和原表 7. 清理战场 重要: insert触发器 =SQL转换=> replace into update触发器 =SQL转换=> delete ignore + replace into (大于3.0.2版本) =SQL转换=> replace into(低于3.0.2版本,所以这个版本会有问题,如果这时候对老的主键修改,那么修改之前的值不会去掉,从而多了一些异常数据) delete触发器 =SQL转换=> delete ignore copy rows =SQL转换=> insert ignore into 最佳实践 1. innodb_autoinc_lock_mode 设置成 2 , 否则会经常死锁,autoinc锁 2. 如果中途ddl失败,需要先删除触发器,再删除新的临时表 3.2 gh-ost 优点 1. 无触发器设计 2. out-over方案设计 3. 对主机性能级别无影响 4. 可以暂停 缺点 1. 原表不能有外键 2. 原表不能有触发器 3. 强制要求binlog为row格式 4. 原表不能有字母大小不同的同名表 5. 当并发写入多的时候,在应用binlog阶段由于是单线程,所以会非常慢,影响ddl性能和进度 原理 原理基本都一样,这里主要的区别就是row apply这里,pt-osc是触发器,这里是监听master binlog并应用日志,其余的差别不大,这里不再赘述 四、 slave 先ddl,后切换主从 如果其余方式都不行,只能祭出大招slave先ddl,然后主从切换了 优点 1. slave操作,不影响master 缺点 1. 需要主从切换,主从切换越平滑,此方案就越好 2. 有几点需要考虑和处理下: 2.1 add column after|before , 这样的操作slave先做是否有影响 2.2 slave先新增字段,可能会导致主从同步停掉,需要设置某些参数 五、 ONLINE DDL 最佳方案选型 如果是创建索引、修改默认值这样的,online ddl 快速且无影响的操作,尽量优先选择online ddl 如果当前服务器写入量不高,负载不高,且原表没有触发器,没有外键,且此表有主键,尽量优先选择pt-online-schema-change 其余情况,选择主从切换
pt的详细步骤 Step 1: Create the new table. Step 2: Alter the new, empty table. This should be very quick, or die if the user specified a bad alter statement. Step 3: Create the triggers to capture changes on the original table and apply them to the new table. Step 4: Copy rows. Step 5: Rename tables: orig -> old, new -> orig Step 6: Update foreign key constraints if there are child tables. Step 7: Drop the old table. DROP TABLE IF EXISTS `_xx_old` DROP TRIGGER IF EXISTS `pt_osc_xx_xx_del`; DROP TRIGGER IF EXISTS `pt_osc_xx_xx_upd`; DROP TRIGGER IF EXISTS `pt_osc_xx_xx_ins`; done 一、常用参数解读 1.0 生产环境使用的参数 inception调用pt-online-schema-change,相关参数如下: inception_osc_alter_foreign_keys_method = rebuild_constraints inception_osc_check_alter = on inception_osc_check_interval = 5 inception_osc_check_replication_filters = OFF inception_osc_chunk_size = 1000 inception_osc_chunk_size_limit = 4 inception_osc_chunk_time = 1 inception_osc_critical_thread_connected = 4000 inception_osc_critical_thread_running = 300 inception_osc_drop_new_table = on inception_osc_drop_old_table = on inception_osc_max_lag = 3 inception_osc_max_thread_connected = 2500 inception_osc_max_thread_running = 200 inception_osc_min_table_size = 16 inception_osc_recursion_method = none 以上inception参数对应的pt-online-schema-change的命令参数如下: pt-online-schema-change --alter " xx " --alter-foreign-keys-method=rebuild_constraints --check-alter=yes --check-interval=5 --check-replication-filters=no --chunk-size=1000 --chunk-size-limit=4 --chunk-time=1 --critical-load=thread_connected:4000,thread_running:300 --max-load=thread_connected:2500,thread_running:200 --drop-new-table=yes --drop-old-table=yes --max-lag=3 --recursion-method=none 1.1 基本用法 pt-online-schema-change [OPTIONS] DSN pt-online-schema-change --alter "ADD COLUMN c1 INT" D=sakila,t=actor pt-online-schema-change --alter "ENGINE=InnoDB" D=sakila,t=actor 1.2 安全的pt-online-schema-change 默认情况下,pt-online-schema-change 是不会修改表的,除非你显示的指定了 --executept-online-schema-change 有一系列动作来阻住一切不期望的后果发生,包括 自动检测复制,以及以下相关措施 大部分情况下,pt-online-schema-change会拒绝给没有主键和唯一键的表做操作,可以参考 --alter 了解更多信息 如果检测到复制过滤(ignore-db,do-db等),pt-online-schema-change会拒绝操作,可以参考 --[no]check-replication-filters 了解更多信息 如果发现复制严重的厉害,那么会暂停copy数据,可以参考 --max-lag 了解更多的信息 如果发现服务器负载非常高,那么也会暂停或者停止相关操作,可以参考 --max-load and --critical-load 了解更多信息 该工具默认会设置 innodb_lock_wait_timeout=1 和 lock_wait_timeout=60来减少竞争 , 参考 --set-vars 了解更多信息 如果有外键约束,那么禁止改表,出发你指定 --alter-foreign-keys-method. Percona XtraDB Cluster中禁止修改MYISAM的表 1.3 常用参数 --dry-run and --execute , 这两个是互斥的参数,一个是打印,一个是执行 --alter 通过这个选项,就不需要alter table关键字了。 你可以通过逗号来指定多个修改操作。 * 以下列出--alter中的一些限制,大家谨记和避免 1. 原表必须要有主键或唯一键,因为delete触发器需要用到,否则会报错 2. rename子句,不允许给表重命名 2.1 不能通过删除一列,然后再新增一列的方式来完成对列的重命名操作 3. 新增字段,如果这个字段是NOT NULL,必须要指定default值,否则报错。 你必须指定默认值 4. 如果是DROP FOREIGN KEY constraint_name , 那么必须指定 _ 加上 constraint_name , 而不是 constraint_name。 举例: CONSTRAINT `fk_foo` FOREIGN KEY (`foo_id`) REFERENCES `bar` (`foo_id`) 你必须指定: --alter "DROP FOREIGN KEY _fk_foo" 而不是 --alter "DROP FOREIGN KEY fk_foo". 5. 必须确保数据库高于5.0版本,因为5.0版本转换MYSIAM到InnoDB会出错 [no]check-alter 默认yes, 给--alter 做一些检测 * 列的重命名 在之前的版本 CHANGE COLUMN name new_name 这个操作是会丢失数据的,现在的工具修复了 但是,由于pt代码并不是full-blown SQL parser,所以,你应该先 --dry-run and --print , 确认下renamed的列名是否正确,以确保无误 * 删除主键 删除主键是很危险的事情,尽量不要做这样的动作 --alter-foreign-keys-method 我们的规范不允许有外键,如果有外键,我们采取其他方式DDL 如何把外键引用到新表?需要特殊处理带有外键约束的表,以保证它们可以应用到新表.当重命名表的时候,外键关系会带到重命名后的表上。 该工具有两种方法,可以自动找到子表,并修改约束关系。 auto: 在rebuild_constraints和drop_swap两种处理方式中选择一个。 rebuild_constraints:使用 ALTER TABLE语句先删除外键约束,然后再添加.如果子表很大的话,会导致长时间的阻塞。 drop_swap: 执行FOREIGN_KEY_CHECKS=0,禁止外键约束,删除原表,再重命名新表。这种方式很快,也不会产生阻塞,但是有风险: 1, 在删除原表和重命名新表的短时间内,表是不存在的,程序会返回错误。 2, 如果重命名表出现错误,也不能回滚了.因为原表已经被删除。 none: 类似"drop_swap"的处理方式,但是它不删除原表,并且外键关系会随着重命名转到老表上面。 --host=xxx --user=xxx --password=xxx 连接实例信息,缩写-h xxx -u xxx -p xxx,密码可以使用参数--ask-pass 手动输入。 D=db_name,t=table_name 指定要ddl的数据库名和表名 --charset 最好设置为MySQL默认字符集: utf8 --check-interval 默认1秒,检测--max-lag --[no]check-replication-filters 默认yes 如果发现任何服务器有 binlog_ignore_db and replicate_do_db , 那么就报错 --check-slave-lag 指定一个从库的DSN连接地址,如果从库超过--max-lag参数设置的值,就会暂停操作。 --[no]swap-tables 默认yes。交换原始表和新表,除非你禁止--[no]drop-old-table。 --max-lag 默认1s。 每个chunk拷贝完成后,会查看所有复制Slave的延迟情况。 要是延迟大于该值,则暂停复制数据,直到所有从的滞后小于这个值,使用Seconds_Behind_Master。 如果有任何从滞后超过此选项的值,则该工具将睡眠--check-interval指定的时间,再检查。 如果从被停止,将会永远等待,直到从开始同步,并且延迟小于该值。 如果指定--check-slave-lag,该工具只检查该服务器的延迟,而不是所有服务器。 --max-load 默认为Threads_running=25。 每个chunk拷贝完后,会检查SHOW GLOBAL STATUS的内容,检查指标是否超过了指定的阈值。 如果超过,则先暂停。 这里可以用逗号分隔,指定多个条件, 每个条件格式: status指标=MAX_VALUE 或者 status指标:MAX_VALUE。 如果不指定MAX_VALUE,那么工具会设置其为当前值的120%。 --critical-load 默认为Threads_running=50。 用法基本与--max-load类似,如果不指定MAX_VALUE,那么工具会这只其为当前值的200%。 如果超过指定值,则工具直接退出,而不是暂停。 --print 打印SQL语句到标准输出。指定此选项可以让你看到该工具所执行的语句,和--dry-run配合最佳。 --progress 复制数据的时候打印进度报告,二部分组成:第一部分是百分比,第二部分是时间。 --set-vars 设置MySQL变量,多个用逗号分割。 默认该工具设置的是: wait_timeout=10000 innodb_lock_wait_timeout=1 lock_wait_timeout=60 --recursion-method 默认是show processlist,发现从的方法,也可以是host,但需要在从上指定report_host,通过show slave hosts来找到,可以指定none来不检查Slave。 METHOD USES =========== ================== processlist SHOW PROCESSLIST hosts SHOW SLAVE HOSTS dsn=DSN DSNs from a table none Do not find slaves 指定none则表示不在乎从的延迟。 --pause-file 可以指定文件暂停pt-online-schema-change 二、使用限制 哪些ddl是不可以做的,做了容易出错 1. 禁止创建唯一索引,会丢失数据,更加不允许添加 --alter-check=no,--check-unique-key-change=no 2. 如果原表没有主键,或者也没有唯一索引,这些表是不允许用pt做DDL的 3. 禁止对外键的表进行pt ddl 4. 禁止对表进行重命名 5. 禁止对列进行重命名,如果一定要做,也必须先print出来检测清楚列名是否正确 6. 新增字段,NOT NULL必须要指定默认值 7. 不允许删除主键 由于pt触发器原理,rowcopy会产业一堆的binlog,所以做之前要检测binlog空间是否够用,也要检测数据空间多一倍表空间是否够用 禁止在业务高峰期进行pt-online-schema-change操作 原表不能有触发器 MySQL最好设置为innodb_autoinc_lock_mode=2,否则在高并发的写入情况下,很容易产生所等待以及死锁 master的表结构必须跟slave的表结构一致,不允许异构,否则pt-online-schema-change的原理就是会rename,然后slave不一致的表结构会被master覆盖,谨记 三、关于触发器 3.0.2之前的update触发器 REPLACE INTO `lc`.`_hb_new` (`id`, `ts`, `ts2`, `c1`) VALUES (NEW.`id`, NEW.`ts`, NEW.`ts2`, NEW.`c1`) 3.0.2之后的update触发器 BEGIN DELETE IGNORE FROM `lc`.`_hb_new` WHERE !(OLD.`id` <=> NEW.`id`) AND `lc`.`_hb_new`.`id` <=> OLD.`id`; REPLACE INTO `lc`.`_hb_new` (`id`, `ts`, `ts2`) VALUES (NEW.`id`, NEW.`ts`, NEW.`ts2`); END 原理 update触发器 =SQL转换=> delete ignore + replace into (大于3.0.2版本) =SQL转换=> replace into(低于3.0.2版本,所以这个版本会有问题,如果这时候对老的主键修改,那么修改之前的值不会去掉,从而多了一些异常数据) 举例:t表中有三条数据,第一列id是主键 ------ 1 lc --row1 2 lc --row2 3 lc --row3 ------ pt-online-schema-change的原理大致四个阶段: 1. 创建临时表_t_new 2. 创建触发器 3. 老数据row copy 4. swap table 好了,我们来举个例子: 1. 创建临时表_t_new 2. 创建触发器 3. 老数据row copy 3.1 拷贝数据row1,row2完毕 3.2 这时候业务有一个update语句, update t set id = 10 where id=1; 3.3 拷贝数据row3 4. swap table 这时候脑补一下原表和新表的示意图, 这时候已经执行到3.1阶段 老表 ------------ 1 lc --row1 2 lc --row2 3 lc --row3 ------------ 新表 ----------- 1 lc 2 lc ----------- 这时候脑补一下原表和新表的示意图, 这时候已经执行到3.3阶段 老表(update t set id = 10 where id=1) ------------ 10 lc --row1 2 lc --row2 3 lc --row3 ------------ 新表(3.0.2之前版本的触发器,没有delete映射,所以最终结果如下,跟老表相比已经不一致了,多了一条数据1,lc) , 触发器 replace into _t_new(id,name) values(10,lc) ----------- 1 lc 2 lc 10 lc 3 lc ----------- 新表(3.0.2之后版本的触发器,有delete映射,所以最终结果如下,于老表的数据一致) , 触发器 delete ignore _t_new where id = 1;replace into _t_new(id,name) values(10,lc); ----------- 2 lc 10 lc 3 lc ----------- 四、错误处理 遇到错误后,继续补充完整
列出几种常用场景,并进行分析实战测试 前提 * pt-kill必须获得的权限 1. PROCESS , SUPER 2. 否则,你只能看到打印出kill id,但是实际上并没有被kill掉 特殊、 打印出执行时间超过3秒的connection,仅仅打印,不kill 每2秒循环一次,超过10秒就退出pt-kill程序 pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --print --ignore-self --busy-time=3 --interval 2 --run-time=10 重点注意: 这里的--busy-time=3,指的是Command=Query的连接,其他的并不会被匹配哦 , 所以一般情况下删除的都是比较安全的用户thread 重点注意2: ddl,dml,select,都是属于Command=Query +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------+ | 3 | repl | xx.xxx.126.166:60528 | NULL | Binlog Dump GTID | 492777 | Master has sent all binlog to slave; waiting for more updates | NULL | | 4 | repl | xx.xxx.126.165:48604 | NULL | Binlog Dump GTID | 492765 | Master has sent all binlog to slave; waiting for more updates | NULL | | 502 | job_heartbeat | xx.xxx.2.217:34626 | heartbeat_db | Sleep | 0 | | NULL | | 1053 | dbadmin | localhost | heartbeat_db | Query | 1 | altering table | alter table heartbeat add column ts2 date | | 1055 | pt_kill | xx.xxx.126.166:63167 | NULL | Sleep | 1 | | NULL | | 1056 | dbadmin | localhost | NULL | Query | 0 | starting | show processlist | +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------+ +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------------+ | 3 | repl | xx.xxx.126.166:60528 | NULL | Binlog Dump GTID | 492716 | Master has sent all binlog to slave; waiting for more updates | NULL | | 4 | repl | xx.xxx.126.165:48604 | NULL | Binlog Dump GTID | 492704 | Master has sent all binlog to slave; waiting for more updates | NULL | | 502 | job_heartbeat | xx.xxx.2.217:34626 | heartbeat_db | Sleep | 0 | | NULL | | 1053 | dbadmin | localhost | heartbeat_db | Query | 3 | updating | update heartbeat set ts = '2018-09-06 00:07:58' | | 1055 | pt_kill | xx.xxx.126.166:63167 | NULL | Sleep | 0 | | NULL | | 1056 | dbadmin | localhost | NULL | Query | 0 | starting | show processlist | +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------------------------------------+ +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------+ | 3 | repl | xx.xxx.126.166:60528 | NULL | Binlog Dump GTID | 492613 | Master has sent all binlog to slave; waiting for more updates | NULL | | 4 | repl | xx.xxx.126.165:48604 | NULL | Binlog Dump GTID | 492601 | Master has sent all binlog to slave; waiting for more updates | NULL | | 502 | job_heartbeat | xx.xxx.2.217:34626 | heartbeat_db | Sleep | 0 | | NULL | | 1053 | dbadmin | localhost | heartbeat_db | Query | 3 | User sleep | select 1,sleep(4) | | 1055 | pt_kill | xx.xxx.126.166:63167 | NULL | Sleep | 1 | | NULL | | 1056 | dbadmin | localhost | NULL | Query | 0 | starting | show processlist | +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+-------------------+ +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+------------------+ | 3 | repl | xx.xxx.126.166:60528 | NULL | Binlog Dump GTID | 492740 | Master has sent all binlog to slave; waiting for more updates | NULL | | 4 | repl | xx.xxx.126.165:48604 | NULL | Binlog Dump GTID | 492728 | Master has sent all binlog to slave; waiting for more updates | NULL | | 502 | job_heartbeat | xx.xxx.2.217:34626 | heartbeat_db | Sleep | 0 | | NULL | | 1053 | dbadmin | localhost | heartbeat_db | Query | 4 | starting | rollback | | 1055 | pt_kill | xx.xxx.126.166:63167 | NULL | Sleep | 0 | | NULL | | 1056 | dbadmin | localhost | NULL | Query | 0 | starting | show processlist | +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+------------------+ 一、打印出sleep时间超过3秒的connection,仅仅打印,不kill 每2秒循环一次,无限循环下去 pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --print --ignore-self --idle-time=3 --interval 2 每2秒循环一次,超过10秒就退出pt-kill程序 pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --print --ignore-self --idle-time=3 --interval 2 --run-time=10 二、kill掉query语句中带有sleep关键字(不区分大小写)的connection, 且Time超过3秒 pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --print --ignore-self --interval 2 --match-info "(?i-xsm:(sleep))" --busy-time=3 --kill --victims all 三、kill掉非系统用户的select开头,且执行时间超过3秒的 connection pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --match-info "(?i-xsm:^(select))" --ignore-user="root|repl" --busy-time=3 --kill --victims all 四、kill掉非系统用户的select,update,delete开头,且执行时间超过3秒的 connection pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --match-info "(?i-xsm:^(select))|(?i-xsm:^(update))|(?i-xsm:^(delete))" --ignore-user="root|repl" --busy-time=3 --kill --victims all 五、kill掉指定特征的query语句 kill掉非系统用户,且query语句中同时包含heartbeat 和 where ,且heartbeat在前where在后,且执行时间超过3秒的 connection pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --match-info ".*heartbeat.*where.*" --ignore-user="root|repl" --busy-time=3 --kill --victims all # 2018-11-15T10:38:03 KILL 1053 (Query 27 sec) select *,sleep(10) from heartbeat where id < '1000000000' # 2018-11-15T10:38:05 KILL 1053 (Query 29 sec) select *,sleep(10) from heartbeat where id < '1000000000' # 2018-11-15T10:38:07 KILL 1053 (Query 31 sec) select *,sleep(10) from heartbeat where id < '1000000000' # 2018-11-15T10:38:09 KILL 1053 (Query 33 sec) select *,sleep(10) from heartbeat where id < '1000000000' 六、kill掉非系统库的select开头,且执行时间超过3秒的 connection pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --match-info "(?i-xsm:^(select))" --ignore-db="mysql|information_schema" --ignore-user="root|repl" --busy-time=3 --kill --victims all 七、kill掉非系统用户的select,update,delete开头,且执行时间超过3秒,且不是被locked住 的 connection pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --match-info "(?i-xsm:^(select))|(?i-xsm:^(update))|(?i-xsm:^(delete))" --ignore-state="Locked" --ignore-user="root|repl" --busy-time=3 --kill --victims all 八、kill掉非系统用户,指定state(Locked、login、Updating、Sorting for order等状态),且执行时间超过3秒 的 connection 指定Locked的connection删除掉 pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --match-info "(?i-xsm:^(select))|(?i-xsm:^(update))|(?i-xsm:^(delete))" --match-state="Locked" --ignore-user="root|repl" --busy-time=3 --kill --victims all 九、kill掉非系统用户,指定Command(Query、Sleep、Binlog Dump、Connect等状态),且执行时间超过3秒 的 connection kill掉指定Connect的command connection pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --match-info "(?i-xsm:^(select))|(?i-xsm:^(update))|(?i-xsm:^(delete))" --match-command="Connect" --ignore-user="root|repl" --busy-time=3 --kill --victims all 十、kill掉指定来源host ip ,且select开头的,且执行时间超过3s的connection pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --match-info "(?i-xsm:^(select))" --ignore-host="x.x.x.x" --ignore-user="root|repl" --busy-time=3 --kill --victims all 十一、kill掉非系统用户,Command=Sleep,且空闲时间为3s的connection pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --ignore-user="root|repl" --idle-time=3 --kill --victims all 十二、kill掉非系统用户,指定特征的query,在后台运行,并打印日志 pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --match-info ".*heartbeat.*where.*" --ignore-user="root|repl" --busy-time=3 --daemonize --log='/root/kill.log' --kill --victims all 十三、--victims的用法 背景 pid:1103 , 是最早开启事务的空闲进程 T1 pid:1095 , 是第二早开启事务的ddl进程 T2 pid:502 , 是最后一个开启事务的dml进程 T3 事务顺序是: T1 锁住了 T2,T3, T2锁住了T3 , T3被T1,T2锁住 +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------------+ | 3 | repl | xx.xxx.126.166:60528 | NULL | Binlog Dump GTID | 507504 | Master has sent all binlog to slave; waiting for more updates | NULL | | 4 | repl | xx.xxx.126.165:48604 | NULL | Binlog Dump GTID | 507492 | Master has sent all binlog to slave; waiting for more updates | NULL | | 502 | job_heartbeat | xx.xxx.2.217:34626 | heartbeat_db | Query | 328 | Waiting for table metadata lock | insert into heartbeat(ts) values('2018-11-15 13:43:50') | | 1095 | dbadxxx | localhost | heartbeat_db | Query | 329 | Waiting for table metadata lock | alter table heartbeat add column ts3 date | | 1103 | dbadxxx | localhost | heartbeat_db | Sleep | 341 | | NULL | | 1104 | dbadxxx | localhost | NULL | Query | 0 | starting | show processlist | +------+---------------+----------------------+--------------+------------------+--------+---------------------------------------------------------------+---------------------------------------------------------+ 6 rows in set (0.00 sec) --match-command="Query|Sleep" --victims oldest --busy-time=3 只kill最老的command为Query|Sleep的最老的链接 pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --ignore-user="root|repl" --match-command="Query|Sleep" --busy-time=3 --victims oldest Enter MySQL password: # 2018-11-15T13:49:07 KILL 1103 (Sleep 330 sec) NULL # 2018-11-15T13:49:09 KILL 1103 (Sleep 332 sec) NULL --match-command="Query|Sleep" --victims all --busy-time=3 kill 所有command="Query|Sleep" 的所有链接 Enter MySQL password: # 2018-11-15T13:48:41 KILL 1103 (Sleep 304 sec) NULL # 2018-11-15T13:48:41 KILL 1095 (Query 292 sec) alter table heartbeat add column ts3 date # 2018-11-15T13:48:41 KILL 502 (Query 291 sec) insert into heartbeat(ts) values('2018-11-15 13:43:50') # 2018-11-15T13:48:41 KILL 1104 (Sleep 262 sec) NULL --match-command="Query" --victims all --busy-time=3 kill 所有 command="Query"(默认不填也就是Query)的所有链接 Enter MySQL password: # 2018-11-15T13:46:59 KILL 1095 (Query 190 sec) alter table heartbeat add column ts3 date # 2018-11-15T13:46:59 KILL 502 (Query 189 sec) insert into heartbeat(ts) values('2018-11-15 13:43:50') --match-command="Query" --victims oldest --busy-time=3 kill 所有 command="Query"(默认不填也就是Query)的最老的链接 Enter MySQL password: # 2018-11-15T13:59:01 KILL 1095 (Query 912 sec) alter table heartbeat add column ts3 date 十四、kill掉非系统用户,指定库的所有链接 举例一、kill掉非root和repl账号,使用open_db或xf_loupan_db的所有链接 pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --ignore-user="root|repl" --match-db="open_db|xf_loupan_db" --victims all --kill dbadmin:xf_loupan_db> select *,sleep(100) from loupan_grade_photo; ERROR 2013 (HY000): Lost connection to MySQL server during query dbadmin:xf_loupan_db> select *,sleep(100) from loupan_grade_photo; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 245 Current database: xf_loupan_db ERROR 2013 (HY000): Lost connection to MySQL server during query Enter MySQL password: # 2018-12-26T12:34:46 KILL 243 (Query 7 sec) select *,sleep(100) from loupan_grade_photo # 2018-12-26T12:35:00 KILL 245 (Query 1 sec) select *,sleep(100) from loupan_grade_photo 特别注意: 这里面只针对db这一列是open_db或xf_loupan_db才会有效果,如果这个用户拥有多个库的权限 那么他在xx库去 select * from xf_loupan_db.loupan_grade_photo 是不会生效的 所以,要保证这个库账号只能对应一个库,才会有效果 比如这样是没有效果的: root:(none)> show processlist; +-----+---------+----------------------+------+---------+------+------------+----------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+---------+----------------------+------+---------+------+------------+----------------------------------------------------------+ | 248 | root | localhost | NULL | Query | 0 | starting | show processlist | | 252 | pt_kill | xx.xxx.126.166:33088 | NULL | Sleep | 1 | | NULL | | 257 | dbadmin | localhost | sys | Query | 19 | User sleep | select *,sleep(100) from xf_loupan_db.loupan_grade_photo | +-----+---------+----------------------+------+---------+------+------------+----------------------------------------------------------+ 3 rows in set (0.00 sec) 十五、kill掉指定用户的所有链接 一般我们什么时候使用它呢? 一般是在迁移某个DB,需要将这个DB上的账号链接全部清理掉,防止长连接 举例:kill掉所有lc_rx 和 rc_ronly 两个账号的所有链接 pt-kill --host xx.xxx.126.164 --port 3306 --user pt_kill --ask-pass --ignore-self --print --interval 2 --match-user="lc_rx|rc_ronly" --victims all --kill
https://www.percona.com/doc/percona-toolkit/LATEST/pt-kill.html 一、NAME pt-kill字面意思就是: kill掉MySQL满足某些特征的query语句 二、大纲:使用方法 pt-kill [OPTIONS] [DSN] pt-kill kill MySQL的链接。如果pt-kill没有指定特定文件的话,它连接到MySQL Server,然后通过show processlist 来得到查询语句如果参数指定了文件,则可以从包含show processlist的文件中读取query语句并分析处理默认从STDIN获取 kill掉执行时间超过60s的query pt-kill --busy-time 60 --kill 打印出执行时间超过60s的query,仅仅是打印,不会kill pt-kill --busy-time 60 --print 每10s 去检查sleep 状态的query , 并kill掉 pt-kill --match-command Sleep --kill --victims all --interval 10 打印所有login 状态的 query pt-kill --match-state login --print --victims all 通过文件分析哪些query满足match条件 mysql -e "SHOW PROCESSLIST" > proclist.txt pt-kill --test-matching proclist.txt --busy-time 60 --print 三、风险 任何软件都有风险,在使用这个工具前,如下建议请关注: 仔细阅读此工具的说明书 review此工具的已知BUGS 在非生产环境进行测试 做好备份并检查你的备份是否可用 四、说明 pt-kill 获取从show processlist中获取query,并进行过滤,然后要么kill,要么print 在某种场景,这也是公认的另一种slow query终结者 主要目的就是观察那些有可能使用非常多资源的query,然后进行kill来保护数据库 通常pt-kill是通过连接MySQL,然后show processlist来获取query,但也还有另一种方法,就是通过指定file 在指定文件这种场景下,pt-kill中的参数--kill就不起作用了,你应该使用--print 当你指定--test-matching的时候,才表示你从文件获取query,然后test是否满足相关匹配条件 接下来,你还有很多规则需要遵守,比如:‘不要将replication thread’ 给kill了,千万别kill掉一些比较重要的thread 两个重要的options : --busy-time 和 --victims --busy-time 指的是query的执行时间(需要测试 --match-command 和 --busy-time 都指定的话,是或的关系,还是且的关系 )--victims 指的是满足条件的query是否都需要kill,是删除oldest query,还是所有的都删除 通常,你至少需要制定一个--match option,否则没有query将会被匹配你也可以制定--match-all 去匹配所有query(不包括--ignore忽略的) 五、GROUP, MATCH AND KILL query语句是如何经过层层筛选,最终得到精确的语句的呢?接下来我们具体来看看详细的流程 第一步:group query into classes 1. --group-by 选项就是控制grouping的。 2. 默认--group-by没有值,表示所有queries都被分在默认的class中。 3. 第二步中的matching规则将会应用在每个class中,如果你不想全部应用的话,就需要单独分组group 第二步:matching Matching implies filtering since if a query doesn’t match some criteria, it is removed from its class. Matching happens for each class. First, queries are filtered from their class by the various Query Matches options like --match-user. Then, entire classes are filtered by the various Class Matches options like --query-count. 第三步:KILL 最后一步其实就是victim selection 你是想kill oldest query 还是 all queries, 由 --victims 决定 最后 The forth and final step is to take some action on all matching queries from all classes. action: 有这些 , 按照 --print, --execute-command, --kill"/"--kill-query 的顺序执行 六、OUTPUT 如果仅仅指定了 --kill , 那么不会有output如果仅仅指定了 --print, 那么你会看到这样的output # 2009-07-15T15:04:01 KILL 8 (Query 42 sec) SELECT * FROM huge_table 这一行显示了时间戳,query的id是8,时间是42秒,info 是query SQL本身 如果同时指定了 --kill --print ,那么匹配后的query会被kill,且会打印出来 七、OPTIONS 至少要指定这些参数里面的一个: --kill, --kill-query, --print, --execute-command or --stop --any-busy-time and --each-busy-time 这两个参数是互斥的,二者只能取其一--kill and --kill-query 这两个参数是互斥的,二者只能取其一--daemonize and --test-matching 这两个参数是互斥的,二者只能取其一 它还可以接受命令行参数: --ask-pass : 连接MySQL的时候输入的密码 --charset: 默认字符集--config:Read this comma-separated list of config files; if specified, this must be the first option on the command line.--create-log-table:创建一个--log-dsn指定的表--daemonize:后台运行--database:数据库名--defaults-file:给定决定路径,仅从这个文件去获取MySQL的options--filter:这个不常用,不做多解释,用的时候再来看--group-by:可以根据不同show processlist字段分组,比如:info,可以根据SQL语句分组,这样的用法也不常见,有需求的时候再细看--help:查看帮助--host:ip--interval : 检查频率,如果--busy-time没有指定,那么默认的interval就是30秒。 否则,interval就是--busy-time的一半。 如果同时指定,频率就以显示指定的--interval为准--log: 当后台运行时,output打印到指定日志--log-dsn: 存储每一个被kill的query到DSN(数据库表)--password: 数据库密码--pid: 指定pid,如果pid存在,则此工具不会运行--port: 端口--run-time: pt-kill工具可以运行多长时间,默认是永久。--sentinel: 当某个文件存在时,pt-kill自动停止运行--slave-user & --slave-password : slave相关的选项,以更小的权限访问slave而已--set-vars : 在MySQL中设置某些变量,比如: wait_timeout=10000--socket:socket file to use for connection--stop: Stop running instances by creating the --sentinel file--[no]strip-comments : 删除掉query后面的comment--version: 显示pt-kill的版本--user: 用户名--[no]version-check : 版本检查--victims : 默认是oldest ,其他选项为(all,all-but-oldest) 八、QUERY MATCHES 默认是区分大小写的,可以通过regex不区分大小写,比如:(?i-xsm:select) --busy-time: type: time; group: Query Matches 状态:Command=Query , 执行时间超过--busy-time=N 秒 --idle-time type: time; group: Query Matches 状态:Command=Sleep ,空闲时间超过--idle-time=N 秒 --ignore-command type: string; group: Query Matches 忽略的command,支持正则 --ignore-db type: string; group: Query Matches 忽略的DB,支持正则匹配 --ignore-host type: string; group: Query Matches Ignore queries whose Host matches this Perl regex. --ignore-info type: string; group: Query Matches Ignore queries whose Info (query) matches this Perl regex. --[no]ignore-self default: yes; group: Query Matches Don’t kill pt-kill‘s own connection. 默认不会删除pt-kill自己的连接 --ignore-state type: string; group: Query Matches; default: Locked Ignore queries whose State matches this Perl regex. The default is to keep threads from being killed if they are locked waiting for another thread. 默认如果被锁住,那么是不会被kill掉的 --ignore-user type: string; group: Query Matches Ignore queries whose user matches this Perl regex. --match-all 如果没有指定--ignore,那么匹配所有query(不包括replication thread,除非指定 --replication-threads) --match-command type: string; group: Query Matches Match only queries whose Command matches this Perl regex. 常用的Command如下: Query Sleep Binlog Dump Connect Delayed insert Execute Fetch Init DB Kill Prepare Processlist Quit Reset stmt Table Dump See http://dev.mysql.com/doc/refman/5.1/en/thread-commands.html for a full list and description of Command values. --match-db type: string; group: Query Matches Match only queries whose db (database) matches this Perl regex. --match-host type: string; group: Query Matches Match only queries whose Host matches this Perl regex. The Host value often time includes the port like “host:port”. --match-info type: string; group: Query Matches Match only queries whose Info (query) matches this Perl regex. The Info column of the processlist shows the query that is being executed or NULL if no query is being executed. --match-state ype: string; group: Query Matches Match only queries whose State matches this Perl regex. 常用state如下: Locked login copy to tmp table Copying to tmp table Copying to tmp table on disk Creating tmp table executing Reading from net Sending data Sorting for order Sorting result Table lock Updating See http://dev.mysql.com/doc/refman/5.1/en/general-thread-states.html for a full list and description of State values. --match-user type: string; group: Query Matches Match only queries whose User matches this Perl regex. --replication-threads group: Query Matches Allow matching and killing replication threads. By default, matches do not apply to replication threads; i.e. replication threads are completely ignored. Specifying this option allows matches to match (and potentially kill) replication threads on masters and slaves. 默认,是不允许kill 复制线程的,除非显示指定了这个选项 --test-matching type: array; group: Query Matches Files with processlist snapshots to test matching options against. Since the matching options can be complex, you can save snapshots of processlist in files, then test matching options against queries in those files. This option disables --run-time, --interval, and --[no]ignore-self. 指定一个文件,根据文件中的show processlist来匹配,而不是连接数据库 九、CLASS MATCHES 忽略 十、ACTIONS 默认的执行顺序是: --print, --execute-command, --kill"/"--kill-query --execute-command 当query匹配后,执行这个command --kill 当query匹配后,执行kill 删除connection --kill-busy-commands 默认是kill command为query的连接,但是如果你想kill其他command,怎么办呢? --kill-busy-commands=Query,Execute 参考下 --kill-query 只kill query,不kill connection --print 打印被kill的语句 十一、DSN OPTIONS 忽略 十二、ENVIRONMENT 忽略 十三、SYSTEM REQUIREMENTS You need Perl, DBI, DBD::mysql, and some core packages that ought to be installed in any reasonably new version of Perl. 十四、BUGS http://www.percona.com/bugs/pt-kill. 十五、DOWNLOADING http://www.percona.com/software/percona-toolkit/ 十六、VERSION pt-kill 3.0.12 十七、作者 Baron Schwartz and Daniel Nichter
一、MySQL为什么会延迟 数据延迟: 是指master执行了N个事务,slave却只执行了N-M个事务,说明master和slave之间产生了延迟 延迟原因:延迟的原因很多种,大部分情况下是 slave的处理能力跟不上master导致 接下来,我们从各种角度分析下延迟的原因 1.1 MySQL复制的架构 通过架构图,可以直观的看到数据延迟的点有哪些,当然也就可以知道如何优化了 1.2 大事务导致的延迟 大家都知道,binlog的写入时机是在commit的时候,redo的写入时机是在事务执行阶段就开始。 Oracle是通过物理复制,我们姑且认为是redo的复制,因为redo是事务执行阶段就开始写入的,所以,oracle的复制几乎没有延迟 MySQL是基于binlog复制的,如果有一个非常大的事务,如果需要1个小时,那么master在1小时候后才会生成binlog,而此时,slave就比master慢了至少1个小时,还不算是binlog传输时间 这是第一种延迟原因,破解方法后面说 PS: DDL虽然不是事务,但是特性跟大事务一样,都是在master上执行了一个巨大无比的操作才写的binlog 1.3 IO线程导致的延迟 根据复制的架构,Master写完binlog后,需要通过网络传输给slave(这部分我们需要网络的支持)然后呢,IO thread会将binlog写到slave的relay log中,这部分工作由IO thread完成 好了,这里我们分析下瓶颈: io thread 是单线程的 io thread 写入 relay log的速度 经过分析以及大量的实战,IO thread并不是我们的瓶颈,因为relay log是顺序的写入,非常快,几乎碰不到瓶颈 1.4 SQL线程导致的延迟 master 上面的事务是可以进行并发的,然后binlog传输到slave后,slave是却以单线程的模式读取和执行relay log这是典型的消费能力不足 1.5 网络问题导致的延迟 网络问题不用多说了吧,如果要复制良好,一个稳定的网路环境是在所难免的 1.6 硬件问题导致的延迟 如果master是SSD,但是slave还是机械硬盘,这样的架构存在延迟也不足为奇 二、延迟场景的解决方案 2.1 DDL 2.1.1 ddl的最佳实践 通过pt-osc 或者 gh-ost 来让ddl拆分成一个个小事务,并且还有流控功能 在slave上先ddl,然后master-slave切换,然后再old master上进行ddl,从而完美的解决了这个问题 2.2 大事务 2.2.1 大事务拆小事务 如果说大事务对于binlog的产生有极大的影响,那么我们认为定义小事务,大事务不允许执行 有大事务的监控,可以基于时间,可以基于数据量,监控到不符合规范的trx自动kill 2.3 大量并发事务 2.3.1 调整安全参数 sync_binlog = 0 && innodb_flush_trx_commit = 0 可以极大的提高事务处理的吞吐量,因为IO fsync的次数变少了,可以非常有效的降低数据延迟 风险:如果slave挂了,需要重做slave 2.3.2 MTS(enhanced multi-threaded slave) 之前有深入讨论过MTS的文章,它主要的功能就是让slave拥有比master更快的并行能力,从而有效的让延迟缩短,甚至无延迟 终极大招 半同步: 半同步可以让延迟为0,但是半同步有自动切换为异步复制的可能 全同步: MySQL的group replication 就是这类的代表,这个话题以后再聊 最后,以上就是关于MySQL延迟优化的方法,几乎涵盖了90%的方案,如果大家还有更好的方案,不妨拿出来大家一起探讨
一、并行复制的背景 首先,为什么会有并行复制这个概念呢? 1. DBA都应该知道,MySQL的复制是基于binlog的。 2. MySQL复制包括两部分,IO线程 和 SQL线程。 3. IO线程主要是用于拉取接收Master传递过来的binlog,并将其写入到relay log 4. SQL线程主要负责解析relay log,并应用到slave中 5. 不管怎么说,IO和SQL线程都是单线程的,然后master却是多线程的,所以难免会有延迟,为了解决这个问题,多线程应运而生了。 6. IO多线程? 6.1 IO没必要多线程,因为IO线程并不是瓶颈啊 7. SQL多线程? 7.1 没错,目前最新的5.6,5.7,8.0 都是在SQL线程上实现了多线程,来提升slave的并发度 接下来,我们就来一窥MySQL在并行复制上的努力和成果吧 二、重点 是否能够并行,关键在于多事务之间是否有锁冲突,这是关键。 下面的并行复制原理就是在看如何让避免锁冲突 三、MySQL5.6 基于schema的并行复制 slave-parallel-type=DATABASE(不同库的事务,没有锁冲突) 之前说过,并行复制的目的就是要让slave尽可能的多线程跑起来,当然基于库级别的多线程也是一种方式(不同库的事务,没有锁冲突) 先说说优点: 实现相对来说简单,对用户来说使用起来也简单再说说缺点: 由于是基于库的,那么并行的粒度非常粗,现在很多公司的架构是一库一实例,针对这样的架构,5.6的并行复制无能为力。当然还有就是主从事务的先后顺序,对于5.6也是个大问题 话不多说,来张图好了 四、MySQL5.7 基于group commit的并行复制 slave-parallel-type=LOGICAL_CLOCK : Commit-Parent-Based模式(同一组的事务[last-commit相同],没有锁冲突. 同一组,肯定没有冲突,否则没办法成为同一组)slave-parallel-type=LOGICAL_CLOCK : Lock-Based模式(即便不是同一组的事务,只要事务之间没有锁冲突[prepare阶段],就可以并发。 不在同一组,只要N个事务prepare阶段可以重叠,说明没有锁冲突) group commit,之前的文章有详细描述,这里不多解释。MySQL5.7在组提交的时候,还为每一组的事务打上了标记,现在想想就是为了方便进行MTS吧。 我们先看一组binlog last_committed=0 sequence_number=1 last_committed=1 sequence_number=2 last_committed=2 sequence_number=3 last_committed=3 sequence_number=4 last_committed=4 sequence_number=5 last_committed=4 sequence_number=6 last_committed=4 sequence_number=7 last_committed=6 sequence_number=8 last_committed=6 sequence_number=9 last_committed=9 sequence_number=10 4.1 Commit-Parent-Based模式 4.2 Lock-Based模式 五、MySQL8.0 基于write-set的并行复制 关于write-set的并行复制,看姜老师的这篇文章基于WRITESET的MySQL并行复制可以快速理解,再详细的自己去看源码即可 我这里简短的对里面的几个重要概论做些解读,这些是我当时理解的时候有偏差的地方 如何启用write-set并行复制 MySQL 5.7.22+ 支持基于write-set的并行复制 # master loose-binlog_transaction_dependency_tracking = WRITESET loose-transaction_write_set_extraction = XXHASH64 binlog_transaction_dependency_history_size = 25000 #默认 #slave slave-parallel-type = LOGICAL_CLOCK slave-parallel-workers = 32 核心原理 # master master端在记录binlog的last_committed方式变了 基于commit-order的方式中,last_committed表示同一组的事务拥有同一个parent_commit 基于write-set的方式中,last_committed的含义是保证冲突事务(相同记录)不能拥有同样的last_committed值 当事务每次提交时,会计算修改的每个行记录的WriteSet值,然后查找哈希表中是否已经存在有同样的WriteSet 1. 若无,WriteSet插入到哈希表,写入二进制日志的last_committed值保持不变,意味着上一个事务跟当前事务的last_committed相等,那么在slave就可以并行执行 2. 若有,更新哈希表对应的writeset的value为sequence number,并且写入到二进制日志的last_committed值也要更新为sequnce_number。意味着,相同记录(冲突事务)回放,last_committed值必然不同,必须等待之前的一条记录回放完成后才能执行 # slave slave的逻辑跟以前一样没有变化,last_committed相同的事务可以并行执行 并行复制如何备份 1. slave的顺序如果不一致,如何备份呢? 1.1 对于non-gtid的gap情况,xtrabackup拷贝的时候应该会通过某种方式记录某一个一致点,否则无法进行change master 1.2 对于gitd,gtid模式本身的机制就可以解决gap的问题 要不要开启并行复制呢? 1. 基于order-commit的模式,本身并行复制已经很好了,如果并发量非常高,那么order-commit可以有很好的表现,如果并发量低,order-commit体现不了并行的优势。 但是大家想想,并发量低的MySQL,根本也不需要并行复制吧 2. 基于write-set的模式,这是目前并发度最高的并行复制了,基本可以解决大部分场景,如果并发量高,或者新搭建的slave需要快速追主库,这是最好的办法。 3. 单线程复制 + 安全参数双0,这种模式同样拥有不随的表现,一般压力均可应付。 以上三种情况,是目前解决延迟的最普遍的方法,目前我用的最多的是最后一种 后面的事务比前面的事务先执行,有什么影响 1. slave的gtid会产生gap 2. 事务在某个时刻是不一致的,但是最终是一致的, 满足最终一致性 3. 相同记录的修改,会按照顺序执行。不同记录的修改,可以产生并行,并无数据一致性风险 总结,基本没啥影响 六、如何让slave的并行复制和master的事务执行的顺序一致呢 5.7.19 之后,可以通过设置 slave_preserve_commit_order = 1 官方解释: For multithreaded slaves, enabling this variable ensures that transactions are externalized on the slave in the same order as they appear in the slave's relay log. Setting this variable has no effect on slaves for which multithreading is not enabled. All replication threads (for all replication channels if you are using multiple replication channels) must be stopped before changing this variable. --log-bin and --log-slave-updates must be enabled on the slave. In addition --slave-parallel-type must be set to LOGICAL_CLOCK. Once a multithreaded slave has been started, transactions can begin to execute in parallel. With slave_preserve_commit_order enabled, the executing thread waits until all previous transactions are committed before committing. While the slave thread is waiting for other workers to commit their transactions it reports its status as Waiting for preceding transaction to commit. 大致实现原理就是:excecution阶段可以并行执行,binlog flush的时候,按顺序进行。 引擎层提交的时候,根据binlog_order_commit也是排队顺序完成 换句话说,如果设置了这个参数,master是怎么并行的,slave就怎么办并行
一、大纲 一阶段提交 二阶段提交 三阶段提交 组提交总结 二、一阶段提交 2.1 什么是一阶段提交 先了解下含义,其实官方并没有定义啥是一阶段,这里只是我为了上下文和好理解,自己定义的一阶段commit流程。 好了,这里的一阶段,其实是针对MySQL没有开启binlog为前提的,因为没有binlog,所以MySQL commit的时候就相对简单很多。 解释几个概念: execution state做什么事情呢 在内存修改相关数据,比如:DML的修改 prepare stage做什么事情呢 1. write() redo 日志 1.1 最重要的操作,记住这个时候就开始刷新redo了(依赖操作系统sync到磁盘),很多同学在这个地方都不太清楚,以为flush redo在最后commit阶段才开始 1.2 这一步可以进行多事务的prepare,也就意味着可以多redo的flush,sync到磁盘,这里是redo的组提交. 在此说明MySQL5.6+ redo是可以进行组提交的,之后我们讨论的重点是binlog,就不在提及redo的组提交了 2. 更新undo状态 3. 等等 innodb commit stage做什么事情呢 1. 更新undo的状态 2. fsync redo & undo(强制sync到磁盘) 3. 写入最终的commit log,代表事务结束 4. 等等 由于这里面只对应到redo日志,所以我们称之为一阶段commit 2.2 为什么要有一阶段提交 一阶段提交,主要是为了crash safe。 如果在 execution stage mysql crash 当MySQL重启后,因为没有记录redo,此事务回滚 如果在 execution stage mysql crash 当MySQL重启后,因为没有记录redo,此事务回滚 如果 prepare stage 1. redo log write()了,但是还没有fsync()到磁盘前,mysqld crash了 此时:事务回滚 2. redo log write()了,fsync()也落盘了,mysqld crash了 此时:事务还是回滚 如果 commit stage 1. commit log fsync到磁盘了 此时:事务提交成功,否则事务回滚 2.3 一阶段提交的弊端 缺点也很明显: 缺点一 1. 为什么redo fsync到磁盘后,还是要回滚呢? 缺点二 1. 没有开启binlog,性能非常高,但是binlog是用来搭建slave的,否则就是单节点,不适合生产环境 三、二阶段提交 3.1 什么是二阶段提交 继续解释几个概念: execution state做什么事情呢 在内存修改相关数据,比如:DML的修改 prepare stage做什么事情呢 1. write() redo 日志 --最重要的操作,记住这个时候就开始刷新redo了(依赖操作系统sync到磁盘),很多同学在这个地方都不太清楚,以为flush redo在最后commit阶段才开始 2. 更新undo状态 3. 等等 binlog stage做什么事情呢 1. write binlog flush binlog 内存日志到磁盘缓存 2. fsync binlog sync磁盘缓存的binlog日志到磁盘持久化 innodb commit stage做什么事情呢 1. 更新undo的状态 2. fsync redo & undo(强制sync到磁盘) 3. 写入最终的commit log,代表事务结束 4. 等等 由于这里的流程中包含了binlog和redo日志刷新的协调一致性,我们称之为二阶段 3.2 为什么要有二阶段提交 当binlog开启的情况下,我们需要引入另一套流程来保证redo和binlog的一致性 , 以及crash safe,所以我们用这套二阶段来实现 在prepare阶段,如果mysqld crash,由于事务未写入binlog且innodb 存储引擎未提交,所以将该事务回滚掉 当binlog阶段 1. binlog flush 到磁盘缓存,但是没有永久fsync到磁盘 如果mysqld crash,此事务回滚 2. binlog永久fsync到磁盘,但是innodb commit log还未提交 如果mysqld crash,MySQL 进行recover,从binlog的xid提取提交的事务进行重做并commit,来保证binlog和redo保持一致 当commit阶段 如果innodb commit log已经提交,事务成功结束 那为什么要保证redo和binlog的一致性呢? 物理热备的问题 多事务中,如果无法保证多事务的redo和binlog一致性,则会有如下问题 commit提交阶段包含的事情: 1. prepare 2. write binlog & fsync binlog 3. commit T1 (---prepare-----write 100[pos1]-------fsync 100--------------------------------------online-backup[pos3:因为热备取的是最近的提交事务位置]-------commit) T2 (------prepare------write 200[pos2]---------fsync 200------commit) T3 (-----------prepare-------write 300[pos3]--------fsync 300--------commit) 解析: 事务的开始顺序: T1 -》T2 -》T3 事务的提交结束顺序: T2 -》T3 -》T1 binlog的写入顺序: T1 -》 T2 -》T3 结论: T2 , T3 引擎层提交结束,T1 fsync binlog 100 也已经结束,但是T1引擎成没有提交成功,所以这时候online-backup记录的binlog位置是pos3(也就是T3提交后的位置) 如果拿着备份重新恢复slave,由于热备是不会备份binlog的,所以事务T1会回滚掉,那么change master to pos3的时候,因为T1的位置是pos1(在pos3之前),所以T1事务被slave完美的漏掉了 多事务中,可以通过三阶段提交(下面一节讲)保证redo和binlog的一致性,则备份无问题. 接下来看一个多事务中,事务日志和binlog日志一致的情况 commit提交阶段包含的事情: 1. prepare 2. write binlog & fsync binlog 3. commit T1 (---prepare-----write 100[pos1]-------fsync 100-------------commit) T2 (------prepare------write 200[pos2]---------fsync 200----------------online-backup[pos2:因为热备取的是最近的提交事务位置]---commit) T3 (-----------prepare-------write 300[pos3]--------fsync 300----------------------------------------------------------------------------commit) 解析: 事务的开始顺序: T1 -》T2 -》T3 事务的提交结束顺序: T1 -》T2 -》T3 binlog的写入顺序: T1 -》 T2 -》T3 ps:以上的事务和binlog完全按照顺序一致运行 结论: T1 引擎层提交结束,T2 fsync binlog 200 也已经结束,但是T2引擎成没有提交成功,所以这时候online-backup记录的binlog位置是pos1(也就是T1提交后的位置) 如果拿着备份重新恢复slave,由于热备是不会备份binlog的,所以事务T2会回滚掉,那么change master to pos1的时候,因为T1的位置是pos1(在pos2之前),所以T2、T3事务会被重做,最终保持一致 总结: 以上的问题,主要原因是热备份工具无法备份binlog导致的根据备份恢复的slave回滚导致的,产生这样的原因最后还是要归结于最后引擎层的日志没有提交导致 所以,xtrabackup意识到了这一点,最后多了这一步flush no_write_to_binlog engine logs,表示将innodb层的redo全部持久化到磁盘后再进行备份,在通俗的说,就是图例上的T2一定成功后,才会再继续进行拷贝备份 那么如果是这样,图例上的T2在恢复的时候,就不会被回滚了,所以就自然不会丢失事务啦 主从数据不一致问题 如果redo和binlog不是一致的,那么有可能就是master执行事务的顺序和slave执行事务顺序不一样,那么不一样会导致什么问题呢?在一些依赖事务顺序的场景,尤其重要,比如我们看一个例子 master节点提交T1和T2事务按照以下顺序 1. State0: x= 1, y= 1 --初始值 2. T1: { x:= Read(y); 3. x:= x+1; 4. Write(x); 5. Commit; } State1: x= 2, y= 1 7. T2: { y:= Read(x); 8. y:=y+1; 9. Write(y); 10. Commit; } State2: x= 2, y= 3 以上两个事务顺序在master为 T1 -> T2 最终结果为 State1: x= 2, y= 1 State2: x= 2, y= 3 如果slave的事务执行顺序与master相反,会怎么样呢? 1. State0: x= 1, y= 1 --初始值 2. T2: { y:= Read(x); 3. y:= y+1; 4. Write(y); 5. Commit; } 6. State1: x= 1, y= 2 7. T1: { x:= Read(y); 8. x:=x+1; 9. Write(x); 10. Commit; } 11. State2: x= 3, y= 2 以上两个事务顺序在master为 T2 -> T1 最终结果为 State1: x= 1, y= 2 State2: x= 3, y= 2 结论: 为了保证主备数据一致性,slave节点必须按照同样的顺序执行,如果顺序不一致容易造成主备库数据不一致的风险。 而redo 和 binlog的一致性,在单线程复制下是master和slave数据一致性的另一个保证, 多线程复制需要依赖MTS的设置 所以,MySQL必须要保证redo 和 binlog的一致性,也就是:引擎层提交的顺序和server层binlog fsync的顺序必须一致,那么二阶段提交就是这样的机制 3.3 二阶段提交的弊端 二阶段提交能够保证同一个事务的redo和binlog的顺序一致性问题,但是无法解决多个事务提交顺序一致性的问题 四、三阶段提交 4.1 什么是三阶段提交 继续解释几个概念: execution state做什么事情呢 在内存修改相关数据,比如:DML的修改 prepare stage做什么事情呢 1. write() redo 日志 --最重要的操作,记住这个时候就开始刷新redo了(依赖操作系统sync到磁盘),很多同学在这个地方都不太清楚,以为flush redo在最后commit阶段才开始 2. 更新undo状态 3. 等等 binlog stage做什么事情呢 1. write binlog --一组有序的binlog flush binlog 内存日志到磁盘缓存 2. fsync binlog --一组有序的binlog sync磁盘缓存的binlog日志到磁盘持久化 innodb commit stage做什么事情呢 1. 更新undo的状态 2. fsync redo & undo(强制sync到磁盘) 3. 写入最终的commit log,代表事务结束 --一组有序的commit日志,按序提交 4. 等等 这里将整个事务提交的过程分为了三个大阶段 InnoDB, Prepare SQL已经成功执行并生成了相应的redo日志 Binlog, Flush Stage(group) -- 一阶段 写入binlog缓存; Binlog, Sync Stage(group) -- 二阶段 binlog缓存将sync到磁盘 InnoDB, Commit stage(group) -- 三阶段 leader根据顺序调用存储引擎提交事务; 重要参数: binlog_group_commit_sync_delay=N : 等待N us后,开始刷盘binlog binlog_group_commit_sync_no_delay_count=N : 如果队列的事务数达到N个后,就开始刷盘binlog 4.2 为什么要有三阶段提交 目的就是保证多事务之间的redo和binlog顺序一致性问题, 以及加入组提交机制,让redo和binlog都可以以组的形式(有序集合)进行fsync来提高并发性能 4.3 再来聊聊MySQL组提交 队列相关 组提交举例 (一)、T1事务第一个进入第一阶段 FLUSH , 由于是第一个,所以是leader,然后再等待(按照具体算法)(二)、T2事务第二个进行第一阶段 FLUSH , 由于是第二个,所以是follower,然后等待leader调度(三)、FLUSH队列等待结束后,开始进入下一阶段SYNC阶段,此时T1带领T2进行一次fsync操作,之后进入commit阶段,按序提交完成,这就是一次组提交的简要过程了(四)、prepare可以并行,说明两个事务没有任何冲突。有冲突的prepare无法进行进入同一队列(五)、每个队列之间都是可以并行运行的 五、总结 组提交的核心思想就是:一次fsync()调用,可以刷新N个事务的redo log(redo的组提交) & binlog(binlog的组提交) 组提交的最终目的就是为了减少IO的频繁刷盘,来提高并发性能,当然也是之后多线程复制的基础 组提交中:sync_binlog=1 & innodb_trx_at_commit=1 代表的不再是1个事务,而是一组事务和一个事务组的binlog 组提交中:binlog是顺序fsync的,事务也是按照顺序进行提交的,这都是有序的,MySQL5.7 并对这些有序的事务进行打好标记(last_committed,sequence_number ) 六、思考问题 如何保证slave执行的同一组binlog的事务顺序跟master的一致 如果slave上同一组事务中的后面的事务先执行,那么slave的gtid该如何表示 如何保证slave上同一组事务中的事务是按照顺序执行的 如果slave突然挂了,那么执行到一半的一组事务,是全部回滚?还是部分回滚? 如果是部分回滚,那么如何知道哪些回滚了,哪些没有回滚,mysql如何自动修复掉回滚的那部分事务
MySQL的瓶颈,一般分为IO密集型和CPU密集型 CPU出问题的情况比较少,最近就遇到过一次比较大的故障,这个话题后面会有一篇专题介绍 今天主要聊聊IO密集型的应用中,我们应该如何快速定位到是谁占用了IO资源比较多 背景 环境 1. MySQL 5.7 + 低版本MySQL这边不再考虑,就像还有使用SAS盘的公司一样,费时费力,MySQL5.7+ 标配 2. InnoDB 存储引擎 3. Centos 6 实战 关于IO的问题,大家能想到的监控工具有哪些 iostat dstat iotop 没错,以上都是神器,可以直接用iotop找到占用资源最多的进程 先上一张图 是的,根据这张图,你能发现的就是MySQL的某个io线程占用了比较多的disk资源,然后呢? 然后,就是去MySQL里面去找,有经验的DBA会去看slow log,或者processlist中去查找相关的sql语句 通常情况下,DBA只会一脸茫然的看到一堆MySQL的query语句,一堆slow log里面去分析,有如大海捞针,定位问题繁琐而低效 如果,你使用的是MySQL5.7+ 版本,那么你就会拥有一件神器(说了好多遍了),可以快速而精准的定位问题 如何快速定位到IO瓶颈消耗在哪里 iotop + threads dba:lc> select * from performance_schema.threads where thread_os_id=37012\G *************************** 1. row *************************** THREAD_ID: 96 NAME: thread/sql/one_connection TYPE: FOREGROUND PROCESSLIST_ID: 15 PROCESSLIST_USER: dba PROCESSLIST_HOST: NULL PROCESSLIST_DB: sbtest PROCESSLIST_COMMAND: Query PROCESSLIST_TIME: 0 PROCESSLIST_STATE: query end PROCESSLIST_INFO: INSERT INTO sbtest1(k, c, pad) VALUES(25079106, '33858784348-81663287461-16031064329-06006952037-79426243027-69964324491-90950423034-40185804987-62166137368-06259615216', '47186118229-42754 696460-81034599900-41836403072-66805611739'),(24907169, '77074724245-16833049423-38868029911-54850236074-63700733526-39699866447-52646750572-85552352492-59476301007-32196580154', '79013412600-99031855741-696987 96712-65630963686-19653514942'),(24896311, '28403978193-66350947863-03931166713-97714847962-65299790981-39948912629-14070597101-63277652140-34421148430-61801121402', '05239379274-22840441238-37771744512-9234774 1972-52847679847'),(18489383, '89292717216-01584483614-67433536730-45584233994-29817613740-77179131661-10692787267-83942773303-14971155500-36206705010', '55201342831-85536327239-84383935287-06948377235-96437333 726'),(24790463, '99362943588-41160434740-62783664419-16002619743-04761662097-94273988379-52564232648-19738707042-79143532768-89687113917', '09717575620-89781830996-88443720661-19001024583-14971953687'),(2 PARENT_THREAD_ID: NULL ROLE: NULL INSTRUMENTED: YES HISTORY: YES CONNECTION_TYPE: Socket THREAD_OS_ID: 37012 1 row in set (0.00 sec) 你看,消耗资源的SQL语句立刻就呈现在你眼前,就是如此高效 好了,以上列出的,还只是全部功能的冰山一角,更多的玩法等待你去解锁。 以上定位的问题也比较的简单,还有一些复杂的IO问题,比如:binlog写入过大、binlog扫描过多、同步线程阻塞、临时表造成的IO过大,等等问题,都可以用此神器一窥究竟 总结 MySQL5.7 默默的提供了非常多的实用工具和新特性,需要DBA们去挖掘和探索。将看似平淡无奇的特性挖掘成黑武器,你才能成为那闪着光芒的Top5 MySQLer 工欲善其事必先利其器
long transaction 背景 大家有没有遇到这样的情况 某个SQL执行特别慢,导致整个transaction一直处于running阶段 某个Session的SQL已经执行完了,但是迟迟没有commit,一直处于sleep阶段 某个Session处于lock wait阶段,迟迟没有结束 以上,大部分原因都是大事务导致的,接下来我们好好聊聊相关话题 关键字 环境 1. MySQL5.7.22 低版本MySQL这边不再考虑,就像还有使用SAS盘的公司一样,费时费力,MySQL5.7+ 标配 2. InnoDB存储引擎 3. CentOS 6 大事务的相关特征 1. transaction开启到结束的时间非常长,我们这边举例为10s 2. 正在执行的事务 3. 未提交的事务 实战 如何监控那些正在执行的事务 1. select * from sys.processlist 2. show processlist 3. select * from information_schema.processlist 4. select * from sys.session 5. select * from information_schema.innodb_trx; 6. select * from performance_schema.events_statements_current 如何监控那些未提交的事务 select * from information_schema.innodb_trx 如何两者结合 select trx_id,INNODB_TRX.trx_state,INNODB_TRX.trx_started,se.conn_id as processlist_id,trx_lock_memory_bytes,se.user,se.command,se.state,se.current_statement,se.last_statement from information_schema.INNODB_TRX,sys.session as se where trx_mysql_thread_id=conn_id; +---------+-----------+---------------------+----------------+-----------------------+------+---------+----------+-----------------------------------+-----------------------------------+ | trx_id | trx_state | trx_started | processlist_id | trx_lock_memory_bytes | user | command | state | current_statement | last_statement | +---------+-----------+---------------------+----------------+-----------------------+------+---------+----------+-----------------------------------+-----------------------------------+ | 1592104 | LOCK WAIT | 2018-06-26 11:51:17 | 3 | 1136 | NULL | Query | updating | update lc_1 set id=4 where id = 1 | NULL | | 1592100 | RUNNING | 2018-06-26 11:49:08 | 2 | 1136 | NULL | Sleep | NULL | NULL | update lc_1 set id=3 where id = 1 | +---------+-----------+---------------------+----------------+-----------------------+------+---------+----------+-----------------------------------+-----------------------------------+ 大家可以看到,通过这个可以立马发现事务语句处于running阶段 , 哪些事务处于lock wait阶段 , 如果遇到这种情况,我们应该如何处理呢?聪明的你,一定会去根据trx_started去寻找蛛丝马迹,可是如果再生产环境中,这是一件非常复杂和繁忙的事情不过没关系,我们还有神器可以使用 如何快速解决锁等待问题 dba:sys> select * from sys.innodb_lock_waits\G *************************** 1. row *************************** wait_started: 2018-06-26 11:49:58 wait_age: 00:00:03 wait_age_secs: 3 locked_table: `lc`.`lc_1` locked_index: GEN_CLUST_INDEX locked_type: RECORD waiting_trx_id: 1592102 waiting_trx_started: 2018-06-26 11:49:58 waiting_trx_age: 00:00:03 waiting_trx_rows_locked: 2 waiting_trx_rows_modified: 0 waiting_pid: 3 waiting_query: update lc_1 set id=4 where id = 1 waiting_lock_id: 1592102:32:3:4 waiting_lock_mode: X blocking_trx_id: 1592100 blocking_pid: 2 blocking_query: NULL blocking_lock_id: 1592100:32:3:4 blocking_lock_mode: X blocking_trx_started: 2018-06-26 11:49:08 blocking_trx_age: 00:00:53 blocking_trx_rows_locked: 1 blocking_trx_rows_modified: 1 sql_kill_blocking_query: KILL QUERY 2 sql_kill_blocking_connection: KILL 2 MySQL最终非常贴心都连kill SQL 语句都生产了,你只需要复制、粘贴即可 细心的你会发现,通过innodb_lock_waits你只能看到被lock的语句,但是看不到是哪个query语句拥有的锁,这又是为什么呢? 不卖关子,因为拥有锁的事务中可能拥有多条query语句,也可能已经执行完,但是没有commit,所以无法给出所有query语句。 那怎么办呢?哈哈,如果幸运的话,你可以根据我上述的案例 current_statement,last_statement 得到答案。 再换句话说,即便没有找到那条query,也不妨碍你解决当前的问题哈 总结 MySQL5.7 默默的提供了非常多的实用工具和新特性,需要DBA们去挖掘和探索。将看似平淡无奇的特性挖掘成黑武器,你才能成为那闪着光芒的Top5 MySQLer 工欲善其事必先利其器
binlog_gtid_simple_recovery 是什么 官方解释 This variable controls how binary log files are iterated during the search for GTIDs when MySQL starts or restarts. In MySQL version 5.7.5, this variable was added as simplified_binlog_gtid_recovery and in MySQL version 5.7.6 it was renamed to binlog_gtid_simple_recovery. When binlog_gtid_simple_recovery=FALSE, the method of iterating the binary log files is: To initialize gtid_executed, binary log files are iterated from the newest file, stopping at the first binary log that has any Previous_gtids_log_event. All GTIDs from Previous_gtids_log_event and Gtid_log_events are read from this binary log file. This GTID set is stored internally and called gtids_in_binlog. The value of gtid_executed is computed as the union of this set and the GTIDs stored in the mysql.gtid_executed table. This process could take a long time if you had a large number of binary log files without GTID events, for example created when gtid_mode=OFF. To initialize gtid_purged, binary log files are iterated from the oldest to the newest, stopping at the first binary log that contains either a Previous_gtids_log_event that is nonempty (that has at least one GTID) or that has at least one Gtid_log_event. From this binary log it reads Previous_gtids_log_event. This GTID set is subtracted from gtids_in_binlog and the result stored in the internal variable gtids_in_binlog_not_purged. The value of gtid_purged is initialized to the value of gtid_executed, minus gtids_in_binlog_not_purged. When binlog_gtid_simple_recovery=TRUE, which is the default in MySQL 5.7.7 and later, the server iterates only the oldest and the newest binary log files and the values of gtid_purged and gtid_executed are computed based only on Previous_gtids_log_event or Gtid_log_event found in these files. This ensures only two binary log files are iterated during server restart or when binary logs are being purged. 官方注意点 Note If this option is enabled, gtid_executed and gtid_purged may be initialized incorrectly in the following situations: The newest binary log was generated by MySQL 5.7.5 or older, and gtid_mode was ON for some binary logs but OFF for the newest binary log. A SET GTID_PURGED statement was issued on a MySQL version prior to 5.7.7, and the binary log that was active at the time of the SET GTID_PURGED has not yet been purged. If an incorrect GTID set is computed in either situation, it will remain incorrect even if the server is later restarted, regardless of the value of this option. 个人理解与总结 1. 这个变量用于在MySQL重启或启动的时候寻找GTIDs过程中,控制binlog 如何遍历的算法? 2. 当binlog_gtid_simple_recovery=FALSE 时: 为了初始化 gtid_executed,算法是: 从newest_binlog -> oldest_binlog 方向遍历读取,如果发现有Previous_gtids_log_event , 那么就停止遍历 为了初始化 gtid_purged,算法是: 从oldest_binlog -> newest_binlog 方向遍历读取, 如果发现有Previous_gtids_log_event(not empty)或者 至少有一个Gtid_log_event的文件,那么就停止遍历 3. 当binlog_gtid_simple_recovery=TRUE 时: 为了初始化 gtid_executed , 算法是: 只需要读取newest_binlog 为了初始化 gtid_purged, 算法是: 只需要读取oldest_binlog 4. 当设置binlog_gtid_simple_recovery=TRUE , 如果MySQL版本低于5.7.7 , 可能会有gitd计算出错的可能,具体参考官方文档详细描述 根据以上解读,那么如果存在非gtid的binlog比较多的时候,会非常影响性能的。接下来,我们就来好好测试这种场景 测试案例 重点测试non-gtid和gtid混合的情况: This process could take a long time if you had a large number of binary log files without GTID events, for example created when gtid_mode=OFF.测试当删除binlog的时候,是如何重置gtid_purged值的 环境 MySQL5.7.13 binlog_gtid_simple_recovery = false => 这是重点 GTID升级:non-GTID -> GTID 后,purge binary logs => 这也是重点 binlog -rw-r----- 1 mysql mysql 177 May 3 11:23 tjtx-126-164.000001 -rw-r----- 1 mysql mysql 1074589597 May 3 11:29 tjtx-126-164.000002 -rw-r----- 1 mysql mysql 1074589060 May 3 11:30 tjtx-126-164.000003 -rw-r----- 1 mysql mysql 1074589063 May 3 11:31 tjtx-126-164.000004 -rw-r----- 1 mysql mysql 1074589065 May 3 11:32 tjtx-126-164.000005 -rw-r----- 1 mysql mysql 1074589051 May 3 11:33 tjtx-126-164.000006 -rw-r----- 1 mysql mysql 1074589045 May 3 11:33 tjtx-126-164.000007 -rw-r----- 1 mysql mysql 1074589047 May 3 11:34 tjtx-126-164.000008 -rw-r----- 1 mysql mysql 1074589050 May 3 11:35 tjtx-126-164.000009 -rw-r----- 1 mysql mysql 1074589052 May 3 11:36 tjtx-126-164.000010 -rw-r----- 1 mysql mysql 1074589062 May 3 11:37 tjtx-126-164.000011 -rw-r----- 1 mysql mysql 1074589068 May 3 11:37 tjtx-126-164.000012 -rw-r----- 1 mysql mysql 1074589045 May 3 11:38 tjtx-126-164.000013 -rw-r----- 1 mysql mysql 1074589038 May 3 11:39 tjtx-126-164.000014 -rw-r----- 1 mysql mysql 1074589055 May 3 11:40 tjtx-126-164.000015 -rw-r----- 1 mysql mysql 1074589050 May 3 11:41 tjtx-126-164.000016 -rw-r----- 1 mysql mysql 1074589063 May 3 11:41 tjtx-126-164.000017 -rw-r----- 1 mysql mysql 1074589055 May 3 11:42 tjtx-126-164.000018 -rw-r----- 1 mysql mysql 1074589048 May 3 11:43 tjtx-126-164.000019 -rw-r----- 1 mysql mysql 1074515950 May 3 11:45 tjtx-126-164.000020 -rw-r----- 1 mysql mysql 1074589069 May 3 11:46 tjtx-126-164.000021 -rw-r----- 1 mysql mysql 1074589051 May 3 11:47 tjtx-126-164.000022 -rw-r----- 1 mysql mysql 1074589063 May 3 11:47 tjtx-126-164.000023 -rw-r----- 1 mysql mysql 1074589051 May 3 11:48 tjtx-126-164.000024 -rw-r----- 1 mysql mysql 321034919 May 3 13:53 tjtx-126-164.000025 -rw-r----- 1 mysql mysql 204 May 3 13:53 tjtx-126-164.000026 -rw-r----- 1 mysql mysql 204 May 3 13:53 tjtx-126-164.000027 -rw-r----- 1 mysql mysql 1092 May 3 13:55 tjtx-126-164.000028 -rw-r----- 1 mysql mysql 194 May 3 13:55 tjtx-126-164.000029 tjtx-126-164.000001 ~ tjtx-126-164.000028 Previous-GTIDs # [empty] tjtx-126-164.000029 #180503 13:55:05 server id 1261261646 end_log_pos 194 CRC32 0xb77b80b7 Previous-GTIDs # 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3 测试开始 <master> dba:lc> purge binary logs to 'tjtx-126-164.000005'; Query OK, 0 rows affected (1 min 14.41 sec) --执行时间竟然长达一分钟 dba:lc> insert into t select 300; --master的事务卡住 Query OK, 1 row affected (1 min 9.42 sec) Records: 1 Duplicates: 0 Warnings: 0 strace跟踪 在从头到尾遍历binlog ,从而再次验证了我们之前的算法理论。 63639 14:07:50.394945 read(55, "05488963387-39206410793-66801786"..., 8192) = 8192 <0.000011> 63639 14:07:50.395005 read(55, "-66498258471-55447794725-7620591"..., 8192) = 8192 <0.000010> 63639 14:07:50.395065 read(55, "7;10709822844-35491948145-283531"..., 8192) = 8192 <0.000012> 63639 14:07:50.395129 read(55, "17-05336385032;74931753923-32217"..., 8192) = 8192 <0.000011> 63639 14:07:50.395191 read(55, "053-81565945575-96536403914;8342"..., 8192) = 8192 <0.000011> 63639 14:07:50.395250 read(55, "7139-77543559499-90858749831-907"..., 8192) = 8192 <0.000010> 63639 14:07:50.395310 read(55, "07981-10898305107-65423962210-93"..., 8192) = 8192 <0.000011> 63639 14:07:50.395371 read(55, "009985-68038808770-60998915978-7"..., 8192) = 8192 <0.000010> 63639 14:07:50.395430 read(55, "3665266-98504623794-11513728759-"..., 8192) = 8192 <0.000011> 63639 14:07:50.395491 read(55, "54495717-21332716078-74081433759"..., 8192) = 8192 <0.000010> 63639 14:07:50.395550 read(55, "873221923-40252274459-8633934300"..., 8192) = 8192 <0.000010> 63639 14:07:50.395610 read(55, "2609904861-91693621073-471178324"..., 8192) = 8192 <0.000010> 63639 14:07:50.125372 open("/data/mysql.bin/tjtx-126-164.~rec~", O_RDWR|O_CREAT, 0640) = 53 <0.000039> 63639 14:07:50.125769 open("/data/mysql.bin/tjtx-126-164.index_crash_safe", O_RDWR|O_CREAT, 0640) = 55 <0.000031> 63639 14:07:50.126150 open("/data/mysql.bin/tjtx-126-164.index", O_RDWR|O_CREAT, 0640) = 3 <0.000013> 。。。。。。。。。。。。。。。。。 63639 14:07:50.126554 open("/data/mysql.bin/tjtx-126-164.000005", O_RDONLY) = 55 <0.000012> 63639 14:07:53.857069 open("/data/mysql.bin/tjtx-126-164.000006", O_RDONLY) = 55 <0.000018> 63639 14:07:57.516826 open("/data/mysql.bin/tjtx-126-164.000007", O_RDONLY) = 55 <0.000016> 63639 14:08:01.169413 open("/data/mysql.bin/tjtx-126-164.000008", O_RDONLY) = 55 <0.000018> 63639 14:08:04.815608 open("/data/mysql.bin/tjtx-126-164.000009", O_RDONLY) = 55 <0.000015> 63639 14:08:08.473808 open("/data/mysql.bin/tjtx-126-164.000010", O_RDONLY) = 55 <0.000015> 63639 14:08:12.449964 open("/data/mysql.bin/tjtx-126-164.000011", O_RDONLY) = 55 <0.000018> 63639 14:08:16.251054 open("/data/mysql.bin/tjtx-126-164.000012", O_RDONLY) = 55 <0.000019> 63639 14:08:19.686003 open("/data/mysql.bin/tjtx-126-164.000013", O_RDONLY) = 55 <0.000015> 63639 14:08:23.341291 open("/data/mysql.bin/tjtx-126-164.000014", O_RDONLY) = 55 <0.000017> 63639 14:08:27.014210 open("/data/mysql.bin/tjtx-126-164.000015", O_RDONLY) = 55 <0.000016> 63639 14:08:30.625242 open("/data/mysql.bin/tjtx-126-164.000016", O_RDONLY) = 55 <0.000016> 63639 14:08:34.192385 open("/data/mysql.bin/tjtx-126-164.000017", O_RDONLY) = 55 <0.000015> 63639 14:08:37.862750 open("/data/mysql.bin/tjtx-126-164.000018", O_RDONLY) = 55 <0.000016> 63639 14:08:41.533869 open("/data/mysql.bin/tjtx-126-164.000019", O_RDONLY) = 55 <0.000016> 63639 14:08:45.202949 open("/data/mysql.bin/tjtx-126-164.000020", O_RDONLY) = 55 <0.000017> 63639 14:08:48.792088 open("/data/mysql.bin/tjtx-126-164.000021", O_RDONLY) = 55 <0.000017> 63639 14:08:52.266700 open("/data/mysql.bin/tjtx-126-164.000022", O_RDONLY) = 55 <0.000017> 63639 14:08:55.932879 open("/data/mysql.bin/tjtx-126-164.000023", O_RDONLY) = 55 <0.000017> 63639 14:08:59.594761 open("/data/mysql.bin/tjtx-126-164.000024", O_RDONLY) = 55 <0.000015> 63639 14:09:03.256451 open("/data/mysql.bin/tjtx-126-164.000025", O_RDONLY) = 55 <0.000015> 63639 14:09:04.349108 open("/data/mysql.bin/tjtx-126-164.000026", O_RDONLY) = 55 <0.000014> 63639 14:09:04.349280 open("/data/mysql.bin/tjtx-126-164.000027", O_RDONLY) = 55 <0.000010> 63639 14:09:04.349434 open("/data/mysql.bin/tjtx-126-164.000028", O_RDONLY) = 55 <0.000010> 查看fd=55的句柄: [root@tjtx-126-164 tmp]# ll /proc/62382/fd | grep 55 lr-x------ 1 root root 64 May 3 14:17 55 -> /data/mysql.bin/tjtx-126-164.000009 [root@tjtx-126-164 tmp]# ll /proc/62382/fd | grep 55 lr-x------ 1 root root 64 May 3 14:17 55 -> /data/mysql.bin/tjtx-126-164.000010 [root@tjtx-126-164 tmp]# ll /proc/62382/fd | grep 55 lr-x------ 1 root root 64 May 3 14:17 55 -> /data/mysql.bin/tjtx-126-164.000010 [root@tjtx-126-164 tmp]# ll /proc/62382/fd | grep 55 lr-x------ 1 root root 64 May 3 14:17 55 -> /data/mysql.bin/tjtx-126-164.000010 测试二 环境 MySQL5.7.13 binlog_gtid_simple_recovery = true non-GTID -> GTID 后,purge binary logs 模拟开始 dba:(none)> purge binary logs to 'tjtx-126-164.000007'; Query OK, 0 rows affected (4.06 sec) --非常快 dba:(none)> show global variables like '%gtid%'; +----------------------------------+------------------------------------------+ | Variable_name | Value | +----------------------------------+------------------------------------------+ | binlog_gtid_simple_recovery | ON | | enforce_gtid_consistency | ON | | gtid_executed | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-7 | | gtid_executed_compression_period | 1000 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | | | session_track_gtids | OFF | +----------------------------------+------------------------------------------+ 8 rows in set (0.00 sec) strace分析:只读取了oldest的binlog 文件 115529 14:31:31.096480 open("/data/mysql.bin/tjtx-126-164.~rec~", O_RDWR|O_CREAT, 0640) = 51 <0.000031> 115529 14:31:31.096777 open("/data/mysql.bin/tjtx-126-164.index_crash_safe", O_RDWR|O_CREAT, 0640) = 52 <0.000029> 115529 14:31:31.097111 open("/data/mysql.bin/tjtx-126-164.index", O_RDWR|O_CREAT, 0640) = 3 <0.000023> 115529 14:31:31.097502 open("/data/mysql.bin/tjtx-126-164.000007", O_RDONLY) = 52 <0.000012> dba:(none)> purge binary logs to 'tjtx-126-164.000029'; Query OK, 0 rows affected (0.00 sec) dba:(none)> show global variables like '%gtid%'; +----------------------------------+------------------------------------------+ | Variable_name | Value | +----------------------------------+------------------------------------------+ | binlog_gtid_simple_recovery | ON | | enforce_gtid_consistency | ON | | gtid_executed | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-7 | | gtid_executed_compression_period | 1000 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3 | --直到000029这个binlog文件读取,才能初始化gitid_purged值,否则为空 | session_track_gtids | OFF | +----------------------------------+------------------------------------------+ 8 rows in set (0.00 sec) 算法总结 1. MySQL重启 当binlog_gtid_simple_recovery=FALSE 时: 为了初始化 gtid_executed,算法是: 从newest_binlog -> oldest_binlog 方向遍历读取,如果发现有Previous_gtids_log_event , 那么就停止遍历。 为了初始化 gtid_purged,算法是: 从oldest_binlog -> newest_binlog 方向遍历读取, 如果发现有Previous_gtids_log_event(not empty)或者 至少有一个Gtid_log_event的文件,那么就停止遍历 当binlog_gtid_simple_recovery=TRUE 时: 为了初始化 gtid_executed , 算法是: 只需要读取newest_binlog。 如果没有,则为空 为了初始化 gtid_purged, 算法是: 只需要读取oldest_binlog。如果没有,则为空 2. binlog rotate(expire_logs_day , purge binary logs to '' 等) 当binlog_gtid_simple_recovery=FALSE 时: 为了初始化 gtid_purged , 从oldest_binlog -> newest_binlog 方向遍历读取, 如果发现有Previous_gtids_log_event(not empty)或者 至少有一个Gtid_log_event的文件,那么就停止遍历 当binlog_gtid_simple_recovery=TRUE 时: 为了初始化 gtid_purged, 算法是: 只需要读取oldest_binlog。 如果没有,则为空 需要注意的点 在线GTID升级的时候,binlog_gtid_simple_recovery = TRUE 必须打开,否则在binlog 删除的时候,会发生阻塞状况 在线GTID升级的时候,尽量将非GTID的binlog备份好,然后删除掉,以免出现莫名其妙的错误
网上一搜这个关键字,得到的结果大多都是delete、truncate、drop之间的区别 但是今天我们要讲的内容,是我们在生产环境中遇到的真实案例 互联网公司一般对大表,都会采用分区表或者物理分表吧,这里主要描述的是分表的删除过程中的问题 案例一 环境 MySQL5.1.54 InnoDB 128G memory innodb_adaptive_hash_index=1 centos6.6 问题复现 业务描述:由于磁盘空间不断上涨,DBA定期会去删除业务过期不用的表,比如自动删除1年之前的数据 当执行到 drop table的时候,实例的慢查询从 5/s 飙升到 3000/s, 10ms 飙升到 200ms (表大小在10G左右) slow sql的种类非常多,并不是被删除表的slow ,而是其他表的查询变慢。 别问我怎么知道的,这是我们的slow 监控系统发现的 那么问题来了,Drop table 为啥会出现这样的问题呢? 当然,当时还不得而知,后来我们就换了种思路,drop table = truncate table + drop table,结果神奇的是:慢查询不见了 案例二 环境 MySQL5.6.27 InnoDB 128G memory innodb_adaptive_hash_index=1 centos6.6 问题复现 业务描述:由于磁盘空间不断上涨,DBA定期会去删除业务过期不用的表,比如自动删除1年之前的数据 由于之前5.1的经验,我们的脚本同样延续之前的做法,drop table = truncate table + drop table 结果在 truncate table 的时候,慢查询出现了,症状和5.1之前的drop table 一模一样 后来,我们还是换了种思路,将truncate table 直接换成 drop table , 神奇的是:慢查询又不见了 案例三 最后一个血腥的案例 环境 MySQL5.6.27 MySQL5.7.21 InnoDB 128G memory innodb_adaptive_hash_index=1 centos6.6 M(5.6) | -> S1(5.6) -> S2(5.6) -> S3(5.7) 问题 * 业务描述: 一主三从,从库有2个版本,2个是5.6.27,1个是5.7.21 在Master 执行truncate table xx , 当master执行完毕后,过了一会,3个slave全部开始延迟,延迟一段时间后,其中两个crash,自动重启。 这是多么一个神奇的事情 * 错误日志 mysql tables in use 1, locked 1 0 lock struct(s), heap size 1136, 0 row lock(s) MySQL thread id 24952, OS thread handle 140332353423104, query id 70663631 System lock truncate table broker_lifecycle_status SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 88024435 --Thread 140329857865472 has waited at buf0flu.cc line 1209 for 0.00 seconds the semaphore: SX-lock on RW-latch at 0x7faf43804cb8 created in file buf0buf.cc line 1460 a writer (thread id 140332353423104) has reserved it in mode SX number of readers 0, waiters flag 1, lock_word: 10000000 Last time read locked in file row0sel.cc line 3751 Last time write locked in file /export/home/pb2/build/sb_0-26514852-1514433850.9/mysql-5.7.21/storage/innobase/fsp/fsp0fsp.cc line 656 --Thread 140332352624384 has waited at ha_innodb.cc line 5582 for 240.00 seconds the semaphore: Mutex at 0x8af31638, Mutex DICT_SYS created dict0dict.cc:1172, lock var 1 --Thread 140332512077568 has waited at srv0srv.cc line 1982 for 270.00 seconds the semaphore: X-lock on RW-latch at 0x8af31598 created in file dict0dict.cc line 1183 a writer (thread id 140332353423104) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file row0purge.cc line 862 Last time write locked in file /export/home/pb2/build/sb_0-26514852-1514433850.9/mysql-5.7.21/storage/innobase/row/row0trunc.cc line 1835 OS WAIT ARRAY INFO: signal count 232035265 RW-shared spins 0, rounds 279022143, OS waits 12780361 RW-excl spins 0, rounds 1878235251, OS waits 18129012 RW-sx spins 129404128, rounds 2720581091, OS waits 36295215 Spin rounds per wait: 279022143.00 RW-shared, 1878235251.00 RW-excl, 21.02 RW-sx InnoDB: ###### Diagnostic info printed to the standard error stream 2018-04-17T11:59:30.590002+08:00 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung. 2018-04-17 11:59:30 0x7fa1b6c88700 InnoDB: Assertion failure in thread 140332533057280 in file ut0ut.cc line 942 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. * strace 结果 91079 restart_syscall(<... resuming interrupted call ...> <unfinished ...> 91071 futex(0x8af31764, FUTEX_WAIT_PRIVATE, 3508, NULL <unfinished ...> 90833 futex(0x1e73a84, FUTEX_WAIT_PRIVATE, 295237, NULL <unfinished ...> 44265 restart_syscall(<... resuming interrupted call ...> <unfinished ...> 14421 futex(0x1e73a84, FUTEX_WAIT_PRIVATE, 295236, NULL <unfinished ...> 25300 restart_syscall(<... resuming interrupted call ...> <unfinished ...> 192855 futex(0x8af31764, FUTEX_WAIT_PRIVATE, 3507, NULL <unfinished ...> 60765 restart_syscall(<... resuming interrupted call ...> <unfinished ...> 51043 restart_syscall(<... resuming interrupted call ...> <unfinished ...> 36890 futex(0x1e41a44, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> 36889 rt_sigtimedwait([HUP QUIT TERM], NULL, NULL, 8 <unfinished ...> 36886 futex(0x3b99394, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> 36885 restart_syscall(<... resuming interrupted call ...> <unfinished ...> 36884 restart_syscall(<... resuming interrupted call ...> <unfinished ...> 36883 futex(0x3b99274, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> 36882 futex(0x3b990c4, FUTEX_WAIT_PRIVATE, 170512183, NULL <unfinished ...> 36881 futex(0x3b99034, FUTEX_WAIT_PRIVATE, 230090551, NULL <unfinished ...> 36880 futex(0x3b98fa4, FUTEX_WAIT_PRIVATE, 302280849, NULL <unfinished ...> 36879 futex(0x3b98f14, FUTEX_WAIT_PRIVATE, 72214487, NULL <unfinished ...> 36878 futex(0x8af0b5c4, FUTEX_WAIT_PRIVATE, 23749, NULL <unfinished ...> 36877 restart_syscall(<... resuming interrupted call ...> <unfinished ...> 36876 restart_syscall(<... resuming interrupted call ...> <unfinished ...> 36875 restart_syscall(<... resuming interrupted call ...> <unfinished ...> 36873 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270083, NULL <unfinished ...> 36872 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270082, NULL <unfinished ...> 36871 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270080, NULL <unfinished ...> 36870 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270081, NULL <unfinished ...> 36869 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270077, NULL <unfinished ...> 36868 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270079, NULL <unfinished ...> 36867 futex(0x8af01f64, FUTEX_WAIT_PRIVATE, 12270078, NULL <unfinished ...> 36866 futex(0x33dd8ae4, FUTEX_WAIT_PRIVATE, 6225, NULL <unfinished ...> 36865 io_getevents(140422104584192, 1, 256, <unfinished ...> 36864 io_getevents(140422104596480, 1, 256, <unfinished ...> 36863 io_getevents(140422104608768, 1, 256, <unfinished ...> 36862 io_getevents(140422104621056, 1, 256, <unfinished ...> 36861 io_getevents(140422104633344, 1, 256, <unfinished ...> 36860 io_getevents(140422104645632, 1, 256, <unfinished ...> 分析 1. 很明显,执行truncate再slave被卡住了 2. 3台slave都报同样的错误,卡了很长时间,ACTIVE 252 sec truncating table 一直增加。 3. 其中两台slave(1台5.6,1台5.7)Semaphore wait has lasted > 600 seconds之后重启 4. 还有一台slave 当Semaphore快要到600s的时候,结束了,所以怀疑是MySQL bug(MySQL8.0修复),truncate的时候purge线程处理太慢导致 5. 网上也有类似故障: Truncation was executed on master which executed it as expected without failure but immediate slave crashed afterwards, heavy read activity is mostly happening on slave https://bugs.mysql.com/bug.php?id=68184 https://bugs.launchpad.net/percona-server/+bug/1633869 6. 还有说innodb_purge_threads=4导致的,但是我们的MySQL5.6设置的就是1,所以这个不成立。 https://blog.pythian.com/mysql-crashes-ddl-statement-lesson-purge-threads/ 总结 好了,这几个问题彻底弄晕我们,不同版本之间的truncate 和 drop 会给用户带来不同的体验 后来,仔细研究了下官方文档,其实有一些猫腻在里面,有兴趣的同学可以深究下 https://dev.mysql.com/doc/refman/5.7/en/truncate-table.html On a system with a large InnoDB buffer pool and innodb_adaptive_hash_index enabled, TRUNCATE TABLE operations may cause a temporary drop in system performance due to an LRU scan that occurs when removing an InnoDB table's adaptive hash index entries. The problem was addressed for DROP TABLE in MySQL 5.5.23 (Bug #13704145, Bug #64284) but remains a known issue for TRUNCATE TABLE (Bug #68184). 最佳实践 MySQL5.6版本以下:使用truncate table + drop table 替代 drop table MySQL5.6版本+ : 直接使用drop table
MHA failover NON-GTID 专题 这里以masterha_master_switch为背景详解各种可能遇到的场景 假定环境(经典三节点) host_1(host_1:3306) (current master) +--host_2(host_2:3306 slave[candidate master]) +--host_3(host_3:3306 etl) 一、Master : MySQL down 1.1 etl 延迟8小时 配置文件中加上no_check_delay=0 即可忽略报错 1.2 slave(候选master)比etl还要落后更多 1.2.1 当master的部分日志还没传递两个slave,这时候master 上的MySQL挂了 ### 模拟现场,现场的3台DB binlog状态 * master host_2 dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_2.000002 | 1920 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) * slave (candidate master) host_1 Master_Log_File: host_2.000002 Read_Master_Log_Pos: 150 Exec_Master_Log_Pos: 150 * etl (other slave) Master_Log_File: host_2.000002 Read_Master_Log_Pos: 1035 Exec_Master_Log_Pos: 1035 ### 切换日志 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host=host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Mon Nov 13 16:24:03 2017 - [info] MHA::MasterFailover version 0.56. Mon Nov 13 16:24:03 2017 - [info] Starting master failover. Mon Nov 13 16:24:03 2017 - [info] Mon Nov 13 16:24:03 2017 - [info] * Phase 1: Configuration Check Phase.. Mon Nov 13 16:24:03 2017 - [info] Mon Nov 13 16:24:03 2017 - [info] HealthCheck: SSH to host_2 is reachable. Mon Nov 13 16:24:03 2017 - [info] Binlog server host_2 is reachable. Mon Nov 13 16:24:06 2017 - [info] HealthCheck: SSH to host_3 is reachable. Mon Nov 13 16:24:06 2017 - [info] Binlog server host_3 is reachable. Mon Nov 13 16:24:06 2017 - [warning] SQL Thread is stopped(no error) on host_1(host_1:3306) Mon Nov 13 16:24:06 2017 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306) Mon Nov 13 16:24:06 2017 - [info] GTID failover mode = 0 Mon Nov 13 16:24:06 2017 - [info] Dead Servers: Mon Nov 13 16:24:06 2017 - [info] host_2(host_2:3306) Mon Nov 13 16:24:06 2017 - [info] Checking master reachability via MySQL(double check)... Mon Nov 13 16:24:06 2017 - [info] ok. Mon Nov 13 16:24:06 2017 - [info] Alive Servers: Mon Nov 13 16:24:06 2017 - [info] host_1(host_1:3306) Mon Nov 13 16:24:06 2017 - [info] host_3(host_3:3306) Mon Nov 13 16:24:06 2017 - [info] Alive Slaves: Mon Nov 13 16:24:06 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:06 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:06 2017 - [info] Primary candidate for the new Master (candidate_master is set) Mon Nov 13 16:24:06 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:06 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:06 2017 - [info] Not candidate for the new Master (no_master is set) Mon Nov 13 16:24:06 2017 - [info] Starting SQL thread on host_1(host_1:3306) .. Mon Nov 13 16:24:06 2017 - [info] done. Mon Nov 13 16:24:06 2017 - [info] Starting SQL thread on host_3(host_3:3306) .. Mon Nov 13 16:24:06 2017 - [info] done. Mon Nov 13 16:24:06 2017 - [info] Starting Non-GTID based failover. Mon Nov 13 16:24:06 2017 - [info] Mon Nov 13 16:24:06 2017 - [info] ** Phase 1: Configuration Check Phase completed. Mon Nov 13 16:24:06 2017 - [info] Mon Nov 13 16:24:06 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Mon Nov 13 16:24:06 2017 - [info] Mon Nov 13 16:24:06 2017 - [info] HealthCheck: SSH to host_2 is reachable. Mon Nov 13 16:24:06 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Mon Nov 13 16:24:06 2017 - [info] Executing master IP deactivation script: Mon Nov 13 16:24:06 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --command=stopssh --ssh_user=root =================== swift vip : tgw_vip from host_2 is deleted ============================== --2017-11-13 16:24:07-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 11.2M=0s 2017-11-13 16:24:11 (11.2 MB/s) - 已写入标准输出 [38] Mon Nov 13 16:24:11 2017 - [info] done. Mon Nov 13 16:24:11 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Mon Nov 13 16:24:11 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Mon Nov 13 16:24:11 2017 - [info] Mon Nov 13 16:24:11 2017 - [info] * Phase 3: Master Recovery Phase.. Mon Nov 13 16:24:11 2017 - [info] Mon Nov 13 16:24:11 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Mon Nov 13 16:24:11 2017 - [info] Mon Nov 13 16:24:11 2017 - [info] The latest binary log file/position on all slaves is host_2.000002:1035 Mon Nov 13 16:24:11 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Mon Nov 13 16:24:11 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:11 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:11 2017 - [info] Not candidate for the new Master (no_master is set) Mon Nov 13 16:24:11 2017 - [info] The oldest binary log file/position on all slaves is host_2.000002:150 Mon Nov 13 16:24:11 2017 - [info] Oldest slaves: Mon Nov 13 16:24:11 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:11 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:11 2017 - [info] Primary candidate for the new Master (candidate_master is set) Mon Nov 13 16:24:11 2017 - [info] Mon Nov 13 16:24:11 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Mon Nov 13 16:24:11 2017 - [info] Mon Nov 13 16:24:11 2017 - [info] Fetching dead master's binary logs.. Mon Nov 13 16:24:11 2017 - [info] Executing command on the dead master host_2(host_2:3306): save_binary_logs --command=save --start_file=host_2.000002 --start_pos=1035 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/log/masterha/mha_test if not exists.. ok. Concat binary/relay logs from host_2.000002 pos 1035 to host_2.000002 EOF into /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog .. Dumping binlog format description event, from position 0 to 150.. ok. Dumping effective binlog data from /data/mysql.bin/host_2.000002 position 1035 to tail(1939).. ok. Concat succeeded. Mon Nov 13 16:24:12 2017 - [info] scp from root@host_2:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog to local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog succeeded. Mon Nov 13 16:24:12 2017 - [info] HealthCheck: SSH to host_1 is reachable. Mon Nov 13 16:24:12 2017 - [info] HealthCheck: SSH to host_3 is reachable. Mon Nov 13 16:24:13 2017 - [info] Mon Nov 13 16:24:13 2017 - [info] * Phase 3.3: Determining New Master Phase.. Mon Nov 13 16:24:13 2017 - [info] Mon Nov 13 16:24:13 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Mon Nov 13 16:24:13 2017 - [info] Checking whether host_3 has relay logs from the oldest position.. Mon Nov 13 16:24:13 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_2.000002 --latest_rmlp=1035 --target_mlf=host_2.000002 --target_rmlp=150 --server_id=1261261666 --workdir=/var/log/masterha/mha_test --timestamp=20171113162403 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000005 : Relay log found at /data/mysql_data, up to host_3-relay-bin.000005 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000005, start_pos:269. Target relay log FOUND! Mon Nov 13 16:24:13 2017 - [info] OK. host_3 has all relay logs. Mon Nov 13 16:24:13 2017 - [info] Searching new master from slaves.. Mon Nov 13 16:24:13 2017 - [info] Candidate masters from the configuration file: Mon Nov 13 16:24:13 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:13 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:13 2017 - [info] Primary candidate for the new Master (candidate_master is set) Mon Nov 13 16:24:13 2017 - [info] Non-candidate masters: Mon Nov 13 16:24:13 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Mon Nov 13 16:24:13 2017 - [info] Replicating from host_2(host_2:3306) Mon Nov 13 16:24:13 2017 - [info] Not candidate for the new Master (no_master is set) Mon Nov 13 16:24:13 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Mon Nov 13 16:24:13 2017 - [info] Not found. Mon Nov 13 16:24:13 2017 - [info] Searching from all candidate_master slaves.. Mon Nov 13 16:24:13 2017 - [info] New master is host_1(host_1:3306) Mon Nov 13 16:24:13 2017 - [info] Starting master failover.. Mon Nov 13 16:24:13 2017 - [info] From: host_2(host_2:3306) (current master) +--host_1(host_1:3306) +--host_3(host_3:3306) To: host_1(host_1:3306) (new master) +--host_3(host_3:3306) Mon Nov 13 16:24:13 2017 - [info] Mon Nov 13 16:24:13 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Mon Nov 13 16:24:13 2017 - [info] Mon Nov 13 16:24:13 2017 - [info] Server host_1 received relay logs up to: host_2.000002:150 Mon Nov 13 16:24:13 2017 - [info] Need to get diffs from the latest slave(host_3) up to: host_2.000002:1035 (using the latest slave's relay logs) Mon Nov 13 16:24:13 2017 - [info] Connecting to the latest slave host host_3, generating diff relay log files.. Mon Nov 13 16:24:13 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_1 --latest_mlf=host_2.000002 --latest_rmlp=1035 --target_mlf=host_2.000002 --target_rmlp=150 --server_id=1261261666 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171113162403 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000005 Mon Nov 13 16:24:15 2017 - [info] Relay log found at /data/mysql_data, up to host_3-relay-bin.000005 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000005, start_pos:269. Concat binary/relay logs from host_3-relay-bin.000005 pos 269 to host_3-relay-bin.000005 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog .. Dumping binlog format description event, from position 0 to 269.. ok. Dumping effective binlog data from /data/mysql_data/host_3-relay-bin.000005 position 269 to tail(1154).. ok. Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog . scp host_3.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog to root@host_1(22) succeeded. Mon Nov 13 16:24:15 2017 - [info] Generating diff files succeeded. Mon Nov 13 16:24:15 2017 - [info] Sending binlog.. Mon Nov 13 16:24:16 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog to root@host_1:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog succeeded. Mon Nov 13 16:24:16 2017 - [info] Mon Nov 13 16:24:16 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Mon Nov 13 16:24:16 2017 - [info] Mon Nov 13 16:24:16 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Mon Nov 13 16:24:16 2017 - [info] Starting recovery on host_1(host_1:3306).. Mon Nov 13 16:24:16 2017 - [info] Generating diffs succeeded. Mon Nov 13 16:24:16 2017 - [info] Waiting until all relay logs are applied. Mon Nov 13 16:24:16 2017 - [info] done. Mon Nov 13 16:24:16 2017 - [info] Getting slave status.. Mon Nov 13 16:24:16 2017 - [info] This slave(host_1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000002:150). No need to recover from Exec_Master_Log_Pos. Mon Nov 13 16:24:16 2017 - [info] Connecting to the target slave host host_1, running recover script.. Mon Nov 13 16:24:16 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_1 --slave_ip=host_1 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171113162403 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Mon Nov 13 16:24:16 2017 - [info] Concat all apply files to /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171113162403.binlog .. Copying the first binlog file /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog to /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171113162403.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog.. dumped up to pos 150. ok. /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog has effective binlog events from pos 150. Dumping effective binlog data from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog position 150 to tail(1054).. ok. Concat succeeded. All apply target binary logs are concatinated at /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171113162403.binlog . MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171113162403.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog on host_1:3306. This may take long time... Applying log files succeeded. Mon Nov 13 16:24:16 2017 - [info] All relay logs were successfully applied. Mon Nov 13 16:24:16 2017 - [info] Getting new master's binlog name and position.. Mon Nov 13 16:24:16 2017 - [info] host_1.000001:3232 Mon Nov 13 16:24:16 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_1', MASTER_PORT=3306, MASTER_LOG_FILE='host_1.000001', MASTER_LOG_POS=3232, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Mon Nov 13 16:24:16 2017 - [info] Executing master IP activate script: Mon Nov 13 16:24:16 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --new_master_host=host_1 --new_master_ip=host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_1 is added ============================== Mon Nov 13 16:24:21 2017 - [info] OK. Mon Nov 13 16:24:21 2017 - [info] Setting read_only=0 on host_1(host_1:3306).. Mon Nov 13 16:24:21 2017 - [info] ok. Mon Nov 13 16:24:21 2017 - [info] ** Finished master recovery successfully. Mon Nov 13 16:24:21 2017 - [info] * Phase 3: Master Recovery Phase completed. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] * Phase 4: Slaves Recovery Phase.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 104955. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171113162403.log if it takes time.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] Log messages from host_3 ... Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Mon Nov 13 16:24:21 2017 - [info] End of log messages from host_3. Mon Nov 13 16:24:21 2017 - [info] -- host_3(host_3:3306) has the latest relay log events. Mon Nov 13 16:24:21 2017 - [info] Generating relay diff files from the latest slave succeeded. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 104966. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171113162403.log if it takes time.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] Log messages from host_3 ... Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] Sending binlog.. Mon Nov 13 16:24:21 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog to root@host_3:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog succeeded. Mon Nov 13 16:24:21 2017 - [info] Starting recovery on host_3(host_3:3306).. Mon Nov 13 16:24:21 2017 - [info] Generating diffs succeeded. Mon Nov 13 16:24:21 2017 - [info] Waiting until all relay logs are applied. Mon Nov 13 16:24:21 2017 - [info] done. Mon Nov 13 16:24:21 2017 - [info] Getting slave status.. Mon Nov 13 16:24:21 2017 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000002:1035). No need to recover from Exec_Master_Log_Pos. Mon Nov 13 16:24:21 2017 - [info] Connecting to the target slave host host_3, running recover script.. Mon Nov 13 16:24:21 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171113162403 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Mon Nov 13 16:24:21 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171113162403.binlog on host_3:3306. This may take long time... Applying log files succeeded. Mon Nov 13 16:24:21 2017 - [info] All relay logs were successfully applied. Mon Nov 13 16:24:21 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_1(host_1:3306).. Mon Nov 13 16:24:21 2017 - [info] Executed CHANGE MASTER. Mon Nov 13 16:24:21 2017 - [info] Slave started. Mon Nov 13 16:24:21 2017 - [info] End of log messages from host_3. Mon Nov 13 16:24:21 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Mon Nov 13 16:24:21 2017 - [info] All new slave servers recovered successfully. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] * Phase 5: New master cleanup phase.. Mon Nov 13 16:24:21 2017 - [info] Mon Nov 13 16:24:21 2017 - [info] Resetting slave info on the new master.. Mon Nov 13 16:24:21 2017 - [info] host_1: Resetting slave info succeeded. Mon Nov 13 16:24:21 2017 - [info] Master failover to host_1(host_1:3306) completed successfully. Mon Nov 13 16:24:21 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_2(host_2:3306) to host_1(host_1:3306) succeeded Master host_2(host_2:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_2(host_2:3306) The latest slave host_3(host_3:3306) has all relay logs for recovery. Selected host_1(host_1:3306) as a new master. host_1(host_1:3306): OK: Applying all logs succeeded. host_1(host_1:3306): OK: Activated master IP address. host_3(host_3:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_1(host_1:3306) host_1(host_1:3306): Resetting slave info succeeded. Master failover to host_1(host_1:3306) completed successfully. Mon Nov 13 16:24:21 2017 - [info] Sending mail.. 1.2.2 当master的所有日志已经传递到1个etl,这时候master 上的MySQL挂了 ### 模拟现场,现场的3台DB binlog状态 * master host_2 dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_2.000010 | 1694 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 164 | 1 | | 165 | 2 | | 166 | 3 | +-----+------+ 3 rows in set (0.00 sec) * slave (candidate master) host_1 Master_Log_File: host_2.000010 Exec_Master_Log_Pos: 806 dba:lc> select * from t_char_2; Empty set (0.00 sec) * etl (other slave) host_3 Master_Log_File: host_2.000010 Exec_Master_Log_Pos: 1694 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 164 | 1 | | 165 | 2 | | 166 | 3 | +-----+------+ 3 rows in set (0.00 sec) ### 切换日志 Wed Nov 15 10:25:50 2017 - [info] MHA::MasterFailover version 0.56. Wed Nov 15 10:25:50 2017 - [info] Starting master failover. Wed Nov 15 10:25:50 2017 - [info] Wed Nov 15 10:25:50 2017 - [info] * Phase 1: Configuration Check Phase.. Wed Nov 15 10:25:50 2017 - [info] Wed Nov 15 10:25:50 2017 - [warning] SQL Thread is stopped(no error) on host_1(host_1:3306) Wed Nov 15 10:25:50 2017 - [info] GTID failover mode = 0 Wed Nov 15 10:25:50 2017 - [info] Dead Servers: Wed Nov 15 10:25:50 2017 - [info] host_2(host_2:3306) Wed Nov 15 10:25:50 2017 - [info] Checking master reachability via MySQL(double check)... Wed Nov 15 10:25:50 2017 - [info] ok. Wed Nov 15 10:25:50 2017 - [info] Alive Servers: Wed Nov 15 10:25:50 2017 - [info] host_1(host_1:3306) Wed Nov 15 10:25:50 2017 - [info] host_3(host_3:3306) Wed Nov 15 10:25:50 2017 - [info] Alive Slaves: Wed Nov 15 10:25:50 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:50 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:50 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:25:50 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:50 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:50 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:25:50 2017 - [info] Starting SQL thread on host_1(host_1:3306) .. Wed Nov 15 10:25:50 2017 - [info] done. Wed Nov 15 10:25:50 2017 - [info] Starting Non-GTID based failover. Wed Nov 15 10:25:50 2017 - [info] Wed Nov 15 10:25:50 2017 - [info] ** Phase 1: Configuration Check Phase completed. Wed Nov 15 10:25:50 2017 - [info] Wed Nov 15 10:25:50 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Wed Nov 15 10:25:50 2017 - [info] Wed Nov 15 10:25:50 2017 - [info] HealthCheck: SSH to host_2 is reachable. Wed Nov 15 10:25:51 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Wed Nov 15 10:25:51 2017 - [info] Executing master IP deactivation script: Wed Nov 15 10:25:51 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --command=stopssh --ssh_user=root =================== swift vip : tgw_vip from host_2 is deleted ============================== --2017-11-15 10:25:51-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 11.1M=0s 2017-11-15 10:25:53 (11.1 MB/s) - 已写入标准输出 [38] Wed Nov 15 10:25:53 2017 - [info] done. Wed Nov 15 10:25:53 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Wed Nov 15 10:25:53 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Wed Nov 15 10:25:53 2017 - [info] Wed Nov 15 10:25:53 2017 - [info] * Phase 3: Master Recovery Phase.. Wed Nov 15 10:25:53 2017 - [info] Wed Nov 15 10:25:53 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Wed Nov 15 10:25:53 2017 - [info] Wed Nov 15 10:25:53 2017 - [info] The latest binary log file/position on all slaves is host_2.000010:1694 Wed Nov 15 10:25:53 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Wed Nov 15 10:25:53 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:53 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:53 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:25:53 2017 - [info] The oldest binary log file/position on all slaves is host_2.000010:806 Wed Nov 15 10:25:53 2017 - [info] Oldest slaves: Wed Nov 15 10:25:53 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:53 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:53 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:25:53 2017 - [info] Wed Nov 15 10:25:53 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Wed Nov 15 10:25:53 2017 - [info] Wed Nov 15 10:25:53 2017 - [info] Fetching dead master's binary logs.. Wed Nov 15 10:25:53 2017 - [info] Executing command on the dead master host_2(host_2:3306): save_binary_logs --command=save --start_file=host_2.000010 --start_pos=1694 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/log/masterha/mha_test if not exists.. ok. Concat binary/relay logs from host_2.000010 pos 1694 to host_2.000010 EOF into /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog .. Dumping binlog format description event, from position 0 to 150.. ok. Dumping effective binlog data from /data/mysql.bin/host_2.000010 position 1694 to tail(1713).. ok. Concat succeeded. Wed Nov 15 10:25:53 2017 - [info] scp from root@host_2:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog to local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog succeeded. Wed Nov 15 10:25:54 2017 - [info] HealthCheck: SSH to host_1 is reachable. Wed Nov 15 10:25:54 2017 - [info] HealthCheck: SSH to host_3 is reachable. Wed Nov 15 10:25:54 2017 - [info] Wed Nov 15 10:25:54 2017 - [info] * Phase 3.3: Determining New Master Phase.. Wed Nov 15 10:25:54 2017 - [info] Wed Nov 15 10:25:54 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Wed Nov 15 10:25:54 2017 - [info] Checking whether host_3 has relay logs from the oldest position.. Wed Nov 15 10:25:54 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_2.000010 --latest_rmlp=1694 --target_mlf=host_2.000010 --target_rmlp=806 --server_id=1261261666 --workdir=/var/log/masterha/mha_test --timestamp=20171115102550 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000004 : Relay log found at /data/mysql_data, up to host_3-relay-bin.000004 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000004, start_pos:1017. Target relay log FOUND! Wed Nov 15 10:25:54 2017 - [info] OK. host_3 has all relay logs. Wed Nov 15 10:25:54 2017 - [info] Searching new master from slaves.. Wed Nov 15 10:25:54 2017 - [info] Candidate masters from the configuration file: Wed Nov 15 10:25:54 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:54 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:54 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:25:54 2017 - [info] Non-candidate masters: Wed Nov 15 10:25:54 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:25:54 2017 - [info] Replicating from host_2(host_2:3306) Wed Nov 15 10:25:54 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:25:54 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Wed Nov 15 10:25:54 2017 - [info] Not found. Wed Nov 15 10:25:54 2017 - [info] Searching from all candidate_master slaves.. Wed Nov 15 10:25:54 2017 - [info] New master is host_1(host_1:3306) Wed Nov 15 10:25:54 2017 - [info] Starting master failover.. Wed Nov 15 10:25:54 2017 - [info] From: host_2(host_2:3306) (current master) +--host_1(host_1:3306) +--host_3(host_3:3306) To: host_1(host_1:3306) (new master) +--host_3(host_3:3306) Wed Nov 15 10:25:54 2017 - [info] Wed Nov 15 10:25:54 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Wed Nov 15 10:25:54 2017 - [info] Wed Nov 15 10:25:54 2017 - [info] Server host_1 received relay logs up to: host_2.000010:806 Wed Nov 15 10:25:54 2017 - [info] Need to get diffs from the latest slave(host_3) up to: host_2.000010:1694 (using the latest slave's relay logs) Wed Nov 15 10:25:55 2017 - [info] Connecting to the latest slave host host_3, generating diff relay log files.. Wed Nov 15 10:25:55 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_1 --latest_mlf=host_2.000010 --latest_rmlp=1694 --target_mlf=host_2.000010 --target_rmlp=806 --server_id=1261261666 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171115102550 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000004 Wed Nov 15 10:25:55 2017 - [info] Relay log found at /data/mysql_data, up to host_3-relay-bin.000004 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000004, start_pos:1017. Concat binary/relay logs from host_3-relay-bin.000004 pos 1017 to host_3-relay-bin.000004 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog .. Dumping binlog format description event, from position 0 to 361.. ok. Dumping effective binlog data from /data/mysql_data/host_3-relay-bin.000004 position 1017 to tail(1905).. ok. Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog . scp host_3.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog to root@host_1(22) succeeded. Wed Nov 15 10:25:55 2017 - [info] Generating diff files succeeded. Wed Nov 15 10:25:55 2017 - [info] Sending binlog.. Wed Nov 15 10:25:56 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog to root@host_1:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog succeeded. Wed Nov 15 10:25:56 2017 - [info] Wed Nov 15 10:25:56 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Wed Nov 15 10:25:56 2017 - [info] Wed Nov 15 10:25:56 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Wed Nov 15 10:25:56 2017 - [info] Starting recovery on host_1(host_1:3306).. Wed Nov 15 10:25:56 2017 - [info] Generating diffs succeeded. Wed Nov 15 10:25:56 2017 - [info] Waiting until all relay logs are applied. Wed Nov 15 10:25:56 2017 - [info] done. Wed Nov 15 10:25:56 2017 - [info] Getting slave status.. Wed Nov 15 10:25:56 2017 - [info] This slave(host_1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000010:806). No need to recover from Exec_Master_Log_Pos. Wed Nov 15 10:25:56 2017 - [info] Connecting to the target slave host host_1, running recover script.. Wed Nov 15 10:25:56 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_1 --slave_ip=host_1 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171115102550 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Wed Nov 15 10:25:56 2017 - [info] Concat all apply files to /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171115102550.binlog .. Copying the first binlog file /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog to /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171115102550.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog.. dumped up to pos 150. ok. /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog has effective binlog events from pos 150. Dumping effective binlog data from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog position 150 to tail(169).. ok. Concat succeeded. All apply target binary logs are concatinated at /var/log/masterha/mha_test/total_binlog_for_host_1_3306.20171115102550.binlog . MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171115102550.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog on host_1:3306. This may take long time... Applying log files succeeded. Wed Nov 15 10:25:56 2017 - [info] All relay logs were successfully applied. Wed Nov 15 10:25:56 2017 - [info] Getting new master's binlog name and position.. Wed Nov 15 10:25:56 2017 - [info] host_1.000010:2060 Wed Nov 15 10:25:56 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_1', MASTER_PORT=3306, MASTER_LOG_FILE='host_1.000010', MASTER_LOG_POS=2060, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Wed Nov 15 10:25:56 2017 - [info] Executing master IP activate script: Wed Nov 15 10:25:56 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --new_master_host=host_1 --new_master_ip=host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_1 is added ============================== Wed Nov 15 10:25:59 2017 - [info] OK. Wed Nov 15 10:25:59 2017 - [info] Setting read_only=0 on host_1(host_1:3306).. Wed Nov 15 10:25:59 2017 - [info] ok. Wed Nov 15 10:25:59 2017 - [info] ** Finished master recovery successfully. Wed Nov 15 10:25:59 2017 - [info] * Phase 3: Master Recovery Phase completed. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] * Phase 4: Slaves Recovery Phase.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 125962. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171115102550.log if it takes time.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] Log messages from host_3 ... Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Wed Nov 15 10:25:59 2017 - [info] End of log messages from host_3. Wed Nov 15 10:25:59 2017 - [info] -- host_3(host_3:3306) has the latest relay log events. Wed Nov 15 10:25:59 2017 - [info] Generating relay diff files from the latest slave succeeded. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 125967. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171115102550.log if it takes time.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] Log messages from host_3 ... Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] Sending binlog.. Wed Nov 15 10:25:59 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog to root@host_3:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog succeeded. Wed Nov 15 10:25:59 2017 - [info] Starting recovery on host_3(host_3:3306).. Wed Nov 15 10:25:59 2017 - [info] Generating diffs succeeded. Wed Nov 15 10:25:59 2017 - [info] Waiting until all relay logs are applied. Wed Nov 15 10:25:59 2017 - [info] done. Wed Nov 15 10:25:59 2017 - [info] Getting slave status.. Wed Nov 15 10:25:59 2017 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000010:1694). No need to recover from Exec_Master_Log_Pos. Wed Nov 15 10:25:59 2017 - [info] Connecting to the target slave host host_3, running recover script.. Wed Nov 15 10:25:59 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171115102550 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Wed Nov 15 10:25:59 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171115102550.binlog on host_3:3306. This may take long time... Applying log files succeeded. Wed Nov 15 10:25:59 2017 - [info] All relay logs were successfully applied. Wed Nov 15 10:25:59 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_1(host_1:3306).. Wed Nov 15 10:25:59 2017 - [info] Executed CHANGE MASTER. Wed Nov 15 10:25:59 2017 - [info] Slave started. Wed Nov 15 10:25:59 2017 - [info] End of log messages from host_3. Wed Nov 15 10:25:59 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Wed Nov 15 10:25:59 2017 - [info] All new slave servers recovered successfully. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] * Phase 5: New master cleanup phase.. Wed Nov 15 10:25:59 2017 - [info] Wed Nov 15 10:25:59 2017 - [info] Resetting slave info on the new master.. Wed Nov 15 10:25:59 2017 - [info] host_1: Resetting slave info succeeded. Wed Nov 15 10:25:59 2017 - [info] Master failover to host_1(host_1:3306) completed successfully. Wed Nov 15 10:25:59 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_2(host_2:3306) to host_1(host_1:3306) succeeded Master host_2(host_2:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_2(host_2:3306) The latest slave host_3(host_3:3306) has all relay logs for recovery. Selected host_1(host_1:3306) as a new master. host_1(host_1:3306): OK: Applying all logs succeeded. host_1(host_1:3306): OK: Activated master IP address. host_3(host_3:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_1(host_1:3306) host_1(host_1:3306): Resetting slave info succeeded. Master failover to host_1(host_1:3306) completed successfully. Wed Nov 15 10:25:59 2017 - [info] Sending mail.. ### 切换后的结果 * new master 和 new etl 数据一致 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 164 | 1 | | 165 | 2 | | 166 | 3 | +-----+------+ 3 rows in set (0.00 sec) 1.3 slave(候选master)的日志是最新的,比etl要多 1.3.1 当master的部分日志还没传递两个slave,这时候master 上的MySQL挂了 ### 模拟现场,现场的3台DB binlog状态 * master host_2 dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_2.000004 | 4577 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 114 | 1 | | 115 | 2 | | 116 | 3 | | 117 | 4 | | 118 | 5 | | 119 | 6 | | 120 | 7 | | 121 | 8 | | 122 | 10 | | 123 | 11 | | 124 | 12 | | 125 | 13 | | 126 | 14 | | 127 | 15 | +-----+------+ 14 rows in set (0.00 sec) * slave (candidate master) host_1 Master_Log_File: host_2.000004 Exec_Master_Log_Pos: 3683 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 114 | 1 | | 115 | 2 | | 116 | 3 | | 117 | 4 | | 118 | 5 | | 119 | 6 | | 120 | 7 | | 121 | 8 | | 122 | 10 | | 123 | 11 | | 124 | 12 | +-----+------+ 11 rows in set (0.00 sec) * etl (other slave) host_3 Master_Log_File: host_2.000004 Exec_Master_Log_Pos: 2789 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 114 | 1 | | 115 | 2 | | 116 | 3 | | 117 | 4 | | 118 | 5 | | 119 | 6 | | 120 | 7 | | 121 | 8 | +-----+------+ 8 rows in set (0.00 sec) ### 切换日志 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host=host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Tue Nov 14 17:13:08 2017 - [info] MHA::MasterFailover version 0.56. Tue Nov 14 17:13:08 2017 - [info] Starting master failover. Tue Nov 14 17:13:08 2017 - [info] Tue Nov 14 17:13:08 2017 - [info] * Phase 1: Configuration Check Phase.. Tue Nov 14 17:13:08 2017 - [info] Tue Nov 14 17:13:08 2017 - [warning] SQL Thread is stopped(no error) on host_1(host_1:3306) Tue Nov 14 17:13:08 2017 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306) Tue Nov 14 17:13:08 2017 - [info] GTID failover mode = 0 Tue Nov 14 17:13:08 2017 - [info] Dead Servers: Tue Nov 14 17:13:08 2017 - [info] host_2(host_2:3306) Tue Nov 14 17:13:08 2017 - [info] Checking master reachability via MySQL(double check)... Tue Nov 14 17:13:08 2017 - [info] ok. Tue Nov 14 17:13:08 2017 - [info] Alive Servers: Tue Nov 14 17:13:08 2017 - [info] host_1(host_1:3306) Tue Nov 14 17:13:08 2017 - [info] host_3(host_3:3306) Tue Nov 14 17:13:08 2017 - [info] Alive Slaves: Tue Nov 14 17:13:08 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:08 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:08 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 14 17:13:08 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:08 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:08 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 14 17:13:08 2017 - [info] Starting SQL thread on host_1(host_1:3306) .. Tue Nov 14 17:13:08 2017 - [info] done. Tue Nov 14 17:13:08 2017 - [info] Starting SQL thread on host_3(host_3:3306) .. Tue Nov 14 17:13:08 2017 - [info] done. Tue Nov 14 17:13:08 2017 - [info] Starting Non-GTID based failover. Tue Nov 14 17:13:08 2017 - [info] Tue Nov 14 17:13:08 2017 - [info] ** Phase 1: Configuration Check Phase completed. Tue Nov 14 17:13:08 2017 - [info] Tue Nov 14 17:13:08 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Tue Nov 14 17:13:08 2017 - [info] Tue Nov 14 17:13:08 2017 - [info] HealthCheck: SSH to host_2 is reachable. Tue Nov 14 17:13:09 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Tue Nov 14 17:13:09 2017 - [info] Executing master IP deactivation script: Tue Nov 14 17:13:09 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --command=stopssh --ssh_user=root =================== swift vip : tgw_vip from host_2 is deleted ============================== --2017-11-14 17:13:09-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 7.19M=0s 2017-11-14 17:13:11 (7.19 MB/s) - 已写入标准输出 [38] Tue Nov 14 17:13:11 2017 - [info] done. Tue Nov 14 17:13:11 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Tue Nov 14 17:13:11 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Tue Nov 14 17:13:11 2017 - [info] Tue Nov 14 17:13:11 2017 - [info] * Phase 3: Master Recovery Phase.. Tue Nov 14 17:13:11 2017 - [info] Tue Nov 14 17:13:11 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Tue Nov 14 17:13:11 2017 - [info] Tue Nov 14 17:13:11 2017 - [info] The latest binary log file/position on all slaves is host_2.000004:3683 Tue Nov 14 17:13:11 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Tue Nov 14 17:13:11 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:11 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:11 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 14 17:13:11 2017 - [info] The oldest binary log file/position on all slaves is host_2.000004:2789 Tue Nov 14 17:13:11 2017 - [info] Oldest slaves: Tue Nov 14 17:13:11 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:11 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:11 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 14 17:13:11 2017 - [info] Tue Nov 14 17:13:11 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Tue Nov 14 17:13:11 2017 - [info] Tue Nov 14 17:13:11 2017 - [info] Fetching dead master's binary logs.. Tue Nov 14 17:13:11 2017 - [info] Executing command on the dead master host_2(host_2:3306): save_binary_logs --command=save --start_file=host_2.000004 --start_pos=3683 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/log/masterha/mha_test if not exists.. ok. Concat binary/relay logs from host_2.000004 pos 3683 to host_2.000004 EOF into /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog .. Dumping binlog format description event, from position 0 to 150.. ok. Dumping effective binlog data from /data/mysql.bin/host_2.000004 position 3683 to tail(4596).. ok. Concat succeeded. Tue Nov 14 17:13:12 2017 - [info] scp from root@host_2:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog to local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog succeeded. Tue Nov 14 17:13:12 2017 - [info] HealthCheck: SSH to host_1 is reachable. Tue Nov 14 17:13:12 2017 - [info] HealthCheck: SSH to host_3 is reachable. Tue Nov 14 17:13:12 2017 - [info] Tue Nov 14 17:13:12 2017 - [info] * Phase 3.3: Determining New Master Phase.. Tue Nov 14 17:13:12 2017 - [info] Tue Nov 14 17:13:12 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Tue Nov 14 17:13:12 2017 - [info] Checking whether host_1 has relay logs from the oldest position.. Tue Nov 14 17:13:12 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_2.000004 --latest_rmlp=3683 --target_mlf=host_2.000004 --target_rmlp=2789 --server_id=1261261646 --workdir=/var/log/masterha/mha_test --timestamp=20171114171308 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_1-relay-bin.000003 : Relay log found at /data/mysql_data, up to host_1-relay-bin.000003 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_1-relay-bin.000003, start_pos:269. Target relay log FOUND! Tue Nov 14 17:13:13 2017 - [info] OK. host_1 has all relay logs. Tue Nov 14 17:13:13 2017 - [info] Searching new master from slaves.. Tue Nov 14 17:13:13 2017 - [info] Candidate masters from the configuration file: Tue Nov 14 17:13:13 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:13 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:13 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 14 17:13:13 2017 - [info] Non-candidate masters: Tue Nov 14 17:13:13 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 14 17:13:13 2017 - [info] Replicating from host_2(host_2:3306) Tue Nov 14 17:13:13 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 14 17:13:13 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Tue Nov 14 17:13:13 2017 - [info] New master is host_1(host_1:3306) Tue Nov 14 17:13:13 2017 - [info] Starting master failover.. Tue Nov 14 17:13:13 2017 - [info] From: host_2(host_2:3306) (current master) +--host_1(host_1:3306) +--host_3(host_3:3306) To: host_1(host_1:3306) (new master) +--host_3(host_3:3306) Tue Nov 14 17:13:13 2017 - [info] Tue Nov 14 17:13:13 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Tue Nov 14 17:13:13 2017 - [info] Tue Nov 14 17:13:13 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Tue Nov 14 17:13:13 2017 - [info] Sending binlog.. Tue Nov 14 17:13:13 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog to root@host_1:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog succeeded. Tue Nov 14 17:13:13 2017 - [info] Tue Nov 14 17:13:13 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Tue Nov 14 17:13:13 2017 - [info] Tue Nov 14 17:13:13 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Tue Nov 14 17:13:13 2017 - [info] Starting recovery on host_1(host_1:3306).. Tue Nov 14 17:13:13 2017 - [info] Generating diffs succeeded. Tue Nov 14 17:13:13 2017 - [info] Waiting until all relay logs are applied. Tue Nov 14 17:13:13 2017 - [info] done. Tue Nov 14 17:13:13 2017 - [info] Getting slave status.. Tue Nov 14 17:13:13 2017 - [info] This slave(host_1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000004:3683). No need to recover from Exec_Master_Log_Pos. Tue Nov 14 17:13:13 2017 - [info] Connecting to the target slave host host_1, running recover script.. Tue Nov 14 17:13:13 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_1 --slave_ip=host_1 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171114171308 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Tue Nov 14 17:13:13 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog on host_1:3306. This may take long time... Applying log files succeeded. Tue Nov 14 17:13:13 2017 - [info] All relay logs were successfully applied. Tue Nov 14 17:13:13 2017 - [info] Getting new master's binlog name and position.. Tue Nov 14 17:13:13 2017 - [info] host_1.000003:2347 Tue Nov 14 17:13:13 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_1', MASTER_PORT=3306, MASTER_LOG_FILE='host_1.000003', MASTER_LOG_POS=2347, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Nov 14 17:13:13 2017 - [info] Executing master IP activate script: Tue Nov 14 17:13:13 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --new_master_host=host_1 --new_master_ip=host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_1 is added ============================== Tue Nov 14 17:13:16 2017 - [info] OK. Tue Nov 14 17:13:16 2017 - [info] Setting read_only=0 on host_1(host_1:3306).. Tue Nov 14 17:13:16 2017 - [info] ok. Tue Nov 14 17:13:16 2017 - [info] ** Finished master recovery successfully. Tue Nov 14 17:13:16 2017 - [info] * Phase 3: Master Recovery Phase completed. Tue Nov 14 17:13:16 2017 - [info] Tue Nov 14 17:13:16 2017 - [info] * Phase 4: Slaves Recovery Phase.. Tue Nov 14 17:13:16 2017 - [info] Tue Nov 14 17:13:16 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Tue Nov 14 17:13:16 2017 - [info] Tue Nov 14 17:13:16 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 29885. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171114171308.log if it takes time.. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] Log messages from host_3 ... Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:16 2017 - [info] Server host_3 received relay logs up to: host_2.000004:2789 Tue Nov 14 17:13:16 2017 - [info] Need to get diffs from the latest slave(host_1) up to: host_2.000004:3683 (using the latest slave's relay logs) Tue Nov 14 17:13:16 2017 - [info] Connecting to the latest slave host host_1, generating diff relay log files.. Tue Nov 14 17:13:16 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_3 --latest_mlf=host_2.000004 --latest_rmlp=3683 --target_mlf=host_2.000004 --target_rmlp=2789 --server_id=1261261646 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171114171308 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_1-relay-bin.000003 Tue Nov 14 17:13:17 2017 - [info] Relay log found at /data/mysql_data, up to host_1-relay-bin.000003 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_1-relay-bin.000003, start_pos:269. Concat binary/relay logs from host_1-relay-bin.000003 pos 269 to host_1-relay-bin.000003 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog .. Dumping binlog format description event, from position 0 to 269.. ok. Dumping effective binlog data from /data/mysql_data/host_1-relay-bin.000003 position 269 to tail(1163).. ok. Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog . scp host_1.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog to root@host_3(22) succeeded. Tue Nov 14 17:13:17 2017 - [info] Generating diff files succeeded. Tue Nov 14 17:13:17 2017 - [info] End of log messages from host_3. Tue Nov 14 17:13:17 2017 - [info] -- Slave diff log generation on host host_3(host_3:3306) succeeded. Tue Nov 14 17:13:17 2017 - [info] Generating relay diff files from the latest slave succeeded. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 31393. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171114171308.log if it takes time.. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] Log messages from host_3 ... Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] Sending binlog.. Tue Nov 14 17:13:17 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog to root@host_3:/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog succeeded. Tue Nov 14 17:13:17 2017 - [info] Starting recovery on host_3(host_3:3306).. Tue Nov 14 17:13:17 2017 - [info] Generating diffs succeeded. Tue Nov 14 17:13:17 2017 - [info] Waiting until all relay logs are applied. Tue Nov 14 17:13:17 2017 - [info] done. Tue Nov 14 17:13:17 2017 - [info] Getting slave status.. Tue Nov 14 17:13:17 2017 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000004:2789). No need to recover from Exec_Master_Log_Pos. Tue Nov 14 17:13:17 2017 - [info] Connecting to the target slave host host_3, running recover script.. Tue Nov 14 17:13:17 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171114171308 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Tue Nov 14 17:13:17 2017 - [info] Concat all apply files to /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171114171308.binlog .. Copying the first binlog file /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog to /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171114171308.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog.. dumped up to pos 150. ok. /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog has effective binlog events from pos 150. Dumping effective binlog data from /var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog position 150 to tail(1063).. ok. Concat succeeded. All apply target binary logs are concatinated at /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171114171308.binlog . MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171114171308.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_2_3306_20171114171308.binlog on host_3:3306. This may take long time... Applying log files succeeded. Tue Nov 14 17:13:17 2017 - [info] All relay logs were successfully applied. Tue Nov 14 17:13:17 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_1(host_1:3306).. Tue Nov 14 17:13:17 2017 - [info] Executed CHANGE MASTER. Tue Nov 14 17:13:17 2017 - [info] Slave started. Tue Nov 14 17:13:17 2017 - [info] End of log messages from host_3. Tue Nov 14 17:13:17 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Tue Nov 14 17:13:17 2017 - [info] All new slave servers recovered successfully. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] * Phase 5: New master cleanup phase.. Tue Nov 14 17:13:17 2017 - [info] Tue Nov 14 17:13:17 2017 - [info] Resetting slave info on the new master.. Tue Nov 14 17:13:17 2017 - [info] host_1: Resetting slave info succeeded. Tue Nov 14 17:13:17 2017 - [info] Master failover to host_1(host_1:3306) completed successfully. Tue Nov 14 17:13:17 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_2(host_2:3306) to host_1(host_1:3306) succeeded Master host_2(host_2:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_2(host_2:3306) The latest slave host_1(host_1:3306) has all relay logs for recovery. Selected host_1(host_1:3306) as a new master. host_1(host_1:3306): OK: Applying all logs succeeded. host_1(host_1:3306): OK: Activated master IP address. host_3(host_3:3306): Generating differential relay logs up to host_1(host_1:3306)succeeded. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_1(host_1:3306) host_1(host_1:3306): Resetting slave info succeeded. Master failover to host_1(host_1:3306) completed successfully. Tue Nov 14 17:13:17 2017 - [info] Sending mail.. ## 切换结果 new master 和 new etl dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 114 | 1 | | 115 | 2 | | 116 | 3 | | 117 | 4 | | 118 | 5 | | 119 | 6 | | 120 | 7 | | 121 | 8 | | 122 | 10 | | 123 | 11 | | 124 | 12 | | 125 | 13 | | 126 | 14 | | 127 | 15 | +-----+------+ 14 rows in set (0.00 sec) 1.3.2 当master的所有日志已经传递slave,这时候master 上的MySQL挂了 ### 模拟现场,现场的3台DB binlog状态 * master host_1 dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_1.000010 | 3341 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 167 | 1 | | 168 | 2 | | 169 | 3 | +-----+------+ 3 rows in set (0.00 sec) * slave host_2 Master_Log_File: host_1.000010 Exec_Master_Log_Pos: 3341 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 167 | 1 | | 168 | 2 | | 169 | 3 | +-----+------+ 3 rows in set (0.00 sec) * etl host_3 Master_Log_File: host_1.000010 Exec_Master_Log_Pos: 2381 ### 切换日志 Wed Nov 15 10:39:36 2017 - [info] MHA::MasterFailover version 0.56. Wed Nov 15 10:39:36 2017 - [info] Starting master failover. Wed Nov 15 10:39:36 2017 - [info] Wed Nov 15 10:39:36 2017 - [info] * Phase 1: Configuration Check Phase.. Wed Nov 15 10:39:36 2017 - [info] Wed Nov 15 10:39:36 2017 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306) Wed Nov 15 10:39:36 2017 - [info] GTID failover mode = 0 Wed Nov 15 10:39:36 2017 - [info] Dead Servers: Wed Nov 15 10:39:36 2017 - [info] host_1(host_1:3306) Wed Nov 15 10:39:36 2017 - [info] Checking master reachability via MySQL(double check)... Wed Nov 15 10:39:36 2017 - [info] ok. Wed Nov 15 10:39:36 2017 - [info] Alive Servers: Wed Nov 15 10:39:36 2017 - [info] host_2(host_2:3306) Wed Nov 15 10:39:36 2017 - [info] host_3(host_3:3306) Wed Nov 15 10:39:36 2017 - [info] Alive Slaves: Wed Nov 15 10:39:36 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:36 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:36 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:39:36 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:36 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:36 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:39:36 2017 - [info] Starting SQL thread on host_3(host_3:3306) .. Wed Nov 15 10:39:36 2017 - [info] done. Wed Nov 15 10:39:36 2017 - [info] Starting Non-GTID based failover. Wed Nov 15 10:39:36 2017 - [info] Wed Nov 15 10:39:36 2017 - [info] ** Phase 1: Configuration Check Phase completed. Wed Nov 15 10:39:36 2017 - [info] Wed Nov 15 10:39:36 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Wed Nov 15 10:39:36 2017 - [info] Wed Nov 15 10:39:36 2017 - [info] HealthCheck: SSH to host_1 is reachable. Wed Nov 15 10:39:37 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Wed Nov 15 10:39:37 2017 - [info] Executing master IP deactivation script: Wed Nov 15 10:39:37 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root =================== swift vip : tgw_vip from host_1 is deleted ============================== --2017-11-15 10:39:37-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 11.4M=0s 2017-11-15 10:39:39 (11.4 MB/s) - 已写入标准输出 [38] Wed Nov 15 10:39:39 2017 - [info] done. Wed Nov 15 10:39:39 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Wed Nov 15 10:39:39 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Wed Nov 15 10:39:39 2017 - [info] Wed Nov 15 10:39:39 2017 - [info] * Phase 3: Master Recovery Phase.. Wed Nov 15 10:39:39 2017 - [info] Wed Nov 15 10:39:39 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Wed Nov 15 10:39:39 2017 - [info] Wed Nov 15 10:39:39 2017 - [info] The latest binary log file/position on all slaves is host_1.000010:3341 Wed Nov 15 10:39:39 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Wed Nov 15 10:39:39 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:39 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:39 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:39:39 2017 - [info] The oldest binary log file/position on all slaves is host_1.000010:2381 Wed Nov 15 10:39:39 2017 - [info] Oldest slaves: Wed Nov 15 10:39:39 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:39 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:39 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:39:39 2017 - [info] Wed Nov 15 10:39:39 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Wed Nov 15 10:39:39 2017 - [info] Wed Nov 15 10:39:39 2017 - [info] Fetching dead master's binary logs.. Wed Nov 15 10:39:39 2017 - [info] Executing command on the dead master host_1(host_1:3306): save_binary_logs --command=save --start_file=host_1.000010 --start_pos=3341 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/log/masterha/mha_test if not exists.. ok. Concat binary/relay logs from host_1.000010 pos 3341 to host_1.000010 EOF into /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 154.. ok. Dumping effective binlog data from /data/mysql.bin/host_1.000010 position 3341 to tail(3364).. ok. Binlog Checksum enabled Concat succeeded. Wed Nov 15 10:39:39 2017 - [info] scp from root@host_1:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog to local:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog succeeded. Wed Nov 15 10:39:40 2017 - [info] HealthCheck: SSH to host_2 is reachable. Wed Nov 15 10:39:40 2017 - [info] HealthCheck: SSH to host_3 is reachable. Wed Nov 15 10:39:40 2017 - [info] Wed Nov 15 10:39:40 2017 - [info] * Phase 3.3: Determining New Master Phase.. Wed Nov 15 10:39:40 2017 - [info] Wed Nov 15 10:39:40 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Wed Nov 15 10:39:40 2017 - [info] Checking whether host_2 has relay logs from the oldest position.. Wed Nov 15 10:39:40 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_1.000010 --latest_rmlp=3341 --target_mlf=host_1.000010 --target_rmlp=2381 --server_id=1261261656 --workdir=/var/log/masterha/mha_test --timestamp=20171115103936 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2-relay-bin.000002 : Relay log found at /data/mysql_data, up to host_2-relay-bin.000002 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_2-relay-bin.000002, start_pos:636. Target relay log FOUND! Wed Nov 15 10:39:40 2017 - [info] OK. host_2 has all relay logs. Wed Nov 15 10:39:40 2017 - [info] Searching new master from slaves.. Wed Nov 15 10:39:40 2017 - [info] Candidate masters from the configuration file: Wed Nov 15 10:39:40 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:40 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:40 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Nov 15 10:39:40 2017 - [info] Non-candidate masters: Wed Nov 15 10:39:40 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Wed Nov 15 10:39:40 2017 - [info] Replicating from host_1(host_1:3306) Wed Nov 15 10:39:40 2017 - [info] Not candidate for the new Master (no_master is set) Wed Nov 15 10:39:40 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Wed Nov 15 10:39:40 2017 - [info] New master is host_2(host_2:3306) Wed Nov 15 10:39:40 2017 - [info] Starting master failover.. Wed Nov 15 10:39:40 2017 - [info] From: host_1(host_1:3306) (current master) +--host_2(host_2:3306) +--host_3(host_3:3306) To: host_2(host_2:3306) (new master) +--host_3(host_3:3306) Wed Nov 15 10:39:40 2017 - [info] Wed Nov 15 10:39:40 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Wed Nov 15 10:39:40 2017 - [info] Wed Nov 15 10:39:40 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Wed Nov 15 10:39:40 2017 - [info] Sending binlog.. Wed Nov 15 10:39:41 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog to root@host_2:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog succeeded. Wed Nov 15 10:39:41 2017 - [info] Wed Nov 15 10:39:41 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Wed Nov 15 10:39:41 2017 - [info] Wed Nov 15 10:39:41 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Wed Nov 15 10:39:41 2017 - [info] Starting recovery on host_2(host_2:3306).. Wed Nov 15 10:39:41 2017 - [info] Generating diffs succeeded. Wed Nov 15 10:39:41 2017 - [info] Waiting until all relay logs are applied. Wed Nov 15 10:39:41 2017 - [info] done. Wed Nov 15 10:39:41 2017 - [info] Getting slave status.. Wed Nov 15 10:39:41 2017 - [info] This slave(host_2)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_1.000010:3341). No need to recover from Exec_Master_Log_Pos. Wed Nov 15 10:39:41 2017 - [info] Connecting to the target slave host host_2, running recover script.. Wed Nov 15 10:39:41 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_2 --slave_ip=host_2 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171115103936 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Wed Nov 15 10:39:41 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog on host_2:3306. This may take long time... Applying log files succeeded. Wed Nov 15 10:39:41 2017 - [info] All relay logs were successfully applied. Wed Nov 15 10:39:41 2017 - [info] Getting new master's binlog name and position.. Wed Nov 15 10:39:41 2017 - [info] host_2.000011:1307 Wed Nov 15 10:39:41 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2.000011', MASTER_LOG_POS=1307, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Wed Nov 15 10:39:41 2017 - [info] Executing master IP activate script: Wed Nov 15 10:39:41 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_2 is added ============================== Wed Nov 15 10:39:44 2017 - [info] OK. Wed Nov 15 10:39:44 2017 - [info] Setting read_only=0 on host_2(host_2:3306).. Wed Nov 15 10:39:44 2017 - [info] ok. Wed Nov 15 10:39:44 2017 - [info] ** Finished master recovery successfully. Wed Nov 15 10:39:44 2017 - [info] * Phase 3: Master Recovery Phase completed. Wed Nov 15 10:39:44 2017 - [info] Wed Nov 15 10:39:44 2017 - [info] * Phase 4: Slaves Recovery Phase.. Wed Nov 15 10:39:44 2017 - [info] Wed Nov 15 10:39:44 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Wed Nov 15 10:39:44 2017 - [info] Wed Nov 15 10:39:44 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 11760. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171115103936.log if it takes time.. Wed Nov 15 10:39:45 2017 - [info] Wed Nov 15 10:39:45 2017 - [info] Log messages from host_3 ... Wed Nov 15 10:39:45 2017 - [info] Wed Nov 15 10:39:44 2017 - [info] Server host_3 received relay logs up to: host_1.000010:2381 Wed Nov 15 10:39:44 2017 - [info] Need to get diffs from the latest slave(host_2) up to: host_1.000010:3341 (using the latest slave's relay logs) Wed Nov 15 10:39:44 2017 - [info] Connecting to the latest slave host host_2, generating diff relay log files.. Wed Nov 15 10:39:44 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_3 --latest_mlf=host_1.000010 --latest_rmlp=3341 --target_mlf=host_1.000010 --target_rmlp=2381 --server_id=1261261656 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171115103936 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2-relay-bin.000002 Wed Nov 15 10:39:45 2017 - [info] Relay log found at /data/mysql_data, up to host_2-relay-bin.000002 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_2-relay-bin.000002, start_pos:636. Concat binary/relay logs from host_2-relay-bin.000002 pos 636 to host_2-relay-bin.000002 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 315.. ok. Dumping effective binlog data from /data/mysql_data/host_2-relay-bin.000002 position 636 to tail(1596).. ok. Binlog Checksum enabled Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog . scp host_2.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog to root@host_3(22) succeeded. Wed Nov 15 10:39:45 2017 - [info] Generating diff files succeeded. Wed Nov 15 10:39:45 2017 - [info] End of log messages from host_3. Wed Nov 15 10:39:45 2017 - [info] -- Slave diff log generation on host host_3(host_3:3306) succeeded. Wed Nov 15 10:39:45 2017 - [info] Generating relay diff files from the latest slave succeeded. Wed Nov 15 10:39:45 2017 - [info] Wed Nov 15 10:39:45 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Wed Nov 15 10:39:45 2017 - [info] Wed Nov 15 10:39:45 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 12881. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171115103936.log if it takes time.. Wed Nov 15 10:40:45 2017 - [info] Wed Nov 15 10:40:45 2017 - [info] Log messages from host_3 ... Wed Nov 15 10:40:45 2017 - [info] Wed Nov 15 10:39:45 2017 - [info] Sending binlog.. Wed Nov 15 10:39:45 2017 - [info] scp from local:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog to root@host_3:/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog succeeded. Wed Nov 15 10:39:45 2017 - [info] Starting recovery on host_3(host_3:3306).. Wed Nov 15 10:39:45 2017 - [info] Generating diffs succeeded. Wed Nov 15 10:39:45 2017 - [info] Waiting until all relay logs are applied. Wed Nov 15 10:39:45 2017 - [info] done. Wed Nov 15 10:39:45 2017 - [info] Getting slave status.. Wed Nov 15 10:39:45 2017 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_1.000010:2381). No need to recover from Exec_Master_Log_Pos. Wed Nov 15 10:39:45 2017 - [info] Connecting to the target slave host host_3, running recover script.. Wed Nov 15 10:39:45 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171115103936 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Wed Nov 15 10:40:45 2017 - [info] Concat all apply files to /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171115103936.binlog .. Copying the first binlog file /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog to /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171115103936.binlog.. ok. Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog.. Binlog Checksum enabled dumped up to pos 154. ok. /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog has effective binlog events from pos 154. Dumping effective binlog data from /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog position 154 to tail(177).. ok. Concat succeeded. All apply target binary logs are concatinated at /var/log/masterha/mha_test/total_binlog_for_host_3_3306.20171115103936.binlog . MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171115103936.binlog,/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171115103936.binlog on host_3:3306. This may take long time... Applying log files succeeded. Wed Nov 15 10:40:45 2017 - [info] All relay logs were successfully applied. Wed Nov 15 10:40:45 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306).. Wed Nov 15 10:40:45 2017 - [info] Executed CHANGE MASTER. Wed Nov 15 10:40:45 2017 - [info] Slave started. Wed Nov 15 10:40:45 2017 - [info] End of log messages from host_3. Wed Nov 15 10:40:45 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Wed Nov 15 10:40:45 2017 - [info] All new slave servers recovered successfully. Wed Nov 15 10:40:45 2017 - [info] Wed Nov 15 10:40:45 2017 - [info] * Phase 5: New master cleanup phase.. Wed Nov 15 10:40:45 2017 - [info] Wed Nov 15 10:40:45 2017 - [info] Resetting slave info on the new master.. Wed Nov 15 10:40:45 2017 - [info] host_2: Resetting slave info succeeded. Wed Nov 15 10:40:45 2017 - [info] Master failover to host_2(host_2:3306) completed successfully. Wed Nov 15 10:40:45 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded Master host_1(host_1:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_1(host_1:3306) The latest slave host_2(host_2:3306) has all relay logs for recovery. Selected host_2(host_2:3306) as a new master. host_2(host_2:3306): OK: Applying all logs succeeded. host_2(host_2:3306): OK: Activated master IP address. host_3(host_3:3306): Generating differential relay logs up to host_2(host_2:3306)succeeded. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306) host_2(host_2:3306): Resetting slave info succeeded. Master failover to host_2(host_2:3306) completed successfully. Wed Nov 15 10:40:45 2017 - [info] Sending mail.. ### 切换结果 new master & new etl 数据一致 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 167 | 1 | | 168 | 2 | | 169 | 3 | 1.4 slave(候选master)上面有大事务在跑 1000s的大查询 无影响,正常切换 1.5 如果MHA过程中失败,是否可以重新执行MHA的failover呢? 90%的场景都是可以重新执行的 5%的场景不能执行 * NO。1 问题: dead master上面的vip已经delete,再次failover 这一步会报错 方案: a)将dead master上的vip再重新add上去 或者 b)直接过滤掉 delete dead master vip这一步 5%的场景不能再次执行,执行会报错 一般这种场景就是:已经failover到最后的change master阶段,这样主从结构已经变更( candidate slave <==rep== etl ),MHA无法重新走一遍。好在,change master的语句已经生成,其余slave再次执行一遍日志里面的change master即可 Thu Nov 9 16:49:39 2017 - [info] GTID failover mode = 0 Thu Nov 9 16:49:39 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln169] Detected dead master host_1(host_1:3306) does not match with specified dead master host_2(host_2:3306)! Thu Nov 9 16:49:39 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53 1.7 Master:MySQL down小结 1. failover最终命令 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host=$dead_master_ip --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error 二、Master : Server down 2.1 etl 延迟8小时 同1.1 结论 2.2 slave(候选master)比etl还要落后更多 2.2.1 当master的部分日志还没传递两个slave,这时候master server挂了 ### 3台DB的binlog状态 * master dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_2.000012 | 1651 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) dba:lc> dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 172 | 1 | | 173 | 2 | | 174 | 3 | | 175 | 4 | +-----+------+ 4 rows in set (0.00 sec) * slave Master_Log_File: host_2.000012 Exec_Master_Log_Pos: 467 dba:lc> select * from t_char_2; Empty set (0.00 sec) * etl Master_Log_File: host_2.000012 Exec_Master_Log_Pos: 1059 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 172 | 1 | | 173 | 2 | +-----+------+ 2 rows in set (0.00 sec) ### 模拟故障场景 * 隔离master的网络,让其等同于down机 master> iptables -A INPUT -p tcp -s other_ip --dport 22 -j ACCEPT master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP ### 切换日志 Thu Nov 16 16:54:40 2017 - [info] MHA::MasterFailover version 0.56. Thu Nov 16 16:54:40 2017 - [info] Starting master failover. Thu Nov 16 16:54:40 2017 - [info] Thu Nov 16 16:54:40 2017 - [info] * Phase 1: Configuration Check Phase.. Thu Nov 16 16:54:40 2017 - [info] Thu Nov 16 16:54:40 2017 - [warning] SQL Thread is stopped(no error) on host_1(host_1:3306) Thu Nov 16 16:54:40 2017 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306) Thu Nov 16 16:54:40 2017 - [info] GTID failover mode = 0 Thu Nov 16 16:54:40 2017 - [info] Dead Servers: Thu Nov 16 16:54:40 2017 - [info] host_2(host_2:3306) Thu Nov 16 16:54:40 2017 - [info] Checking master reachability via MySQL(double check)... Thu Nov 16 16:54:41 2017 - [info] ok. Thu Nov 16 16:54:41 2017 - [info] Alive Servers: Thu Nov 16 16:54:41 2017 - [info] host_1(host_1:3306) Thu Nov 16 16:54:41 2017 - [info] host_3(host_3:3306) Thu Nov 16 16:54:41 2017 - [info] Alive Slaves: Thu Nov 16 16:54:41 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:54:41 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:54:41 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 16:54:41 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:54:41 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:54:41 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 16:54:41 2017 - [info] Starting SQL thread on host_1(host_1:3306) .. Thu Nov 16 16:54:41 2017 - [info] done. Thu Nov 16 16:54:41 2017 - [info] Starting SQL thread on host_3(host_3:3306) .. Thu Nov 16 16:54:41 2017 - [info] done. Thu Nov 16 16:54:41 2017 - [info] Starting Non-GTID based failover. Thu Nov 16 16:54:41 2017 - [info] Thu Nov 16 16:54:41 2017 - [info] ** Phase 1: Configuration Check Phase completed. Thu Nov 16 16:54:41 2017 - [info] Thu Nov 16 16:54:41 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Thu Nov 16 16:54:41 2017 - [info] Thu Nov 16 16:55:31 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_2! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. Thu Nov 16 16:55:31 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Thu Nov 16 16:55:31 2017 - [info] Executing master IP deactivation script: Thu Nov 16 16:55:31 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --command=stop ssh: connect to host host_2 port 22: Connection timed out =================== swift vip : tgw_vip from host_2 is deleted ============================== --2017-11-16 16:55:38-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 12.1M=0s 2017-11-16 16:57:36 (12.1 MB/s) - 已写入标准输出 [38] Thu Nov 16 16:57:36 2017 - [info] done. Thu Nov 16 16:57:36 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Thu Nov 16 16:57:36 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] * Phase 3: Master Recovery Phase.. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] The latest binary log file/position on all slaves is host_2.000012:1059 Thu Nov 16 16:57:36 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Thu Nov 16 16:57:36 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:57:36 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:57:36 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 16:57:36 2017 - [info] The oldest binary log file/position on all slaves is host_2.000012:467 Thu Nov 16 16:57:36 2017 - [info] Oldest slaves: Thu Nov 16 16:57:36 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:57:36 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:57:36 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] * Phase 3.3: Determining New Master Phase.. Thu Nov 16 16:57:36 2017 - [info] Thu Nov 16 16:57:36 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Thu Nov 16 16:57:37 2017 - [info] HealthCheck: SSH to host_3 is reachable. Thu Nov 16 16:57:37 2017 - [info] Checking whether host_3 has relay logs from the oldest position.. Thu Nov 16 16:57:37 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_2.000012 --latest_rmlp=1059 --target_mlf=host_2.000012 --target_rmlp=467 --server_id=1261261666 --workdir=/var/log/masterha/mha_test --timestamp=20171116165440 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000004 : Relay log found at /data/mysql_data, up to host_3-relay-bin.000004 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000004, start_pos:678. Target relay log FOUND! Thu Nov 16 16:57:37 2017 - [info] OK. host_3 has all relay logs. Thu Nov 16 16:57:37 2017 - [info] HealthCheck: SSH to host_1 is reachable. Thu Nov 16 16:57:37 2017 - [info] Searching new master from slaves.. Thu Nov 16 16:57:37 2017 - [info] Candidate masters from the configuration file: Thu Nov 16 16:57:37 2017 - [info] host_1(host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:57:37 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:57:37 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 16:57:37 2017 - [info] Non-candidate masters: Thu Nov 16 16:57:37 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 16:57:37 2017 - [info] Replicating from host_2(host_2:3306) Thu Nov 16 16:57:37 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 16:57:37 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Thu Nov 16 16:57:37 2017 - [info] Not found. Thu Nov 16 16:57:37 2017 - [info] Searching from all candidate_master slaves.. Thu Nov 16 16:57:37 2017 - [info] New master is host_1(host_1:3306) Thu Nov 16 16:57:37 2017 - [info] Starting master failover.. Thu Nov 16 16:57:37 2017 - [info] From: host_2(host_2:3306) (current master) +--host_1(host_1:3306) +--host_3(host_3:3306) To: host_1(host_1:3306) (new master) +--host_3(host_3:3306) Thu Nov 16 16:57:37 2017 - [info] Thu Nov 16 16:57:37 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Thu Nov 16 16:57:37 2017 - [info] Thu Nov 16 16:57:37 2017 - [info] Server host_1 received relay logs up to: host_2.000012:467 Thu Nov 16 16:57:37 2017 - [info] Need to get diffs from the latest slave(host_3) up to: host_2.000012:1059 (using the latest slave's relay logs) Thu Nov 16 16:57:38 2017 - [info] Connecting to the latest slave host host_3, generating diff relay log files.. Thu Nov 16 16:57:38 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_1 --latest_mlf=host_2.000012 --latest_rmlp=1059 --target_mlf=host_2.000012 --target_rmlp=467 --server_id=1261261666 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171116165440 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_3-relay-bin.000004 Thu Nov 16 16:57:38 2017 - [info] Relay log found at /data/mysql_data, up to host_3-relay-bin.000004 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_3-relay-bin.000004, start_pos:678. Concat binary/relay logs from host_3-relay-bin.000004 pos 678 to host_3-relay-bin.000004 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog .. Dumping binlog format description event, from position 0 to 361.. ok. Dumping effective binlog data from /data/mysql_data/host_3-relay-bin.000004 position 678 to tail(1270).. ok. Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog . scp host_3.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog to root@host_1(22) succeeded. Thu Nov 16 16:57:38 2017 - [info] Generating diff files succeeded. Thu Nov 16 16:57:38 2017 - [info] Thu Nov 16 16:57:38 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Thu Nov 16 16:57:38 2017 - [info] Thu Nov 16 16:57:38 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Thu Nov 16 16:57:38 2017 - [info] Starting recovery on host_1(host_1:3306).. Thu Nov 16 16:57:38 2017 - [info] Generating diffs succeeded. Thu Nov 16 16:57:38 2017 - [info] Waiting until all relay logs are applied. Thu Nov 16 16:57:38 2017 - [info] done. Thu Nov 16 16:57:38 2017 - [info] Getting slave status.. Thu Nov 16 16:57:38 2017 - [info] This slave(host_1)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_2.000012:467). No need to recover from Exec_Master_Log_Pos. Thu Nov 16 16:57:38 2017 - [info] Connecting to the target slave host host_1, running recover script.. Thu Nov 16 16:57:38 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_1 --slave_ip=host_1 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171116165440 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Thu Nov 16 16:57:39 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_1_3306_20171116165440.binlog on host_1:3306. This may take long time... Applying log files succeeded. Thu Nov 16 16:57:39 2017 - [info] All relay logs were successfully applied. Thu Nov 16 16:57:39 2017 - [info] Getting new master's binlog name and position.. Thu Nov 16 16:57:39 2017 - [info] host_1.000012:1310 Thu Nov 16 16:57:39 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_1', MASTER_PORT=3306, MASTER_LOG_FILE='host_1.000012', MASTER_LOG_POS=1310, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Thu Nov 16 16:57:39 2017 - [info] Executing master IP activate script: Thu Nov 16 16:57:39 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_2 --orig_master_ip=host_2 --orig_master_port=3306 --new_master_host=host_1 --new_master_ip=host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_1 is added ============================== Thu Nov 16 16:57:41 2017 - [info] OK. Thu Nov 16 16:57:41 2017 - [info] Setting read_only=0 on host_1(host_1:3306).. Thu Nov 16 16:57:41 2017 - [info] ok. Thu Nov 16 16:57:41 2017 - [info] ** Finished master recovery successfully. Thu Nov 16 16:57:41 2017 - [info] * Phase 3: Master Recovery Phase completed. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] * Phase 4: Slaves Recovery Phase.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 123011. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171116165440.log if it takes time.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] Log messages from host_3 ... Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Thu Nov 16 16:57:41 2017 - [info] End of log messages from host_3. Thu Nov 16 16:57:41 2017 - [info] -- host_3(host_3:3306) has the latest relay log events. Thu Nov 16 16:57:41 2017 - [info] Generating relay diff files from the latest slave succeeded. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 123044. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171116165440.log if it takes time.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] Log messages from host_3 ... Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] Starting recovery on host_3(host_3:3306).. Thu Nov 16 16:57:41 2017 - [info] This server has all relay logs. Waiting all logs to be applied.. Thu Nov 16 16:57:41 2017 - [info] done. Thu Nov 16 16:57:41 2017 - [info] All relay logs were successfully applied. Thu Nov 16 16:57:41 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_1(host_1:3306).. Thu Nov 16 16:57:41 2017 - [info] Executed CHANGE MASTER. Thu Nov 16 16:57:41 2017 - [info] Slave started. Thu Nov 16 16:57:41 2017 - [info] End of log messages from host_3. Thu Nov 16 16:57:41 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Thu Nov 16 16:57:41 2017 - [info] All new slave servers recovered successfully. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] * Phase 5: New master cleanup phase.. Thu Nov 16 16:57:41 2017 - [info] Thu Nov 16 16:57:41 2017 - [info] Resetting slave info on the new master.. Thu Nov 16 16:57:41 2017 - [info] host_1: Resetting slave info succeeded. Thu Nov 16 16:57:41 2017 - [info] Master failover to host_1(host_1:3306) completed successfully. Thu Nov 16 16:57:41 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_2(host_2:3306) to host_1(host_1:3306) succeeded Master host_2(host_2:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_2(host_2:3306) The latest slave host_3(host_3:3306) has all relay logs for recovery. Selected host_1(host_1:3306) as a new master. host_1(host_1:3306): OK: Applying all logs succeeded. host_1(host_1:3306): OK: Activated master IP address. host_3(host_3:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_1(host_1:3306) host_1(host_1:3306): Resetting slave info succeeded. Master failover to host_1(host_1:3306) completed successfully. ### 切换后的结果 new_master & new_etl 结果一致,由于master上面未传过来的binlog彻底丢失,所以相应的新集群也缺失这些数据。 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 172 | 1 | | 173 | 2 | +-----+------+ 2 rows in set (0.00 sec) ### 最后一步很重要 如果dead master之后又活过来了,那么这一步要做 dead_master> /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c http://gitlab.corp.anjuke.com/_dba/architecture/blob/master/personal/Keithlan/other/share/tools/always_used_command.md ==》 tgw章节详细描述 2.2.2 当master的所有日志已经传递到1个etl,这时候master server挂了 测试省略,和2.2.1基本一样 结论:由于master上的所有日志全部传递到etl,所以最后是不会丢失master上任何数据的。 2.3 slave(候选master)的日志是最新的,比etl要多 2.3.1 当master的部分日志还没传递两个slave,这时候master server挂了 ### 3台DB的binlog状态 * master dba:lc> show master status; +---------------------+----------+--------------+------------------+-------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+-------------------+ | host_1.000012 | 3860 | | | | +---------------------+----------+--------------+------------------+-------------------+ 1 row in set (0.00 sec) dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 176 | 1 | | 177 | 2 | | 178 | 10 | | 179 | 20 | +-----+------+ 4 rows in set (0.00 sec) * slave Master_Log_File: host_1.000012 Exec_Master_Log_Pos: 3216 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 176 | 1 | | 177 | 2 | +-----+------+ 2 rows in set (0.00 sec) * etl Master_Log_File: host_1.000012 Exec_Master_Log_Pos: 2576 dba:lc> select * from t_char_2; Empty set (0.00 sec) ### 模拟故障场景 * 隔离master的网络,让其等同于down机 master> iptables -A INPUT -p tcp -s other_ip --dport 22 -j ACCEPT master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP ### 切换日志 Thu Nov 16 17:17:59 2017 - [info] MHA::MasterFailover version 0.56. Thu Nov 16 17:17:59 2017 - [info] Starting master failover. Thu Nov 16 17:17:59 2017 - [info] Thu Nov 16 17:17:59 2017 - [info] * Phase 1: Configuration Check Phase.. Thu Nov 16 17:17:59 2017 - [info] Thu Nov 16 17:17:59 2017 - [warning] SQL Thread is stopped(no error) on host_2(host_2:3306) Thu Nov 16 17:17:59 2017 - [warning] SQL Thread is stopped(no error) on host_3(host_3:3306) Thu Nov 16 17:17:59 2017 - [info] GTID failover mode = 0 Thu Nov 16 17:17:59 2017 - [info] Dead Servers: Thu Nov 16 17:17:59 2017 - [info] host_1(host_1:3306) Thu Nov 16 17:17:59 2017 - [info] Checking master reachability via MySQL(double check)... Thu Nov 16 17:18:00 2017 - [info] ok. Thu Nov 16 17:18:00 2017 - [info] Alive Servers: Thu Nov 16 17:18:00 2017 - [info] host_2(host_2:3306) Thu Nov 16 17:18:00 2017 - [info] host_3(host_3:3306) Thu Nov 16 17:18:00 2017 - [info] Alive Slaves: Thu Nov 16 17:18:00 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:18:00 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:18:00 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 17:18:00 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:18:00 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:18:00 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 17:18:00 2017 - [info] Starting SQL thread on host_2(host_2:3306) .. Thu Nov 16 17:18:00 2017 - [info] done. Thu Nov 16 17:18:00 2017 - [info] Starting SQL thread on host_3(host_3:3306) .. Thu Nov 16 17:18:00 2017 - [info] done. Thu Nov 16 17:18:00 2017 - [info] Starting Non-GTID based failover. Thu Nov 16 17:18:00 2017 - [info] Thu Nov 16 17:18:00 2017 - [info] ** Phase 1: Configuration Check Phase completed. Thu Nov 16 17:18:00 2017 - [info] Thu Nov 16 17:18:00 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Thu Nov 16 17:18:00 2017 - [info] Thu Nov 16 17:18:50 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. Thu Nov 16 17:18:50 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Thu Nov 16 17:18:50 2017 - [info] Executing master IP deactivation script: Thu Nov 16 17:18:50 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --command=stop ssh: connect to host host_1 port 22: Connection timed out =================== swift vip : tgw_vip from host_1 is deleted ============================== --2017-11-16 17:18:57-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 8.61M=0s 2017-11-16 17:20:55 (8.61 MB/s) - 已写入标准输出 [38] Thu Nov 16 17:20:55 2017 - [info] done. Thu Nov 16 17:20:55 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Thu Nov 16 17:20:55 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] * Phase 3: Master Recovery Phase.. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] The latest binary log file/position on all slaves is host_1.000012:3216 Thu Nov 16 17:20:55 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Thu Nov 16 17:20:55 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:20:55 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:20:55 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 17:20:55 2017 - [info] The oldest binary log file/position on all slaves is host_1.000012:2576 Thu Nov 16 17:20:55 2017 - [info] Oldest slaves: Thu Nov 16 17:20:55 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:20:55 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:20:55 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] * Phase 3.3: Determining New Master Phase.. Thu Nov 16 17:20:55 2017 - [info] Thu Nov 16 17:20:55 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves.. Thu Nov 16 17:20:55 2017 - [info] HealthCheck: SSH to host_2 is reachable. Thu Nov 16 17:20:55 2017 - [info] Checking whether host_2 has relay logs from the oldest position.. Thu Nov 16 17:20:55 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=host_1.000012 --latest_rmlp=3216 --target_mlf=host_1.000012 --target_rmlp=2576 --server_id=1261261656 --workdir=/var/log/masterha/mha_test --timestamp=20171116171759 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2-relay-bin.000002 : Relay log found at /data/mysql_data, up to host_2-relay-bin.000002 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_2-relay-bin.000002, start_pos:1581. Target relay log FOUND! Thu Nov 16 17:20:56 2017 - [info] OK. host_2 has all relay logs. Thu Nov 16 17:20:56 2017 - [info] HealthCheck: SSH to host_3 is reachable. Thu Nov 16 17:20:56 2017 - [info] Searching new master from slaves.. Thu Nov 16 17:20:56 2017 - [info] Candidate masters from the configuration file: Thu Nov 16 17:20:56 2017 - [info] host_2(host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:20:56 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:20:56 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 16 17:20:56 2017 - [info] Non-candidate masters: Thu Nov 16 17:20:56 2017 - [info] host_3(host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 16 17:20:56 2017 - [info] Replicating from host_1(host_1:3306) Thu Nov 16 17:20:56 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 16 17:20:56 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Thu Nov 16 17:20:56 2017 - [info] New master is host_2(host_2:3306) Thu Nov 16 17:20:56 2017 - [info] Starting master failover.. Thu Nov 16 17:20:56 2017 - [info] From: host_1(host_1:3306) (current master) +--host_2(host_2:3306) +--host_3(host_3:3306) To: host_2(host_2:3306) (new master) +--host_3(host_3:3306) Thu Nov 16 17:20:56 2017 - [info] Thu Nov 16 17:20:56 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase.. Thu Nov 16 17:20:56 2017 - [info] Thu Nov 16 17:20:56 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave. Thu Nov 16 17:20:56 2017 - [info] Thu Nov 16 17:20:56 2017 - [info] * Phase 3.4: Master Log Apply Phase.. Thu Nov 16 17:20:56 2017 - [info] Thu Nov 16 17:20:56 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed. Thu Nov 16 17:20:56 2017 - [info] Starting recovery on host_2(host_2:3306).. Thu Nov 16 17:20:56 2017 - [info] This server has all relay logs. Waiting all logs to be applied.. Thu Nov 16 17:20:56 2017 - [info] done. Thu Nov 16 17:20:56 2017 - [info] All relay logs were successfully applied. Thu Nov 16 17:20:56 2017 - [info] Getting new master's binlog name and position.. Thu Nov 16 17:20:56 2017 - [info] host_2.000012:3959 Thu Nov 16 17:20:56 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='host_2', MASTER_PORT=3306, MASTER_LOG_FILE='host_2.000012', MASTER_LOG_POS=3959, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Thu Nov 16 17:20:56 2017 - [info] Executing master IP activate script: Thu Nov 16 17:20:56 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host=host_1 --orig_master_ip=host_1 --orig_master_port=3306 --new_master_host=host_2 --new_master_ip=host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_2 is added ============================== Thu Nov 16 17:20:59 2017 - [info] OK. Thu Nov 16 17:20:59 2017 - [info] ** Finished master recovery successfully. Thu Nov 16 17:20:59 2017 - [info] * Phase 3: Master Recovery Phase completed. Thu Nov 16 17:20:59 2017 - [info] Thu Nov 16 17:20:59 2017 - [info] * Phase 4: Slaves Recovery Phase.. Thu Nov 16 17:20:59 2017 - [info] Thu Nov 16 17:20:59 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. Thu Nov 16 17:20:59 2017 - [info] Thu Nov 16 17:20:59 2017 - [info] -- Slave diff file generation on host host_3(host_3:3306) started, pid: 77007. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171116171759.log if it takes time.. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] Log messages from host_3 ... Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:20:59 2017 - [info] Server host_3 received relay logs up to: host_1.000012:2576 Thu Nov 16 17:20:59 2017 - [info] Need to get diffs from the latest slave(host_2) up to: host_1.000012:3216 (using the latest slave's relay logs) Thu Nov 16 17:20:59 2017 - [info] Connecting to the latest slave host host_2, generating diff relay log files.. Thu Nov 16 17:20:59 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=host_3 --latest_mlf=host_1.000012 --latest_rmlp=3216 --target_mlf=host_1.000012 --target_rmlp=2576 --server_id=1261261656 --diff_file_readtolatest=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog --workdir=/var/log/masterha/mha_test --timestamp=20171116171759 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --relay_dir=/data/mysql_data --current_relay_log=host_2-relay-bin.000002 Thu Nov 16 17:21:00 2017 - [info] Relay log found at /data/mysql_data, up to host_2-relay-bin.000002 Fast relay log position search succeeded. Target relay log file/position found. start_file:host_2-relay-bin.000002, start_pos:1581. Concat binary/relay logs from host_2-relay-bin.000002 pos 1581 to host_2-relay-bin.000002 EOF into /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 315.. ok. Dumping effective binlog data from /data/mysql_data/host_2-relay-bin.000002 position 1581 to tail(2221).. ok. Binlog Checksum enabled Concat succeeded. Generating diff relay log succeeded. Saved at /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog . scp host_2.58os.org:/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog to root@host_3(22) succeeded. Thu Nov 16 17:21:00 2017 - [info] Generating diff files succeeded. Thu Nov 16 17:21:00 2017 - [info] End of log messages from host_3. Thu Nov 16 17:21:00 2017 - [info] -- Slave diff log generation on host host_3(host_3:3306) succeeded. Thu Nov 16 17:21:00 2017 - [info] Generating relay diff files from the latest slave succeeded. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] -- Slave recovery on host host_3(host_3:3306) started, pid: 78627. Check tmp log /var/log/masterha/mha_test/host_3_3306_20171116171759.log if it takes time.. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] Log messages from host_3 ... Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] Starting recovery on host_3(host_3:3306).. Thu Nov 16 17:21:00 2017 - [info] Generating diffs succeeded. Thu Nov 16 17:21:00 2017 - [info] Waiting until all relay logs are applied. Thu Nov 16 17:21:00 2017 - [info] done. Thu Nov 16 17:21:00 2017 - [info] Getting slave status.. Thu Nov 16 17:21:00 2017 - [info] This slave(host_3)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(host_1.000012:2576). No need to recover from Exec_Master_Log_Pos. Thu Nov 16 17:21:00 2017 - [info] Connecting to the target slave host host_3, running recover script.. Thu Nov 16 17:21:00 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='dba' --slave_host=host_3 --slave_ip=host_3 --slave_port=3306 --apply_files=/var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog --workdir=/var/log/masterha/mha_test --target_version=5.7.13-log --timestamp=20171116171759 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx Thu Nov 16 17:21:00 2017 - [info] MySQL client version is 5.7.13. Using --binary-mode. Applying differential binary/relay log files /var/log/masterha/mha_test/relay_from_read_to_latest_host_3_3306_20171116171759.binlog on host_3:3306. This may take long time... Applying log files succeeded. Thu Nov 16 17:21:00 2017 - [info] All relay logs were successfully applied. Thu Nov 16 17:21:00 2017 - [info] Resetting slave host_3(host_3:3306) and starting replication from the new master host_2(host_2:3306).. Thu Nov 16 17:21:00 2017 - [info] Executed CHANGE MASTER. Thu Nov 16 17:21:00 2017 - [info] Slave started. Thu Nov 16 17:21:00 2017 - [info] End of log messages from host_3. Thu Nov 16 17:21:00 2017 - [info] -- Slave recovery on host host_3(host_3:3306) succeeded. Thu Nov 16 17:21:00 2017 - [info] All new slave servers recovered successfully. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] * Phase 5: New master cleanup phase.. Thu Nov 16 17:21:00 2017 - [info] Thu Nov 16 17:21:00 2017 - [info] Resetting slave info on the new master.. Thu Nov 16 17:21:00 2017 - [info] host_2: Resetting slave info succeeded. Thu Nov 16 17:21:00 2017 - [info] Master failover to host_2(host_2:3306) completed successfully. Thu Nov 16 17:21:00 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_1(host_1:3306) to host_2(host_2:3306) succeeded Master host_1(host_1:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_1(host_1:3306) The latest slave host_2(host_2:3306) has all relay logs for recovery. Selected host_2(host_2:3306) as a new master. host_2(host_2:3306): OK: Applying all logs succeeded. host_2(host_2:3306): OK: Activated master IP address. host_3(host_3:3306): Generating differential relay logs up to host_2(host_2:3306)succeeded. Generating relay diff files from the latest slave succeeded. host_3(host_3:3306): OK: Applying all logs succeeded. Slave started, replicating from host_2(host_2:3306) host_2(host_2:3306): Resetting slave info succeeded. Master failover to host_2(host_2:3306) completed successfully. ### 切换后的结果 new_master & new_etl 结果一致,由于master上面未传过来的binlog彻底丢失,所以相应的新集群也缺失这些数据。 dba:lc> select * from t_char_2; +-----+------+ | id | name | +-----+------+ | 176 | 1 | | 177 | 2 | +-----+------+ 2 rows in set (0.00 sec) ### 最后一步很重要 如果dead master之后又活过来了,那么这一步要做 dead_master> /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c http://gitlab.corp.anjuke.com/_dba/architecture/blob/master/personal/Keithlan/other/share/tools/always_used_command.md ==》 tgw章节详细描述 结论: 由于master 已挂,然而最后的日志没有传递到其他服务器,所以会丢失master没有传递过来的事务日志 2.3.2 当master的所有日志已经传递slave,这时候master server挂了 测试省略,和2.3.1基本一样 结论:由于master上的所有日志全部传递到slave,所以最后是不会丢失master上任何数据的。 2.4 slave(候选master)上面有大事务在跑 1000s的大查询 同1.4结论 flush tables with readlock 同1.4结论 2.6 如果MHA过程中失败,是否可以重新执行MHA的failover呢? 同1.6结论 三、遇到的坑 3.1 交互模式下,如果没有及时敲'YES',则终止切换 3.2 如果在non-gtid模式的机器上,配置了binlog server,会有什么影响呢? 无影响 3.3 不要在relay-log的地方伪造日志 xx-relay-bin.000001 xx-relay-bin.000002 xx-relay-bin.000002.lc --这个是伪造的,当时这个日志是自己解析的 mysqlbinlog -vv xx-relay-bin.000002 > xx-relay-bin.000002.lc MHA在切换的时候报错如下: Reading xx-relay-bin.000002.lc Event too large: pos: 4, event_length: 1163083840, event_type: 32 at /usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm line 103 Tue Nov 14 10:39:08 2017 - [warning] xx doesn't have all relay logs. Maybe some logs were purged. Tue Nov 14 10:39:08 2017 - [warning] None of latest servers have enough relay logs from oldest position. We can't recover oldest slaves. Tue Nov 14 10:39:08 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln947] None of the latest slaves has enough relay logs for recovery. Tue Nov 14 10:39:08 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53 3.4 flush tables with read lock 会阻塞 dba:(none)> show processlist; +----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+ | 63 | dba | localhost | NULL | Query | 0 | starting | show processlist | | 65 | dba | xx:11164 | NULL | Sleep | 121 | | NULL | | 83 | dba | new master:49022 | NULL | Query | 176 | Waiting for global read lock | BINLOG ' GpAKWhNYUy1LMAAAAGYHAAAAAG0AAAAAAAEAAmxjAAh0X2NoYXJfMgACAw8CLAEC GpAKWh5YUy1LJwAAAI0HAAAAAG | +----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+ 3 rows in set (0.00 sec) 会卡在这一步 Tue Nov 14 15:16:23 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase.. --------------Waiting for global read lock----------------- 那么后面的步骤都会卡住 * Phase 4.2: Starting Parallel Slave Log Apply Phase.. --将 当前需要恢复的slave 和 latest slave之间的diff日志 传送给 需要恢复的 slave --将 latest slave 和 dead master之间的diff日志 传送给 需要恢复的 slave --将 刚刚的差异日志依次apply 到需要恢复的slave --Resetting slave , Executed CHANGE MASTER to new_master * Phase 5: New master cleanup phase.. Resetting slave info on the new master. 3.5 binlog伪造 Tue Nov 14 17:23:10 2017 - [info] Executing command on the dead master host_1(host_1:3306): save_binary_logs --command=save --start_file=host_1.000003 --start_pos=4042 --binlog_dir=/data/mysql.bin --output_file=/var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171114172304.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/log/masterha/mha_test if not exists.. ok. Concat binary/relay logs from host_1.000003 pos 4042 to host_1.000050 EOF into /var/log/masterha/mha_test/saved_master_binlog_from_host_1_3306_20171114172304.binlog .. Binlog Checksum enabled Dumping binlog format description event, from position 0 to 154.. ok. Dumping effective binlog data from /data/mysql.bin/host_1.000003 position 4042 to tail(4065).. ok. Failed to save binary log: Target file /data/mysql.bin/host_1.000004 not found! at /usr/bin/save_binary_logs line 176 Tue Nov 14 17:23:10 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln760] Failed to save binary log events from the orig master. Maybe disks on binary logs are not accessible or binary log itself is corrupt? 测试: 有flush + 无binlog.xx + 1伪造 = save binlog --成功 有flush + 有binlog.xx + 1伪造 = save binlog --失败 无flush + 无binlog.xx + 1伪造 = save binlog --成功 无flush + 有binlog.xx + 1伪造 = save binlog --失败 说明,如果有伪造的binlog,且后缀比之前的大,那么就会告警 解析: /usr/bin/save_binary_logs ==》 generate_diff_binary_log => /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm ==》 concat_all_binlogs_from():: => start_num , end_num(如果出现伪造的binlog,则会报错) 四、总结 MHA + NON-GTID 模式,重点配置和用法如下: 1. command masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host=$dead_master_ip --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error 2. tgw 清理 dead master 如果还可以起来,那么必须在上面执行: /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c 原因可参看:http://gitlab.corp.anjuke.com/_dba/architecture/blob/master/personal/Keithlan/other/share/tools/always_used_command.md ==> TGW 章节 五、 流程简介 * Phase 1: Configuration Check Phase.. HealthCheck: SSH N台DB是否reachable Binlog server: N台DB 是否reachable GTID failover mode = ? Dead Servers is ? Primary candidate for the new Master (candidate_master is set) ? * Phase 2: Dead Master Shutdown Phase.. Executing master IP deactivation script: TGW-vip delete操作 shutdown_script: ? * Phase 3: Master Recovery Phase.. * Phase 3.1: Getting Latest Slaves Phase.. Latest slaves ,file position ? Oldest slaves , file position ? * Phase 3.2: Saving Dead Master's Binlog Phase.. Executing command on the dead master : save_binary_logs --command=save Concat binary/relay logs from latest_slave file_pos to master's binlog EOF scp from root@master_ip: binlog to local: xx.binlog --将master最新的,latest缺失的binlog传递到manager机器 * Phase 3.3: Determining New Master Phase.. 选择哪个slave为new master * Phase 3.3: New Master Diff Log Generation Phase.. Need to get diffs from the latest slave up to: xx (using the latest slave's relay logs) --生成new master和latest slave之间的diff日志 scp latest_ip:xx to new_master:xx --将new master和latest slave之间的diff日志 传送给 new master scp from local: xx to new_master: xx --将latest slave 和 dead master之间的diff日志 传送给 new master * Phase 3.4: Master Log Apply Phase.. new master 将之前的两段diff日志合并,然后开始apply All other slaves should start replication from here --开始生成change master语句 Executing master IP activate script: TGW-vip 激活操作,并且设置readonly=0 * Phase 4: Slaves Recovery Phase.. (并行操作) * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase.. --开始生成其余slave 和 latest 之间的diff日志 * Phase 4.2: Starting Parallel Slave Log Apply Phase.. --将 当前需要恢复的slave 和 latest slave之间的diff日志 传送给 需要恢复的 slave --将 latest slave 和 dead master之间的diff日志 传送给 需要恢复的 slave --将 刚刚的差异日志依次apply 到需要恢复的slave --Resetting slave , Executed CHANGE MASTER to new_master * Phase 5: New master cleanup phase.. Resetting slave info on the new master.
MHA failover GTID 专题 这里以masterha_master_switch为背景详解各种可能遇到的场景 假定环境(经典三节点) host_1(host_1:3306) (current master) +--host_2(host_2:3306 slave[candidate master]) +--host_3(host_3:3306 etl) 一、Master : MySQL down 1.1 etl 延迟8小时 配置文件中加上no_check_delay=0 即可忽略报错 1.2 slave(候选master)比etl还要落后更多 1.2.1 当master的部分日志还没传递两个slave,这时候master 上的MySQL挂了 ### 模拟现场,现场的3台DB gtid状态 * master host_2 dba:lc> show master status; +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | host_2.000002 | 2885 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446362 | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) * slave (candidate master) host_1 Retrieved_Gtid_Set: ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446353 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446353 Auto_Position: 1 * etl (other slave) host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:4-16, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446353-446356 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446356 Auto_Position: 1 ### 切换日志 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Thu Nov 9 10:43:49 2017 - [info] MHA::MasterFailover version 0.56. Thu Nov 9 10:43:49 2017 - [info] Starting master failover. Thu Nov 9 10:43:49 2017 - [info] Thu Nov 9 10:43:49 2017 - [info] * Phase 1: Configuration Check Phase.. Thu Nov 9 10:43:49 2017 - [info] Thu Nov 9 10:43:50 2017 - [info] HealthCheck: SSH to host_2 is reachable. Thu Nov 9 10:43:50 2017 - [info] Binlog server host_2 is reachable. Thu Nov 9 10:43:50 2017 - [info] HealthCheck: SSH to host_1 is reachable. Thu Nov 9 10:43:50 2017 - [info] Binlog server host_1 is reachable. Thu Nov 9 10:43:50 2017 - [info] HealthCheck: SSH to host_3 is reachable. Thu Nov 9 10:43:50 2017 - [info] Binlog server host_3 is reachable. Thu Nov 9 10:43:51 2017 - [warning] SQL Thread is stopped(no error) on host_1( host_1:3306) Thu Nov 9 10:43:51 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306) Thu Nov 9 10:43:51 2017 - [info] GTID failover mode = 1 Thu Nov 9 10:43:51 2017 - [info] Dead Servers: Thu Nov 9 10:43:51 2017 - [info] host_2( host_2:3306) Thu Nov 9 10:43:51 2017 - [info] Checking master reachability via MySQL(double check)... Thu Nov 9 10:43:51 2017 - [info] ok. Thu Nov 9 10:43:51 2017 - [info] Alive Servers: Thu Nov 9 10:43:51 2017 - [info] host_1( host_1:3306) Thu Nov 9 10:43:51 2017 - [info] host_3( host_3:3306) Thu Nov 9 10:43:51 2017 - [info] Alive Slaves: Thu Nov 9 10:43:51 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:43:51 2017 - [info] GTID ON Thu Nov 9 10:43:51 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 10:43:51 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 10:43:51 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:43:51 2017 - [info] GTID ON Thu Nov 9 10:43:51 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 10:43:51 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 10:43:51 2017 - [info] Starting SQL thread on host_1( host_1:3306) .. Thu Nov 9 10:43:51 2017 - [info] done. Thu Nov 9 10:43:51 2017 - [info] Starting SQL thread on host_3( host_3:3306) .. Thu Nov 9 10:43:51 2017 - [info] done. Thu Nov 9 10:43:51 2017 - [info] Starting GTID based failover. Thu Nov 9 10:43:51 2017 - [info] Thu Nov 9 10:43:51 2017 - [info] ** Phase 1: Configuration Check Phase completed. Thu Nov 9 10:43:51 2017 - [info] Thu Nov 9 10:43:51 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Thu Nov 9 10:43:51 2017 - [info] Thu Nov 9 10:43:51 2017 - [info] HealthCheck: SSH to host_2 is reachable. Thu Nov 9 10:43:51 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Thu Nov 9 10:43:51 2017 - [info] Executing master IP deactivation script: Thu Nov 9 10:43:51 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --command=stopssh --ssh_user=root Thu Nov 9 10:43:53 2017 - [info] done. Thu Nov 9 10:43:53 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Thu Nov 9 10:43:53 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Thu Nov 9 10:43:53 2017 - [info] Thu Nov 9 10:43:53 2017 - [info] * Phase 3: Master Recovery Phase.. Thu Nov 9 10:43:53 2017 - [info] Thu Nov 9 10:43:53 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Thu Nov 9 10:43:53 2017 - [info] Thu Nov 9 10:43:53 2017 - [info] The latest binary log file/position on all slaves is host_2.000002:1115 Thu Nov 9 10:43:53 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:4-16, Thu Nov 9 10:43:53 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Thu Nov 9 10:43:53 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:43:53 2017 - [info] GTID ON Thu Nov 9 10:43:53 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 10:43:53 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 10:43:53 2017 - [info] The oldest binary log file/position on all slaves is host_2.000002:230 Thu Nov 9 10:43:53 2017 - [info] Retrieved Gtid Set: ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446353 Thu Nov 9 10:43:53 2017 - [info] Oldest slaves: Thu Nov 9 10:43:53 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:43:53 2017 - [info] GTID ON Thu Nov 9 10:43:53 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 10:43:53 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 10:43:53 2017 - [info] Thu Nov 9 10:43:53 2017 - [info] * Phase 3.3: Determining New Master Phase.. Thu Nov 9 10:43:53 2017 - [info] Thu Nov 9 10:43:53 2017 - [info] Searching new master from slaves.. Thu Nov 9 10:43:53 2017 - [info] Candidate masters from the configuration file: Thu Nov 9 10:43:53 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:43:53 2017 - [info] GTID ON Thu Nov 9 10:43:53 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 10:43:53 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 10:43:53 2017 - [info] Non-candidate masters: Thu Nov 9 10:43:53 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:43:53 2017 - [info] GTID ON Thu Nov 9 10:43:53 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 10:43:53 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 10:43:53 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Thu Nov 9 10:43:53 2017 - [info] Not found. Thu Nov 9 10:43:53 2017 - [info] Searching from all candidate_master slaves.. Thu Nov 9 10:43:53 2017 - [info] New master is host_1( host_1:3306) Thu Nov 9 10:43:53 2017 - [info] Starting master failover.. Thu Nov 9 10:43:53 2017 - [info] Thu Nov 9 10:43:53 2017 - [info] Thu Nov 9 10:43:53 2017 - [info] * Phase 3.3: New Master Recovery Phase.. Thu Nov 9 10:43:53 2017 - [info] Thu Nov 9 10:43:53 2017 - [info] Waiting all logs to be applied.. Thu Nov 9 10:43:53 2017 - [info] done. Thu Nov 9 10:43:53 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply.. Thu Nov 9 10:43:53 2017 - [info] Waiting all logs to be applied on the latest slave.. Thu Nov 9 10:43:53 2017 - [info] Resetting slave host_1( host_1:3306) and starting replication from the new master host_3( host_3:3306).. Thu Nov 9 10:43:53 2017 - [info] Executed CHANGE MASTER. Thu Nov 9 10:43:54 2017 - [info] Slave started. Thu Nov 9 10:43:54 2017 - [info] Waiting to execute all relay logs on host_1( host_1:3306).. Thu Nov 9 10:43:54 2017 - [info] master_pos_wait( host_3.000049:18041) completed on host_1( host_1:3306). Executed 0 events. Thu Nov 9 10:43:54 2017 - [info] done. Thu Nov 9 10:43:54 2017 - [info] done. Thu Nov 9 10:43:54 2017 - [info] -- Saving binlog from host host_2 started, pid: 150294 Thu Nov 9 10:43:54 2017 - [info] -- Saving binlog from host host_1 started, pid: 150295 Thu Nov 9 10:43:54 2017 - [info] -- Saving binlog from host host_3 started, pid: 150297 Thu Nov 9 10:43:54 2017 - [info] Thu Nov 9 10:43:54 2017 - [info] Log messages from host_1 ... Thu Nov 9 10:43:54 2017 - [info] Thu Nov 9 10:43:54 2017 - [info] Fetching binary logs from binlog server host_1.. Thu Nov 9 10:43:54 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000002 --start_pos=1115 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171109104349.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Thu Nov 9 10:43:54 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Thu Nov 9 10:43:54 2017 - [info] End of log messages from host_1. Thu Nov 9 10:43:54 2017 - [warning] Got error from host_1. Thu Nov 9 10:43:54 2017 - [info] Thu Nov 9 10:43:54 2017 - [info] Log messages from host_3 ... Thu Nov 9 10:43:54 2017 - [info] Thu Nov 9 10:43:54 2017 - [info] Fetching binary logs from binlog server host_3.. Thu Nov 9 10:43:54 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000002 --start_pos=1115 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171109104349.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Thu Nov 9 10:43:54 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Thu Nov 9 10:43:54 2017 - [info] End of log messages from host_3. Thu Nov 9 10:43:54 2017 - [warning] Got error from host_3. Thu Nov 9 10:43:55 2017 - [info] Thu Nov 9 10:43:55 2017 - [info] Log messages from host_2 ... Thu Nov 9 10:43:55 2017 - [info] Thu Nov 9 10:43:54 2017 - [info] Fetching binary logs from binlog server host_2.. Thu Nov 9 10:43:54 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000002 --start_pos=1115 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171109104349.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Thu Nov 9 10:43:55 2017 - [info] scp from root@ host_2:/var/log/masterha/mha_test/saved_binlog_binlog1_20171109104349.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171109104349.binlog succeeded. Thu Nov 9 10:43:55 2017 - [info] End of log messages from host_2. Thu Nov 9 10:43:55 2017 - [info] Saved mysqlbinlog size from host_2 is 6047 bytes. Thu Nov 9 10:43:55 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171109104349.binlog .. Thu Nov 9 10:43:55 2017 - [info] Differential log apply from binlog server succeeded. Thu Nov 9 10:43:55 2017 - [info] Getting new master's binlog name and position.. Thu Nov 9 10:43:55 2017 - [info] host_1.000053:3624 Thu Nov 9 10:43:55 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_1', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Thu Nov 9 10:43:55 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_1.000053, 3624, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16, Thu Nov 9 10:43:55 2017 - [info] Executing master IP activate script: Thu Nov 9 10:43:55 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --new_master_host= host_1 --new_master_ip= host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Thu Nov 9 10:43:57 2017 - [info] OK. Thu Nov 9 10:43:57 2017 - [info] Setting read_only=0 on host_1( host_1:3306).. Thu Nov 9 10:43:57 2017 - [info] ok. Thu Nov 9 10:43:57 2017 - [info] ** Finished master recovery successfully. Thu Nov 9 10:43:57 2017 - [info] * Phase 3: Master Recovery Phase completed. Thu Nov 9 10:43:57 2017 - [info] Thu Nov 9 10:43:57 2017 - [info] * Phase 4: Slaves Recovery Phase.. Thu Nov 9 10:43:57 2017 - [info] Thu Nov 9 10:43:57 2017 - [info] Thu Nov 9 10:43:57 2017 - [info] * Phase 4.1: Starting Slaves in parallel.. Thu Nov 9 10:43:57 2017 - [info] Thu Nov 9 10:43:57 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 155162. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171109104349.log if it takes time.. Thu Nov 9 10:43:58 2017 - [info] Thu Nov 9 10:43:58 2017 - [info] Log messages from host_3 ... Thu Nov 9 10:43:58 2017 - [info] Thu Nov 9 10:43:57 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_1( host_1:3306).. Thu Nov 9 10:43:57 2017 - [info] Executed CHANGE MASTER. Thu Nov 9 10:43:58 2017 - [info] Slave started. Thu Nov 9 10:43:58 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16, Thu Nov 9 10:43:58 2017 - [info] End of log messages from host_3. Thu Nov 9 10:43:58 2017 - [info] -- Slave on host host_3( host_3:3306) started. Thu Nov 9 10:43:58 2017 - [info] All new slave servers recovered successfully. Thu Nov 9 10:43:58 2017 - [info] Thu Nov 9 10:43:58 2017 - [info] * Phase 5: New master cleanup phase.. Thu Nov 9 10:43:58 2017 - [info] Thu Nov 9 10:43:58 2017 - [info] Resetting slave info on the new master.. Thu Nov 9 10:43:58 2017 - [info] host_1: Resetting slave info succeeded. Thu Nov 9 10:43:58 2017 - [info] Master failover to host_1( host_1:3306) completed successfully. Thu Nov 9 10:43:58 2017 - [info] Thu Nov 9 10:43:58 2017 - [info] Sending mail.. 1.2.2 当master的所有日志已经传递到1个etl,这时候master 上的MySQL挂了 ### 模拟现场,现场的3台DB gtid状态 * master host_1 dba:lc> show master status; +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | host_1.000053 | 5229 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-21, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446362 | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) * slave (candidate master) host_2 Retrieved_Gtid_Set: Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446362 Auto_Position: 1 * etl (other slave) host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:17-21, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446357-446362 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-21, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446362 Auto_Position: 1 ### 切换日志 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Thu Nov 9 10:59:14 2017 - [info] MHA::MasterFailover version 0.56. Thu Nov 9 10:59:14 2017 - [info] Starting master failover. Thu Nov 9 10:59:14 2017 - [info] Thu Nov 9 10:59:14 2017 - [info] * Phase 1: Configuration Check Phase.. Thu Nov 9 10:59:14 2017 - [info] Thu Nov 9 10:59:15 2017 - [info] HealthCheck: SSH to host_2 is reachable. Thu Nov 9 10:59:15 2017 - [info] Binlog server host_2 is reachable. Thu Nov 9 10:59:15 2017 - [info] HealthCheck: SSH to host_1 is reachable. Thu Nov 9 10:59:15 2017 - [info] Binlog server host_1 is reachable. Thu Nov 9 10:59:15 2017 - [info] HealthCheck: SSH to host_3 is reachable. Thu Nov 9 10:59:16 2017 - [info] Binlog server host_3 is reachable. Thu Nov 9 10:59:16 2017 - [warning] SQL Thread is stopped(no error) on host_2( host_2:3306) Thu Nov 9 10:59:16 2017 - [info] GTID failover mode = 1 Thu Nov 9 10:59:16 2017 - [info] Dead Servers: Thu Nov 9 10:59:16 2017 - [info] host_1( host_1:3306) Thu Nov 9 10:59:16 2017 - [info] Checking master reachability via MySQL(double check)... Thu Nov 9 10:59:16 2017 - [info] ok. Thu Nov 9 10:59:16 2017 - [info] Alive Servers: Thu Nov 9 10:59:16 2017 - [info] host_2( host_2:3306) Thu Nov 9 10:59:16 2017 - [info] host_3( host_3:3306) Thu Nov 9 10:59:16 2017 - [info] Alive Slaves: Thu Nov 9 10:59:16 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:59:16 2017 - [info] GTID ON Thu Nov 9 10:59:16 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 10:59:16 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 10:59:16 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:59:16 2017 - [info] GTID ON Thu Nov 9 10:59:16 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 10:59:16 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 10:59:16 2017 - [info] Starting SQL thread on host_2( host_2:3306) .. Thu Nov 9 10:59:16 2017 - [info] done. Thu Nov 9 10:59:16 2017 - [info] Starting GTID based failover. Thu Nov 9 10:59:16 2017 - [info] Thu Nov 9 10:59:16 2017 - [info] ** Phase 1: Configuration Check Phase completed. Thu Nov 9 10:59:16 2017 - [info] Thu Nov 9 10:59:16 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Thu Nov 9 10:59:16 2017 - [info] Thu Nov 9 10:59:16 2017 - [info] HealthCheck: SSH to host_1 is reachable. Thu Nov 9 10:59:16 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Thu Nov 9 10:59:16 2017 - [info] Executing master IP deactivation script: Thu Nov 9 10:59:16 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root Thu Nov 9 10:59:20 2017 - [info] done. Thu Nov 9 10:59:20 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Thu Nov 9 10:59:20 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Thu Nov 9 10:59:20 2017 - [info] Thu Nov 9 10:59:20 2017 - [info] * Phase 3: Master Recovery Phase.. Thu Nov 9 10:59:20 2017 - [info] Thu Nov 9 10:59:20 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Thu Nov 9 10:59:20 2017 - [info] Thu Nov 9 10:59:20 2017 - [info] The latest binary log file/position on all slaves is host_1.000053:5229 Thu Nov 9 10:59:20 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:17-21, Thu Nov 9 10:59:20 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Thu Nov 9 10:59:20 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:59:20 2017 - [info] GTID ON Thu Nov 9 10:59:20 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 10:59:20 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 10:59:20 2017 - [info] The oldest binary log file/position on all slaves is host_1.000053:3624 Thu Nov 9 10:59:20 2017 - [info] Oldest slaves: Thu Nov 9 10:59:20 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:59:20 2017 - [info] GTID ON Thu Nov 9 10:59:20 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 10:59:20 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 10:59:20 2017 - [info] Thu Nov 9 10:59:20 2017 - [info] * Phase 3.3: Determining New Master Phase.. Thu Nov 9 10:59:20 2017 - [info] Thu Nov 9 10:59:20 2017 - [info] Searching new master from slaves.. Thu Nov 9 10:59:20 2017 - [info] Candidate masters from the configuration file: Thu Nov 9 10:59:20 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:59:20 2017 - [info] GTID ON Thu Nov 9 10:59:20 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 10:59:20 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 10:59:20 2017 - [info] Non-candidate masters: Thu Nov 9 10:59:20 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 10:59:20 2017 - [info] GTID ON Thu Nov 9 10:59:20 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 10:59:20 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 10:59:20 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Thu Nov 9 10:59:20 2017 - [info] Not found. Thu Nov 9 10:59:20 2017 - [info] Searching from all candidate_master slaves.. Thu Nov 9 10:59:20 2017 - [info] New master is host_2( host_2:3306) Thu Nov 9 10:59:20 2017 - [info] Starting master failover.. Thu Nov 9 10:59:20 2017 - [info] Thu Nov 9 10:59:20 2017 - [info] Thu Nov 9 10:59:20 2017 - [info] * Phase 3.3: New Master Recovery Phase.. Thu Nov 9 10:59:20 2017 - [info] Thu Nov 9 10:59:20 2017 - [info] Waiting all logs to be applied.. Thu Nov 9 10:59:20 2017 - [info] done. Thu Nov 9 10:59:20 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply.. Thu Nov 9 10:59:20 2017 - [info] Waiting all logs to be applied on the latest slave.. Thu Nov 9 10:59:20 2017 - [info] Resetting slave host_2( host_2:3306) and starting replication from the new master host_3( host_3:3306).. Thu Nov 9 10:59:20 2017 - [info] Executed CHANGE MASTER. Thu Nov 9 10:59:21 2017 - [info] Slave started. Thu Nov 9 10:59:21 2017 - [info] Waiting to execute all relay logs on host_2( host_2:3306).. Thu Nov 9 10:59:21 2017 - [info] master_pos_wait( host_3.000049:22035) completed on host_2( host_2:3306). Executed 0 events. Thu Nov 9 10:59:21 2017 - [info] done. Thu Nov 9 10:59:21 2017 - [info] done. Thu Nov 9 10:59:21 2017 - [info] -- Saving binlog from host host_2 started, pid: 184482 Thu Nov 9 10:59:21 2017 - [info] -- Saving binlog from host host_1 started, pid: 184483 Thu Nov 9 10:59:21 2017 - [info] -- Saving binlog from host host_3 started, pid: 184487 Thu Nov 9 10:59:21 2017 - [info] Thu Nov 9 10:59:21 2017 - [info] Log messages from host_2 ... Thu Nov 9 10:59:21 2017 - [info] Thu Nov 9 10:59:21 2017 - [info] Fetching binary logs from binlog server host_2.. Thu Nov 9 10:59:21 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000053 --start_pos=5229 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171109105914.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Thu Nov 9 10:59:21 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Thu Nov 9 10:59:21 2017 - [info] End of log messages from host_2. Thu Nov 9 10:59:21 2017 - [warning] Got error from host_2. Thu Nov 9 10:59:21 2017 - [info] Thu Nov 9 10:59:21 2017 - [info] Log messages from host_3 ... Thu Nov 9 10:59:21 2017 - [info] Thu Nov 9 10:59:21 2017 - [info] Fetching binary logs from binlog server host_3.. Thu Nov 9 10:59:21 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000053 --start_pos=5229 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171109105914.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Thu Nov 9 10:59:21 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Thu Nov 9 10:59:21 2017 - [info] End of log messages from host_3. Thu Nov 9 10:59:21 2017 - [warning] Got error from host_3. Thu Nov 9 10:59:22 2017 - [info] Thu Nov 9 10:59:22 2017 - [info] Log messages from host_1 ... Thu Nov 9 10:59:22 2017 - [info] Thu Nov 9 10:59:21 2017 - [info] Fetching binary logs from binlog server host_1.. Thu Nov 9 10:59:21 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000053 --start_pos=5229 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171109105914.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Thu Nov 9 10:59:22 2017 - [info] scp from root@ host_1:/var/log/masterha/mha_test/saved_binlog_binlog2_20171109105914.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_1_binlog2_20171109105914.binlog succeeded. Thu Nov 9 10:59:22 2017 - [info] End of log messages from host_1. Thu Nov 9 10:59:22 2017 - [info] Saved mysqlbinlog size from host_1 is 800 bytes. Thu Nov 9 10:59:22 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_1_binlog2_20171109105914.binlog .. Thu Nov 9 10:59:22 2017 - [info] Differential log apply from binlog server succeeded. Thu Nov 9 10:59:22 2017 - [info] Getting new master's binlog name and position.. Thu Nov 9 10:59:22 2017 - [info] host_2.000003:1680 Thu Nov 9 10:59:22 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_2', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Thu Nov 9 10:59:22 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_2.000003, 1680, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-21, Thu Nov 9 10:59:22 2017 - [info] Executing master IP activate script: Thu Nov 9 10:59:22 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --new_master_host= host_2 --new_master_ip= host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Thu Nov 9 10:59:24 2017 - [info] OK. Thu Nov 9 10:59:24 2017 - [info] Setting read_only=0 on host_2( host_2:3306).. Thu Nov 9 10:59:24 2017 - [info] ok. Thu Nov 9 10:59:24 2017 - [info] ** Finished master recovery successfully. Thu Nov 9 10:59:24 2017 - [info] * Phase 3: Master Recovery Phase completed. Thu Nov 9 10:59:24 2017 - [info] Thu Nov 9 10:59:24 2017 - [info] * Phase 4: Slaves Recovery Phase.. Thu Nov 9 10:59:24 2017 - [info] Thu Nov 9 10:59:24 2017 - [info] Thu Nov 9 10:59:24 2017 - [info] * Phase 4.1: Starting Slaves in parallel.. Thu Nov 9 10:59:24 2017 - [info] Thu Nov 9 10:59:24 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 189393. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171109105914.log if it takes time.. Thu Nov 9 10:59:25 2017 - [info] Thu Nov 9 10:59:25 2017 - [info] Log messages from host_3 ... Thu Nov 9 10:59:25 2017 - [info] Thu Nov 9 10:59:24 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_2( host_2:3306).. Thu Nov 9 10:59:24 2017 - [info] Executed CHANGE MASTER. Thu Nov 9 10:59:25 2017 - [info] Slave started. Thu Nov 9 10:59:25 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-21, Thu Nov 9 10:59:25 2017 - [info] End of log messages from host_3. Thu Nov 9 10:59:25 2017 - [info] -- Slave on host host_3( host_3:3306) started. Thu Nov 9 10:59:25 2017 - [info] All new slave servers recovered successfully. Thu Nov 9 10:59:25 2017 - [info] Thu Nov 9 10:59:25 2017 - [info] * Phase 5: New master cleanup phase.. Thu Nov 9 10:59:25 2017 - [info] Thu Nov 9 10:59:25 2017 - [info] Resetting slave info on the new master.. Thu Nov 9 10:59:25 2017 - [info] host_2: Resetting slave info succeeded. Thu Nov 9 10:59:25 2017 - [info] Master failover to host_2( host_2:3306) completed successfully. Thu Nov 9 10:59:25 2017 - [info] Thu Nov 9 10:59:25 2017 - [info] Sending mail.. 1.3 slave(候选master)的日志是最新的,比etl要多 1.3.1 当master的部分日志还没传递两个slave,这时候master 上的MySQL挂了 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Tue Nov 7 17:11:29 2017 - [info] MHA::MasterFailover version 0.56. Tue Nov 7 17:11:29 2017 - [info] Starting master failover. Tue Nov 7 17:11:29 2017 - [info] Tue Nov 7 17:11:29 2017 - [info] * Phase 1: Configuration Check Phase.. Tue Nov 7 17:11:29 2017 - [info] Tue Nov 7 17:11:29 2017 - [info] HealthCheck: SSH to host_2 is reachable. Tue Nov 7 17:11:29 2017 - [info] Binlog server host_2 is reachable. Tue Nov 7 17:11:29 2017 - [info] HealthCheck: SSH to host_1 is reachable. Tue Nov 7 17:11:30 2017 - [info] Binlog server host_1 is reachable. Tue Nov 7 17:11:30 2017 - [info] HealthCheck: SSH to host_3 is reachable. Tue Nov 7 17:11:30 2017 - [info] Binlog server host_3 is reachable. Tue Nov 7 17:11:30 2017 - [warning] SQL Thread is stopped(no error) on host_2( host_2:3306) Tue Nov 7 17:11:30 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306) Tue Nov 7 17:11:30 2017 - [info] GTID failover mode = 1 Tue Nov 7 17:11:30 2017 - [info] Dead Servers: Tue Nov 7 17:11:30 2017 - [info] host_1( host_1:3306) Tue Nov 7 17:11:30 2017 - [info] Checking master reachability via MySQL(double check)... Tue Nov 7 17:11:30 2017 - [info] ok. Tue Nov 7 17:11:30 2017 - [info] Alive Servers: Tue Nov 7 17:11:30 2017 - [info] host_2( host_2:3306) Tue Nov 7 17:11:30 2017 - [info] host_3( host_3:3306) Tue Nov 7 17:11:30 2017 - [info] Alive Slaves: Tue Nov 7 17:11:30 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 17:11:30 2017 - [info] GTID ON Tue Nov 7 17:11:30 2017 - [info] Replicating from host_1( host_1:3306) Tue Nov 7 17:11:30 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 7 17:11:30 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 17:11:30 2017 - [info] GTID ON Tue Nov 7 17:11:30 2017 - [info] Replicating from host_1( host_1:3306) Tue Nov 7 17:11:30 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 7 17:11:30 2017 - [info] Starting SQL thread on host_2( host_2:3306) .. Tue Nov 7 17:11:30 2017 - [info] done. Tue Nov 7 17:11:30 2017 - [info] Starting SQL thread on host_3( host_3:3306) .. Tue Nov 7 17:11:30 2017 - [info] done. Tue Nov 7 17:11:30 2017 - [info] Starting GTID based failover. Tue Nov 7 17:11:30 2017 - [info] Tue Nov 7 17:11:30 2017 - [info] ** Phase 1: Configuration Check Phase completed. Tue Nov 7 17:11:30 2017 - [info] Tue Nov 7 17:11:30 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Tue Nov 7 17:11:30 2017 - [info] Tue Nov 7 17:11:30 2017 - [info] HealthCheck: SSH to host_1 is reachable. Tue Nov 7 17:11:31 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Tue Nov 7 17:11:31 2017 - [info] Executing master IP deactivation script: Tue Nov 7 17:11:31 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root Tue Nov 7 17:11:33 2017 - [info] done. Tue Nov 7 17:11:33 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Tue Nov 7 17:11:33 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] * Phase 3: Master Recovery Phase.. Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] The latest binary log file/position on all slaves is host_1.000051:13508 Tue Nov 7 17:11:33 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:3-8 Tue Nov 7 17:11:33 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Tue Nov 7 17:11:33 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 17:11:33 2017 - [info] GTID ON Tue Nov 7 17:11:33 2017 - [info] Replicating from host_1( host_1:3306) Tue Nov 7 17:11:33 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 7 17:11:33 2017 - [info] The oldest binary log file/position on all slaves is host_1.000051:11918 Tue Nov 7 17:11:33 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:2-3, Tue Nov 7 17:11:33 2017 - [info] Oldest slaves: Tue Nov 7 17:11:33 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 17:11:33 2017 - [info] GTID ON Tue Nov 7 17:11:33 2017 - [info] Replicating from host_1( host_1:3306) Tue Nov 7 17:11:33 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] * Phase 3.3: Determining New Master Phase.. Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] Searching new master from slaves.. Tue Nov 7 17:11:33 2017 - [info] Candidate masters from the configuration file: Tue Nov 7 17:11:33 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 17:11:33 2017 - [info] GTID ON Tue Nov 7 17:11:33 2017 - [info] Replicating from host_1( host_1:3306) Tue Nov 7 17:11:33 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 7 17:11:33 2017 - [info] Non-candidate masters: Tue Nov 7 17:11:33 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 17:11:33 2017 - [info] GTID ON Tue Nov 7 17:11:33 2017 - [info] Replicating from host_1( host_1:3306) Tue Nov 7 17:11:33 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 7 17:11:33 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Tue Nov 7 17:11:33 2017 - [info] New master is host_2( host_2:3306) Tue Nov 7 17:11:33 2017 - [info] Starting master failover.. Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] * Phase 3.3: New Master Recovery Phase.. Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] Waiting all logs to be applied.. Tue Nov 7 17:11:33 2017 - [info] done. Tue Nov 7 17:11:33 2017 - [info] -- Saving binlog from host host_2 started, pid: 54677 Tue Nov 7 17:11:33 2017 - [info] -- Saving binlog from host host_1 started, pid: 54681 Tue Nov 7 17:11:33 2017 - [info] -- Saving binlog from host host_3 started, pid: 54683 Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] Log messages from host_3 ... Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] Fetching binary logs from binlog server host_3.. Tue Nov 7 17:11:33 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000051 --start_pos=13508 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171107171129.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Tue Nov 7 17:11:33 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Tue Nov 7 17:11:33 2017 - [info] End of log messages from host_3. Tue Nov 7 17:11:33 2017 - [warning] Got error from host_3. Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] Log messages from host_2 ... Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] Fetching binary logs from binlog server host_2.. Tue Nov 7 17:11:33 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000051 --start_pos=13508 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171107171129.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Tue Nov 7 17:11:33 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Tue Nov 7 17:11:33 2017 - [info] End of log messages from host_2. Tue Nov 7 17:11:33 2017 - [warning] Got error from host_2. Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] Log messages from host_1 ... Tue Nov 7 17:11:33 2017 - [info] Tue Nov 7 17:11:33 2017 - [info] Fetching binary logs from binlog server host_1.. Tue Nov 7 17:11:33 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000051 --start_pos=13508 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171107171129.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Tue Nov 7 17:11:33 2017 - [info] scp from root@ host_1:/var/log/masterha/mha_test/saved_binlog_binlog2_20171107171129.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_1_binlog2_20171107171129.binlog succeeded. Tue Nov 7 17:11:33 2017 - [info] End of log messages from host_1. Tue Nov 7 17:11:33 2017 - [info] Saved mysqlbinlog size from host_1 is 8578 bytes. Tue Nov 7 17:11:33 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_1_binlog2_20171107171129.binlog .. Tue Nov 7 17:11:33 2017 - [info] Differential log apply from binlog server succeeded. Tue Nov 7 17:11:33 2017 - [info] Getting new master's binlog name and position.. Tue Nov 7 17:11:33 2017 - [info] host_2.000001:5048 Tue Nov 7 17:11:33 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_2', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Nov 7 17:11:33 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_2.000001, 5048, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16, Tue Nov 7 17:11:33 2017 - [info] Executing master IP activate script: Tue Nov 7 17:11:33 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --new_master_host= host_2 --new_master_ip= host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Tue Nov 7 17:11:36 2017 - [info] OK. Tue Nov 7 17:11:36 2017 - [info] Setting read_only=0 on host_2( host_2:3306).. Tue Nov 7 17:11:36 2017 - [info] ok. Tue Nov 7 17:11:36 2017 - [info] ** Finished master recovery successfully. Tue Nov 7 17:11:36 2017 - [info] * Phase 3: Master Recovery Phase completed. Tue Nov 7 17:11:36 2017 - [info] Tue Nov 7 17:11:36 2017 - [info] * Phase 4: Slaves Recovery Phase.. Tue Nov 7 17:11:36 2017 - [info] Tue Nov 7 17:11:36 2017 - [info] Tue Nov 7 17:11:36 2017 - [info] * Phase 4.1: Starting Slaves in parallel.. Tue Nov 7 17:11:36 2017 - [info] Tue Nov 7 17:11:36 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 58422. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171107171129.log if it takes time.. Tue Nov 7 17:11:37 2017 - [info] Tue Nov 7 17:11:37 2017 - [info] Log messages from host_3 ... Tue Nov 7 17:11:37 2017 - [info] Tue Nov 7 17:11:36 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_2( host_2:3306).. Tue Nov 7 17:11:36 2017 - [info] Executed CHANGE MASTER. Tue Nov 7 17:11:37 2017 - [info] Slave started. Tue Nov 7 17:11:37 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-16, Tue Nov 7 17:11:37 2017 - [info] End of log messages from host_3. Tue Nov 7 17:11:37 2017 - [info] -- Slave on host host_3( host_3:3306) started. Tue Nov 7 17:11:37 2017 - [info] All new slave servers recovered successfully. Tue Nov 7 17:11:37 2017 - [info] Tue Nov 7 17:11:37 2017 - [info] * Phase 5: New master cleanup phase.. Tue Nov 7 17:11:37 2017 - [info] Tue Nov 7 17:11:37 2017 - [info] Resetting slave info on the new master.. Tue Nov 7 17:11:37 2017 - [info] host_2: Resetting slave info succeeded. Tue Nov 7 17:11:37 2017 - [info] Master failover to host_2( host_2:3306) completed successfully. Tue Nov 7 17:11:37 2017 - [info] Tue Nov 7 17:11:37 2017 - [info] Sending mail.. 1.3.2 当master的所有日志已经传递slave,这时候master 上的MySQL挂了 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Tue Nov 7 15:56:11 2017 - [info] MHA::MasterFailover version 0.56. Tue Nov 7 15:56:11 2017 - [info] Starting master failover. Tue Nov 7 15:56:11 2017 - [info] Tue Nov 7 15:56:11 2017 - [info] * Phase 1: Configuration Check Phase.. Tue Nov 7 15:56:11 2017 - [info] Tue Nov 7 15:56:11 2017 - [info] HealthCheck: SSH to host_2 is reachable. Tue Nov 7 15:56:12 2017 - [info] Binlog server host_2 is reachable. Tue Nov 7 15:56:12 2017 - [info] HealthCheck: SSH to host_1 is reachable. Tue Nov 7 15:56:12 2017 - [info] Binlog server host_1 is reachable. Tue Nov 7 15:56:12 2017 - [info] HealthCheck: SSH to host_3 is reachable. Tue Nov 7 15:56:13 2017 - [info] Binlog server host_3 is reachable. Tue Nov 7 15:56:13 2017 - [warning] SQL Thread is stopped(no error) on host_1( host_1:3306) Tue Nov 7 15:56:13 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306) Tue Nov 7 15:56:13 2017 - [info] GTID failover mode = 1 Tue Nov 7 15:56:13 2017 - [info] Dead Servers: Tue Nov 7 15:56:13 2017 - [info] host_2( host_2:3306) Tue Nov 7 15:56:13 2017 - [info] Checking master reachability via MySQL(double check)... Tue Nov 7 15:56:13 2017 - [info] ok. Tue Nov 7 15:56:13 2017 - [info] Alive Servers: Tue Nov 7 15:56:13 2017 - [info] host_1( host_1:3306) Tue Nov 7 15:56:13 2017 - [info] host_3( host_3:3306) Tue Nov 7 15:56:13 2017 - [info] Alive Slaves: Tue Nov 7 15:56:13 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 15:56:13 2017 - [info] GTID ON Tue Nov 7 15:56:13 2017 - [info] Replicating from host_2( host_2:3306) Tue Nov 7 15:56:13 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 7 15:56:13 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 15:56:13 2017 - [info] GTID ON Tue Nov 7 15:56:13 2017 - [info] Replicating from host_2( host_2:3306) Tue Nov 7 15:56:13 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 7 15:56:13 2017 - [info] Starting SQL thread on host_1( host_1:3306) .. Tue Nov 7 15:56:13 2017 - [info] done. Tue Nov 7 15:56:13 2017 - [info] Starting SQL thread on host_3( host_3:3306) .. Tue Nov 7 15:56:13 2017 - [info] done. Tue Nov 7 15:56:13 2017 - [info] Starting GTID based failover. Tue Nov 7 15:56:13 2017 - [info] Tue Nov 7 15:56:13 2017 - [info] ** Phase 1: Configuration Check Phase completed. Tue Nov 7 15:56:13 2017 - [info] Tue Nov 7 15:56:13 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Tue Nov 7 15:56:13 2017 - [info] Tue Nov 7 15:56:13 2017 - [info] HealthCheck: SSH to host_2 is reachable. Tue Nov 7 15:56:13 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Tue Nov 7 15:56:13 2017 - [info] Executing master IP deactivation script: Tue Nov 7 15:56:13 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --command=stopssh --ssh_user=root Tue Nov 7 15:56:16 2017 - [info] done. Tue Nov 7 15:56:16 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Tue Nov 7 15:56:16 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Tue Nov 7 15:56:16 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] * Phase 3: Master Recovery Phase.. Tue Nov 7 15:56:16 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Tue Nov 7 15:56:16 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] The latest binary log file/position on all slaves is host_2.000049:11291 Tue Nov 7 15:56:16 2017 - [info] Retrieved Gtid Set: ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:3-446352 Tue Nov 7 15:56:16 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Tue Nov 7 15:56:16 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 15:56:16 2017 - [info] GTID ON Tue Nov 7 15:56:16 2017 - [info] Replicating from host_2( host_2:3306) Tue Nov 7 15:56:16 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 7 15:56:16 2017 - [info] The oldest binary log file/position on all slaves is host_2.000049:10703 Tue Nov 7 15:56:16 2017 - [info] Retrieved Gtid Set: ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:3-446350 Tue Nov 7 15:56:16 2017 - [info] Oldest slaves: Tue Nov 7 15:56:16 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 15:56:16 2017 - [info] GTID ON Tue Nov 7 15:56:16 2017 - [info] Replicating from host_2( host_2:3306) Tue Nov 7 15:56:16 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 7 15:56:16 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] * Phase 3.3: Determining New Master Phase.. Tue Nov 7 15:56:16 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] Searching new master from slaves.. Tue Nov 7 15:56:16 2017 - [info] Candidate masters from the configuration file: Tue Nov 7 15:56:16 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 15:56:16 2017 - [info] GTID ON Tue Nov 7 15:56:16 2017 - [info] Replicating from host_2( host_2:3306) Tue Nov 7 15:56:16 2017 - [info] Primary candidate for the new Master (candidate_master is set) Tue Nov 7 15:56:16 2017 - [info] Non-candidate masters: Tue Nov 7 15:56:16 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Tue Nov 7 15:56:16 2017 - [info] GTID ON Tue Nov 7 15:56:16 2017 - [info] Replicating from host_2( host_2:3306) Tue Nov 7 15:56:16 2017 - [info] Not candidate for the new Master (no_master is set) Tue Nov 7 15:56:16 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Tue Nov 7 15:56:16 2017 - [info] New master is host_1( host_1:3306) Tue Nov 7 15:56:16 2017 - [info] Starting master failover.. Tue Nov 7 15:56:16 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] * Phase 3.3: New Master Recovery Phase.. Tue Nov 7 15:56:16 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] Waiting all logs to be applied.. Tue Nov 7 15:56:16 2017 - [info] done. Tue Nov 7 15:56:16 2017 - [info] -- Saving binlog from host host_2 started, pid: 79759 Tue Nov 7 15:56:16 2017 - [info] -- Saving binlog from host host_1 started, pid: 79768 Tue Nov 7 15:56:16 2017 - [info] -- Saving binlog from host host_3 started, pid: 79770 Tue Nov 7 15:56:17 2017 - [info] Tue Nov 7 15:56:17 2017 - [info] Log messages from host_1 ... Tue Nov 7 15:56:17 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_1.. Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Tue Nov 7 15:56:17 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_1. Tue Nov 7 15:56:17 2017 - [warning] Got error from host_1. Tue Nov 7 15:56:17 2017 - [info] Tue Nov 7 15:56:17 2017 - [info] Log messages from host_3 ... Tue Nov 7 15:56:17 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_3.. Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Tue Nov 7 15:56:17 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_3. Tue Nov 7 15:56:17 2017 - [warning] Got error from host_3. Tue Nov 7 15:56:17 2017 - [info] Tue Nov 7 15:56:17 2017 - [info] Log messages from host_2 ... Tue Nov 7 15:56:17 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_2.. Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Tue Nov 7 15:56:17 2017 - [info] scp from root@ host_2:/var/log/masterha/mha_test/saved_binlog_binlog1_20171107155611.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171107155611.binlog succeeded. Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_2. Tue Nov 7 15:56:17 2017 - [info] Saved mysqlbinlog size from host_2 is 768 bytes. Tue Nov 7 15:56:17 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171107155611.binlog .. Tue Nov 7 15:56:17 2017 - [info] Differential log apply from binlog server succeeded. Tue Nov 7 15:56:17 2017 - [info] Getting new master's binlog name and position.. Tue Nov 7 15:56:17 2017 - [info] host_1.000051:11449 Tue Nov 7 15:56:17 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_1', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Tue Nov 7 15:56:17 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_1.000051, 11449, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1, Tue Nov 7 15:56:17 2017 - [info] Executing master IP activate script: Tue Nov 7 15:56:17 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --new_master_host= host_1 --new_master_ip= host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Tue Nov 7 15:56:20 2017 - [info] OK. Tue Nov 7 15:56:20 2017 - [info] Setting read_only=0 on host_1( host_1:3306).. Tue Nov 7 15:56:20 2017 - [info] ok. Tue Nov 7 15:56:20 2017 - [info] ** Finished master recovery successfully. Tue Nov 7 15:56:20 2017 - [info] * Phase 3: Master Recovery Phase completed. Tue Nov 7 15:56:20 2017 - [info] Tue Nov 7 15:56:20 2017 - [info] * Phase 4: Slaves Recovery Phase.. Tue Nov 7 15:56:20 2017 - [info] Tue Nov 7 15:56:20 2017 - [info] Tue Nov 7 15:56:20 2017 - [info] * Phase 4.1: Starting Slaves in parallel.. Tue Nov 7 15:56:20 2017 - [info] Tue Nov 7 15:56:20 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 85941. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171107155611.log if it takes time.. Tue Nov 7 15:56:21 2017 - [info] Tue Nov 7 15:56:21 2017 - [info] Log messages from host_3 ... Tue Nov 7 15:56:21 2017 - [info] Tue Nov 7 15:56:20 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_1( host_1:3306).. Tue Nov 7 15:56:20 2017 - [info] Executed CHANGE MASTER. Tue Nov 7 15:56:21 2017 - [info] Slave started. Tue Nov 7 15:56:21 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1, Tue Nov 7 15:56:21 2017 - [info] End of log messages from host_3. Tue Nov 7 15:56:21 2017 - [info] -- Slave on host host_3( host_3:3306) started. Tue Nov 7 15:56:21 2017 - [info] All new slave servers recovered successfully. Tue Nov 7 15:56:21 2017 - [info] Tue Nov 7 15:56:21 2017 - [info] * Phase 5: New master cleanup phase.. Tue Nov 7 15:56:21 2017 - [info] Tue Nov 7 15:56:21 2017 - [info] Resetting slave info on the new master.. Tue Nov 7 15:56:21 2017 - [info] host_1: Resetting slave info succeeded. Tue Nov 7 15:56:21 2017 - [info] Master failover to host_1( host_1:3306) completed successfully. Tue Nov 7 15:56:21 2017 - [info] Tue Nov 7 15:56:21 2017 - [info] Sending mail.. 1.4 slave(候选master)上面有大事务在跑 1000s的大查询 无影响,正常切换 flush tables with readlock dba:(none)> show processlist; +----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+ | 63 | dba | localhost | NULL | Query | 0 | starting | show processlist | | 65 | dba | xx:11164 | NULL | Sleep | 121 | | NULL | | 83 | dba | new master:49022 | NULL | Query | 176 | Waiting for global read lock | BINLOG ' GpAKWhNYUy1LMAAAAGYHAAAAAG0AAAAAAAEAAmxjAAh0X2NoYXJfMgACAw8CLAEC GpAKWh5YUy1LJwAAAI0HAAAAAG | +----+------+----------------------+------+---------+------+------------------------------+------------------------------------------------------------------------------------------------------+ 3 rows in set (0.00 sec) 1.5 binlog server 不同场景的测试 dead_master上的最后部分日志没有传递到slave和etl的情况, 然而slave的日志也落后etl (这是最严苛的情况) binlog server 写3台 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Tue Nov 7 15:56:17 2017 - [info] Log messages from host_1 ... Tue Nov 7 15:56:17 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_1.. Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Tue Nov 7 15:56:17 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_1. Tue Nov 7 15:56:17 2017 - [warning] Got error from host_1. Tue Nov 7 15:56:17 2017 - [info] Tue Nov 7 15:56:17 2017 - [info] Log messages from host_3 ... Tue Nov 7 15:56:17 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_3.. Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Tue Nov 7 15:56:17 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_3. Tue Nov 7 15:56:17 2017 - [warning] Got error from host_3. Tue Nov 7 15:56:17 2017 - [info] Tue Nov 7 15:56:17 2017 - [info] Log messages from host_2 ... Tue Nov 7 15:56:17 2017 - [info] Tue Nov 7 15:56:16 2017 - [info] Fetching binary logs from binlog server host_2.. Tue Nov 7 15:56:16 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000049 --start_pos=11291 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171107155611.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Tue Nov 7 15:56:17 2017 - [info] scp from root@ host_2:/var/log/masterha/mha_test/saved_binlog_binlog1_20171107155611.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171107155611.binlog succeeded. Tue Nov 7 15:56:17 2017 - [info] End of log messages from host_2. Tue Nov 7 15:56:17 2017 - [info] Saved mysqlbinlog size from host_2 is 768 bytes. Tue Nov 7 15:56:17 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171107155611.binlog .. Tue Nov 7 15:56:17 2017 - [info] Differential log apply from binlog server succeeded. binlog server 只写master masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Thu Nov 9 11:20:04 2017 - [info] -- Saving binlog from host host_2 started, pid: 117389 Thu Nov 9 11:20:05 2017 - [info] Thu Nov 9 11:20:05 2017 - [info] Log messages from host_2 ... Thu Nov 9 11:20:05 2017 - [info] Thu Nov 9 11:20:04 2017 - [info] Fetching binary logs from binlog server host_2.. Thu Nov 9 11:20:04 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000004 --start_pos=1115 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171109111957.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Thu Nov 9 11:20:05 2017 - [info] scp from root@ host_2:/var/log/masterha/mha_test/saved_binlog_binlog1_20171109111957.binlog to local:/var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171109111957.binlog succeeded. Thu Nov 9 11:20:05 2017 - [info] End of log messages from host_2. Thu Nov 9 11:20:05 2017 - [info] Saved mysqlbinlog size from host_2 is 4444 bytes. Thu Nov 9 11:20:05 2017 - [info] Applying differential binlog /var/log/masterha/mha_test/saved_binlog_ host_2_binlog1_20171109111957.binlog .. Thu Nov 9 11:20:05 2017 - [info] Differential log apply from binlog server succeeded. binlog server 只写slave ### 3台服务器的GTID状态 * master host_1 dba:lc> show master status; +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | host_1.000055 | 6016 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446369 | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) * slave host_2 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-21, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446369 Auto_Position: 1 * etl host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:22-25, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446366-446369 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-25, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446369 Auto_Position: 1 ### 切换日志 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Thu Nov 9 15:00:09 2017 - [info] MHA::MasterFailover version 0.56. Thu Nov 9 15:00:09 2017 - [info] Starting master failover. Thu Nov 9 15:00:09 2017 - [info] Thu Nov 9 15:00:09 2017 - [info] * Phase 1: Configuration Check Phase.. Thu Nov 9 15:00:09 2017 - [info] Thu Nov 9 15:00:09 2017 - [info] HealthCheck: SSH to host_2 is reachable. Thu Nov 9 15:00:09 2017 - [info] Binlog server host_2 is reachable. Thu Nov 9 15:00:10 2017 - [warning] SQL Thread is stopped(no error) on host_2( host_2:3306) Thu Nov 9 15:00:10 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306) Thu Nov 9 15:00:10 2017 - [info] GTID failover mode = 1 Thu Nov 9 15:00:10 2017 - [info] Dead Servers: Thu Nov 9 15:00:10 2017 - [info] host_1( host_1:3306) Thu Nov 9 15:00:10 2017 - [info] Checking master reachability via MySQL(double check)... Thu Nov 9 15:00:10 2017 - [info] ok. Thu Nov 9 15:00:10 2017 - [info] Alive Servers: Thu Nov 9 15:00:10 2017 - [info] host_2( host_2:3306) Thu Nov 9 15:00:10 2017 - [info] host_3( host_3:3306) Thu Nov 9 15:00:10 2017 - [info] Alive Slaves: Thu Nov 9 15:00:10 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 15:00:10 2017 - [info] GTID ON Thu Nov 9 15:00:10 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 15:00:10 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 15:00:10 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 15:00:10 2017 - [info] GTID ON Thu Nov 9 15:00:10 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 15:00:10 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 15:00:10 2017 - [info] Starting SQL thread on host_2( host_2:3306) .. Thu Nov 9 15:00:10 2017 - [info] done. Thu Nov 9 15:00:10 2017 - [info] Starting SQL thread on host_3( host_3:3306) .. Thu Nov 9 15:00:10 2017 - [info] done. Thu Nov 9 15:00:10 2017 - [info] Starting GTID based failover. Thu Nov 9 15:00:10 2017 - [info] Thu Nov 9 15:00:10 2017 - [info] ** Phase 1: Configuration Check Phase completed. Thu Nov 9 15:00:10 2017 - [info] Thu Nov 9 15:00:10 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Thu Nov 9 15:00:10 2017 - [info] Thu Nov 9 15:00:10 2017 - [info] HealthCheck: SSH to host_1 is reachable. Thu Nov 9 15:00:10 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Thu Nov 9 15:00:10 2017 - [info] Executing master IP deactivation script: Thu Nov 9 15:00:10 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --command=stopssh --ssh_user=root Thu Nov 9 15:00:17 2017 - [info] done. Thu Nov 9 15:00:17 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Thu Nov 9 15:00:17 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Thu Nov 9 15:00:17 2017 - [info] Thu Nov 9 15:00:17 2017 - [info] * Phase 3: Master Recovery Phase.. Thu Nov 9 15:00:17 2017 - [info] Thu Nov 9 15:00:17 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Thu Nov 9 15:00:17 2017 - [info] Thu Nov 9 15:00:17 2017 - [info] The latest binary log file/position on all slaves is host_1.000055:4090 Thu Nov 9 15:00:17 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:22-25, Thu Nov 9 15:00:17 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Thu Nov 9 15:00:17 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 15:00:17 2017 - [info] GTID ON Thu Nov 9 15:00:17 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 15:00:17 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 15:00:17 2017 - [info] The oldest binary log file/position on all slaves is host_1.000055:2806 Thu Nov 9 15:00:17 2017 - [info] Oldest slaves: Thu Nov 9 15:00:17 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 15:00:17 2017 - [info] GTID ON Thu Nov 9 15:00:17 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 15:00:17 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 15:00:17 2017 - [info] Thu Nov 9 15:00:17 2017 - [info] * Phase 3.3: Determining New Master Phase.. Thu Nov 9 15:00:17 2017 - [info] Thu Nov 9 15:00:17 2017 - [info] Searching new master from slaves.. Thu Nov 9 15:00:17 2017 - [info] Candidate masters from the configuration file: Thu Nov 9 15:00:17 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 15:00:17 2017 - [info] GTID ON Thu Nov 9 15:00:17 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 15:00:17 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 15:00:17 2017 - [info] Non-candidate masters: Thu Nov 9 15:00:17 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 15:00:17 2017 - [info] GTID ON Thu Nov 9 15:00:17 2017 - [info] Replicating from host_1( host_1:3306) Thu Nov 9 15:00:17 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 15:00:17 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Thu Nov 9 15:00:17 2017 - [info] Not found. Thu Nov 9 15:00:17 2017 - [info] Searching from all candidate_master slaves.. Thu Nov 9 15:00:17 2017 - [info] New master is host_2( host_2:3306) Thu Nov 9 15:00:17 2017 - [info] Starting master failover.. Thu Nov 9 15:00:17 2017 - [info] Thu Nov 9 15:00:17 2017 - [info] Thu Nov 9 15:00:17 2017 - [info] * Phase 3.3: New Master Recovery Phase.. Thu Nov 9 15:00:17 2017 - [info] Thu Nov 9 15:00:17 2017 - [info] Waiting all logs to be applied.. Thu Nov 9 15:00:17 2017 - [info] done. Thu Nov 9 15:00:17 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply.. Thu Nov 9 15:00:17 2017 - [info] Waiting all logs to be applied on the latest slave.. Thu Nov 9 15:00:17 2017 - [info] Resetting slave host_2( host_2:3306) and starting replication from the new master host_3( host_3:3306).. Thu Nov 9 15:00:17 2017 - [info] Executed CHANGE MASTER. Thu Nov 9 15:00:18 2017 - [info] Slave started. Thu Nov 9 15:00:18 2017 - [info] Waiting to execute all relay logs on host_2( host_2:3306).. Thu Nov 9 15:00:18 2017 - [info] master_pos_wait( host_3.000049:25843) completed on host_2( host_2:3306). Executed 0 events. Thu Nov 9 15:00:18 2017 - [info] done. Thu Nov 9 15:00:18 2017 - [info] done. Thu Nov 9 15:00:18 2017 - [info] -- Saving binlog from host host_2 started, pid: 175683 Thu Nov 9 15:00:18 2017 - [info] Thu Nov 9 15:00:18 2017 - [info] Log messages from host_2 ... Thu Nov 9 15:00:18 2017 - [info] Thu Nov 9 15:00:18 2017 - [info] Fetching binary logs from binlog server host_2.. Thu Nov 9 15:00:18 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000055 --start_pos=4090 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171109150009.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Thu Nov 9 15:00:18 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Thu Nov 9 15:00:18 2017 - [info] End of log messages from host_2. Thu Nov 9 15:00:18 2017 - [warning] Got error from host_2. Thu Nov 9 15:00:18 2017 - [info] Getting new master's binlog name and position.. Thu Nov 9 15:00:18 2017 - [info] host_2.000005:1390 Thu Nov 9 15:00:18 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_2', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Thu Nov 9 15:00:18 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_2.000005, 1390, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-25, Thu Nov 9 15:00:18 2017 - [info] Executing master IP activate script: Thu Nov 9 15:00:18 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --new_master_host= host_2 --new_master_ip= host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Thu Nov 9 15:00:22 2017 - [info] OK. Thu Nov 9 15:00:22 2017 - [info] Setting read_only=0 on host_2( host_2:3306).. Thu Nov 9 15:00:22 2017 - [info] ok. Thu Nov 9 15:00:22 2017 - [info] ** Finished master recovery successfully. Thu Nov 9 15:00:22 2017 - [info] * Phase 3: Master Recovery Phase completed. Thu Nov 9 15:00:22 2017 - [info] Thu Nov 9 15:00:22 2017 - [info] * Phase 4: Slaves Recovery Phase.. Thu Nov 9 15:00:22 2017 - [info] Thu Nov 9 15:00:22 2017 - [info] Thu Nov 9 15:00:22 2017 - [info] * Phase 4.1: Starting Slaves in parallel.. Thu Nov 9 15:00:22 2017 - [info] Thu Nov 9 15:00:22 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 180681. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171109150009.log if it takes time.. Thu Nov 9 15:00:23 2017 - [info] Thu Nov 9 15:00:23 2017 - [info] Log messages from host_3 ... Thu Nov 9 15:00:23 2017 - [info] Thu Nov 9 15:00:22 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_2( host_2:3306).. Thu Nov 9 15:00:22 2017 - [info] Executed CHANGE MASTER. Thu Nov 9 15:00:23 2017 - [info] Slave started. Thu Nov 9 15:00:23 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-25, Thu Nov 9 15:00:23 2017 - [info] End of log messages from host_3. Thu Nov 9 15:00:23 2017 - [info] -- Slave on host host_3( host_3:3306) started. Thu Nov 9 15:00:23 2017 - [info] All new slave servers recovered successfully. Thu Nov 9 15:00:23 2017 - [info] Thu Nov 9 15:00:23 2017 - [info] * Phase 5: New master cleanup phase.. Thu Nov 9 15:00:23 2017 - [info] Thu Nov 9 15:00:23 2017 - [info] Resetting slave info on the new master.. Thu Nov 9 15:00:23 2017 - [info] host_2: Resetting slave info succeeded. Thu Nov 9 15:00:23 2017 - [info] Master failover to host_2( host_2:3306) completed successfully. Thu Nov 9 15:00:23 2017 - [info] Thu Nov 9 15:00:23 2017 - [info] Sending mail.. 结论: 由于binlog server没有配置master,所以会丢失master没有传递过来的事务日志好在,slave和etl之间会互相change master,所以尽管slave(candidate master)的日志落后,最终也还是用etl的日志补齐了slave缺失的日志。 binlog server 啥都不写 ### 3台DB的GTID状态 * master host_2 dba:lc> show master status; +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | host_2.000005 | 5785 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446378 | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) * slave host_1 Retrieved_Gtid_Set: Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446369 Auto_Position: 1 * etl host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:26-31, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446370-446372 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446372 Auto_Position: 1 ### 切换日志 Thu Nov 9 16:22:41 2017 - [info] MHA::MasterFailover version 0.56. Thu Nov 9 16:22:41 2017 - [info] Starting master failover. Thu Nov 9 16:22:41 2017 - [info] Thu Nov 9 16:22:41 2017 - [info] * Phase 1: Configuration Check Phase.. Thu Nov 9 16:22:41 2017 - [info] Thu Nov 9 16:22:41 2017 - [warning] SQL Thread is stopped(no error) on host_1( host_1:3306) Thu Nov 9 16:22:41 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306) Thu Nov 9 16:22:41 2017 - [info] GTID failover mode = 1 Thu Nov 9 16:22:41 2017 - [info] Dead Servers: Thu Nov 9 16:22:41 2017 - [info] host_2( host_2:3306) Thu Nov 9 16:22:41 2017 - [info] Checking master reachability via MySQL(double check)... Thu Nov 9 16:22:41 2017 - [info] ok. Thu Nov 9 16:22:41 2017 - [info] Alive Servers: Thu Nov 9 16:22:41 2017 - [info] host_1( host_1:3306) Thu Nov 9 16:22:41 2017 - [info] host_3( host_3:3306) Thu Nov 9 16:22:41 2017 - [info] Alive Slaves: Thu Nov 9 16:22:41 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 16:22:41 2017 - [info] GTID ON Thu Nov 9 16:22:41 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 16:22:41 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 16:22:41 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 16:22:41 2017 - [info] GTID ON Thu Nov 9 16:22:41 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 16:22:41 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 16:22:41 2017 - [info] Starting SQL thread on host_1( host_1:3306) .. Thu Nov 9 16:22:41 2017 - [info] done. Thu Nov 9 16:22:41 2017 - [info] Starting SQL thread on host_3( host_3:3306) .. Thu Nov 9 16:22:41 2017 - [info] done. Thu Nov 9 16:22:41 2017 - [info] Starting GTID based failover. Thu Nov 9 16:22:41 2017 - [info] Thu Nov 9 16:22:41 2017 - [info] ** Phase 1: Configuration Check Phase completed. Thu Nov 9 16:22:41 2017 - [info] Thu Nov 9 16:22:41 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Thu Nov 9 16:22:41 2017 - [info] Thu Nov 9 16:22:42 2017 - [info] HealthCheck: SSH to host_2 is reachable. Thu Nov 9 16:22:42 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Thu Nov 9 16:22:42 2017 - [info] Executing master IP deactivation script: Thu Nov 9 16:22:42 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --command=stopssh --ssh_user=root =================== swift vip : tgw_vip from host_2 is deleted ============================== --2017-11-09 16:22:42-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 9.79M=0s 2017-11-09 16:22:44 (9.79 MB/s) - 已写入标准输出 [38] Thu Nov 9 16:22:44 2017 - [info] done. Thu Nov 9 16:22:44 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Thu Nov 9 16:22:44 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Thu Nov 9 16:22:44 2017 - [info] Thu Nov 9 16:22:44 2017 - [info] * Phase 3: Master Recovery Phase.. Thu Nov 9 16:22:44 2017 - [info] Thu Nov 9 16:22:44 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Thu Nov 9 16:22:44 2017 - [info] Thu Nov 9 16:22:44 2017 - [info] The latest binary log file/position on all slaves is host_2.000005:4015 Thu Nov 9 16:22:44 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:26-31, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446370-446372 Thu Nov 9 16:22:44 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Thu Nov 9 16:22:44 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 16:22:44 2017 - [info] GTID ON Thu Nov 9 16:22:44 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 16:22:44 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 16:22:44 2017 - [info] The oldest binary log file/position on all slaves is host_2.000005:3130 Thu Nov 9 16:22:44 2017 - [info] Oldest slaves: Thu Nov 9 16:22:44 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 16:22:44 2017 - [info] GTID ON Thu Nov 9 16:22:44 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 16:22:44 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 16:22:44 2017 - [info] Thu Nov 9 16:22:44 2017 - [info] * Phase 3.3: Determining New Master Phase.. Thu Nov 9 16:22:44 2017 - [info] Thu Nov 9 16:22:44 2017 - [info] Searching new master from slaves.. Thu Nov 9 16:22:44 2017 - [info] Candidate masters from the configuration file: Thu Nov 9 16:22:44 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 16:22:44 2017 - [info] GTID ON Thu Nov 9 16:22:44 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 16:22:44 2017 - [info] Primary candidate for the new Master (candidate_master is set) Thu Nov 9 16:22:44 2017 - [info] Non-candidate masters: Thu Nov 9 16:22:44 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Thu Nov 9 16:22:44 2017 - [info] GTID ON Thu Nov 9 16:22:44 2017 - [info] Replicating from host_2( host_2:3306) Thu Nov 9 16:22:44 2017 - [info] Not candidate for the new Master (no_master is set) Thu Nov 9 16:22:44 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Thu Nov 9 16:22:44 2017 - [info] Not found. Thu Nov 9 16:22:44 2017 - [info] Searching from all candidate_master slaves.. Thu Nov 9 16:22:44 2017 - [info] New master is host_1( host_1:3306) Thu Nov 9 16:22:44 2017 - [info] Starting master failover.. Thu Nov 9 16:22:44 2017 - [info] From: host_2( host_2:3306) (current master) +-- host_1( host_1:3306) +-- host_3( host_3:3306) To: host_1( host_1:3306) (new master) +-- host_3( host_3:3306) Thu Nov 9 16:22:44 2017 - [info] Thu Nov 9 16:22:44 2017 - [info] * Phase 3.3: New Master Recovery Phase.. Thu Nov 9 16:22:44 2017 - [info] Thu Nov 9 16:22:44 2017 - [info] Waiting all logs to be applied.. Thu Nov 9 16:22:44 2017 - [info] done. Thu Nov 9 16:22:44 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply.. Thu Nov 9 16:22:44 2017 - [info] Waiting all logs to be applied on the latest slave.. Thu Nov 9 16:22:44 2017 - [info] Resetting slave host_1( host_1:3306) and starting replication from the new master host_3( host_3:3306).. Thu Nov 9 16:22:44 2017 - [info] Executed CHANGE MASTER. Thu Nov 9 16:22:45 2017 - [info] Slave started. Thu Nov 9 16:22:45 2017 - [info] Waiting to execute all relay logs on host_1( host_1:3306).. Thu Nov 9 16:22:45 2017 - [info] master_pos_wait( host_3.000049:28663) completed on host_1( host_1:3306). Executed 0 events. Thu Nov 9 16:22:45 2017 - [info] done. Thu Nov 9 16:22:45 2017 - [info] done. Thu Nov 9 16:22:45 2017 - [info] Getting new master's binlog name and position.. Thu Nov 9 16:22:45 2017 - [info] host_1.000056:1170 Thu Nov 9 16:22:45 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_1', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Thu Nov 9 16:22:45 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_1.000056, 1170, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446372 Thu Nov 9 16:22:45 2017 - [info] Executing master IP activate script: Thu Nov 9 16:22:45 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --new_master_host= host_1 --new_master_ip= host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_1 is added ============================== Thu Nov 9 16:22:47 2017 - [info] OK. Thu Nov 9 16:22:47 2017 - [info] Setting read_only=0 on host_1( host_1:3306).. Thu Nov 9 16:22:47 2017 - [info] ok. Thu Nov 9 16:22:47 2017 - [info] ** Finished master recovery successfully. Thu Nov 9 16:22:47 2017 - [info] * Phase 3: Master Recovery Phase completed. Thu Nov 9 16:22:47 2017 - [info] Thu Nov 9 16:22:47 2017 - [info] * Phase 4: Slaves Recovery Phase.. Thu Nov 9 16:22:47 2017 - [info] Thu Nov 9 16:22:47 2017 - [info] Thu Nov 9 16:22:47 2017 - [info] * Phase 4.1: Starting Slaves in parallel.. Thu Nov 9 16:22:47 2017 - [info] Thu Nov 9 16:22:47 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 112317. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171109162241.log if it takes time.. Thu Nov 9 16:22:48 2017 - [info] Thu Nov 9 16:22:48 2017 - [info] Log messages from host_3 ... Thu Nov 9 16:22:48 2017 - [info] Thu Nov 9 16:22:47 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_1( host_1:3306).. Thu Nov 9 16:22:47 2017 - [info] Executed CHANGE MASTER. Thu Nov 9 16:22:48 2017 - [info] Slave started. Thu Nov 9 16:22:48 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-31, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446372) completed on host_3( host_3:3306). Executed 0 events. Thu Nov 9 16:22:48 2017 - [info] End of log messages from host_3. Thu Nov 9 16:22:48 2017 - [info] -- Slave on host host_3( host_3:3306) started. Thu Nov 9 16:22:48 2017 - [info] All new slave servers recovered successfully. Thu Nov 9 16:22:48 2017 - [info] Thu Nov 9 16:22:48 2017 - [info] * Phase 5: New master cleanup phase.. Thu Nov 9 16:22:48 2017 - [info] Thu Nov 9 16:22:48 2017 - [info] Resetting slave info on the new master.. Thu Nov 9 16:22:49 2017 - [info] host_1: Resetting slave info succeeded. Thu Nov 9 16:22:49 2017 - [info] Master failover to host_1( host_1:3306) completed successfully. Thu Nov 9 16:22:49 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_2( host_2:3306) to host_1( host_1:3306) succeeded Master host_2( host_2:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_2( host_2:3306) Selected host_1( host_1:3306) as a new master. host_1( host_1:3306): OK: Applying all logs succeeded. host_1( host_1:3306): OK: Activated master IP address. host_3( host_3:3306): OK: Slave started, replicating from host_1( host_1:3306) host_1( host_1:3306): Resetting slave info succeeded. Master failover to host_1( host_1:3306) completed successfully. 结论: 由于binlog server没有配置master,所以会丢失master没有传递过来的事务日志好在,slave和etl之间会互相change master,所以尽管slave(candidate master)的日志落后,最终也还是用etl的日志补齐了slave缺失的日志。 1.6 如果MHA过程中失败,是否可以重新执行MHA的failover呢? 99%的场景都是可以重新执行的 1%的场景不能再次执行,执行会报错 一般这种场景就是:已经failover到最后的change master阶段,这样主从结构已经变更,MHA无法重新走一遍。不过,即便到这步骤失败了,表示master的日志已经补完,由于是gtid模式,自己再让slave change master到最新的master即可,最后ACTIVE new ip和readonly=1就好了 Thu Nov 9 16:49:39 2017 - [info] MHA::MasterFailover version 0.56. Thu Nov 9 16:49:39 2017 - [info] Starting master failover. Thu Nov 9 16:49:39 2017 - [info] Thu Nov 9 16:49:39 2017 - [info] * Phase 1: Configuration Check Phase.. Thu Nov 9 16:49:39 2017 - [info] Thu Nov 9 16:49:39 2017 - [info] GTID failover mode = 1 Thu Nov 9 16:49:39 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln169] Detected dead master host_1( host_1:3306) does not match with specified dead master host_2( host_2:3306)! Thu Nov 9 16:49:39 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line 53 1.7 Master:MySQL down小结 1. failover最终命令 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error 2. binlog server建议 配置master就可以了 [binlog1] $master_ip 只配置slave,或者没有配置,会导致丢失部分没有从master传递过来的日志事务 二、Master : Server down 2.1 etl 延迟8小时 同1.1 结论 2.2 slave(候选master)比etl还要落后更多 2.2.1 当master的部分日志还没传递两个slave,这时候master server挂了 ### 3台DB的GTID状态 * master host_2 dba:lc> show master status; +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | host_2.000008 | 5445 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) * slave host_1 Retrieved_Gtid_Set: Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446385 Auto_Position: 1 * etl host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:46-50, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446386-446388 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446388 Auto_Position: 1 ### 模拟故障场景 * 隔离master的网络,让其等同于down机 master> iptables -A INPUT -p tcp -s other_host --dport 22 -j ACCEPT master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP ### 切换日志 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_2 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Fri Nov 10 11:12:38 2017 - [info] MHA::MasterFailover version 0.56. Fri Nov 10 11:12:38 2017 - [info] Starting master failover. Fri Nov 10 11:12:38 2017 - [info] Fri Nov 10 11:12:38 2017 - [info] * Phase 1: Configuration Check Phase.. Fri Nov 10 11:12:38 2017 - [info] Fri Nov 10 11:13:28 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_2! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. Fri Nov 10 11:13:28 2017 - [warning] Failed to SSH to binlog server host_2 Fri Nov 10 11:13:29 2017 - [info] HealthCheck: SSH to host_1 is reachable. Fri Nov 10 11:13:29 2017 - [info] Binlog server host_1 is reachable. Fri Nov 10 11:13:29 2017 - [info] HealthCheck: SSH to host_3 is reachable. Fri Nov 10 11:13:29 2017 - [info] Binlog server host_3 is reachable. Fri Nov 10 11:13:29 2017 - [warning] SQL Thread is stopped(no error) on host_1( host_1:3306) Fri Nov 10 11:13:29 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306) Fri Nov 10 11:13:29 2017 - [info] GTID failover mode = 1 Fri Nov 10 11:13:29 2017 - [info] Dead Servers: Fri Nov 10 11:13:29 2017 - [info] host_2( host_2:3306) Fri Nov 10 11:13:29 2017 - [info] Checking master reachability via MySQL(double check)... Fri Nov 10 11:13:30 2017 - [info] ok. Fri Nov 10 11:13:30 2017 - [info] Alive Servers: Fri Nov 10 11:13:30 2017 - [info] host_1( host_1:3306) Fri Nov 10 11:13:30 2017 - [info] host_3( host_3:3306) Fri Nov 10 11:13:30 2017 - [info] Alive Slaves: Fri Nov 10 11:13:30 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 11:13:30 2017 - [info] GTID ON Fri Nov 10 11:13:30 2017 - [info] Replicating from host_2( host_2:3306) Fri Nov 10 11:13:30 2017 - [info] Primary candidate for the new Master (candidate_master is set) Fri Nov 10 11:13:30 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 11:13:30 2017 - [info] GTID ON Fri Nov 10 11:13:30 2017 - [info] Replicating from host_2( host_2:3306) Fri Nov 10 11:13:30 2017 - [info] Not candidate for the new Master (no_master is set) Fri Nov 10 11:13:30 2017 - [info] Starting SQL thread on host_1( host_1:3306) .. Fri Nov 10 11:13:30 2017 - [info] done. Fri Nov 10 11:13:30 2017 - [info] Starting SQL thread on host_3( host_3:3306) .. Fri Nov 10 11:13:30 2017 - [info] done. Fri Nov 10 11:13:30 2017 - [info] Starting GTID based failover. Fri Nov 10 11:13:30 2017 - [info] Fri Nov 10 11:13:30 2017 - [info] ** Phase 1: Configuration Check Phase completed. Fri Nov 10 11:13:30 2017 - [info] Fri Nov 10 11:13:30 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Fri Nov 10 11:13:30 2017 - [info] Fri Nov 10 11:14:20 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_2! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. Fri Nov 10 11:14:20 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Fri Nov 10 11:14:20 2017 - [info] Executing master IP deactivation script: Fri Nov 10 11:14:20 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --command=stop ssh: connect to host host_2 port 22: Connection timed out =================== swift vip : tgw_vip from host_2 is deleted ============================== --2017-11-10 11:14:27-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 11.4M=0s 2017-11-10 11:16:27 (11.4 MB/s) - 已写入标准输出 [38] Fri Nov 10 11:16:27 2017 - [info] done. Fri Nov 10 11:16:27 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Fri Nov 10 11:16:27 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Fri Nov 10 11:16:27 2017 - [info] Fri Nov 10 11:16:27 2017 - [info] * Phase 3: Master Recovery Phase.. Fri Nov 10 11:16:27 2017 - [info] Fri Nov 10 11:16:27 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Fri Nov 10 11:16:27 2017 - [info] Fri Nov 10 11:16:27 2017 - [info] The latest binary log file/position on all slaves is host_2.000008:4265 Fri Nov 10 11:16:27 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:46-50, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446386-446388 Fri Nov 10 11:16:27 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Fri Nov 10 11:16:27 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 11:16:27 2017 - [info] GTID ON Fri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306) Fri Nov 10 11:16:27 2017 - [info] Not candidate for the new Master (no_master is set) Fri Nov 10 11:16:27 2017 - [info] The oldest binary log file/position on all slaves is host_2.000008:3380 Fri Nov 10 11:16:27 2017 - [info] Oldest slaves: Fri Nov 10 11:16:27 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 11:16:27 2017 - [info] GTID ON Fri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306) Fri Nov 10 11:16:27 2017 - [info] Primary candidate for the new Master (candidate_master is set) Fri Nov 10 11:16:27 2017 - [info] Fri Nov 10 11:16:27 2017 - [info] * Phase 3.3: Determining New Master Phase.. Fri Nov 10 11:16:27 2017 - [info] Fri Nov 10 11:16:27 2017 - [info] Searching new master from slaves.. Fri Nov 10 11:16:27 2017 - [info] Candidate masters from the configuration file: Fri Nov 10 11:16:27 2017 - [info] host_1( host_1:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 11:16:27 2017 - [info] GTID ON Fri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306) Fri Nov 10 11:16:27 2017 - [info] Primary candidate for the new Master (candidate_master is set) Fri Nov 10 11:16:27 2017 - [info] Non-candidate masters: Fri Nov 10 11:16:27 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 11:16:27 2017 - [info] GTID ON Fri Nov 10 11:16:27 2017 - [info] Replicating from host_2( host_2:3306) Fri Nov 10 11:16:27 2017 - [info] Not candidate for the new Master (no_master is set) Fri Nov 10 11:16:27 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Fri Nov 10 11:16:27 2017 - [info] Not found. Fri Nov 10 11:16:27 2017 - [info] Searching from all candidate_master slaves.. Fri Nov 10 11:16:27 2017 - [info] New master is host_1( host_1:3306) Fri Nov 10 11:16:27 2017 - [info] Starting master failover.. Fri Nov 10 11:16:27 2017 - [info] From: host_2( host_2:3306) (current master) +-- host_1( host_1:3306) +-- host_3( host_3:3306) To: host_1( host_1:3306) (new master) +-- host_3( host_3:3306) Fri Nov 10 11:16:27 2017 - [info] Fri Nov 10 11:16:27 2017 - [info] * Phase 3.3: New Master Recovery Phase.. Fri Nov 10 11:16:27 2017 - [info] Fri Nov 10 11:16:27 2017 - [info] Waiting all logs to be applied.. Fri Nov 10 11:16:27 2017 - [info] done. Fri Nov 10 11:16:27 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply.. Fri Nov 10 11:16:27 2017 - [info] Waiting all logs to be applied on the latest slave.. Fri Nov 10 11:16:27 2017 - [info] Resetting slave host_1( host_1:3306) and starting replication from the new master host_3( host_3:3306).. Fri Nov 10 11:16:27 2017 - [info] Executed CHANGE MASTER. Fri Nov 10 11:16:28 2017 - [info] Slave started. Fri Nov 10 11:16:28 2017 - [info] Waiting to execute all relay logs on host_1( host_1:3306).. Fri Nov 10 11:16:28 2017 - [info] master_pos_wait( host_3.000049:40136) completed on host_1( host_1:3306). Executed 0 events. Fri Nov 10 11:16:28 2017 - [info] done. Fri Nov 10 11:16:28 2017 - [info] done. Fri Nov 10 11:16:28 2017 - [info] -- Saving binlog from host host_2 started, pid: 43038 Fri Nov 10 11:16:28 2017 - [info] -- Saving binlog from host host_1 started, pid: 43039 Fri Nov 10 11:16:28 2017 - [info] -- Saving binlog from host host_3 started, pid: 43041 Fri Nov 10 11:16:28 2017 - [info] Fri Nov 10 11:16:28 2017 - [info] Log messages from host_2 ... Fri Nov 10 11:16:28 2017 - [info] End of log messages from host_2. Fri Nov 10 11:16:28 2017 - [warning] SSH is not reachable on host_2. Skipping Fri Nov 10 11:16:28 2017 - [info] Fri Nov 10 11:16:28 2017 - [info] Log messages from host_1 ... Fri Nov 10 11:16:28 2017 - [info] Fri Nov 10 11:16:28 2017 - [info] Fetching binary logs from binlog server host_1.. Fri Nov 10 11:16:28 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000008 --start_pos=4265 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog2_20171110111238.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Failed to save binary log: Binlog not found from /data/mysql.bin! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again. at /usr/bin/save_binary_logs line 123 eval {...} called at /usr/bin/save_binary_logs line 70 main::main() called at /usr/bin/save_binary_logs line 66 Fri Nov 10 11:16:28 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Fri Nov 10 11:16:28 2017 - [info] End of log messages from host_1. Fri Nov 10 11:16:28 2017 - [warning] Got error from host_1. Fri Nov 10 11:16:28 2017 - [info] Fri Nov 10 11:16:28 2017 - [info] Log messages from host_3 ... Fri Nov 10 11:16:28 2017 - [info] Fri Nov 10 11:16:28 2017 - [info] Fetching binary logs from binlog server host_3.. Fri Nov 10 11:16:28 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_2.000008 --start_pos=4265 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171110111238.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Failed to save binary log: Binlog not found from /data/mysql.bin! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again. at /usr/bin/save_binary_logs line 123 eval {...} called at /usr/bin/save_binary_logs line 70 main::main() called at /usr/bin/save_binary_logs line 66 Fri Nov 10 11:16:28 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Fri Nov 10 11:16:28 2017 - [info] End of log messages from host_3. Fri Nov 10 11:16:28 2017 - [warning] Got error from host_3. Fri Nov 10 11:16:28 2017 - [info] Getting new master's binlog name and position.. Fri Nov 10 11:16:28 2017 - [info] host_1.000058:4059 Fri Nov 10 11:16:28 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_1', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Fri Nov 10 11:16:28 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_1.000058, 4059, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446388 Fri Nov 10 11:16:28 2017 - [info] Executing master IP activate script: Fri Nov 10 11:16:28 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_2 --orig_master_ip= host_2 --orig_master_port=3306 --new_master_host= host_1 --new_master_ip= host_1 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_1 is added ============================== Fri Nov 10 11:16:30 2017 - [info] OK. Fri Nov 10 11:16:30 2017 - [info] ** Finished master recovery successfully. Fri Nov 10 11:16:30 2017 - [info] * Phase 3: Master Recovery Phase completed. Fri Nov 10 11:16:30 2017 - [info] Fri Nov 10 11:16:30 2017 - [info] * Phase 4: Slaves Recovery Phase.. Fri Nov 10 11:16:30 2017 - [info] Fri Nov 10 11:16:30 2017 - [info] Fri Nov 10 11:16:30 2017 - [info] * Phase 4.1: Starting Slaves in parallel.. Fri Nov 10 11:16:30 2017 - [info] Fri Nov 10 11:16:30 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 46878. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171110111238.log if it takes time.. Fri Nov 10 11:16:31 2017 - [info] Fri Nov 10 11:16:31 2017 - [info] Log messages from host_3 ... Fri Nov 10 11:16:31 2017 - [info] Fri Nov 10 11:16:30 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_1( host_1:3306).. Fri Nov 10 11:16:30 2017 - [info] Executed CHANGE MASTER. Fri Nov 10 11:16:31 2017 - [info] Slave started. Fri Nov 10 11:16:31 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446388) completed on host_3( host_3:3306). Executed 0 events. Fri Nov 10 11:16:31 2017 - [info] End of log messages from host_3. Fri Nov 10 11:16:31 2017 - [info] -- Slave on host host_3( host_3:3306) started. Fri Nov 10 11:16:31 2017 - [info] All new slave servers recovered successfully. Fri Nov 10 11:16:31 2017 - [info] Fri Nov 10 11:16:31 2017 - [info] * Phase 5: New master cleanup phase.. Fri Nov 10 11:16:31 2017 - [info] Fri Nov 10 11:16:31 2017 - [info] Resetting slave info on the new master.. Fri Nov 10 11:16:31 2017 - [info] host_1: Resetting slave info succeeded. Fri Nov 10 11:16:31 2017 - [info] Master failover to host_1( host_1:3306) completed successfully. Fri Nov 10 11:16:31 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_2( host_2:3306) to host_1( host_1:3306) succeeded Master host_2( host_2:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_2( host_2:3306) Selected host_1( host_1:3306) as a new master. host_1( host_1:3306): OK: Applying all logs succeeded. host_1( host_1:3306): OK: Activated master IP address. host_3( host_3:3306): OK: Slave started, replicating from host_1( host_1:3306) host_1( host_1:3306): Resetting slave info succeeded. Master failover to host_1( host_1:3306) completed successfully. Fri Nov 10 11:16:31 2017 - [info] Sending mail.. ### 最后一步很重要 如果dead master之后又活过来了,那么这一步要做 dead_master> /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c http://gitlab.corp.anjuke.com/_dba/architecture/blob/master/personal/Keithlan/other/share/tools/always_used_command.md ==》 tgw章节详细描述 结论: 由于master 已挂,然而最后的日志没有传递到其他服务器,所以会丢失master没有传递过来的事务日志好在,slave和etl之间会互相change master,所以尽管slave(candidate master)的日志落后,最终也还是用etl的日志补齐了slave缺失的日志。 2.2.2 当master的所有日志已经传递到1个etl,这时候master server挂了 测试省略,和2.2.1基本一样 结论:由于master上的所有日志全部传递到etl,所以最后是不会丢失master上任何数据的。 2.3 slave(候选master)的日志是最新的,比etl要多 2.3.1 当master的部分日志还没传递两个slave,这时候master server挂了 测试省略,和2.2.1基本一样 结论: 由于master 已挂,然而最后的日志没有传递到其他服务器,所以会丢失master没有传递过来的事务日志好在,slave和etl之间会互相change master,所以尽管slave(candidate master)的日志落后,最终也还是用etl的日志补齐了slave缺失的日志。 2.3.2 当master的所有日志已经传递slave,这时候master server挂了 测试省略,和2.2.1基本一样 结论:由于master上的所有日志全部传递到slave,所以最后是不会丢失master上任何数据的。 2.4 slave(候选master)上面有大事务在跑 1000s的大查询 同1.4结论 flush tables with readlock 同1.4结论 2.5 binlog server 不同场景的测试 dead_master上的最后部分日志没有传递到slave和etl的情况, 然而slave的日志也落后etl (这是最严苛的情况) binlog server 写3台 2.2.1 测试的就是这种情况,详细日志切换请看2.2.1 结论: 由于binlog server配置了3台,但是由于master server已经挂掉,无法从master的binlog server上获取日志,所以会丢失master上没有传递的日志事务 binlog server 只写master ### 3台DB的gtid 状态 * master host_1 dba:lc> show master status; +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | host_1.000058 | 8517 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-60, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) * slave host_2 Retrieved_Gtid_Set: Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 Auto_Position: 1 * etl host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:51-55, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446389-446392 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-55, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 Auto_Position: 1 ### 模拟故障 master> iptables -A INPUT -p tcp -s other_host --dport 22 -j ACCEPT master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP ### 故障切换 masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error Fri Nov 10 14:15:51 2017 - [info] MHA::MasterFailover version 0.56. Fri Nov 10 14:15:51 2017 - [info] Starting master failover. Fri Nov 10 14:15:51 2017 - [info] Fri Nov 10 14:15:51 2017 - [info] * Phase 1: Configuration Check Phase.. Fri Nov 10 14:15:51 2017 - [info] Fri Nov 10 14:16:41 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. Fri Nov 10 14:16:41 2017 - [warning] Failed to SSH to binlog server host_1 Fri Nov 10 14:16:41 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln239] Binlog Server is defined but there is no alive server. Fri Nov 10 14:16:41 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/share/perl5/vendor_perl/MHA/MasterFailover.pm line 2082 结论: binlog server 必须要配置一个活的 server,如果只配置master,如果master挂了,那么就等于一个都没有,MHA不会切换 binlog server 只写slave ### 3台DB的gtid 状态 * master host_1 dba:lc> show master status; +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ | host_1.000058 | 8517 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-60, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 | +---------------------+----------+--------------+------------------+------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) * slave host_2 Retrieved_Gtid_Set: Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 Auto_Position: 1 * etl host_3 Retrieved_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:51-55, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446389-446392 Executed_Gtid_Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-55, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 Auto_Position: 1 ### 模拟故障 master> iptables -A INPUT -p tcp -s other_host --dport 22 -j ACCEPT master> iptables -A INPUT -p tcp -s 0.0.0.0/0 -j DROP ### 故障切换 Fri Nov 10 14:29:50 2017 - [info] MHA::MasterFailover version 0.56. Fri Nov 10 14:29:50 2017 - [info] Starting master failover. Fri Nov 10 14:29:50 2017 - [info] Fri Nov 10 14:29:50 2017 - [info] * Phase 1: Configuration Check Phase.. Fri Nov 10 14:29:50 2017 - [info] Fri Nov 10 14:29:50 2017 - [info] HealthCheck: SSH to host_2 is reachable. Fri Nov 10 14:29:50 2017 - [info] Binlog server host_2 is reachable. Fri Nov 10 14:29:50 2017 - [info] HealthCheck: SSH to host_3 is reachable. Fri Nov 10 14:29:50 2017 - [info] Binlog server host_3 is reachable. Fri Nov 10 14:29:50 2017 - [warning] SQL Thread is stopped(no error) on host_2( host_2:3306) Fri Nov 10 14:29:50 2017 - [warning] SQL Thread is stopped(no error) on host_3( host_3:3306) Fri Nov 10 14:29:50 2017 - [info] GTID failover mode = 1 Fri Nov 10 14:29:50 2017 - [info] Dead Servers: Fri Nov 10 14:29:50 2017 - [info] host_1( host_1:3306) Fri Nov 10 14:29:50 2017 - [info] Checking master reachability via MySQL(double check)... Fri Nov 10 14:29:51 2017 - [info] ok. Fri Nov 10 14:29:51 2017 - [info] Alive Servers: Fri Nov 10 14:29:51 2017 - [info] host_2( host_2:3306) Fri Nov 10 14:29:51 2017 - [info] host_3( host_3:3306) Fri Nov 10 14:29:51 2017 - [info] Alive Slaves: Fri Nov 10 14:29:51 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 14:29:51 2017 - [info] GTID ON Fri Nov 10 14:29:51 2017 - [info] Replicating from host_1( host_1:3306) Fri Nov 10 14:29:51 2017 - [info] Primary candidate for the new Master (candidate_master is set) Fri Nov 10 14:29:51 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 14:29:51 2017 - [info] GTID ON Fri Nov 10 14:29:51 2017 - [info] Replicating from host_1( host_1:3306) Fri Nov 10 14:29:51 2017 - [info] Not candidate for the new Master (no_master is set) Fri Nov 10 14:29:51 2017 - [info] Starting SQL thread on host_2( host_2:3306) .. Fri Nov 10 14:29:51 2017 - [info] done. Fri Nov 10 14:29:51 2017 - [info] Starting SQL thread on host_3( host_3:3306) .. Fri Nov 10 14:29:52 2017 - [info] done. Fri Nov 10 14:29:52 2017 - [info] Starting GTID based failover. Fri Nov 10 14:29:52 2017 - [info] Fri Nov 10 14:29:52 2017 - [info] ** Phase 1: Configuration Check Phase completed. Fri Nov 10 14:29:52 2017 - [info] Fri Nov 10 14:29:52 2017 - [info] * Phase 2: Dead Master Shutdown Phase.. Fri Nov 10 14:29:52 2017 - [info] Fri Nov 10 14:30:42 2017 - [warning] HealthCheck: Got timeout on checking SSH connection to host_1! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. Fri Nov 10 14:30:42 2017 - [info] Forcing shutdown so that applications never connect to the current master.. Fri Nov 10 14:30:42 2017 - [info] Executing master IP deactivation script: Fri Nov 10 14:30:42 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --command=stop ssh: connect to host host_1 port 22: Connection timed out =================== swift vip : tgw_vip from host_1 is deleted ============================== --2017-11-10 14:30:49-- http://tgw_server/cgi-bin/fun_logic/bin/public_api/op_rs.cgi 正在连接 tgw_server:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:未指定 [text/html] 正在保存至: “STDOUT” 0K 12.1M=0s 2017-11-10 14:32:47 (12.1 MB/s) - 已写入标准输出 [38] Fri Nov 10 14:32:47 2017 - [info] done. Fri Nov 10 14:32:47 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Fri Nov 10 14:32:47 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed. Fri Nov 10 14:32:47 2017 - [info] Fri Nov 10 14:32:47 2017 - [info] * Phase 3: Master Recovery Phase.. Fri Nov 10 14:32:47 2017 - [info] Fri Nov 10 14:32:47 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Fri Nov 10 14:32:47 2017 - [info] Fri Nov 10 14:32:47 2017 - [info] The latest binary log file/position on all slaves is host_1.000058:6912 Fri Nov 10 14:32:47 2017 - [info] Retrieved Gtid Set: 0923e916-3c36-11e6-82a5-ecf4bbf1f518:51-55, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:446389-446392 Fri Nov 10 14:32:47 2017 - [info] Latest slaves (Slaves that received relay log files to the latest): Fri Nov 10 14:32:47 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 14:32:47 2017 - [info] GTID ON Fri Nov 10 14:32:47 2017 - [info] Replicating from host_1( host_1:3306) Fri Nov 10 14:32:47 2017 - [info] Not candidate for the new Master (no_master is set) Fri Nov 10 14:32:47 2017 - [info] The oldest binary log file/position on all slaves is host_1.000058:5307 Fri Nov 10 14:32:47 2017 - [info] Oldest slaves: Fri Nov 10 14:32:47 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 14:32:47 2017 - [info] GTID ON Fri Nov 10 14:32:47 2017 - [info] Replicating from host_1( host_1:3306) Fri Nov 10 14:32:47 2017 - [info] Primary candidate for the new Master (candidate_master is set) Fri Nov 10 14:32:47 2017 - [info] Fri Nov 10 14:32:47 2017 - [info] * Phase 3.3: Determining New Master Phase.. Fri Nov 10 14:32:47 2017 - [info] Fri Nov 10 14:32:47 2017 - [info] Searching new master from slaves.. Fri Nov 10 14:32:47 2017 - [info] Candidate masters from the configuration file: Fri Nov 10 14:32:47 2017 - [info] host_2( host_2:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 14:32:47 2017 - [info] GTID ON Fri Nov 10 14:32:47 2017 - [info] Replicating from host_1( host_1:3306) Fri Nov 10 14:32:47 2017 - [info] Primary candidate for the new Master (candidate_master is set) Fri Nov 10 14:32:47 2017 - [info] Non-candidate masters: Fri Nov 10 14:32:47 2017 - [info] host_3( host_3:3306) Version=5.7.13-log (oldest major version between slaves) log-bin:enabled Fri Nov 10 14:32:47 2017 - [info] GTID ON Fri Nov 10 14:32:47 2017 - [info] Replicating from host_1( host_1:3306) Fri Nov 10 14:32:47 2017 - [info] Not candidate for the new Master (no_master is set) Fri Nov 10 14:32:47 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events.. Fri Nov 10 14:32:47 2017 - [info] Not found. Fri Nov 10 14:32:47 2017 - [info] Searching from all candidate_master slaves.. Fri Nov 10 14:32:47 2017 - [info] New master is host_2( host_2:3306) Fri Nov 10 14:32:47 2017 - [info] Starting master failover.. Fri Nov 10 14:32:47 2017 - [info] From: host_1( host_1:3306) (current master) +-- host_2( host_2:3306) +-- host_3( host_3:3306) To: host_2( host_2:3306) (new master) +-- host_3( host_3:3306) Fri Nov 10 14:32:47 2017 - [info] Fri Nov 10 14:32:47 2017 - [info] * Phase 3.3: New Master Recovery Phase.. Fri Nov 10 14:32:47 2017 - [info] Fri Nov 10 14:32:47 2017 - [info] Waiting all logs to be applied.. Fri Nov 10 14:32:47 2017 - [info] done. Fri Nov 10 14:32:47 2017 - [info] Replicating from the latest slave host_3( host_3:3306) and waiting to apply.. Fri Nov 10 14:32:47 2017 - [info] Waiting all logs to be applied on the latest slave.. Fri Nov 10 14:32:47 2017 - [info] Resetting slave host_2( host_2:3306) and starting replication from the new master host_3( host_3:3306).. Fri Nov 10 14:32:47 2017 - [info] Executed CHANGE MASTER. Fri Nov 10 14:32:48 2017 - [info] Slave started. Fri Nov 10 14:32:48 2017 - [info] Waiting to execute all relay logs on host_2( host_2:3306).. Fri Nov 10 14:32:48 2017 - [info] master_pos_wait( host_3.000049:42954) completed on host_2( host_2:3306). Executed 0 events. Fri Nov 10 14:32:48 2017 - [info] done. Fri Nov 10 14:32:48 2017 - [info] done. Fri Nov 10 14:32:48 2017 - [info] -- Saving binlog from host host_2 started, pid: 76664 Fri Nov 10 14:32:48 2017 - [info] -- Saving binlog from host host_3 started, pid: 76665 Fri Nov 10 14:32:48 2017 - [info] Fri Nov 10 14:32:48 2017 - [info] Log messages from host_2 ... Fri Nov 10 14:32:48 2017 - [info] Fri Nov 10 14:32:48 2017 - [info] Fetching binary logs from binlog server host_2.. Fri Nov 10 14:32:48 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000058 --start_pos=6912 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog1_20171110142950.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Failed to save binary log: Binlog not found from /data/mysql.bin! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again. at /usr/bin/save_binary_logs line 123 eval {...} called at /usr/bin/save_binary_logs line 70 main::main() called at /usr/bin/save_binary_logs line 66 Fri Nov 10 14:32:48 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Fri Nov 10 14:32:48 2017 - [info] End of log messages from host_2. Fri Nov 10 14:32:48 2017 - [warning] Got error from host_2. Fri Nov 10 14:32:48 2017 - [info] Fri Nov 10 14:32:48 2017 - [info] Log messages from host_3 ... Fri Nov 10 14:32:48 2017 - [info] Fri Nov 10 14:32:48 2017 - [info] Fetching binary logs from binlog server host_3.. Fri Nov 10 14:32:48 2017 - [info] Executing binlog save command: save_binary_logs --command=save --start_file= host_1.000058 --start_pos=6912 --output_file=/var/log/masterha/mha_test/saved_binlog_binlog3_20171110142950.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.56 --oldest_version=5.7.13-log --binlog_dir=/data/mysql.bin Failed to save binary log: Binlog not found from /data/mysql.bin! If you got this error at MHA Manager, please set "master_binlog_dir=/path/to/binlog_directory_of_the_master" correctly in the MHA Manager's configuration file and try again. at /usr/bin/save_binary_logs line 123 eval {...} called at /usr/bin/save_binary_logs line 70 main::main() called at /usr/bin/save_binary_logs line 66 Fri Nov 10 14:32:48 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln660] Failed to save binary log events from the binlog server. Maybe disks on binary logs are not accessible or binary log itself is corrupt? Fri Nov 10 14:32:48 2017 - [info] End of log messages from host_3. Fri Nov 10 14:32:48 2017 - [warning] Got error from host_3. Fri Nov 10 14:32:48 2017 - [info] Getting new master's binlog name and position.. Fri Nov 10 14:32:48 2017 - [info] host_2.000008:6895 Fri Nov 10 14:32:48 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=' host_2', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Fri Nov 10 14:32:48 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: host_2.000008, 6895, 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-55, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392 Fri Nov 10 14:32:48 2017 - [info] Executing master IP activate script: Fri Nov 10 14:32:48 2017 - [info] /data/online/agent/MHA/masterha/bak_mha_test/master_ip_failover_mha_test --command=start --ssh_user=root --orig_master_host= host_1 --orig_master_ip= host_1 --orig_master_port=3306 --new_master_host= host_2 --new_master_ip= host_2 --new_master_port=3306 --new_master_user='dba' --new_master_password='dba' Unknown option: new_master_user Unknown option: new_master_password =================== swift vip : tgw_vip to host_2 is added ============================== Fri Nov 10 14:32:51 2017 - [info] OK. Fri Nov 10 14:32:51 2017 - [info] ** Finished master recovery successfully. Fri Nov 10 14:32:51 2017 - [info] * Phase 3: Master Recovery Phase completed. Fri Nov 10 14:32:51 2017 - [info] Fri Nov 10 14:32:51 2017 - [info] * Phase 4: Slaves Recovery Phase.. Fri Nov 10 14:32:51 2017 - [info] Fri Nov 10 14:32:51 2017 - [info] Fri Nov 10 14:32:51 2017 - [info] * Phase 4.1: Starting Slaves in parallel.. Fri Nov 10 14:32:51 2017 - [info] Fri Nov 10 14:32:51 2017 - [info] -- Slave recovery on host host_3( host_3:3306) started, pid: 80398. Check tmp log /var/log/masterha/mha_test/ host_3_3306_20171110142950.log if it takes time.. Fri Nov 10 14:32:52 2017 - [info] Fri Nov 10 14:32:52 2017 - [info] Log messages from host_3 ... Fri Nov 10 14:32:52 2017 - [info] Fri Nov 10 14:32:51 2017 - [info] Resetting slave host_3( host_3:3306) and starting replication from the new master host_2( host_2:3306).. Fri Nov 10 14:32:51 2017 - [info] Executed CHANGE MASTER. Fri Nov 10 14:32:52 2017 - [info] Slave started. Fri Nov 10 14:32:52 2017 - [info] gtid_wait(0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-55, ebd9ff93-c5b2-11e6-b21d-ecf4bbf1f42c:1-446392) completed on host_3( host_3:3306). Executed 0 events. Fri Nov 10 14:32:52 2017 - [info] End of log messages from host_3. Fri Nov 10 14:32:52 2017 - [info] -- Slave on host host_3( host_3:3306) started. Fri Nov 10 14:32:52 2017 - [info] All new slave servers recovered successfully. Fri Nov 10 14:32:52 2017 - [info] Fri Nov 10 14:32:52 2017 - [info] * Phase 5: New master cleanup phase.. Fri Nov 10 14:32:52 2017 - [info] Fri Nov 10 14:32:52 2017 - [info] Resetting slave info on the new master.. Fri Nov 10 14:32:52 2017 - [info] host_2: Resetting slave info succeeded. Fri Nov 10 14:32:52 2017 - [info] Master failover to host_2( host_2:3306) completed successfully. Fri Nov 10 14:32:52 2017 - [info] ----- Failover Report ----- bak_mha_test: MySQL Master failover host_1( host_1:3306) to host_2( host_2:3306) succeeded Master host_1( host_1:3306) is down! Check MHA Manager logs at tjtx135-2-217.58os.org:/var/log/masterha/mha_test/mha_test.log for details. Started automated(non-interactive) failover. Invalidated master IP address on host_1( host_1:3306) Selected host_2( host_2:3306) as a new master. host_2( host_2:3306): OK: Applying all logs succeeded. host_2( host_2:3306): OK: Activated master IP address. host_3( host_3:3306): OK: Slave started, replicating from host_2( host_2:3306) host_2( host_2:3306): Resetting slave info succeeded. Master failover to host_2( host_2:3306) completed successfully. Fri Nov 10 14:32:52 2017 - [info] Sending mail.. 结论: binlog server 配置成多台slave,这是正确的方案。 由于master 挂了,master没有传递过来的binlog会丢失,这是没办法的. 好在,其余slave自动补齐现有日志 binlog server 啥都不写 会切换成功,由于master 挂了,master没有传递过来的binlog会丢失 好在,其余slave自动补齐现有日志 2.6 如果MHA过程中失败,是否可以重新执行MHA的failover呢? 同1.6结论 三、遇到的坑 3.1 交互模式下,如果没有及时敲'YES',则终止切换 四、总结 MHA + GTID 模式,重点配置和用法如下: 1. command masterha_master_switch --global_conf=/data/online/agent/MHA/conf/masterha_default.cnf --conf=/data/online/agent/MHA/conf/bak_mha_test.cnf --dead_master_host= host_1 --dead_master_port=3306 --master_state=dead --interactive=0 --ignore_last_failover --ignore_binlog_server_error 2. binlog server 在配置文件中对 master,slave,etl 都写在binlog server中。对MySQL down 和 DB server down 综合考虑下,建议这样配置。 3. tgw 清理 dead master 如果还可以起来,那么必须在上面执行: /usr/local/realserver/RS_TUNL0/etc/setup_rs.sh -c 原因可参看:http://gitlab.corp.anjuke.com/_dba/architecture/blob/master/personal/Keithlan/other/share/tools/always_used_command.md ==> TGW 章节 五、流程简介 * Phase 1: Configuration Check Phase.. HealthCheck: SSH N台DB是否reachable Binlog server: N台DB 是否reachable GTID failover mode = ? Dead Servers is ? Primary candidate for the new Master (candidate_master is set) ? * Phase 2: Dead Master Shutdown Phase.. Executing master IP deactivation script: TGW-vip delete操作 shutdown_script: ? * Phase 3: Master Recovery Phase.. * Phase 3.1: Getting Latest Slaves Phase.. Latest slaves ,file position ? Oldest slaves , file position ? * Phase 3.3: Determining New Master Phase.. 选择哪个slave为new master * Phase 3.3: New Master Recovery Phase.. Replicating from the latest slave and waiting to apply.. --让new master change master 到 latest slave Waiting all logs to be applied on the latest slave.. --让new master跟 latest slave的日志保持一致 Saving binlog from binlog server。。。 --根据配置的binlog server,开始生成latest slave和dead master之间的diff日志 Applying differential binlog --apply这些差异日志到new master,让new master执行完所有缺失的日志 Getting new master's binlog name and position.. --获取new master现在的binlog file pos All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST, MASTER_PORT=3306, MASTER_AUTO_POSITION=1 Executing master IP activate script: --TGW-vip 激活操作,并且设置readonly=0 * Phase 4: Slaves Recovery Phase.. (并行操作) Resetting slave and starting replication from the new master * Phase 5: New master cleanup phase.. Resetting slave info on the new master.
背景 一个MySQL实例中,如何验证一个账号上面是否还有访问? 一个MySQL实例中,如何验证某个业务ip是否还有访问? 倔强青铜级别 打开general log 优点: 全量 缺点: 性能差 秩序白银级别 打开slow log,设置long_query_time = 0 优点: 全量 缺点: 性能比较差 荣耀黄金级别 tshark | tcpdump | tcpcopy tshark -i any dst host ${ip} and dst port 3306 -l -d tcp.port==3306,mysql -T fields -e frame.time -e 'ip.src' -e 'mysql.query' -e 'mysql.user' -e 'mysql.schema' 优点:全量*95% 缺点:性能比较差,使用不方便 尊贵铂金级别 使用P_S * 使用案例 dba:performance_schema> select USER,EVENT_NAME,COUNT_STAR,now() as time from events_statements_summary_by_user_by_event_name where EVENT_NAME in ('statement/sql/select','statement/sql/update','statement/sql/delete','statement/sql/insert','statement/sql/replace') and COUNT_STAR > 0; +------+----------------------+------------+---------------------+ | USER | EVENT_NAME | COUNT_STAR | time | +------+----------------------+------------+---------------------+ | dba | statement/sql/select | 143 | 2017-09-04 18:02:33 | | repl | statement/sql/select | 10 | 2017-09-04 18:02:33 | +------+----------------------+------------+---------------------+ 2 rows in set (0.00 sec) dba:performance_schema> select HOST,EVENT_NAME,COUNT_STAR,now() as time from events_statements_summary_by_host_by_event_name where EVENT_NAME in ('statement/sql/select','statement/sql/update','statement/sql/delete','statement/sql/insert','statement/sql/replace') and COUNT_STAR > 0; +-----------+----------------------+------------+---------------------+ | HOST | EVENT_NAME | COUNT_STAR | time | +-----------+----------------------+------------+---------------------+ | localhost | statement/sql/select | 22 | 2017-09-04 18:02:35 | +-----------+----------------------+------------+---------------------+ 1 row in set (0.00 sec) 对比 优点:全量,性能基本无影响 缺点:无法抓到对应的SQL 永恒钻石级别 巧用P_S 将每1分钟,5分钟,10分钟的P_S快照映射到对应的table,永久存下来,进行统计分析 优点:全量,性能基本无影响,且时间更加细粒度化 缺点:无法抓到对应的SQL,需要额外开发成本 最强王者 巧用P_S + tshark 1. P_S分段,找到具体有访问的时间段 $time 2. 在$time时间段内,去用tshark 抓取SQL相关info
一、背景 最近凌晨05:00总是接到来自SQL防火墙的告警: group_name id user host db command time info state BASE 1059712468 xx xx.xx.xx.xx aea Query 34 UPDATE approve SET operator = '0',operator_name = 'system',comment = '离职',status = '1' WHERE (id = '48311') updating 当第一次看到这个数据的时候,第一反应是可能是被受影响的SQL,没话时间关注,但是后面好几次的凌晨告警,就不得不对他进行深入分析 症状特点 主键更新 状态为updating 执行时间30多秒 command为query 这一切看上去都特别正常 二、环境 MySQL版本 mysql Ver 14.14 Distrib 5.6.16, for linux-glibc2.5 (x86_64) using EditLine wrapper 表结构 CREATE TABLE `approve` ( `id` int(11) NOT NULL AUTO_INCREMENT, `reim_id` int(11) NOT NULL DEFAULT '0' , `user_name` varchar(20) NOT NULL DEFAULT '', `user_ids` varchar(100) NOT NULL DEFAULT '', `user_email` text COMMENT '用于mail', `status` tinyint(1) NOT NULL DEFAULT '0' , `stagesub` smallint(3) NOT NULL DEFAULT '0' , `stage` smallint(3) NOT NULL DEFAULT '0' , `flag` tinyint(1) NOT NULL DEFAULT '0' , `operator` int(11) NOT NULL DEFAULT '0' , `operator_name` varchar(20) NOT NULL DEFAULT '', `comment` text, `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `cs_userid` int(11) NOT NULL DEFAULT '0' , `cs_status` tinyint(4) NOT NULL DEFAULT '0' , `is_deficit` tinyint(1) NOT NULL DEFAULT '1' , `approve_type` tinyint(4) NOT NULL DEFAULT '1', PRIMARY KEY (`id`), KEY `list` (`user_ids`,`status`), KEY `next` (`flag`,`status`), KEY `detail` (`reim_id`), KEY `ix_userid` (`cs_userid`) ) ENGINE=InnoDB AUTO_INCREMENT=464885 DEFAULT CHARSET=utf8 三、分析过程 SQL语句本身的分析 1. 这条语句在正常不过了,而且还是主键更新,执行计划一切都很正常。 2. show processlist中的状态, command=Query,state=updating 手动执行 没有任何问题,瞬间执行完毕 dba> UPDATE approve SET `operator` = '0',`operator_name` = 'system',`comment` = '离职',`status` = '1' WHERE (`id` = '49384'); Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 可能的问题原因 1. SQL语句的拼接问题,会不会凌晨的时候SQL有特殊字符导致的全表扫描更新呢? 1.1 为了这个问题,模拟了N遍,且将所有特殊字符都打印出来,这个问题排除。 2. 服务器压力问题,有没有可能是在凌晨的时候io,cpu压力特别大,造成的updating慢呢? 2.1 查看当时的监控,一切指标都正常,故也排除 3. 数据库本身的问题,MySQL出现bug了? 3.1 这个目前也没有搜到关于这方面的bug 信息 4. 锁的问题,SQL语句当时被锁住了? 4.1 show processlist中没有看到任何lock的字样啊 锁相关排除 1. 一开始,所有的故障排除全部来自监控系统和show processlist,然后查看锁的神器没有使用,就是show engine innodb status \G ---TRANSACTION 51055827249, ACTIVE 20 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 360, 1 row lock(s) MySQL thread id 1060068541, OS thread handle 0x7fba06c6c700, query id 55990809665 xx aea updating UPDATE approve SET `operator` = '0',`operator_name` = 'system',`comment` = '离职',`status` = '1' WHERE (`id` = '49384') ------- TRX HAS BEEN WAITING 20 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 746 page no 624 n bits 216 index `PRIMARY` of table `aea`.`approve` trx id 51055827249 lock_mode X locks rec but not gap waiting Record lock, heap no 148 PHYSICAL RECORD: n_fields 19; compact format; info bits 0 0: len 4; hex 8000c0e8; asc ;; 1: len 6; hex 000be32a10cb; asc * ;; 2: len 7; hex 7a000004540557; asc z T W;; 3: len 4; hex 80002884; asc ( ;; 4: len 6; hex e69da8e58b87; asc ;; 5: len 6; hex 3b363430353b; asc ;6405;;; 6: len 19; hex 7979616e6740616e6a756b65696e632e636f6d; asc yy.com;; 7: len 1; hex 81; asc ;; 8: len 2; hex 8015; asc ;; 9: len 2; hex 8001; asc ;; 10: len 1; hex 80; asc ;; 11: len 4; hex 80000001; asc ;; 12: len 6; hex 73797374656d; asc system;; 13: len 6; hex e7a6bbe8818c; asc ;; 14: len 4; hex 59a4c993; asc Y ;; 15: len 4; hex 80000000; asc ;; 16: len 1; hex 80; asc ;; 17: len 1; hex 81; asc ;; 18: len 1; hex 81; asc ;; ------------------ ---TRANSACTION 51055825099, ACTIVE 21 sec 2 lock struct(s), heap size 360, 1 row lock(s), undo log entries 1 MySQL thread id 1060025172, OS thread handle 0x7fba05ad0700, query id 55990809629 xx aea cleaning up 2. 通过以上片段信息可以得知如下结论 2.1 UPDATE approve 语句等待主键索引的record lock,lock_mode X locks rec but not gap , space id 746 page no 624, 记录为主键49384的row 2.2 TRANSACTION 51055827249, ACTIVE 20 sec , 这个事务持续20秒 2.3 TRANSACTION 51055825099, ACTIVE 21 sec , 这个事务持续21秒,根据这个信息,很有可能由于这个事务持有UPDATE approve需要的record lock 2.4 TRANSACTION 51055825099, 1 row lock(s) , 根据这个信息,可以更进一步推论出该事务,该thread id 1060025172 持有该记录锁。 3. 很可惜,并不知道是什么SQL语句,说明已经执行完毕 验证 1. 如何验证上面的推论呢? 2. 如何找到是哪条SQL持有锁呢? 3. 首先我们从表入手,查找该表有哪些写入SQL? 通过Performance Schema ,发现了两种形迹可疑的SQL digest sql count db dbgroup date 0c95e7f2105d7a3e655b8b4462251bf2 UPDATE approve SET operator = ? , operator_name = ? , comment = ? , status = ? WHERE ( id = ? ) 15 xx BASE 20170829 591226ca0ece89fe74bc6894ad193d71 UPDATE approve SET STATUS = ? , operator = ? , operator_name = ? , COMMENT = ? WHERE approve . id = ? 15 xx BASE 20170829 进一步验证 1. 通过上述SQL,如果他们更新的是同一个id,那么很有可能就会导致锁等待。 2. 要满足上面的推测,还必须满足一个必要条件就是:下面那个语句必须在上面语句之前执行,且没有commit 3. 我们去服务器上进行tcpdump抓包发现如下: Capturing on Pseudo-device that captures on all interfaces Aug 29, 2017 10:20:23.560491000 xx.xx.xx.xx UPDATE approve SET status=1, operator=1, operator_name='system', comment='\xe7\xa6\xbb\xe8\x81\x8c' WHERE approve.id = 49384 Aug 29, 2017 10:20:23.589586000 xx.xx.xx.xx UPDATE approve SET `operator` = '0',`operator_name` = 'system',`comment` = '\xe7\xa6\xbb\xe8\x81\x8c',`status` = '1' WHERE (`id` = '49384') 正好验证我们的推论 4. 手动模拟了这种情况,得到的现象和我们的故障一致,即问题原因以及找到 问题解决 1. 拿到开发的代码,发现是python的代码,并没有auto_commit的选项 2. 第一个事务并没有结束,没有commit,导致下一个事务在等待锁的资源 3. 为什么需要对同一个记录进行两次更新,这个还需要进一步了解代码和业务 四、总结 1. 下次遇到类似问题,可以不用被processlist表面现象所迷惑,善于了解锁机制和锁信息的排查 2. 写代码的时候,尽量做到小事务,一般用auto_commit就好,如果需要显示开启事务,也应该尽量做到用完尽量commit 3. MySQL如果能够再show processlist中直接打印出lock,waiting lock状态会更加的人性化 4. 在show engine innodb status的时候,为什么看不到是哪个SQL语句持有的锁呢?MySQL如果能够提供这样的信息,可以更加好的帮助DBA诊断问题
能学到什么 什么是死锁 死锁有什么危害 典型的死锁案例剖析 如何避免死锁 一、什么是死锁 1.必须满足的条件 1. 必须有两个或者两个以上的事务 2. 不同事务之间都持有对方需要的锁资源。 A事务需要B的资源,B事务需要A的资源,这就是典型的AB-BA死锁 2.死锁相关的参数 * innodb_print_all_deadlocks 1. 如果这个参数打开,那么死锁相关的信息都会打印输出到error log * innodb_lock_wait_timeout 1. 当MySQL获取row lock的时候,如果wait了innodb_lock_wait_timeout=N的时间,会报以下错误 ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction * innodb_deadlock_detect 1. innodb_deadlock_detect = off 可以关闭掉死锁检测,那么就发生死锁的时候,用锁超时来处理。 2. innodb_deadlock_detect = on (默认选项)开启死锁检测,数据库自动回滚 * innodb_status_lock_output = on 1. 可以看到更加详细的锁信息 二、死锁有什么危害 死锁,即表明有多个事务之间需要互相争夺资源而互相等待。 如果没有死锁检测,那么就会互相卡死,一直hang死 如果有死锁检测机制,那么数据库会自动根据代价来评估出哪些事务可以被回滚掉,用来打破这个僵局 所以说:死锁并没有啥坏处,反而可以保护数据库和应用 那么出现死锁,而且非常频繁,我们应该调整业务逻辑,让其避免产生死锁方为上策 三、典型的死锁案例剖析 3.1 死锁案例一 典型的 AB-BA 死锁 session 1: select * from tb_b where id_2 = 1 for update (A) session 2: select * from tb_a where id = 2 for update (B) session 1: select * from tb_a where id = 2 for update (B) session 2: select * from tb_b where id_2 = 1 for update (A) ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 1213的死锁错误,mysql会自动回滚 哪个回滚代价最小,回滚哪个(根据undo判断) ------------------------ LATEST DETECTED DEADLOCK ------------------------ 2017-06-22 16:39:50 0x7f547dd02700 *** (1) TRANSACTION: TRANSACTION 133601982, ACTIVE 48 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s) MySQL thread id 11900, OS thread handle 140000866637568, query id 25108 localhost dba statistics select * from tb_a where id = 2 for update -----session1 持有tb_a中记录为2的锁 *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 303 page no 3 n bits 72 index PRIMARY of table `lc_5`.`tb_a` trx id 133601982 lock_mode X locks rec but not gap waiting Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 80000002; asc ;; --session 1 需要tb_a中记录为2的锁( session1 -> session2 ) 1: len 6; hex 000007f69ab2; asc ;; 2: len 7; hex dc000027100110; asc ' ;; *** (2) TRANSACTION: TRANSACTION 133601983, ACTIVE 28 sec starting index read, thread declared inside InnoDB 5000 mysql tables in use 1, locked 1 4 lock struct(s), heap size 1136, 2 row lock(s) MySQL thread id 11901, OS thread handle 140000864773888, query id 25109 localhost dba statistics select * from tb_b where id_2 = 1 for update *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 303 page no 3 n bits 72 index PRIMARY of table `lc_5`.`tb_a` trx id 133601983 lock_mode X locks rec but not gap Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 80000002; asc ;; --session 2 持有tb_a中记录等于2的锁 1: len 6; hex 000007f69ab2; asc ;; 2: len 7; hex dc000027100110; asc ' ;; *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 304 page no 3 n bits 72 index PRIMARY of table `lc_5`.`tb_b` trx id 133601983 lock_mode X locks rec but not gap waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 80000001; asc ;; --session 2 需要tb_b中记录为1的锁 ( session2 -> session1 ) 1: len 6; hex 000007f69ab8; asc ;; 2: len 7; hex e0000027120110; asc ' ;; 最终的结果: 死锁路径:[session1 -> session2 , session2 -> session1] ABBA死锁产生 3.2 死锁案例二 同一个事务中,S-lock 升级为 X-lock 不能直接继承 * session 1: mysql> CREATE TABLE t (i INT) ENGINE = InnoDB; Query OK, 0 rows affected (1.07 sec) mysql> INSERT INTO t (i) VALUES(1); Query OK, 1 row affected (0.09 sec) mysql> START TRANSACTION; Query OK, 0 rows affected (0.00 sec) mysql> SELECT * FROM t WHERE i = 1 LOCK IN SHARE MODE; --获取S-lock +------+ | i | +------+ | 1 | +------+ * session 2: mysql> START TRANSACTION; Query OK, 0 rows affected (0.00 sec) mysql> DELETE FROM t WHERE i = 1; --想要获取X-lock,但是被session1的S-lock 卡住,目前处于waiting lock阶段 * session 1: mysql> DELETE FROM t WHERE i = 1; --想要获取X-lock,session1本身拥有S-lock,但是由于session 2 获取X-lock再前,所以session1不能够从S-lock 提升到 X-lock,需要等待session2 释放才可以获取,所以造成死锁 ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 死锁路径: session2 -> session1 , session1 -> session2 3.3 死锁案例三 唯一键死锁 (delete + insert)关键点在于:S-lock dba:lc_3> show create table uk; +-------+--------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+--------------------------------------------------------------------------------------------------------------+ | uk | CREATE TABLE `uk` ( `a` int(11) NOT NULL, UNIQUE KEY `uniq_a` (`a`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------+--------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:lc_3> select * from uk; +---+ | a | +---+ | 1 | +---+ 1 row in set (0.00 sec) session 1: dba:lc_3> begin; Query OK, 0 rows affected (0.00 sec) dba:lc_3> delete from uk where a=1; Query OK, 1 row affected (0.00 sec) session 2: dba:(none)> use lc_3; Database changed dba:lc_3> insert into uk values(1); --wait lock(想要加S-lock,却被sesson1的X-lock卡住) sesson 3: dba:(none)> use lc_3; Database changed dba:lc_3> insert into uk values(1); --wait lock(想要加S-lock,却被sesson1的X-lock卡住) session 1: commit; --session2和session3 都获得了S-lock,然后都想要去给记录1 加上X-lock,却互相被对方的S-lock卡住,死锁产生 再来看session 2 和 session 3 的结果: session2: Query OK, 1 row affected (7.36 sec) session3: ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction 总结: 试想想,如果session 1 不是commit,而是rollback会是怎么样呢? 大家去测测就会发现,结果肯定是唯一键冲突啊 3.4 死锁案例四 主键和二级索引的死锁 * primary key 1 2 3 4 --primary key col1 10 30 20 40 --idx_key2 col2 100 200 300 400 --idx_key3 col3 * idx_key2 select * from t where col2 > 10: 锁二级索引顺序为:20 =》30 , 对应锁主键的顺序为:3 =》2 10 20 30 40 1 3 2 4 * idx_key3 select * from t where col3 > 100:锁二级索引顺序为:200 =》300 , 对应锁主键的顺序为:2 =》3 100 200 300 400 1 2 3 4 死锁路径: 由于二级索引引起的主键加锁顺序: 3 =》2 由于二级索引引起的主键加锁顺序: 2 =》3 这个要求并发,且刚好 session 1 加锁3的时候 session 2 要加锁2. session 1 加锁2的时候 session 3 要加锁3. 这样就产生了 AB-BA 死锁 3.5 死锁案例五 purge + unique key 引发的死锁 A表的记录: id = 1 10 40 100 200 500 800 900 session 1 : delete from a where id = 10; ??? session 2 : delete from a where id = 800; ??? session 1 : insert into a select 800; ??? session 2 : insert into a select 10; ??? * 如果大家去跑这两钟SQL语句的并发测试,是可以导致死锁的。 * 如何验证是由于purge导致的问题呢?这个本想用mysqld-debug模式去关闭purge线程,但是很遗憾我没能模拟出来。。。 3.6 死锁案例六 REPLACE INTO问题 * 这个问题模拟起来非常简单,原理非常复杂,这里不过多解释 * 详情请看姜老师的文章,据说看懂了年薪都100w了: http://www.innomysql.com/26186-2/ * 解决方案: * 用insert into ... on duplicate key update 代替 replace into * 此方案亲测有效 四、如何避免死锁 产生死锁的原因 1. 事务之间互相占用资源 方法和总结 1. 降低隔离级别,修改 RR -> RC , 如果这个调整了,可以避免掉60%的死锁场景和奇怪的锁等待 2. 调整业务逻辑和SQL,让其都按照顺序执行操作 3. 减少unique索引,大部分死锁的场景都是由于unique索引导致 4. 尽量不用replace into,用insert into ... on duplicate key update 代替
Agenda 踩坑经历 测试用例 结论 实战用途 一、踩坑经历 设置了slow log 的时间,但是抓不到正确的sql 设置了read_only ,为啥还有写入进来 设置了sql_safe_update , 为啥还能全表删除 测试方法的不对,导致设置了read_only后,有的时候可以insert,有的时候不可以insert 太多这样的问题, 所以打算一窥究竟 二、测试用例 测试设置参数后,是否会生效 2.1 官方文档说明 https://dev.mysql.com/doc/refman/5.7/en/set-variable.html * 重点说明 If you change a session system variable, the value remains in effect within your session until you change the variable to a different value or the session ends. The change has no effect on other sessions. If you change a global system variable, the value is remembered and used for new sessions until you change the variable to a different value or the server exits. The change is visible to any client that accesses the global variable. However, the change affects the corresponding session variable only for clients that connect after the change. The global variable change does not affect the session variable for any current client sessions (not even the session within which the SET GLOBAL statement occurred). 官方重点说明,设置global变量的时候,只对后面连接进来的session生效,对当前session和之前的session不生效 接下来,我们好好测试下 2.2 系统变量的Scope 1. Global : 全局级别 set global variables = xx; --正确 set variables = xx; --报错 (因为是scope=Global,所以不能设置session变量 ) 2. Session : 会话级别 set variables = xx; --正确 set global variables = xx; --报错 (因为是Scope=session,所以不能设置Global变量) 3. Both : 两者皆可 3.1 Global : set global variables = xx; --正确(因为是scope=both,他既可以设置全局变量,也可以设置session变量) 3.2 Session : set variables = xx; --正确(因为是scope=both,他既可以设置全局变量,也可以设置session变量) 2.3 Session 级别测试 1. session 级别的变量代表:sql_log_bin 2. 该类型的变量,设置后,只会影响当前session,其他session不受影响 2.4 Global 级别测试 变量代表 1. Global 级别的变量代表:read_only , log_queries_not_using_indexes 测试一 * processlist_id = 100: lc_rx:lc> select @@global.log_queries_not_using_indexes; +----------------------------------------+ | @@global.log_queries_not_using_indexes | +----------------------------------------+ | 0 | +----------------------------------------+ 1 row in set (0.00 sec) lc_rx:lc> select * from lc_1; +------+ | id | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | +------+ 5 rows in set (0.00 sec) 此时查看slow log,并未发现任何slow * processlist_id = 120: dba:(none)> set global log_queries_not_using_indexes=on; Query OK, 0 rows affected (0.00 sec) * processlist_id = 100: lc_rx:lc> select @@global.log_queries_not_using_indexes; +----------------------------------------+ | @@global.log_queries_not_using_indexes | +----------------------------------------+ | 1 | +----------------------------------------+ 1 row in set (0.00 sec) lc_rx:lc> select * from lc_1; +------+ | id | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | +------+ 5 rows in set (0.00 sec) 此时,去发现slow log # Time: 2017-08-04T16:05:04.303005+08:00 # User@Host: lc_rx[lc_rx] @ localhost [] Id: 296 # Query_time: 0.000149 Lock_time: 0.000081 Rows_sent: 5 Rows_examined: 5 SET timestamp=1501833904; select * from lc_1; * 结论 说明全局参数变量不管是在session前,还是session后设置,都是立马让所有session生效 测试二 dba:(none)> show processlist; +-----+-------+----------------------+------+------------------+---------+---------------------------------------------------------------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+-------+----------------------+------+------------------+---------+---------------------------------------------------------------+------------------+ | 303 | lc_rx | localhost | lc | Sleep | 83 | | NULL | | 304 | dba | localhost | NULL | Query | 0 | starting | show processlist | +-----+-------+----------------------+------+------------------+---------+---------------------------------------------------------------+------------------+ 3 rows in set (0.00 sec) * PROCESSLIST_ID=303 lc_rx:lc> select @@global.read_only; +--------------------+ | @@global.read_only | +--------------------+ | 0 | +--------------------+ 1 row in set (0.00 sec) lc_rx:lc> insert into lc_1 select 2; Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0 Warnings: 0 * PROCESSLIST_ID=304 dba:(none)> set global read_only=on; Query OK, 0 rows affected (0.00 sec) * PROCESSLIST_ID=303 lc_rx:lc> select @@global.read_only; +--------------------+ | @@global.read_only | +--------------------+ | 1 | +--------------------+ 1 row in set (0.00 sec) lc_rx:lc> insert into lc_1 select 3; ERROR 1290 (HY000): The MySQL server is running with the --read-only option so it cannot execute this statement * 结论: PROCESSLIST_ID=304 设置的参数,导致PROCESSLIST_ID=303 也生效了 2.5 如何查看当下所有session中的系统变量值呢? 5.7 可以看到遗憾的是:只能看到Both和session的变量,scope=global没法看(因为会立即生效) dba:(none)> select * from performance_schema.variables_by_thread as a,\ -> (select THREAD_ID,PROCESSLIST_ID,PROCESSLIST_USER,PROCESSLIST_HOST,PROCESSLIST_COMMAND,PROCESSLIST_STATE from performance_schema.threads where PROCESSLIST_USER<>'NULL') as b\ -> where a.THREAD_ID = b.THREAD_ID and a.VARIABLE_NAME = 'sql_safe_updates'; +-----------+------------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ | THREAD_ID | VARIABLE_NAME | VARIABLE_VALUE | THREAD_ID | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_COMMAND | PROCESSLIST_STATE | +-----------+------------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ | 313 | sql_safe_updates | OFF | 313 | 232 | repl | xx.xxx.xxx.xxx | Binlog Dump GTID | Master has sent all binlog to slave; waiting for more updates | | 381 | sql_safe_updates | ON | 381 | 300 | dba | localhost | Query | Sending data | +-----------+------------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ 2 rows in set (0.00 sec) 2.6 Both 级别测试 用我们刚刚学到的知识,来验证更加快速和靠谱 变量代表 1. Both 级别的变量代表:sql_safe_updates , long_query_time 测试 * 第一次查看long_query_time参数,PROCESSLIST_ID=307,308,309 都是一样的,都是300s dba:(none)> select * from performance_schema.variables_by_thread as a, (select THREAD_ID,PROCESSLIST_ID,PROCESSLIST_USER,PROCESSLIST_HOST,PROCESSLIST_COMMAND,PROCESSLIST_STATE from performance_schema.threads where PROCESSLIST_USER<>'NULL') as b where a.THREAD_ID = b.THREAD_ID and a.VARIABLE_NAME = 'long_query_time'; +-----------+-----------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ | THREAD_ID | VARIABLE_NAME | VARIABLE_VALUE | THREAD_ID | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_COMMAND | PROCESSLIST_STATE | +-----------+-----------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ | 388 | long_query_time | 300.000000 | 388 | 307 | dba | localhost | Sleep | NULL | | 389 | long_query_time | 300.000000 | 389 | 308 | dba | localhost | Query | Sending data | | 390 | long_query_time | 300.000000 | 390 | 309 | dba | localhost | Sleep | NULL | +-----------+-----------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ 4 rows in set (0.00 sec) * 我们再PROCESSLIST_ID=308的session上进行设置long_query_time=100,我们能看到这个时候所有的session都还是300,没有生效 dba:(none)> set global long_query_time=100; Query OK, 0 rows affected (0.00 sec) dba:(none)> select * from performance_schema.variables_by_thread as a, (select THREAD_ID,PROCESSLIST_ID,PROCESSLIST_USER,PROCESSLIST_HOST,PROCESSLIST_COMMAND,PROCESSLIST_STATE from performance_schema.threads where PROCESSLIST_USER<>'NULL') as b where a.THREAD_ID = b.THREAD_ID and a.VARIABLE_NAME = 'long_query_time'; +-----------+-----------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ | THREAD_ID | VARIABLE_NAME | VARIABLE_VALUE | THREAD_ID | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_COMMAND | PROCESSLIST_STATE | +-----------+-----------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ | 388 | long_query_time | 300.000000 | 388 | 307 | dba | localhost | Sleep | NULL | | 389 | long_query_time | 300.000000 | 389 | 308 | dba | localhost | Query | Sending data | | 390 | long_query_time | 300.000000 | 390 | 309 | dba | localhost | Sleep | NULL | +-----------+-----------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ 4 rows in set (0.00 sec) * 接下来,我们再断开309,重连时,processlist id 应该是310,这时候的结果就是100s了。这一点说明,在执行set global参数后进来的session才会生效,对当前session和之前的session不生效 dba:(none)> select * from performance_schema.variables_by_thread as a, (select THREAD_ID,PROCESSLIST_ID,PROCESSLIST_USER,PROCESSLIST_HOST,PROCESSLIST_COMMAND,PROCESSLIST_STATE from performance_schema.threads where PROCESSLIST_USER<>'NULL') as b where a.THREAD_ID = b.THREAD_ID and a.VARIABLE_NAME = 'long_query_time'; +-----------+-----------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ | THREAD_ID | VARIABLE_NAME | VARIABLE_VALUE | THREAD_ID | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_COMMAND | PROCESSLIST_STATE | +-----------+-----------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ | 388 | long_query_time | 300.000000 | 388 | 307 | dba | localhost | Sleep | NULL | | 389 | long_query_time | 300.000000 | 389 | 308 | dba | localhost | Query | Sending data | | 391 | long_query_time | 100.000000 | 391 | 310 | dba | localhost | Sleep | NULL | +-----------+-----------------+----------------+-----------+----------------+------------------+------------------+---------------------+---------------------------------------------------------------+ 4 rows in set (0.00 sec) 三、结论 官方文档也不是很靠谱,也有很多差强人意的地方自己动手,测试验证的时候做好测试方案和计划,以免遗漏导致测试失败,得出错误的结论 四、实战意义 4.1 项目背景 a. 修改sql_safe_update=on, 这里面有很多难点,其中的一个难点就是如何让所有session生效 4.2 解决方案 MySQL5.7+ 结合今天的知识,通过performance_schema.variables_by_thread,performance_schema.threads表,可以知道哪些变量已经生效,哪些变量还没生效 MySQL5.7- 1. 如果对今天的Both变量知识理解了,不难发现,还有一个变通的办法 2. 执行这条命令即可 2.1 set global $both_scope_variables = on|off 2.2 select max(ID) from information_schema.PROCESSLIST; 3. kill掉所有小于processlist < max(ID) 的session即可 3.1 当然,系统用户进程你不能kill,read_only的用户你没必要kill 3.2 其他的自行脑补
一、主要内容 生产前的测试方案 生产环境如何平滑实施 生产坏境中遇到哪些困难 我们的解决方案 价值与意义 二、背景 这个项目的起源,来源于生产环境中的N次误删数据,所以才有他的姊妹篇文章,一个神奇的参数前传 三、生产前的测试方案 3.1 why 为什么要做测试方案 1. 大家都知道设置sql_safe_update=1 会拒绝掉很多你想不到的SQL,这样会导致业务出问题,服务中断,影响非常严重 2. 我们需要测试出哪些SQL语句会被拒绝? 3. 我们需要知道已经上线的SQL语句中,哪些会被拒绝? 总之,我们需要无缝升级,怎么样才能既加强安全防范,又不影响业务呢? 这就是我们的SQL防火墙系统的升级改造之路 3.2 如何测试 非常感谢DBA团队袁俊敏同学的细心测试 1. 根据官方文档的提示,以及之前碰壁的经验,我们详细的设计了各种SQL方案 a. 单键索引 a.1 update语句 a.2 delete语句 a.3 replace into系列 a.4 有limit a.5 无limit a.6 有where条件 a.7 无where条件 a.8 隐式类型字符转换 a.9 SQL带有函数方法 b. 组合索引 b.1 update语句 b.2 delete语句 b.3 replace into系列 b.4 有limit b.5 无limit b.6 有where条件 b.7 无where条件 b.8 隐式类型字符转换 b.9 SQL带有函数方法 等等 3.3 哪些语句会触发sql_safe_update报错 1. 有where 条件且没有使用索引,且没有limit语句 --触发 2. 没有where 条件 , 有limit,delete语句 --触发 3. 没有where 条件 , 没有limit, delete+update语句 --触发 总结: 没有使用索引的DML语句,都会被触发 四、生产环境如何平滑实施 log_queries_not_using_indexes=onlong_query_time = 10000log_queries_not_using_indexes 无长连接的概念,立即对所有链接生效 通过log_queries_not_using_indexes=on + long_query_time = 10000 可以抓出所有我们需要的dml,解决掉这些sql,我们的目的就达到了 五、生产坏境中遇到哪些困难 这边说一个典型的坑 你们真的以为设置了sql_safe_updates就一定能够拒绝没有使用索引的SQL吗? 1. 首先:log_queries_not_using_indexes=on,的确是可以抓出所有没有使用索引的DML 2. 但是:再设置sql_safe_updates=1之前,如果这个connection已经存在了,那么sql_safe_updates=1对早已经存在的connection是不起作用的 不可预见的问题 1. online目前long_query_time=0.1,本来打算当场设置long_query_time = 10000, 来排除掉slow的因素,然后设置log_queries_not_using_indexes=on 将所有没有使用索引的SQL抓出来。 2. 结果出乎意料的是:将使用索引的DML也抓了出来 3. 后来通过slow发现这些dml都是大于100ms的,才得知结论:长连接还在使用long_query_time=0.1的参数,对于刚刚设置的参数不生效。 4. 所以,long_query_time 对长连接无效。 目前总结下来:这里面有两个关键参数对长连接无效 1. long_query_time 2. sql_safe_updates 故障一 1. master 由于对于长连接不生效,所以全表更新dml在master执行了,但是在slave却不能执行,导致主从同步失败(MIXED,STATEMENT) 2. 以上情况ROW模式不受影响,因为Row模式已经是对每一行记录进行更改,不可能不安全 故障二 1. master 由于对于长连接不生效,所以全表更新dml在master执行了,那么意味着,你以为可以保障MySQL安全,其实只是自欺欺人而已 六、我们的解决方案 解决长连接问题 删掉所有链接 1. 有人说,那简单,我们直接全部删掉所有链接就好了。 的确,全部删除,的确可以做到,但是是不是有点粗暴呢? 2. 那就让业务方将所有长连接应用重启 这。。。业务方会很崩溃,也不可能停掉所有的长连接服务 只kill具有dml权限的长连接 * 如何找到长连接 1. 长连接的特点:长 2. 那么MySQL里面的show processlist有两个非常重要的属性 Id: session id Time: command status time 3. 误区 这里有大部分人会根据Time来识别这个链接是不是长连接,那么他一定不理解time的含义 它并不是链接的时间长短,而是command某个状态的时间而已 4. 大家已经猜到,最终的方案就是通过session id来判断识别长连接 4.0 先在master上设置sql_safe_update=on 4.1 然后假设10:00 show processlist,记录下所有的id 4.2 那么明天10:00 show processlist,与上一次的id进行匹配,如果匹配中了,那么说明这个connection已经存在一天,那么可以认为他是长连接了 4.3 判断这些id对应的用户权限,只读账号忽略 4.4 kill掉这些长连接即可(注意:repl,system user 这些系统进程不要被误删掉了,否则哭都来不及) 4.5 可以根据host:port告知业务方,一起配合重启和kill之后的观察 价值和意义 目前我们已经完成了N组DB集群的设置 这里有很多人有疑问: 花这么大的代价,只是为了设置这样的一个参数,值得吗? 万一搞不好,弄出问题来,岂不是没事找事,给自己找罪受? 这样操作,开发会支持你吗?你们老大支持你吗? 我是这么理解的: 刚开始的时候,的确难度非常大,后来我们经过无数次测试和技术方案演练,还是决定冒着枪林弹雨,只为以后的数据安全 一切以用户为中心,我们必须用我们专业的判断对用户负责 final:我将这件事看做: '功在当代,利在千秋' 的一件事
能学到什么 隔离级别和锁的关系 重点讲解在RR隔离级别下的加锁算法逻辑 重点罗列了比较典型的几种加锁逻辑案例 对insert的加锁逻辑进行了深度剖析 实战中剖析加锁的全过程 InnoDB为什么要这样加锁 隔离级别和算法 repeatable-read 1. 使用的是next-key locking 2. next-key lock = record lock + Gap lock read-committed 1. 使用的是 record lock 2. 当然特殊情况下( purge + unique key ),也会有Gap lock 我们接下来就以RR隔离级别来阐述,因为RC更加简单 锁的通用算法 RR隔离级别 1. 锁是在索引上实现的 2. 假设有一个key,有5条记录, 1,3,5,7,9. 如果where id<5 , 那么锁住的区间不是(-∞,5),而是(-∞,1],(1,3],(3,5] 多个区间组合而成 3. RR隔离级别使用的是:next-key lock算法,即:锁住 记录本身+区间 4. next-key lock 降级为 record lock的情况 如果是唯一索引,且查询条件得到的结果集是1条记录(等值,而不是范围),那么会降级为记录锁 典型的案例:where primary_key = 1 (会降级), 而不是 where primary_key < 10 (由于返回的结果集不仅仅一条,那么不会降级) 5. 上锁,不仅仅对主键索引加锁,还需要对辅助索引加锁,这一点非常重要 锁算法的案例剖析 RR隔离级别 表结构 dba:lc_3> show create table a; +-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------+ | Table | Create Table | +-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------+ | a | CREATE TABLE `a` ( `a` int(11) NOT NULL, `b` int(11) DEFAULT NULL, `c` int(11) DEFAULT NULL, `d` int(11) DEFAULT NULL, PRIMARY KEY (`a`), UNIQUE KEY `idx_b` (`b`), KEY `idx_c` (`c`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------+ 1 row in set (0.00 sec) dba:lc_3> select * from a; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 1 | 3 | 5 | 7 | | 3 | 5 | 7 | 9 | | 5 | 7 | 9 | 11 | | 7 | 9 | 11 | 13 | +---+------+------+------+ 4 rows in set (0.00 sec) * 设置RR隔离级别 set tx_isolation = 'repeatable-read'; 等值查询,非唯一索引的加锁逻辑 dba:lc_3> begin; Query OK, 0 rows affected (0.00 sec) dba:lc_3> select * from a where c=9 for update; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 5 | 7 | 9 | 11 | +---+------+------+------+ 1 row in set (0.00 sec) TABLE LOCK table `lc_3`.`a` trx id 133601815 lock mode IX RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133601815 lock_mode X Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000009; asc ;; 1: len 4; hex 80000005; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133601815 lock_mode X locks rec but not gap Record lock, heap no 4 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000005; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d012a; asc ' *;; 3: len 4; hex 80000007; asc ;; 4: len 4; hex 80000009; asc ;; 5: len 4; hex 8000000b; asc ;; RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133601815 lock_mode X locks gap before rec Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 8000000b; asc ;; 1: len 4; hex 80000007; asc ;; 锁的结构如下: 对二级索引idx_c: 1. 加next-key lock,((7,3),(9,5)] , ((9,5),(11,7)],解读一下:((7,3),(9,5)] 表示:7是二级索引key,3是对应的主键 2.这样写不太好懂,所以以后就暂时忽略掉主键这样写: next-key lock = (7,9],(9,11] 对主键索引primary: 加record lock,[5] 等值查询,唯一键的加锁逻辑 dba:lc_3> select * from a where b=9 for update; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 7 | 9 | 11 | 13 | +---+------+------+------+ 1 row in set (0.00 sec) TABLE LOCK table `lc_3`.`a` trx id 133601816 lock mode IX RECORD LOCKS space id 281 page no 4 n bits 72 index idx_b of table `lc_3`.`a` trx id 133601816 lock_mode X locks rec but not gap Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000009; asc ;; 1: len 4; hex 80000007; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133601816 lock_mode X locks rec but not gap Record lock, heap no 5 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000007; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d0137; asc ' 7;; 3: len 4; hex 80000009; asc ;; 4: len 4; hex 8000000b; asc ;; 5: len 4; hex 8000000d; asc ;; 锁的结构如下: 对二级索引idx_b: 1. 加record lock,[9] 对主键索引primary: 1. 加record lock,[7] = ,非唯一索引的加锁逻辑 dba:lc_3> select * from a where c>=9 for update; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 5 | 7 | 9 | 11 | | 7 | 9 | 11 | 13 | +---+------+------+------+ 2 rows in set (0.00 sec) TABLE LOCK table `lc_3`.`a` trx id 133601817 lock mode IX RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133601817 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000009; asc ;; 1: len 4; hex 80000005; asc ;; Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 8000000b; asc ;; 1: len 4; hex 80000007; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133601817 lock_mode X locks rec but not gap Record lock, heap no 4 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000005; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d012a; asc ' *;; 3: len 4; hex 80000007; asc ;; 4: len 4; hex 80000009; asc ;; 5: len 4; hex 8000000b; asc ;; Record lock, heap no 5 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000007; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d0137; asc ' 7;; 3: len 4; hex 80000009; asc ;; 4: len 4; hex 8000000b; asc ;; 5: len 4; hex 8000000d; asc ;; 锁的结构如下: 对二级索引idx_c: 1. 加next-key lock, (7,9],(9,11],(11,∞] 对主键索引primary: 1. 加record lock,[5],[7] = ,唯一索引的加锁逻辑 dba:lc_3> select * from a where b>=7 for update; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 5 | 7 | 9 | 11 | | 7 | 9 | 11 | 13 | +---+------+------+------+ 2 rows in set (0.00 sec) TABLE LOCK table `lc_3`.`a` trx id 133601820 lock mode IX RECORD LOCKS space id 281 page no 4 n bits 72 index idx_b of table `lc_3`.`a` trx id 133601820 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000007; asc ;; 1: len 4; hex 80000005; asc ;; Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000009; asc ;; 1: len 4; hex 80000007; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133601820 lock_mode X locks rec but not gap Record lock, heap no 4 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000005; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d012a; asc ' *;; 3: len 4; hex 80000007; asc ;; 4: len 4; hex 80000009; asc ;; 5: len 4; hex 8000000b; asc ;; Record lock, heap no 5 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000007; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d0137; asc ' 7;; 3: len 4; hex 80000009; asc ;; 4: len 4; hex 8000000b; asc ;; 5: len 4; hex 8000000d; asc ;; 锁的结构如下: 对二级索引idx_b: 1. 加next-key lock, (5,7],(7,9],(9,∞] 对主键索引primary: 1. 加record lock,[5],[7] <= , 非唯一索引的加锁逻辑 dba:lc_3> select * from a where c<=7 for update; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 1 | 3 | 5 | 7 | | 3 | 5 | 7 | 9 | +---+------+------+------+ 2 rows in set (0.00 sec) TABLE LOCK table `lc_3`.`a` trx id 133601822 lock mode IX RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133601822 lock_mode X Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000005; asc ;; 1: len 4; hex 80000001; asc ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000007; asc ;; 1: len 4; hex 80000003; asc ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000009; asc ;; 1: len 4; hex 80000005; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133601822 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d0110; asc ' ;; 3: len 4; hex 80000003; asc ;; 4: len 4; hex 80000005; asc ;; 5: len 4; hex 80000007; asc ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000003; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d011d; asc ' ;; 3: len 4; hex 80000005; asc ;; 4: len 4; hex 80000007; asc ;; 5: len 4; hex 80000009; asc ;; 锁的结构如下: 对二级索引idx_c: 1. 加next-key lock, (-∞,5],(5,7],(7,9] 对主键索引primary: 1. 加record lock,[1],[3] <= , 唯一索引的加锁逻辑 dba:lc_3> select * from a where b<=5 for update; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 1 | 3 | 5 | 7 | | 3 | 5 | 7 | 9 | +---+------+------+------+ 2 rows in set (0.00 sec) TABLE LOCK table `lc_3`.`a` trx id 133601823 lock mode IX RECORD LOCKS space id 281 page no 4 n bits 72 index idx_b of table `lc_3`.`a` trx id 133601823 lock_mode X Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000003; asc ;; 1: len 4; hex 80000001; asc ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000005; asc ;; 1: len 4; hex 80000003; asc ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000007; asc ;; 1: len 4; hex 80000005; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133601823 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d0110; asc ' ;; 3: len 4; hex 80000003; asc ;; 4: len 4; hex 80000005; asc ;; 5: len 4; hex 80000007; asc ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000003; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d011d; asc ' ;; 3: len 4; hex 80000005; asc ;; 4: len 4; hex 80000007; asc ;; 5: len 4; hex 80000009; asc ;; 锁的结构如下: 对二级索引idx_b: 1. 加next-key lock, (-∞,3],(3,5],(5,7] 对主键索引primary: 1. 加record lock,[1],[3] , 非唯一索引的加锁逻辑 dba:lc_3> select * from a where c>9 for update; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 7 | 9 | 11 | 13 | +---+------+------+------+ 1 row in set (0.00 sec) RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133601825 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 8000000b; asc ;; 1: len 4; hex 80000007; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133601825 lock_mode X locks rec but not gap Record lock, heap no 5 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000007; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d0137; asc ' 7;; 3: len 4; hex 80000009; asc ;; 4: len 4; hex 8000000b; asc ;; 5: len 4; hex 8000000d; asc ;; 锁的结构如下: 对二级索引idx_c: 1. 加next-key lock, (9,11],(11,∞] 对主键索引primary: 1. 加record lock,[7] , 唯一索引的加锁逻辑 dba:lc_3> select * from a where b>7 for update; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 7 | 9 | 11 | 13 | +---+------+------+------+ 1 row in set (0.00 sec) TABLE LOCK table `lc_3`.`a` trx id 133601826 lock mode IX RECORD LOCKS space id 281 page no 4 n bits 72 index idx_b of table `lc_3`.`a` trx id 133601826 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; Record lock, heap no 5 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000009; asc ;; 1: len 4; hex 80000007; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133601826 lock_mode X locks rec but not gap Record lock, heap no 5 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000007; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d0137; asc ' 7;; 3: len 4; hex 80000009; asc ;; 4: len 4; hex 8000000b; asc ;; 5: len 4; hex 8000000d; asc ;; 锁的结构如下: 对二级索引idx_b: 1. 加next-key lock, (7,9],(9,∞] 对主键索引primary: 1. 加record lock,[7] < , 非唯一索引的加锁逻辑 dba:lc_3> select * from a where c<7 for update; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 1 | 3 | 5 | 7 | +---+------+------+------+ 1 row in set (0.00 sec) TABLE LOCK table `lc_3`.`a` trx id 133601827 lock mode IX RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133601827 lock_mode X Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000005; asc ;; 1: len 4; hex 80000001; asc ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000007; asc ;; 1: len 4; hex 80000003; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133601827 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d0110; asc ' ;; 3: len 4; hex 80000003; asc ;; 4: len 4; hex 80000005; asc ;; 5: len 4; hex 80000007; asc ;; 锁的结构如下: 对二级索引idx_c: 1. 加next-key lock, (-∞,5],(5,7] 对主键索引primary: 1. 加record lock,[1] < , 唯一索引的加锁逻辑 dba:lc_3> select * from a where b<5 for update; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 1 | 3 | 5 | 7 | +---+------+------+------+ 1 row in set (0.00 sec) TABLE LOCK table `lc_3`.`a` trx id 133601828 lock mode IX RECORD LOCKS space id 281 page no 4 n bits 72 index idx_b of table `lc_3`.`a` trx id 133601828 lock_mode X Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000003; asc ;; 1: len 4; hex 80000001; asc ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000005; asc ;; 1: len 4; hex 80000003; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133601828 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d0110; asc ' ;; 3: len 4; hex 80000003; asc ;; 4: len 4; hex 80000005; asc ;; 5: len 4; hex 80000007; asc ;; 锁的结构如下: 对二级索引idx_c: 1. 加next-key lock, (-∞,3],(3,5] 对主键索引primary: 1. 加record lock,[1] 总结之前的加锁逻辑 * 如果 1. select * from xx where col <比较运算符> M for update 2. M->next-rec: 表示M的下一条记录 3. M->pre-rec: 表示M的前一条记录 ########第一轮总结######## * 等值查询M,非唯一索引的加锁逻辑 (M->pre-rec,M],(M,M->next-rec] * 等值查询M,唯一键的加锁逻辑 [M], next-lock 降级为 record locks * >= ,非唯一索引的加锁逻辑 (M->pre_rec,M],(M,M->next-rec]....(∞] * >= ,唯一索引的加锁逻辑 (M->pre_rec,M],(M,M->next-rec]....(∞] * <= , 非唯一索引的加锁逻辑 (-∞] ... (M,M->next-rec] * <= , 唯一索引的加锁逻辑 (-∞] ... (M,M->next-rec] * > , 非唯一索引的加锁逻辑 (M,M->next-rec] ... (∞] * > , 唯一索引的加锁逻辑 (M,M->next-rec] ... (∞] * < , 非唯一索引的加锁逻辑 (-∞] ... (M->rec,M] * < , 唯一索引的加锁逻辑 (-∞] ... (M->rec,M] ########第二轮总结合并######## * 等值查询M,非唯一索引的加锁逻辑 (M->pre-rec,M],(M,M->next-rec] * 等值查询M,唯一键的加锁逻辑 [M], next-lock 降级为 record locks 这里大家还记得之前讲过的通用算法吗: next-key lock 降级为 record lock的情况: 如果是唯一索引,且查询条件得到的结果集是1条记录(等值,而不是范围),那么会降级为记录锁 * >= ,加锁逻辑 (M->pre_rec,M],(M,M->next-rec]....(∞] * > , 加锁逻辑 (M,M->next-rec] ... (∞] * <= , 加锁逻辑 (-∞] ... (M,M->next-rec] * < , 加锁逻辑 (-∞] ... (M->rec,M] ########最后的疑问和总结######## 1. 疑问: 为什么要对M->next-rec 或者 M->pre-rec ? 1. 回答: 因为为了防止幻读。 insert 操作的加锁逻辑 RR 隔离级别 表结构 dba:lc_3> show create table tb_non_uk; +-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | tb_non_uk | CREATE TABLE `tb_non_uk` ( `id` int(11) NOT NULL AUTO_INCREMENT, `id_2` int(11) DEFAULT NULL, PRIMARY KEY (`id`), KEY `idx_id2` (`id_2`) ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 | +-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:lc_3> show create table tb_uk; +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | tb_uk | CREATE TABLE `tb_uk` ( `id` int(11) NOT NULL AUTO_INCREMENT, `id_2` int(11) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `uniq_idx` (`id_2`) ) ENGINE=InnoDB AUTO_INCREMENT=36 DEFAULT CHARSET=utf8 | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:lc_3> select * from tb_non_uk; +----+------+ | id | id_2 | +----+------+ | 1 | 100 | | 2 | 200 | +----+------+ 2 rows in set (0.00 sec) dba:lc_3> select * from tb_uk; +----+------+ | id | id_2 | +----+------+ | 1 | 10 | | 2 | 20 | | 33 | 30 | +----+------+ 3 rows in set (0.00 sec) 普通的insert,insert之前,其他事务没有对next-record加任何锁 dba:lc_3> insert into tb_uk select 100,200; Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0 Warnings: 0 锁的结构: MySQL thread id 11888, OS thread handle 140000862643968, query id 24975 localhost dba cleaning up TABLE LOCK table `lc_3`.`tb_uk` trx id 133601936 lock mode IX 没有加任何的锁,除了在表上面加了意向锁之外,这个锁基本上只要访问到表都会加的 难道insert不会加锁吗?显然不是,那是因为加的是隐式类型的锁 有唯一键约束,insert之前,其他事务且对其next-record加了Gap-lock * session 1: select * from tb_uk where id_2 >= 30 for update; TABLE LOCK table `lc_3`.`tb_uk` trx id 133601951 lock mode IX RECORD LOCKS space id 301 page no 4 n bits 72 index uniq_idx of table `lc_3`.`tb_uk` trx id 133601951 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 8000001e; asc ;; 1: len 4; hex 80000021; asc !;; RECORD LOCKS space id 301 page no 3 n bits 72 index PRIMARY of table `lc_3`.`tb_uk` trx id 133601951 lock_mode X locks rec but not gap Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000021; asc !;; 1: len 6; hex 000007f69a77; asc w;; 2: len 7; hex ad00000d010110; asc ;; 3: len 4; hex 8000001e; asc ;; 锁住: (20,30](30,∞) , 对30有Gap锁 * session 2: dba:lc_3> insert into tb_uk select 3,25; Query OK, 1 row affected (6.30 sec) Records: 1 Duplicates: 0 Warnings: 0 * session 1: rollback; TABLE LOCK table `lc_3`.`tb_uk` trx id 133601952 lock mode IX RECORD LOCKS space id 301 page no 4 n bits 72 index uniq_idx of table `lc_3`.`tb_uk` trx id 133601952 lock_mode X locks gap before rec insert intention Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 8000001e; asc ;; 1: len 4; hex 80000021; asc !;; 当session2 插入25的时候,这时候session2 会被卡住。 然后session 2 释放gap lock后,session 1 就持有插入意向锁 lock_mode X locks gap before rec insert intention 有唯一键约束,insert之前,其他事务且对其next-record加了record lock * session 1: dba:lc_3> select * from tb_uk where id_2 = 30 for update; +----+------+ | id | id_2 | +----+------+ | 33 | 30 | +----+------+ 1 row in set (0.00 sec) TABLE LOCK table `lc_3`.`tb_uk` trx id 133601943 lock mode IX RECORD LOCKS space id 301 page no 4 n bits 72 index uniq_idx of table `lc_3`.`tb_uk` trx id 133601943 lock_mode X locks rec but not gap Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 8000001e; asc ;; 1: len 4; hex 80000021; asc !;; RECORD LOCKS space id 301 page no 3 n bits 72 index PRIMARY of table `lc_3`.`tb_uk` trx id 133601943 lock_mode X locks rec but not gap Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000021; asc !;; 1: len 6; hex 000007f69a77; asc w;; 2: len 7; hex ad00000d010110; asc ;; 3: len 4; hex 8000001e; asc ;; * session 2: dba:lc_3> insert into tb_uk select 3,25; Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0 Warnings: 0 锁结构: 说明有唯一键约束,insert之前,其他事务且对其next-record加了record lock,不会阻塞insert。 此时的insert,也不会产生insert intension lock 有唯一键约束,insert 记录之后,发现原来的表有重复值的情况, * session 1: dba:lc_3> select * from tb_uk where id_2 = 30 for update; +----+------+ | id | id_2 | +----+------+ | 33 | 30 | +----+------+ 1 row in set (0.00 sec) dba:lc_3> delete from tb_uk where id_2 = 20; Query OK, 1 row affected (0.00 sec) 这时候的锁结构如下: TABLE LOCK table `lc_3`.`tb_uk` trx id 133601943 lock mode IX RECORD LOCKS space id 301 page no 4 n bits 72 index uniq_idx of table `lc_3`.`tb_uk` trx id 133601943 lock_mode X locks rec but not gap Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 4; hex 80000014; asc ;; 1: len 4; hex 80000002; asc ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 8000001e; asc ;; 1: len 4; hex 80000021; asc !;; RECORD LOCKS space id 301 page no 3 n bits 72 index PRIMARY of table `lc_3`.`tb_uk` trx id 133601943 lock_mode X locks rec but not gap Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 32 0: len 4; hex 80000002; asc ;; 1: len 6; hex 000007f69a97; asc ;; 2: len 7; hex 460000403f090b; asc F @? ;; 3: len 4; hex 80000014; asc ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000021; asc !;; 1: len 6; hex 000007f69a77; asc w;; 2: len 7; hex ad00000d010110; asc ;; 3: len 4; hex 8000001e; asc ;; 对二级索引uniq_idx : 1. 加record lock , [20],[30] 对主键索引: 1. 加record lock,[2],[33] * session 2: dba:lc_3> insert into tb_uk select 3,20; ...............waiting................. 这时候,我们再来看看锁结构: TABLE LOCK table `lc_3`.`tb_uk` trx id 133601949 lock mode IX RECORD LOCKS space id 301 page no 4 n bits 72 index uniq_idx of table `lc_3`.`tb_uk` trx id 133601949 lock mode S waiting Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 4; hex 80000014; asc ;; 1: len 4; hex 80000002; asc ;; ---TRANSACTION 133601943, ACTIVE 490 sec 3 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 1 MySQL thread id 11889, OS thread handle 140000878618368, query id 25018 localhost dba cleaning up TABLE LOCK table `lc_3`.`tb_uk` trx id 133601943 lock mode IX RECORD LOCKS space id 301 page no 4 n bits 72 index uniq_idx of table `lc_3`.`tb_uk` trx id 133601943 lock_mode X locks rec but not gap Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 4; hex 80000014; asc ;; 1: len 4; hex 80000002; asc ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 8000001e; asc ;; 1: len 4; hex 80000021; asc !;; RECORD LOCKS space id 301 page no 3 n bits 72 index PRIMARY of table `lc_3`.`tb_uk` trx id 133601943 lock_mode X locks rec but not gap Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 32 0: len 4; hex 80000002; asc ;; 1: len 6; hex 000007f69a97; asc ;; 2: len 7; hex 460000403f090b; asc F @? ;; 3: len 4; hex 80000014; asc ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000021; asc !;; 1: len 6; hex 000007f69a77; asc w;; 2: len 7; hex ad00000d010110; asc ;; 3: len 4; hex 8000001e; asc ;; info bits 32 表示这条记录已经标记为删除状态 这里面的session 2 : insert into tb_uk select 3,20; 被阻塞了 因为,这条insert 语句需要对 uniq_idx中的20加lock mode S , 但是发现session 1 已经对其加了lock_mode X locks rec but not gap,而这条记录被标记为删除状态 所以发生锁等待,因为S lock 和 X lock 冲突 没有唯一键约束,insert之前,其他事务对其next-record加了Gap-lock * session 1: dba:lc_3> select * from tb_non_uk where id_2>=100 for update; +----+------+ | id | id_2 | +----+------+ | 1 | 100 | | 2 | 200 | +----+------+ 2 rows in set (0.00 sec) 锁结构: TABLE LOCK table `lc_3`.`tb_non_uk` trx id 133601939 lock mode IX RECORD LOCKS space id 302 page no 4 n bits 72 index idx_id2 of table `lc_3`.`tb_non_uk` trx id 133601939 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 800000c8; asc ;; 1: len 4; hex 80000002; asc ;; RECORD LOCKS space id 302 page no 3 n bits 72 index PRIMARY of table `lc_3`.`tb_non_uk` trx id 133601939 lock_mode X locks rec but not gap Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000002; asc ;; 1: len 6; hex 000007f69a6b; asc k;; 2: len 7; hex a500000d360110; asc 6 ;; 3: len 4; hex 800000c8; asc ;; 对idx_id2二级索引: (100,200],(200,∞] 对主键索引: [2] * session 2: dba:lc_3> insert into tb_non_uk select 3,150; ......waiting..... ---TRANSACTION 133601940, ACTIVE 3 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1 MySQL thread id 11888, OS thread handle 140000862643968, query id 24996 localhost dba executing insert into tb_non_uk select 3,150 ------- TRX HAS BEEN WAITING 3 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 302 page no 4 n bits 72 index idx_id2 of table `lc_3`.`tb_non_uk` trx id 133601940 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 800000c8; asc ;; 1: len 4; hex 80000002; asc ;; ------------------ TABLE LOCK table `lc_3`.`tb_non_uk` trx id 133601940 lock mode IX RECORD LOCKS space id 302 page no 4 n bits 72 index idx_id2 of table `lc_3`.`tb_non_uk` trx id 133601940 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 800000c8; asc ;; 1: len 4; hex 80000002; asc ;; ---TRANSACTION 133601939, ACTIVE 311 sec 3 lock struct(s), heap size 1136, 3 row lock(s) MySQL thread id 11889, OS thread handle 140000878618368, query id 24994 localhost dba cleaning up TABLE LOCK table `lc_3`.`tb_non_uk` trx id 133601939 lock mode IX RECORD LOCKS space id 302 page no 4 n bits 72 index idx_id2 of table `lc_3`.`tb_non_uk` trx id 133601939 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 800000c8; asc ;; 1: len 4; hex 80000002; asc ;; RECORD LOCKS space id 302 page no 3 n bits 72 index PRIMARY of table `lc_3`.`tb_non_uk` trx id 133601939 lock_mode X locks rec but not gap Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 4; hex 80000002; asc ;; 1: len 6; hex 000007f69a6b; asc k;; 2: len 7; hex a500000d360110; asc 6 ;; 3: len 4; hex 800000c8; asc ;; 锁结构: 多了一个插入意向锁 lock_mode X locks gap before rec insert intention 总结Insert 操作的加锁流程 * insert 的流程(没有唯一索引的情况): insert N 1. 找到大于N的第一条记录M 2. 如果M上面没有gap , next-key locking的话,可以插入 , 否则等待 (对其next-rec加insert intension lock,由于有gap锁,所以等待) * insert 的流程(有唯一索引的情况): insert N 1. 找到大于N的第一条记录M,以及前一条记录P 2. 如果M上面没有gap , next-key locking的话,进入第三步骤 , 否则等待(对其next-rec加insert intension lock,由于有gap锁,所以等待) 3. 检查p: 判断p是否等于n: 如果不等: 则完成插入(结束) 如果相等: 再判断P 是否有锁, 如果没有锁: 报1062错误(duplicate key) --说明该记录已经存在,报重复值错误 加S-lock --说明该记录被标记为删除, 事务已经提交,还没来得及purge 如果有锁: 则加S-lock --说明该记录被标记为删除,事务还未提交. * insert intension lock 有什么用呢?锁的兼容矩阵是啥? 1. insert intension lock 是一种特殊的Gap lock,记住非常特殊哦 2. insert intension lock 和 insert intension lock 是兼容的,其次都是不兼容的 3. Gap lock 是为了防止insert, insert intension lock 是为了insert并发更快,两者是有区别的 4. 什么情况下会出发insert intension lock ? 当insert的记录M的 next-record 加了Gap lock才会发生,record lock并不会触发 实战案例 RR 隔离级别最后来一个比较复杂的案例作为结束通过这几个案例,可以复习下之前讲过的理论,锁不仅对主键加,还要考虑二级索引哦 环境 set tx_isolation = 'repeatable-read'; CREATE TABLE `a` ( `a` int(11) NOT NULL, `b` int(11) DEFAULT NULL, `c` int(11) DEFAULT NULL, `d` int(11) DEFAULT NULL, PRIMARY KEY (`a`), UNIQUE KEY `idx_b` (`b`), KEY `idx_c` (`c`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 dba:lc_3> select * from a; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 1 | 3 | 5 | 7 | | 3 | 5 | 7 | 9 | | 5 | 7 | 9 | 11 | | 7 | 9 | 11 | 13 | +---+------+------+------+ 4 rows in set (0.00 sec) 加锁语句 select * from a where c<9 for update; 锁结构: TABLE LOCK table `lc_3`.`a` trx id 133601957 lock mode IX RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133601957 lock_mode X Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000005; asc ;; 1: len 4; hex 80000001; asc ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000007; asc ;; 1: len 4; hex 80000003; asc ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000009; asc ;; 1: len 4; hex 80000005; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133601957 lock_mode X locks rec but not gap Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000001; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d0110; asc ' ;; 3: len 4; hex 80000003; asc ;; 4: len 4; hex 80000005; asc ;; 5: len 4; hex 80000007; asc ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000003; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d011d; asc ' ;; 3: len 4; hex 80000005; asc ;; 4: len 4; hex 80000007; asc ;; 5: len 4; hex 80000009; asc ;; 二级索引idx_c 加锁 next-key lock: (-∞,5],(5,7],(7,9] primary key 加锁 record lock: [1]和[3] 案例一 insert into a select 4,40,9,90 大家觉得能够插入成功吗? dba:lc_3> insert into a select 4,40,9,90; ^C^C -- query aborted ERROR 1317 (70100): Query execution was interrupted ...................waiting................. 显然是被锁住了 TABLE LOCK table `lc_3`.`a` trx id 133601961 lock mode IX RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133601961 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000009; asc ;; 1: len 4; hex 80000005; asc ;; 案例二 insert into a select 6,40,9,90; 大家觉得能够插入成功吗? dba:lc_3> insert into a select 6,40,9,90; Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0 Warnings: 0 显然是插入成功了
agenda 我们能学到什么 什么是MVCC MVCC能解决什么问题 MVCC的实现原理 一、什么是MVCC 名词解释 英文名:Multi Version Concurrency Control 中文名:多版本一致性控制 应用场景 大家有没有这样的疑问,线上的表一直被更新,可是为什么还可以去select呢? 我的更新事务还没有提交,为什么另外一个事务可以读到数据呢? 我的更新事务已经提交,另一个事务又是怎么选择数据返回给用户呢? 二、MVCC能解决什么问题 解决的问题 1. snapshot查询不会加锁,读和读,读和写之间互不影响,提高数据库的并发能力 2. 隔离级别的实现 三、MVCC的实现原理 3.1 row的记录格式 记住这个格式,很重要 3.2 row 和 undo 3.3 readview 3.4 可见性判断 实现原理 readview = 活跃事务列表 readview(RR): 事务开始时产生readview readview(RC): 每条语句都会产生readview 如何判断可见性: 假设:活跃事务为(3,4,5,6)=readview,当前事务id号为10,做了修改这条记录 , 那么这条记录上的db_trx_id=10 流程如下: 当前事务(trxid=10)拿着刚刚产生的readview =(3[active_trx_min],4,5,6【active_trx_max】)去查看记录, 1.如果row上的db_trx_id in (活跃事务列表),那么说明此记录还未提交,这条记录对于此事务不可见.需要调用上一个undo,用同样的判断标准过滤,循环 2.如果row上的db_trx_id < 活跃事务列表最小值,那么说明已经提交,这条记录对于此事务可见 3.如果row上的db_trx_id > 活跃事务列表最大值, 那么说明该记录在当前事务之后提交,这条记录对于此事务不可见.需要调用上一个undo,用同样的判断标准过滤,循环 这里有个问题: 当前事务id更新后,会锁住该记录并更新db_trx_id=10,那么该记录上的trx_id肯定是<=当前事务id(10)的,那既然这样,怎么会产生db_trx_id > 活跃事务列表最大值呢? 原因:因为当前事务不仅仅是读取这条被锁住的记录,可能还需要读取其他记录(这些记录当然可能被其他更靠后的事务id更新了),那么这时候其他记录上的db_trx_id>=10就很正常不过了。 创建readview的位置,不是begin的那个位置,而是begin后面的SQL语句的位置。(换句话说:就是begin的时候不会分配事务id,只有执行了sql之后才会分配事务id) 如果你想在开启transaction的时候就产生readview,分配事务id,那么可以这样操作:start transaction with consistent snapshot percona 中可以有这样的信息,官方没有: Trx read view will not see trx with id >= 413 , sees < 411 案例剖析
一、隔离级别 事务的隔离级别有4种: SQL-1992 ,但是我只想介绍其中两种,因为其他的两个根本就用不上 1.1 什么叫一致性锁定读 和 一致性非锁定读 一致性锁定读 1. 读数据的时候,会去加S-lock、x-lock 2. eg:select ... for update , select ... lock in share mode 3. dml语句 一致性非锁定读 1. 读数据的时候,不加任何的锁,快照读(snapshot read) 2. eg: select ... 最普通的查询语句 1.2 什么是幻读(不可重复读) 概念 一个事务内的同一条【一致性锁定读】SQL多次执行,读到的结果不一致,我们称之为幻读。 实战 * set global tx_isolation='READ-COMMITTED' > 事务一: root:test> begin;select * from lc for update; +------+ | id | +------+ | 1 | | 2 | +------+ > 事务二: root:test>begin; insert into lc values(3); Query OK, 1 row affected (0.00 sec) root:test> commit ; Query OK, 0 rows affected (0.00 sec) > 事务一: root:test> select * from lc for update; +------+ | id | +------+ | 1 | | 2 | | 3 | +------+ 3 rows in set (0.00 sec) * 同一个事务一中,同一条select * from lc for update (一致性锁定读) 执行两次,得到的结果不一致,说明产生了幻读 * 同一个事务一中,同一条select * from lc (一致性非锁定读) 执行两次,得到的结果不一致,说明产生了幻读 * 我们姑且认为,幻读和不可重复读为一个概念,实际上也差不多一个概念。 1.3 什么是脏读 1. 这个大家都很多好理解,就是事务一还没有提交的事务,却被事务二读到了,这就是脏读 1.4 repeatable-read(RR) 什么是RR 1. 学名: 可重复读 2. 顾名思义:一个事务内的同一条【一致性锁定读】SQL多次执行,读到的结果一致,我们称之为可重复读。 3. 解决了幻读的问题 1.5 read-committed (RC) * 学名:可提交读 * 顾名思义: 只要其他事务提交了,我就能读到 * 解决了脏读的问题,没有解决幻读的问题 二、隔离级别是如何实现的 就拿上面那个简单的例子来佐证好了 环境 dba:lc_4> show create table lc; +-------+--------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+--------------------------------------------------------------------------------------------------------+ | lc | CREATE TABLE `lc` ( `id` int(11) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------+--------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:lc_4> select * from lc; +----+ | id | +----+ | 1 | | 2 | | 3 | +----+ 3 rows in set (0.00 sec) 2.1 RR RR 如何解决幻读问题?RR 的锁算法:next-key lock 解决幻读的案例 dba:lc_4> set tx_isolation='repeatable-read'; Query OK, 0 rows affected (0.00 sec) dba:lc_4> select * from lc for update ; +----+ | id | +----+ | 1 | | 2 | | 3 | +----+ 3 rows in set (0.00 sec) 这时候,查看下锁的情况: ------------ TRANSACTIONS ------------ Trx id counter 133588361 Purge done for trx's n:o < 133588356 undo n:o < 0 state: running but idle History list length 892 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 421565826150000, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421565826149088, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 133588360, ACTIVE 4 sec 2 lock struct(s), heap size 1136, 4 row lock(s) MySQL thread id 135, OS thread handle 140001104295680, query id 1176 localhost dba cleaning up TABLE LOCK table `lc_4`.`lc` trx id 133588360 lock mode IX RECORD LOCKS space id 289 page no 3 n bits 72 index PRIMARY of table `lc_4`.`lc` trx id 133588360 lock_mode X --next key lock , 锁记录和范围 Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; --next-key lock, 锁住正无穷大 Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 80000001; asc ;; --next-key lock, 锁住1和1之前的区间,包括记录 (negtive,1] 1: len 6; hex 000007f6657e; asc e~;; 2: len 7; hex e5000040220110; asc @" ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 80000002; asc ;; --next-key lock, 锁住2和1之前的区间,包括记录 (1,2] 1: len 6; hex 000007f6657f; asc e ;; 2: len 7; hex e6000040330110; asc @3 ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 80000003; asc ;; --next-key lock, 锁住3和2之间的区间,包括记录 (2,3] 1: len 6; hex 000007f66584; asc e ;; 2: len 7; hex e9000040240110; asc @$ ;; * 总结下来就是: 1. (negtive bounds,1] , (1,2] , (2,3],(3,positive bounds) --锁住的记录和范围,相当于表锁 2. 这时候,session 2 插入任何一条记录,会被锁住,所以幻读可以避免,尤其彻底解决了幻读的问题 2.2 RC RC 的锁算法:record locks幻读对线上影响大吗? oracle默认就是RC隔离级别 不解决幻读的案例 dba:lc_4> set tx_isolation='read-committed'; Query OK, 0 rows affected (0.00 sec) dba:lc_4> select * from lc for update ; +----+ | id | +----+ | 1 | | 2 | | 3 | +----+ 3 rows in set (0.00 sec) * 查看锁的信息如下 ------------ TRANSACTIONS ------------ Trx id counter 133588362 Purge done for trx's n:o < 133588356 undo n:o < 0 state: running but idle History list length 892 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 421565826150000, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 421565826149088, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 133588361, ACTIVE 3 sec 2 lock struct(s), heap size 1136, 3 row lock(s) MySQL thread id 138, OS thread handle 140001238955776, query id 1192 localhost dba cleaning up TABLE LOCK table `lc_4`.`lc` trx id 133588361 lock mode IX RECORD LOCKS space id 289 page no 3 n bits 72 index PRIMARY of table `lc_4`.`lc` trx id 133588361 lock_mode X locks rec but not gap --记录锁,只锁记录 Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 80000001; asc ;; -- 记录锁,锁住1 1: len 6; hex 000007f6657e; asc e~;; 2: len 7; hex e5000040220110; asc @" ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 80000002; asc ;; -- 记录锁,锁住2 1: len 6; hex 000007f6657f; asc e ;; 2: len 7; hex e6000040330110; asc @3 ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 80000003; asc ;; -- 记录锁,锁住3 1: len 6; hex 000007f66584; asc e ;; 2: len 7; hex e9000040240110; asc @$ ;; * 总结下来 1. 锁住的是哪些? [1,2,3] 这些记录被锁住 2. 那么session 2 除了1,2,3 不能插入之外,其他的记录都能,比如; insert into lc select 4 , 那么再次select * from lc for udpate 的时候,就是4条记录了,由此产生幻读 2.3 RC vs RR 安全性 RC 和 binlog 1. RC 模式,binlog 必须使用Row 模式 为什么RC的binlog必须使用Row * session 1: begin; delete from tb_1 where id > 0; * session 2: begin; insert into tb_1 select 100; commit; * session 1: commit; * 如果RC模式下的binlog是statement模式,结果会是怎么样呢? master : 结果是 100 slave : 结果是 空 这样就导致master和slave结果不一致了: 因为在slave上,先执行insert into tb_1 select 100; 再执行delete from tb_1 where id > 0; 当然等于空咯 * 如果RC模式下的binlog是ROW模式,结果会是怎么样呢? master : 结果是 100 slave : 结果是 100 主从结果一致,因为binlog是row模式,slave并不是逻辑的执行上述sql,而记录的都是行的变化 2.4 总结 RC 的优点 1. 由于降低了隔离级别,那么实现起来简单,对锁的开销小,基本上不会有Gap lock,那么导致死锁和锁等待的可能就小 2. 当然RC也不是完全没有Gap lock,当purge 和 唯一性索引存在的时候会产生特殊的Gap lock,这个后面会具体讲 RC 的缺点 1. 会有幻读发生 2. 事务内的每条select,都会产生新的read-view,造成资源浪费 RR 的优点 1. 一个事务,只有再开始的时候才会产生read-view,有且只有一个,所以这块消耗比较小 2. 解决了幻读的问题, 实现了真正意义上的隔离级别 RR 的缺点 1. 由于RR的实现,是通过Gap-lock实现,经常会锁定一个范围,那么导致死锁和所等待的概率非常大 我们的选择 一般我们生产环境的标配,都是RC+Row 模式,谁用谁知道哦
什么是undo 1) redo 记录的是对页的重做日志,undo 记录的是对事务的逆向操作 2) undo 会产生redo,undo的产生也会伴随这redo的产生,因为重启恢复的时候,可以通过redo还原这些undo的操作,以达到回滚的目的 undo有什么用 1) 用于对事务的回滚 2)用于MVCC undo的存储结构 rollback segment * 在MySQL5.1的年代,一个MySQL实例,就只有一个rollback segment * 在MySQL5.1+ 的年代,一个MySQL实例里面,可以有128个rollback segment undo segment * 一个segment 有 1024 个 undo slot,一个undo slot 对应一个undo log * 一个事务(dml)对应一个undo log 总结 据此推断: 1) 5.1 最多能够承载的并发事务(dml),1 * 1024 = 1024 2)5.1+ 最多能够承载的并发事务(dml),128 * 1024 = 10w左右 从此可以看出,5.1 之后的版本支持的并发写入事务数更多,性能更好 undo的格式 insert_undo 1) insert操作产生的undo 2)为什么要单独出来,因为insert的undo可以立马释放(不需要purge),不需要判断是否有其他事务引用,本来insert的事务也没有任何事务可以看见它嘛 update_undo 1)delete 或者 update 操作产生的undo日志 2)判断undo是否可以被删除,必须看这个undo上面是否被其他事务所引用 3) 如果没有任何事务引用,那么可以由后台线程purge掉这个undo 如何判断undo日志是否有其他事务引用呢 1. 每一个undo log中都有一个DB_trx_id , 这个id记录的是该undo最近一次被更新的事务id 2. 如果这个id 不在readview(活跃事务列表) 里面,就可以认为没事务引用,即可删除? undo存放在哪里 1) 5.6之前的版本,undo都是存放在ibdata,也就是所谓的共享表空间里面的 2) 5.6以及之后的版本,可以配置存放在单独的undo表空间中 什么是purge 1) delete语句操作的后,只会对其进行delete mark,这些被标记为删除的记录只能通过purge来进行物理的删除,但是并不回收空间 2)undo log,如果undo 没有任何事务再引用,那么也只能通过purge线程来进行物理的删除,但是并不回收空间 purge后空间就释放了吗 1) undo page里面可以存放多个undo log日志 2)只有当undo page里面的所有undo log日志都被purge掉之后,这个页的空间才可能被释放掉,否则这些undo page可以被重用 DML的相关物理实现算法 主键索引 1. 对于delete --需要undo绑定该记录才能进行回滚,所以只能打上标记,否则undo指向哪里呢 delete mark 2. 对于update --原记录可以物理删除,因为可以在新插入进来的地方进行undo绑定 * 如果不能原地更新: delete(注意:这里是直接delete,而不是delete mark) + insert * 如果可以原地更新,那么直接update就好 二级索引 1. 对于delete --不能直接被物理删除,因为二级索引没有undo,只能通过打标记,然后回滚。否则如果被物理删除,则无法回滚 delete mark 2. 对于update --不能直接被物理删除,因为二级索引没有undo,只能通过打标记,然后回滚。否则如果被物理删除,则无法回滚 delete mark + insert
环境 1. DB: Server version: 5.7.18-log MySQL Community Server (GPL) 2. OS: CentOS release 6.6 (Final) 问题描述 问题要害 1. 不定时的磁盘util 100% 2. 每次持续时间就几秒钟 问题分析 第一反应 1. 看到这个问题,我的第一反应就是去看看mysql slow query 2. 结果通过omega系统里面的智能slow query系统得到的答案是:无明显slow 问题到这,基本上根据经验已经无法快速得到答案,然后继续思考 看各项监控 1. cpu 正常,历史曲线一致 2. load 正常, 历史曲线一致 3. InnoDB 核心监控正常,历史曲线一致 4. 网络正常,历史曲线一致 看下来都很正常,唯独磁盘io不正常 既然是io压力,那么很自然的查看iostat和iotop 1. iostat 经过一段时间的iostat(因为要问题复现必须要等待,因为通过监控得到,问题时间不规律)发现,磁盘io 100% 的时候,基本上wio=2000,wMB/s 800M左右 2. iotop 经过一段时间的观察,唯独了一个系统进程[flush-8:16] io占用特别高 到这里,是不是就基本锁定问题了呢?去查了下,[flush-8:16] 就是OS内核再刷新数据到这一步,基本上快接近真相了,但是剧情并没有按照想象的发展,那么问题来了,这个刷新是谁导致的呢?最后的凶手又是谁呢? 回顾问题 1. 基本确定是内核刷新数据导致,排除掉硬件故障 2. 是系统自己刷新?还是MySQL再刷新? 3. io 100% 为什么以前没有发生这样的现象,唯独最近一周发生,这一周我们做了哪些改变 a)MySQL 从5.6 升级到 5.7 b)MySQL的参数从5.6,优化到 5.7,参数优化的变量因子还是挺多的,没办法一一去掉后排查 c)最近由于机器问题,切换了一次master d) 启动了压缩表功能 那就分析下是os自己的刷新,还是MySQL内核的刷新 分析下是否MySQL的脏页刷新导致 1. MySQL 刷新数据,io占用比较高的地方有哪些 a)binlog: binlog 并不在出问题的分区上,所以binlog 可以排除 b)redo log : b.1) redo log 是顺序写,checkpoint的age在800M左右,大小上来看非常温和,但是要记住,这仅仅是age,并不是一次性要刷新这么多 b.2) redo log 是没有o_direct的,所以可能导致操作系统刷新数据 b.3) redo log的刷新条件和触发时机有一个是:每秒钟都刷新,每一次commit都刷新,所以更加可以排除掉redo造成的问题,因为一个commit在一秒内不可能有这么大的日志量 c)data file : c.1) data file 如果要刷新800M,那至少要刷新好几万个page吧,如果要刷新那么多页,MySQL估计就已经hung住了 c.2) data file 我们设置的是: flush_method=O_Direct, 这表示InnoDB自己管理内存刷新 c.3) checkpoint的触发时机:当checkpoint_age达到75%的redo 总大小时触发,然而远远没有达到 c.4) 查看modified pages 的频率,并没有明显的异常 所以,排除掉是MySQL的刷新问题 分析下是否系统产生的脏页导致的问题 while true; do cat /proc/vmstat |grep nr_dir; date; sleep 1; done Wed Jun 7 15:59:18 CST 2017 nr_dirty 182832 Wed Jun 7 15:59:19 CST 2017 nr_dirty 494958 Wed Jun 7 15:59:20 CST 2017 nr_dirty 815964 Wed Jun 7 15:59:21 CST 2017 nr_dirty 1140783 Wed Jun 7 15:59:22 CST 2017 nr_dirty 1474413 Wed Jun 7 15:59:23 CST 2017 nr_dirty 1382764 Wed Jun 7 15:59:24 CST 2017 当脏页非常多的时候,过几秒,io 100%就必现 基本可以断定,是操作系统的刷新导致的问题 再次iotop 1) 这一次的iotop,由于目不转睛的人肉扫描,终于发现另一个可疑进程 cp xx.err xx.err.bak 2) 然后查看了下这个xx.err文件,竟然有8G的大小 3) 然后问题终于已经定位成功 总结&改进 为什么MySQL error log 会这么大呢? 1) 5.7新增参数innodb print dead locks=1,这样可以打印出详细的死锁日志 2) 然后线上一堆死锁 3) 导致error log日志非常大 4) 然后我们自己的监控回去定期cp error log 5) 然后问题就发生了 至于为什么有那么的死锁信息,这个后面有MySQL锁的专题文章专门介绍 改进方案: 1)去掉这个参数 ,没必要打印出所有的死锁信息,当有死锁的时候,实时查看也可以的 2)error log的日志监控 3)cp的方式优化掉 为什么iotop 一开始没有发现到这个cp 进程呢? 1) 由于cp的时间就几秒非常短,所以当我们看到的时候,已经是在flush 阶段了 有什么先进的工具可以直接定位到文件,哪个文件所占用的io最大呢? 1) pt-ioprofile 优点:可以分析出文件的io占比 缺点: 比较重,是用ptrace的方式来实现,可能会对mysql有影响 貌似只会对MySQL的文件才监控 最后 问题的难点:问题出现时间短,无规律 这个问题的解决,其实也非常简单,可是为什么没能一开始就找到答案呢,说明自己水平还是有限的,还需要多总结增加经验和不断学习 这次的收获也非常多,经过这次的问题,分析问题的过程中对MySQL的体系结构又再次深入了解学习了下 要把这种细小的问题分析透彻,是需要一定的坚持和固执的,因为就偶尔几秒钟的io100%,很可能不会引起大家的关注 最后非常感谢姜承尧和宋武斌的帮助,才能够最终彻底发现这样的问题
背景 锁系列第一期的时候介绍的锁,我们要如何去解读呢? 在哪里能够看到这些锁? 锁信息解读 工欲善其事必先利其器show engine innodb status 关于锁的信息是最详细的 案例一(有索引的情况) 前期准备 dba:lc_3> dba:lc_3> dba:lc_3> show create table a; +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------+ | Table | Create Table | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------+ | a | CREATE TABLE `a` ( `a` int(11) NOT NULL, `b` int(11) DEFAULT NULL, `c` int(11) DEFAULT NULL, `d` int(11) DEFAULT NULL, PRIMARY KEY (`a`), UNIQUE KEY `idx_b` (`b`), KEY `idx_c` (`c`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:lc_3> dba:lc_3> select * from a; +---+------+------+------+ | a | b | c | d | +---+------+------+------+ | 1 | 3 | 5 | 7 | | 3 | 5 | 7 | 9 | | 5 | 7 | 9 | 11 | | 7 | 9 | 11 | 13 | +---+------+------+------+ 4 rows in set (0.00 sec) 产生锁的语句 dba:lc_3> set tx_isolation = 'repeatable-read'; --事务隔离级别为repeatable-read,以后介绍 Query OK, 0 rows affected (0.00 sec) begin; select * from a where c=7 for update; show engine innodb status ------------ TRANSACTIONS ------------ Trx id counter 133588132 Purge done for trx's n:o < 133588131 undo n:o < 0 state: running but idle History list length 836 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 421565826149088, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 133588131, ACTIVE 8 sec 4 lock struct(s), heap size 1136, 3 row lock(s) MySQL thread id 116, OS thread handle 140001238423296, query id 891 localhost dba cleaning up TABLE LOCK table `lc_3`.`a` trx id 133588131 lock mode IX RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133588131 lock_mode X Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000007; asc ;; 1: len 4; hex 80000003; asc ;; RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133588131 lock_mode X locks rec but not gap Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000003; asc ;; 1: len 6; hex 000007f66444; asc dD;; 2: len 7; hex fc0000271d011d; asc ' ;; 3: len 4; hex 80000005; asc ;; 4: len 4; hex 80000007; asc ;; 5: len 4; hex 80000009; asc ;; RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133588131 lock_mode X locks gap before rec Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000009; asc ;; 1: len 4; hex 80000005; asc ;; show engine innodb status 解读 * Trx id counter 133588132 描述的是:下一个事务的id为133588132 * Purge done for trx's n:o < 133588131 undo n:o < 0 state: running but idle Purge线程已经将trxid小于133588131的事务都purge了,目前purge线程的状态为idle Purge线程无法控制 * History list length 836 undo中未被清除的事务数量,如果这个值非常大,说明系统来不及回收undo,需要人工介入了。 疑问:上面的purge都已经刷新完了,为什么History list length 不等于0,这是一个有意思的问题 * ---TRANSACTION 133588131, ACTIVE 8 sec 当前事务id为133588131 * 4 lock struct(s), heap size 1136, 3 row lock(s) 产生了4个锁对象结构,占用内存大小1136字节,3条记录被锁住(1个表锁,3个记录锁) * TABLE LOCK table `lc_3`.`a` trx id 133588131 lock mode IX 在a表上面有一个表锁,这个锁的模式为IX(排他意向锁) * RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133588131 lock_mode X 在space id=281(a表的表空间),page no=5的页上,对表a上的idx_c索引加了记录锁,锁模式为:next-key 锁(这个在上一节中有告知) 该页上面的位图锁占有72bits * 具体锁了哪些记录 Record lock, heap no 3 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 -- heap no 3 的记录被锁住了 0: len 4; hex 80000007; asc ;; --这是一个二级索引上的锁,7被锁住 1: len 4; hex 80000003; asc ;; --二级索引上面还会自带一个主键,所以主键值3也会被锁住 RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133588131 lock_mode X locks rec but not gap(这是一个记录锁,在主键上锁住的) Record lock, heap no 3 PHYSICAL RECORD: n_fields 6; compact format; info bits 0 0: len 4; hex 80000003; asc ;; --第一个字段是主键3,占用4个字节,被锁住了 1: len 6; hex 000007f66444; asc dD;; --该字段为6个字节的事务id,这个id表示最近一次被更新的事务id 2: len 7; hex fc0000271d011d; asc ' ;; --该字段为7个字节的回滚指针,用于mvcc 3: len 4; hex 80000005; asc ;; --该字段表示的是此记录的第二个字段5 4: len 4; hex 80000007; asc ;; --该字段表示的是此记录的第三个字段7 5: len 4; hex 80000009; asc ;; --该字段表示的是此记录的第四个字段9 RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133588131 lock_mode X locks gap before rec Record lock, heap no 4 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 4; hex 80000009; asc ;; --这是一个二级索引上的锁,9被锁住 1: len 4; hex 80000005; asc ;; --二级索引上面还会自带一个主键,所以主键值5被锁住 案例二(无索引的情况) 前期准备 dba:lc_3> show create table t; +-------+------------------------------------------------------------------------------------+ | Table | Create Table | +-------+------------------------------------------------------------------------------------+ | t | CREATE TABLE `t` ( `i` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------+------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:lc_3> select * from t; +------+ | i | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 5 | | 5 | | 5 | | 5 | | 5 | | 5 | | 5 | | 5 | | 5 | | 5 | | 5 | +------+ 16 rows in set (0.00 sec) 产生锁语句 dba:lc_3> set tx_isolation = 'repeatable-read'; Query OK, 0 rows affected (0.00 sec) dba:lc_3> select * from t where i=1 for update; +------+ | i | +------+ | 1 | +------+ 1 row in set (0.00 sec) show engine innodb status ------------ TRANSACTIONS ------------ Trx id counter 133588133 Purge done for trx's n:o < 133588131 undo n:o < 0 state: running but idle History list length 836 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 421565826149088, not started 0 lock struct(s), heap size 1136, 0 row lock(s) ---TRANSACTION 133588132, ACTIVE 6 sec 2 lock struct(s), heap size 1136, 17 row lock(s) MySQL thread id 118, OS thread handle 140001238955776, query id 904 localhost dba cleaning up TABLE LOCK table `lc_3`.`t` trx id 133588132 lock mode IX RECORD LOCKS space id 278 page no 3 n bits 88 index GEN_CLUST_INDEX of table `lc_3`.`t` trx id 133588132 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff05; asc ;; 1: len 6; hex 000007f66397; asc c ;; 2: len 7; hex fb0000271c0110; asc ' ;; 3: len 4; hex 80000001; asc ;; Record lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff06; asc ;; 1: len 6; hex 000007f663ea; asc c ;; 2: len 7; hex bb000027340110; asc '4 ;; 3: len 4; hex 80000002; asc ;; Record lock, heap no 4 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff07; asc ;; 1: len 6; hex 000007f66426; asc d&;; 2: len 7; hex e4000040210110; asc @! ;; 3: len 4; hex 80000003; asc ;; Record lock, heap no 5 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff08; asc ;; 1: len 6; hex 000007f66427; asc d';; 2: len 7; hex e5000040220110; asc @" ;; 3: len 4; hex 80000004; asc ;; Record lock, heap no 6 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff09; asc ;; 1: len 6; hex 000007f6642c; asc d,;; 2: len 7; hex e8000040230110; asc @# ;; 3: len 4; hex 80000005; asc ;; Record lock, heap no 7 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff0a; asc ;; 1: len 6; hex 000007f6642d; asc d-;; 2: len 7; hex e9000040240110; asc @$ ;; 3: len 4; hex 80000005; asc ;; Record lock, heap no 8 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff0b; asc ;; 1: len 6; hex 000007f66432; asc d2;; 2: len 7; hex ec0000273f0110; asc '? ;; 3: len 4; hex 80000005; asc ;; Record lock, heap no 9 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff0c; asc ;; 1: len 6; hex 000007f66433; asc d3;; 2: len 7; hex ed000040020110; asc @ ;; 3: len 4; hex 80000005; asc ;; Record lock, heap no 10 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff0d; asc ;; 1: len 6; hex 000007f66434; asc d4;; 2: len 7; hex ee000040030110; asc @ ;; 3: len 4; hex 80000005; asc ;; Record lock, heap no 11 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff0e; asc ;; 1: len 6; hex 000007f66435; asc d5;; 2: len 7; hex ef000040040110; asc @ ;; 3: len 4; hex 80000005; asc ;; Record lock, heap no 12 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff0f; asc ;; 1: len 6; hex 000007f66436; asc d6;; 2: len 7; hex f0000040050110; asc @ ;; 3: len 4; hex 80000005; asc ;; Record lock, heap no 13 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff10; asc ;; 1: len 6; hex 000007f66437; asc d7;; 2: len 7; hex f1000040060110; asc @ ;; 3: len 4; hex 80000005; asc ;; Record lock, heap no 14 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff11; asc ;; 1: len 6; hex 000007f66438; asc d8;; 2: len 7; hex f2000027130110; asc ' ;; 3: len 4; hex 80000005; asc ;; Record lock, heap no 15 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff12; asc ;; 1: len 6; hex 000007f66439; asc d9;; 2: len 7; hex f3000027140110; asc ' ;; 3: len 4; hex 80000005; asc ;; Record lock, heap no 16 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff13; asc ;; 1: len 6; hex 000007f6643a; asc d:;; 2: len 7; hex f4000027150110; asc ' ;; 3: len 4; hex 80000005; asc ;; Record lock, heap no 17 PHYSICAL RECORD: n_fields 4; compact format; info bits 0 0: len 6; hex 0000000dff14; asc ;; 1: len 6; hex 000007f6643b; asc d;;; 2: len 7; hex f5000027160110; asc ' ;; 3: len 4; hex 80000005; asc ;; 锁解读 1. 这里只列出跟第一个案例不同的地方解读,其他的都一样 2. RECORD LOCKS space id 278 page no 3 n bits 88 index GEN_CLUST_INDEX of table `lc_3`.`t` trx id 133588132 lock_mode X 由于表定义没有显示的索引,而InnoDB又是索引组织表,会自动创建一个索引,这里面叫index GEN_CLUST_INDEX 3. 由于没有索引,那么会对每条记录都加上lock_mode X (next-key lock) 4. 这里有一个明显不一样的是: Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; supremum 值得是页里面的最后一条记录(伪记录,通过select查不到的,并不是真实的记录),heap no=1 , Infimum 表示的是页里面的第一个记录(伪记录) 可以简单的认为: supremum 为upper bounds,正去穷大 Infimum 为Minimal bounds,负无穷大 那这里的加锁的意思就是:通过supremum 锁住index GEN_CLUST_INDEX的最大值到正无穷大的区间,这样就可以锁住全部记录,以及全部间隙,相当于表锁 锁开销 锁10条记录和锁1条记录的开销是成正比的吗? 1. 由于锁的内存对象针对的是页而不是记录,所以开销并不是非常大 2. 锁10条记录和锁1条记录的内存开销都是一样的,都是heap size=1136个字节 最后 这里面select * from a where c=7 for update; 明明只锁一条记录,为什么却看到4把锁呢?看到这里是不是有点晕,没关系,这个问题,后面会慢慢揭晓答案
背景 锁是MySQL里面最难理解的知识,但是又无处不在。一开始接触锁的时候,感觉被各种锁类型和名词弄得晕头转向,就别说其他了。本文是通过DBA的视角(非InnoDB内核开发)来分析和窥探锁的奥秘,并解决实际工作当中遇到的问题 锁的种类&概念 想要啃掉这块最难的大骨头,必须先画一个框架,先了解其全貌,才能逐个击破 Shared and Exclusive Locks * Shared lock: 共享锁,官方描述:permits the transaction that holds the lock to read a row eg:select * from xx where a=1 lock in share mode * Exclusive Locks:排他锁: permits the transaction that holds the lock to update or delete a row eg: select * from xx where a=1 for update Intention Locks 1. 这个锁是加在table上的,表示要对下一个层级(记录)进行加锁 2. Intention shared (IS):Transaction T intends to set S locks on individual rows in table t 3. Intention exclusive (IX): Transaction T intends to set X locks on those rows 4. 在数据库层看到的结果是这样的: TABLE LOCK table `lc_3`.`a` trx id 133588125 lock mode IX Record Locks 1. 在数据库层看到的结果是这样的: RECORD LOCKS space id 281 page no 3 n bits 72 index PRIMARY of table `lc_3`.`a` trx id 133588125 lock_mode X locks rec but not gap 2. 该锁是加在索引上的(从上面的index PRIMARY of table `lc_3`.`a` 就能看出来) 3. 记录锁可以有两种类型:lock_mode X locks rec but not gap && lock_mode S locks rec but not gap Gap Locks 1. 在数据库层看到的结果是这样的: RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133588125 lock_mode X locks gap before rec 2. Gap锁是用来防止insert的 3. Gap锁,中文名间隙锁,锁住的不是记录,而是范围,比如:(negative infinity, 10),(10, 11)区间,这里都是开区间哦 Next-Key Locks 1. 在数据库层看到的结果是这样的: RECORD LOCKS space id 281 page no 5 n bits 72 index idx_c of table `lc_3`.`a` trx id 133588125 lock_mode X 2. Next-Key Locks = Gap Locks + Record Locks 的结合, 不仅仅锁住记录,还会锁住间隙,比如: (negative infinity, 10】,(10, 11】区间,这些右边都是闭区间哦 Insert Intention Locks 1. 在数据库层看到的结果是这样的: RECORD LOCKS space id 279 page no 3 n bits 72 index PRIMARY of table `lc_3`.`t1` trx id 133587907 lock_mode X insert intention waiting 2. Insert Intention Locks 可以理解为特殊的Gap锁的一种,用以提升并发写入的性能 AUTO-INC Locks 1. 在数据库层看到的结果是这样的: TABLE LOCK table xx trx id 7498948 lock mode AUTO-INC waiting 2. 属于表级别的锁 3. 自增锁的详细情况可以之前的一篇文章: http://keithlan.github.io/2017/03/03/auto_increment_lock/ 显示锁 vs 隐示锁 * 显示锁(explicit lock) 显示的加锁,在show engine innoDB status 中能够看到 ,会在内存中产生对象,占用内存 eg: select ... for update , select ... lock in share mode * 隐示锁(implicit lock) implicit lock 是在索引中对记录逻辑的加锁,但是实际上不产生锁对象,不占用内存空间 * 哪些语句会产生implicit lock 呢? eg: insert into xx values(xx) eg: update xx set t=t+1 where id = 1 ; 会对辅助索引加implicit lock * implicit lock 在什么情况下会转换成 explicit lock eg: 只有implicit lock 产生冲突的时候,会自动转换成explicit lock,这样做的好处就是降低锁的开销 eg: 比如:我插入了一条记录10,本身这个记录加上implicit lock,如果这时候有人再去更新这条10的记录,那么就会自动转换成explicit lock * 数据库怎么知道implicit lock的存在呢?如何实现锁的转化呢? 1. 对于聚集索引上面的记录,有db_trx_id,如果该事务id在活跃事务列表中,那么说明还没有提交,那么implicit则存在 2. 对于非聚集索引:由于上面没有事务id,那么可以通过上面的主键id,再通过主键id上面的事务id来判断,不过算法要非常复杂,这里不做介绍 metadata lock 1. 这是Server 层实现的锁,跟引擎层无关 2. 当你执行select的时候,如果这时候有ddl语句,那么ddl会被阻塞,因为select语句拥有metadata lock,防止元数据被改掉 锁迁移 1. 锁迁移,又名锁继承 2. 什么是锁迁移呢? a) 满足的场景条件: b)我锁住的记录是一条已经被标记为删除的记录,但是还没有被puge c) 然后这条被标记为删除的记录,被purge掉了 d) 那么上面的锁自然而然就继承给了下一条记录,我们称之为锁迁移 锁升级 锁升级指的是:一条全表更新的语句,那么数据库就会对所有记录进行加锁,那么可能造成锁开销非常大,可能升级为页锁,或者表锁。 MySQL 没有锁升级 锁分裂 1. InnoDB的实现加锁,其实是在页上面做的,没有办法直接对记录加锁 2. 一个页被读取到内存,然后会产生锁对象,锁对象里面会有位图信息来表示哪些heapno被锁住,heapno表示的就是堆的序列号,可以认为就是定位到某一条记录 3. 大家又知道,由于B+tree的存在,当insert的时候,会产生页的分裂动作 4. 如果页分裂了,那么原来对页上面的加锁位图信息也就变了,为了保持这种变化和锁信息,锁对象也会分裂,由于继续维护分裂后页的锁信息 锁合并 锁的合并,和锁的分裂,其实原理是一样的,参考上面即可。 至于锁合并和锁分裂的算法,比较复杂,这里就不介绍了 latch vs lock * latch mutex rw-lock 临界资源用完释放 不支持死锁检测 以上是应用程序中的锁,不是数据库的锁 * lock 当事务结束后,释放 支持死锁检测 数据库中的锁 锁的兼容矩阵 X vs S 兼容性 X S X N N S N Y IS,IX,S,X 兼容性 IS IX S X IS Y Y Y N IX Y Y N N S Y N Y N X N N N N AI,IS,IX,S,X 兼容性 AI IS IX S X AI N Y Y N N IS Y Y Y Y N IX Y Y Y N N S N Y N Y N X N N N N N 参考资料 1. https://dev.mysql.com/doc/refman/5.7/en/innodb-locking.html 2. MySQL技术内幕:InnoDB 存储引擎 3. MySQL内核:InnoDB 存储引擎
背景 什么是omega 简单说就是一个平台, 运维和运营为一体的智能DB管理平台 所有DB相关的事情都能通过此平台 完成->自助完成->智能完成 目前知道这个就够了,其他的以后慢慢介绍 为什么要介绍omega系统里面的connection 1. 因为我们这边业务使用PHP是主流,短连接非常多,经常会遇到connection和thread的问题,所以关注比较多 2. 另一方面,我们omega系统提供了一套完整的connection和thread监控,但是里面有一些专业术语 很多人并不知道【包括一些DBA自己】 3. 既然不明白里面的参数,那么肯定也就不知道这样的监控有何意义,又有何实战价值,所以稍微普及一下。 omega:connection视图 官方解释 name desc Connections The number of connection attempts (successful or not) to the MySQL server abort_clients The number of connections that were aborted because the client died without closing the connection properly abort_connects The number of failed attempts to connect to the MySQL server 官方的东东,比较拗口,我知道你看不懂,所以看下面的实战。 实战意义 以上三个参数都是累积值,omega里面的单位平均每秒多少多少 [Connections] 重点一:表示一分钟内平均尝试连接到mysql server的次数。重点二:这里面的连接数包括成功的连接,也包括失败的连接,大部分人这里不是很清楚。 [abort_clients] 1)客户端已经成功创建连接,但是后来断开了。 2)如果这个值逐渐增大,那么说明什么问题呢? a)wait_timeout 超时,mysql自动kill掉连接 b) 客户端由于某些原因被干掉 总之:就是已经创建好了连接,由于某种原因断开掉了。 [abort_connects] 1) 客户端没有创建连接,在尝试建立连接的时候失败了。 2) 如果这个值逐渐增大,有哪些可能的原因呢? a) too many connection 已经发生 b) 权限,端口,密码等等错误,导致不能创建连接的情况 c)客户端设置了connect_timeout等造成的连接不上,网络问题。 总之,就是有很多种原因导致没有成功的创建连接 omega:thread视图 官方解释 name desc threads_connected The number of currently open connections. threads_running The number of threads that are not sleeping. threads_sleep 我自己yy的,意思是The number of threads that are sleeping. 实战意义 以上三个值是瞬间值 [threads_connected] show processlist里面看到的数量就是这个值 [threads_running] 非sleep的连接,如果这个值非常高,说明SQL卡住了或者SQL非常慢,高并发的SQL非常多,通常伴随着cpu,io非常高等特点 [threads_sleep] sleep的连接,就是该thread不干任何事,一旦这样的数值特别大,说明某些业务哪里占了连接不释放,或者其他服务缓慢有问题,导致链接不释放,一般我们的做法就是让MySQL自动关闭这样的连接,保护数据库。 总结 至此,上面的参数和status解释完毕,上面状态的各种组合常常能够反映出各种问题,可以帮助DBA快速定位问题,各位可以尝试下,谁用谁知道。 好了,最后给大家出一个问题思考下:上面第一个截图代表啥意思呢?
今天遇到一个非常神奇的sql执行计划时好时坏,我们一起来领略一下吧 废话不多说,直接进入实战 环境 * version:MySQL5.6.27 社区版 * 表结构 CREATE TABLE `xx` ( `TagId` int(11) NOT NULL AUTO_INCREMENT COMMENT '', `TagType` int(11) DEFAULT NULL COMMENT '', `SubType` int(11) DEFAULT NULL COMMENT '', `CommId` int(11) NOT NULL DEFAULT '0' COMMENT '', `TagFlag` int(11) NOT NULL DEFAULT '0' COMMENT '', `TagName` varchar(255) DEFAULT NULL COMMENT '', `OrderId` int(11) DEFAULT '0' COMMENT '', `Unum` int(10) NOT NULL DEFAULT '0' COMMENT '', `IsBest` int(11) NOT NULL DEFAULT '0' COMMENT '', `BrokerId` int(11) NOT NULL DEFAULT '0' COMMENT '', `AddDate` int(11) DEFAULT NULL COMMENT '', `UpdateDate` int(11) DEFAULT NULL COMMENT '', `updatetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `tmpnum` int(10) DEFAULT '0' COMMENT '', `cityid` int(11) DEFAULT '0' COMMENT '', PRIMARY KEY (`TagId`), KEY `idx_4` (`IsBest`,`TagFlag`,`CommId`), KEY `idxnew` (`UpdateDate`), KEY `idx_lc_1` (`TagName`,`TagType`,`TagId`), KEY `idx_lc_2` (`CommId`,`TagName`,`TagType`), KEY `idx_tagName_brokerId_cityId` (`TagName`,`BrokerId`,`cityid`), KEY `idx_lc_3` (`SubType`,`TagType`,`cityid`), ) ENGINE=InnoDB AUTO_INCREMENT=20628140 DEFAULT CHARSET=utf8 DB症状 1. slow query 非常多 2. thread_running 非常多 3. cpu 90% 4. too many connection 多症齐发 定位问题 很明显就是去寻找slow query,毕竟slow是我衡量DB性能重要标准。 然后发现99%都是类似这样的语句: # Time: 170304 10:32:07 # User@Host[] @ [] Id: 26019853 # Query_time: 0.251174 Lock_time: 0.000078 Rows_sent: 1 Rows_examined: 470135 SET timestamp=1488594727; select `TagId`,`TagType`,`SubType`,`CommId`,`TagFlag`,`TagName`,`OrderId`,`Unum`,`IsBest`,`BrokerId`,`AddDate`,`UpdateDate`,`updatetime`,`tmpnum`,`cityid` from `xx` where `TagType` = '1' and `TagName` = '**高' order by `TagId` ASC limit 1 ; 分析问题 step1:查看执行计划 explain select `TagId`,`TagType`,`SubType`,`CommId`,`TagFlag`,`TagName`,`OrderId`,`Unum`,`IsBest`,`BrokerId`,`AddDate`,`UpdateDate`,`updatetime`,`tmpnum`,`cityid` from `xx` where `TagType` = '1' and `TagName` ='**高' order by `TagId` limit 1; +----+-------------+-----------------+-------+--------------------------------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+-------+--------------------------------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | xx | index | idx_lc_1,idx_tagName_brokerId_cityId | PRIMARY | 4 | NULL | 175 | Using where | +----+-------------+-----------------+-------+--------------------------------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) 这条语句执行时间是: 0.99s 奇怪,从表结构上看,应该会使用idx_lc_1才对,为什么执行计划是错的呢? step2:第二反应 会不会是TagType是int类型,但是sql语句确实字符串呢?隐士类型转换的导致的执行计划出错之前也是碰到过的。 试试吧, explain select `TagId`,`TagType`,`SubType`,`CommId`,`TagFlag`,`TagName`,`OrderId`,`Unum`,`IsBest`,`BrokerId`,`AddDate`,`UpdateDate`,`updatetime`,`tmpnum`,`cityid` from `xx` where `TagType` = 1 and `TagName` ='**高' order by `TagId` limit 1; +----+-------------+-----------------+-------+--------------------------------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+-------+--------------------------------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | xx | index | idx_lc_1,idx_tagName_brokerId_cityId | PRIMARY | 4 | NULL | 175 | Using where | +----+-------------+-----------------+-------+--------------------------------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) 这条语句执行时间是: 0.89s 还是非常缓慢,看来不是这个原因。 step3:会不会是数据的问题呢? 因为从slow的分布看,基本上都是`TagName` ='**高' 的slow,其他的值也没发现,所以开始怀疑value,调整下看看呢 explain select `TagId`,`TagType`,`SubType`,`CommId`,`TagFlag`,`TagName`,`OrderId`,`Unum`,`IsBest`,`BrokerId`,`AddDate`,`UpdateDate`,`updatetime`,`tmpnum`,`cityid` from `xx` where `TagType` = 1 and `TagName` ='%%高' order by `TagId` limit 1; +----+-------------+-----------------+------+--------------------------------------+----------+---------+-------------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+------+--------------------------------------+----------+---------+-------------+------+------------------------------------+ | 1 | SIMPLE | xx | ref | idx_lc_1,idx_tagName_brokerId_cityId | idx_lc_1 | 773 | const,const | 3 | Using index condition; Using where | +----+-------------+-----------------+------+--------------------------------------+----------+---------+-------------+------+------------------------------------+ 这条语句执行时间:0.00s 哇塞,0s就解决战斗,但是这又是为什么呢? 再试一下:将‘**’高,换成‘*高’ explain select `TagId`,`TagType`,`SubType`,`CommId`,`TagFlag`,`TagName`,`OrderId`,`Unum`,`IsBest`,`BrokerId`,`AddDate`,`UpdateDate`,`updatetime`,`tmpnum`,`cityid` from `xx` where `TagType` = 1 and `TagName` ='*高' order by `TagId` limit 1; +----+-------------+-----------------+------+--------------------------------------+----------+---------+-------------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+------+--------------------------------------+----------+---------+-------------+------+------------------------------------+ | 1 | SIMPLE | xx | ref | idx_lc_1,idx_tagName_brokerId_cityId | idx_lc_1 | 773 | const,const | 3 | Using index condition; Using where | +----+-------------+-----------------+------+--------------------------------------+----------+---------+-------------+------+------------------------------------+ 执行计划也正确,执行时间也非常快。 然后笃定的认为问题找到了,竟然是 ‘**’导致的。 当我自己给自己sleep 10s 之后,开始思考,这是为什么呢? 等值匹配跟*有关系吗? step4: 再次调整语句 * 去掉limit呢? 因为limit是执行计划的杀手,这个我想大部分DBA知道的吧。。。 explain select `TagId`,`TagType`,`SubType`,`CommId`,`TagFlag`,`TagName`,`OrderId`,`Unum`,`IsBest`,`BrokerId`,`AddDate`,`UpdateDate`,`updatetime`,`tmpnum`,`cityid` from `xx` where `TagType` = 1 and `TagName` ='*高' order by `TagId` ; +----+-------------+-----------------+------+--------------------------------------+-----------------------------+---------+-------+-------+---------------------------------- ------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+------+--------------------------------------+-----------------------------+---------+-------+-------+---------------------------------- ------------------+ | 1 | SIMPLE | xx | ref | idx_lc_1,idx_tagName_brokerId_cityId | idx_tagName_brokerId_cityId | 768 | const | 13854 | Using index condition; Using wher e; Using filesort | +----+-------------+-----------------+------+--------------------------------------+-----------------------------+---------+-------+-------+---------------------------------- ------------------+ 惊奇的发现,执行计划再次发生了改变。。。。 idx_tagName_brokerId_cityId 为什么又冒出来了呢? 那我们再回头看看表结构: PRIMARY KEY (`TagId`), KEY `idx_4` (`IsBest`,`TagFlag`,`CommId`), KEY `idxnew` (`UpdateDate`), KEY `idx_lc_1` (`TagName`,`TagType`,`TagId`), KEY `idx_lc_2` (`CommId`,`TagName`,`TagType`), KEY `idx_tagName_brokerId_cityId` (`TagName`,`BrokerId`,`cityid`), KEY `idx_lc_3` (`SubType`,`TagType`,`cityid`) 去掉干扰项后: PRIMARY KEY (`TagId`), `idx_lc_1` (`TagName`,`TagType`,`TagId`), `idx_tagName_brokerId_cityId` (`TagName`,`BrokerId`,`cityid`), 执行计划竟然没有选择idx_lc_1,而是idx_tagName_brokerId_cityId,那么这个肯定是干扰索引。 所以,就更加清晰的定位到idx_tagName_brokerId_cityId索引的问题,然后开始调整这个索引,主要是第一个字段TagName的干扰,选择性的问题。 将: KEY `idx_tagName_brokerId_cityId` (`TagName`,`BrokerId`,`cityid`) => KEY `idx_tagName_brokerId_cityId` (`BrokerId`,`TagName`,`cityid`) step 5: 再次观察执行计划 explain select `TagId`,`TagType`,`SubType`,`CommId`,`TagFlag`,`TagName`,`OrderId`,`Unum`,`IsBest`,`BrokerId`,`AddDate`,`UpdateDate`,`updatetime`,`tmpnum`,`cityid` from `xx` where `TagType` = 1 and `TagName` ='**高' order by `TagId` limit 1; +----+-------------+-----------------+------+--------------------------------------+----------+---------+-------------+------+------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------------+------+--------------------------------------+----------+---------+-------------+------+------------------------------------+ | 1 | SIMPLE | xx | ref | idx_lc_1,idx_tagName_brokerId_cityId | idx_lc_1 | 773 | const,const | 3 | Using index condition; Using where | +----+-------------+-----------------+------+--------------------------------------+----------+---------+-------------+------+------------------------------------+ sql执行时间:0.00s 总结 至此,问题已经解决,第一个前缀索引是如此的重要。 索引调优是门艺术 展望 以后如何调整和优化类似的索引执行计划呢? 原则: 高索引基数的filed,必须放前面。 希望MySQL的优化器以后越来越强大
背景 先描述下故障吧 step0: 环境介绍 1. MySQL5.6.27 2. InnoDB 3. Centos 基本介绍完毕,应该跟大部分公司的实例一样 CREATE TABLE `new_table` ( `id` int(11) NOT NULL AUTO_INCREMENT, `x` varchar(200) DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=5908151 DEFAULT CHARSET=utf8 CREATE TABLE `old_table` ( `id` int(11) NOT NULL AUTO_INCREMENT, `xx` varchar(200) DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=5908151 DEFAULT CHARSET=utf8 step1: 业务需要导入历史数据到新表,新表有写入 1. insert into new_table(x) select xx from old_table 2. 批量插入在new_table上 step2: 结果 show processlist; 看到好多语句都处于executing阶段,DB假死,任何语句都非常慢,too many connection step3: 查看innoDB状况 show engine innodb statu\G 结果: ==lock== ---TRANSACTION 7509250, ACTIVE 0 sec setting auto-inc lock --一堆 TABLE LOCK table `xx`.`y'y` trx id 7498948 lock mode AUTO-INC waiting --一堆 模拟问题,场景复现 让问题再次发生才好定位解决问题 表结构 | t_inc | CREATE TABLE `t_inc` ( `id` int(11) NOT NULL AUTO_INCREMENT, `x` varchar(199) DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=5908151 DEFAULT CHARSET=utf8 | CREATE TABLE `t_inc_template` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `cookie_unique` varchar(255) NOT NULL DEFAULT '' COMMENT '', PRIMARY KEY (`id`), ) ENGINE=InnoDB AUTO_INCREMENT=5857489 DEFAULT CHARSET=utf8 step1 session1:insert into t_inc(x) select cookie_unique from t_inc_template; session2:mysqlslap -hxx -ulc_rx -plc_rx -P3306 --concurrency=10 --iterations=1000 --create-schema='lc' --query="insert into t_inc(x) select 'lanchun';" --number-of-queries=10 产生并发,然其自动分配自增id。 step2:观察 | 260126 | lc_rx | x:22833 | NULL | Sleep | 8 | | NULL | | 260127 | lc_rx | x:22834 | lc | Query | 8 | executing | insert into t_inc(x) select 'lanchun' | | 260128 | lc_rx | x:22835 | lc | Query | 8 | executing | insert into t_inc(x) select 'lanchun' | | 260129 | lc_rx | x:22836 | lc | Query | 8 | executing | insert into t_inc(x) select 'lanchun' | | 260130 | lc_rx | x:22837 | lc | Query | 8 | executing | insert into t_inc(x) select 'lanchun' | | 260131 | lc_rx | x:22838 | lc | Query | 8 | executing | insert into t_inc(x) select 'lanchun' | | 260132 | lc_rx | x:22840 | lc | Query | 8 | executing | insert into t_inc(x) select 'lanchun' | | 260133 | lc_rx | x:22839 | lc | Query | 8 | executing | insert into t_inc(x) select 'lanchun' | | 260134 | lc_rx | x:22842 | lc | Query | 8 | executing | insert into t_inc(x) select 'lanchun' | | 260135 | lc_rx | x:22841 | lc | Query | 8 | executing | insert into t_inc(x) select 'lanchun' | | 260136 | lc_rx | x:22843 | lc | Query | 8 | executing | insert into t_inc(x) select 'lanchun' | step3 show engine innodb status TABLE LOCK table `lc`.`t_inc` trx id 113776506 lock mode AUTO-INC waiting 一堆这样的waiting 然后卡死 好了问题已经复现,大概也知道是什么原因造成了,那就是:AUTO-INC lock 自增锁 接下来聊聊自增锁 和auto_increment相关的insert种类 INSERT-like 解释:任何会产生新记录的语句,都叫上INSERT-like,比如: INSERT, INSERT ... SELECT, REPLACE, REPLACE ... SELECT, and LOAD DATA 总之包括:“simple-inserts”, “bulk-inserts”, and “mixed-mode” inserts. simple insert 插入的记录行数是确定的:比如:insert into values,replace 但是不包括: INSERT ... ON DUPLICATE KEY UPDATE. Bulk inserts 插入的记录行数不能马上确定的,比如: INSERT ... SELECT, REPLACE ... SELECT, and LOAD DATA Mixed-mode inserts 这些都是simple-insert,但是部分auto increment值给定或者不给定 1. INSERT INTO t1 (c1,c2) VALUES (1,'a'), (NULL,'b'), (5,'c'), (NULL,'d'); 2. INSERT ... ON DUPLICATE KEY UPDATE 以上都是Mixed-mode inserts 锁模式 innodb_autoinc_lock_mode = 0 (“traditional” lock mode) 优点:极其安全 缺点:对于这种模式,写入性能最差,因为任何一种insert-like语句,都会产生一个table-level AUTO-INC lock innodb_autoinc_lock_mode = 1 (“consecutive” lock mode) 原理:这是默认锁模式,当发生bulk inserts的时候,会产生一个特殊的AUTO-INC table-level lock直到语句结束,注意:(这里是语句结束就释放锁,并不是事务结束哦,因为一个事务可能包含很多语句) 对于Simple inserts,则使用的是一种轻量级锁,只要获取了相应的auto increment就释放锁,并不会等到语句结束。 PS:当发生AUTO-INC table-level lock的时候,这种轻量级的锁也不会加锁成功,会等待。。。。 优点:非常安全,性能与innodb_autoinc_lock_mode = 0相比要好很多。 缺点:还是会产生表级别的自增锁 深入思考: 为什么这个模式要产生表级别的锁呢? 因为:他要保证bulk insert自增id的连续性,防止在bulk insert的时候,被其他的insert语句抢走auto increment值。 innodb_autoinc_lock_mode = 2 (“interleaved” lock mode) 原理:当进行bulk insert的时候,不会产生table级别的自增锁,因为它是允许其他insert插入的。 来一个记录,插入分配一个auto 值,不会预分配。 优点:性能非常好,提高并发,SBR不安全 缺点: 一条bulk insert,得到的自增id可能不连续 SBR模式下:会导致复制出错,不一致 延伸 当innodb_autoinc_lock_mode = 2 ,SBR为什么不安全 master 插入逻辑和结果 表结构:a primary key auto_increment,b varchar(3) time_logic_clock session1:bulk insert() session2: insert like 0 1,A 1 2,AA 2 3,B 3 4,C 4 5,CC 5 6,D 最终的结果是: a b 1 A 2 AA 3 B 4 C 5 CC 6 D slave的最终结果 因为binlog中session2的语句先执行完,导致结果为 a b 1 AA 2 CC 3 A 4 B 5 C 6 D RBR为什么就安全呢? 因为RBR都是根据row image来的,跟语句没关系的。 好了,通过以上对比分析,相信大家都知道该如何抉择了吧? innodb_autoinc_lock_mode = 2 的一个小问题 由于innodb_autoinc_lock_mode = 2是语句级别的锁,那么就有可能造成 后面的id先提交,前面的id后提交 举个例子: session A: begin; insert into xx values() ; --这时候的自增id 是100 session B: begin insert into xx values() ; --这时候的自增id 是101 session B: commit; --意味着id=101的记录先插入到数据库 session A: commit; --意味着id=100的记录后插入到数据库 最后,对于数据库来说,没有大问题,因为数据都插入进来了,只是后面的id先插入进来而已。 但是有的业务就有问题:比如,某些业务根据自增id进行遍历 select * from xx where id>1 limit N select * from xx where id>1+N limit N select * from xx where id>1+N+N limit N 如果id是顺序插入的,就没问题。 如果后面的id先插入进来(比如id=101),那么id=100还没提交的id就被程序忽略掉了,由此对业务来说就丢了id=100 这条记录 解决方法:where id>N and add_date< (NOW() - INTERVAL 5 second) 取前5s的数据,降低并发写入带来的困扰 总结 如果你的binlog-format是row模式,而且不关心一条bulk-insert的auto值连续(一般不用关心),那么设置innodb_autoinc_lock_mode = 2 可以提高更好的写入性能。
GTID 和 START SLAVE START SLAVE 语法 START SLAVE [thread_types] [until_option] [connection_options] thread_types: [thread_type [, thread_type] ... ] thread_type: IO_THREAD | SQL_THREAD until_option: UNTIL { {SQL_BEFORE_GTIDS | SQL_AFTER_GTIDS} = gtid_set | MASTER_LOG_FILE = 'log_name', MASTER_LOG_POS = log_pos | RELAY_LOG_FILE = 'log_name', RELAY_LOG_POS = log_pos | SQL_AFTER_MTS_GAPS } * SQL_BEFORE_GTIDS = $gitd_set : $gtid_set之前的gtid都会被执行 eg. START SLAVE SQL_THREAD UNTIL SQL_BEFORE_GTIDS = 3E11FA47-71CA-11E1-9E33-C80AA9429562:11-56 表示,当SQL_thread 执行到3E11FA47-71CA-11E1-9E33-C80AA9429562:10 的时候停止,下一个事务是11 * SQL_AFTER_GTIDS = $gitd_set : $gtid_set之前,以及$gtid_set包含的gtid都会被执行 eg. START SLAVE SQL_THREAD UNTIL SQL_AFTER_GTIDS = 3E11FA47-71CA-11E1-9E33-C80AA9429562:11-56 表示,当SQL_thread 执行到3E11FA47-71CA-11E1-9E33-C80AA9429562:56 的时候停止,56是最后一个提交的事务。 如何从multi-threaded slave 转化成 single-threaded mode START SLAVE UNTIL SQL_AFTER_MTS_GAPS; SET @@GLOBAL.slave_parallel_workers = 0; START SLAVE SQL_THREAD; GTID 和 upgrade 如果 --gtid-mode=ON ,那么在使用upgrade时候,不推荐使用--write-binlog 选项。 因为,mysql_upgrade 会更新Myisam引擎的系统表. 而同时更新transction table 和 non-trasaction table 是gtid所不允许的 GTID 和 mysql.gtid_executed gtid_mode = (ON|ON_PERMISSIVE), bin_log = off gtid 会实时的写入到mysql.gtid_executed表中,且根据executed_gtids_compression_period=N来压缩 gtid_mode = (ON|ON_PERMISSIVE), bin_log = on gtid 不会实时的写入到mysql.gtid_executed,executed_gtids_compression_period会失效。 只有当binlog rotate或者mysql shutdown的时候才会写入mysql.gtid_executed 如果master 异常shutdown,gtid还没有写入到mysql.gtid_executed怎么办呢? 这种场景,一般通过mysql recover机制写入到mysql.gtid_executed中 GTID 和 gtid_next http://dev.mysql.com/doc/refman/5.7/en/replication-options-gtids.html#sysvar_gtid_next 三种取值 * AUTOMATIC: Use the next automatically-generated global transaction ID. * ANONYMOUS: Transactions do not have global identifiers, and are identified by file and position only. * A global transaction ID in UUID:NUMBER format. QA: GTID 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 对应的事务顺序,从小到大,一定是顺序执行的吗? 答案:错,一般情况下事务是从小到大,顺序执行的。 但是如果再MTS场景,或者是人工设置gtid_next的情况下,就可能不是顺序执行了 dba:(none)> show master status; +--------------------+----------+--------------+------------------+-------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +--------------------+----------+--------------+------------------+-------------------------------------------+ | xx.000009 | 1719 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-46 | +--------------------+----------+--------------+------------------+-------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> set gtid_next='0923e916-3c36-11e6-82a5-ecf4bbf1f518:50'; Query OK, 0 rows affected (0.00 sec) dba:lc> insert into gtid_1 values(5); Query OK, 1 row affected (0.00 sec) dba:lc> set gtid_next=AUTOMATIC; Query OK, 0 rows affected (0.00 sec) dba:lc> flush logs; Query OK, 0 rows affected (0.01 sec) dba:lc> show master status; +--------------------+----------+--------------+------------------+----------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +--------------------+----------+--------------+------------------+----------------------------------------------+ | xx.000010 | 210 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-46:50 | +--------------------+----------+--------------+------------------+----------------------------------------------+ 1 row in set (0.00 sec) dba:lc> insert into gtid_1 values(6); Query OK, 1 row affected (0.00 sec) dba:lc> insert into gtid_1 values(6); Query OK, 1 row affected (0.00 sec) dba:lc> insert into gtid_1 values(6); Query OK, 1 row affected (0.00 sec) dba:lc> show master status; +--------------------+----------+--------------+------------------+-------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +--------------------+----------+--------------+------------------+-------------------------------------------+ | xx.000010 | 1125 | | | 0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 | +--------------------+----------+--------------+------------------+-------------------------------------------+ 1 row in set (0.00 sec) 在这里面,很明显0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-50 事务执行顺序为: 1-46(最先执行) , 50(其次执行) , 47-49(最后执行) GTID 和 MHA 请参考MHA源码解析 GTID模式下,需要relay-log吗?purge_relay_log设置为on可以吗? * replication 架构 host_1(host_1:3306) (current master) +--host_2(host_2:3306 candidate master) +--host_3(host_3:3306 no candidate) * 模拟: 1. 大量并发的写入,一直持续的往host_1写数据,造成并发写入很大的样子 2. host_2: stop slave , 造成host_2 延迟master很多的样子 3. host_1: purge binary logs, 造成master删掉了日志,导致host_2 修复的时候拿不到master的最新binlog 4. host_3: 一直正常同步master,拥有最新的binlog 5. host_3: flush logs; purge_relay_log=on; flush logs;一直循环flush logs,造成host_3已经将最新的relay log删掉了,host_2 是肯定拿不到host_3的relay 来修复自己了 6. 好了,一切条件均已经准备完毕,这个时候让master 宕机,这样就能模拟出在relay log没有的情况下,是否可以正常完成mha 切换了 ............... 7. 结果完成了正常切换,那mha是怎么再gtid模式下,在没有relay log的情况下,正常切换的恩? 8. 原理:host_2发现自己不是最新的slave,所以就去change master到host_3,通过host_3的binlog来恢复 9. 最后,当host_2和host_3都一致的情况下,再让host_3 重新指向host_2,完毕... *结论: gtid模式下,mha恢复切换的原理是不需要relay log的,只需要binlog GTID 和 备份(物理备份+逻辑备份) 物理备份:xtrabackup,其他等逻辑备份:mysqldump,mydumper,mysqlpump等 物理备份 备份的时候,只要在备份的时候记录下Executed_Gtid_Set($gtid_dump)即可,这个可以用于重新change master; reset master; SET @@GLOBAL.GTID_PURGED='$gtid_dump'; change master to master_auto_position=1; 逻辑备份 * mysqldump 中 sql_log_bin 默认是关闭的。 SET @@SESSION.SQL_LOG_BIN= 0; 所以这里用途非常重要 * 如果dump文件,你要在master上执行,那么必须这样备份: mysqldump xx --set-gtid-purged=OFF , 这样dump文件不会有SET @@SESSION.SQL_LOG_BIN= 0存在 * 如果dump文件,你要在slave上执行,想重新搭建一套slave环境。那么必须这样备份: mysqldump xx --set-gtid-purged=ON GTID 和 crash safe slave slave relay log 不完整怎么办?(relay-log-recover=0)relay-log-recover=1 不考虑,因为它会舍弃掉relay log 为何要讨论这个 * 官方解释: 1) 非GTID模式下,如何保证slave crash safe 呢? relay_log_recovery=1,relay_log_info_repository=TABLE,master_info_repository=TABLE,innodb_flush_log_at_trx_commit=1,sync_binlog=1 2) GTID模式下,如何保证slave crash safe呢? relay_log_recovery=(1|0),relay_log_info_repository=TABLE,master_info_repository=TABLE,innodb_flush_log_at_trx_commit=1,sync_binlog=1 以上两种情况配置,可以保证crash safe 这里看到区别就是relay_log_recovery了,gtid可以是any,这就需要讨论下了。 当relay_log_recovery=1时,当mysql crash的时候,会丢弃掉之前获取的relay,所以这个不会产生一致性问题。 当relay_log_recovery=0时 如果是非GTID模式,因为没办法保证写master_info.log和relay log file之间的原子性,会导致slave有可能多拉取一个事务,这样就有一致性问题。 如果是GTID模式,因为binlog-dump协议变了,master_info.log已经不用,slave会将已经exected_GTID与retrieve_gtid的并集发送给master,以此来获取没有执行过的gtid,所以没问题。 这里面的retrieve_gtid就是IO_thread从master获取的gtid,会写入到relay log。 模拟relay log不完整的情况 从上面可以知道,relay log的记录非常重要,那么relay log 不完整,会怎么样呢? 1) master 创建一张10G的表,然后执行全表更新操作。 2)这时候,slave就在狂写relay log了 3)此时,去slave kill掉mysql进程 4)这时候,relay log就不完整了 WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output. 总结: relay log不完整,mysql起来后,会重新获取不完整的这个events,sql_thread在回放的时候,如果发现events不完整,会跳过,不会影响到同步。 GTID 和 MTS MTS_GAPS 如果MTS遇到Gap transction怎么办? 1. 先解决问题 START SLAVE UNTIL SQL_AFTER_MTS_GAPS 2. 考虑设置slave_preserve_commit_order=1 GTID 生产环境中必须考虑的问题 Migration to GTID replicationNon transactionally safe statement will raise errors nowMySQL Performance in GTIDmysql_upgrade scriptErrant transactionsFiltration on the slaveInjecting empty transactions以上问题请参考 GTID原理与实战 GTID 和 online升级 online升级丢数据?online升级会报错吗?online升级步骤请参考 GTID原理与实战 故障案例一 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate anonymous transaction when @@GLOBAL.GTID_MODE = ON ...' 两种情况: 1)slave的gtid_mode=on时,却还接受着来自master的non-gtid transaction的时候,会报以上错误。 2)事实上,不管slave的gtid_mode是on,还是off,只要master的gtid_mode=on,那么整个replication slave,都必须是gtid的事务 解决方案:在master上从gtid_mode=ON_PERMISSIVE 设置到 gtid_mode=ON之前,如何保证现在所有non-gtid事务都已经在slave执行完毕了? 很简单,两种方法: 第一种方案: 1) 在master上,当设置gtid_mode=ON_PERMISSIVE的时候,其实就已经产生gtid事务了,这个时候show master status;记下这个位置 $pos 2)然后再每个slave上,执行 SELECT MASTER_POS_WAIT(file, position); 第二种更加直接方案: 0)默认情况下,slave的gtid_mode都是off,所以去slave上show master status 都应该是file,position 1) 先在master上,设置gtid_mode=ON_PERMISSIVE 2)然后再每台slave上再次执行show master status,如果发现结果由file,position 变成 GTID_EXECUTED,那么说明slave已经将non-gtid全部执行完毕了 故障案例二 Last_IO_Error: The replication receiver thread cannot start because the master has GTID_MODE = ON and this server has GTID_MODE = OFF. slave的gtid_mode=off时,却还接受着来自master的gtid transaction的时候,会报以上错误。 GTID 和 mysqlbinlog mysqlbinlog 参数: * --exclude-gtids : 排除这些gtid * --include-gtids : 只打印这些gtid * --skip-gtids : 所有gtid都不打印 可以用--skip-gtids 做传统模式的恢复。但是这个是官方不推荐的。 mysqlbinlog --skip-gtids binlog.000001 > /tmp/dump.sql GTID 和 重要函数 gtid_set 用引号扩起来 Name Description GTID_SUBSET(subset,set) returns true (1) if all GTIDs in subset are also in set GTID_SUBTRACT(set,subset) returns only those GTIDs from set that are not in subset WAIT_FOR_EXECUTED_GTID_SET(gtid_set[, timeout]) Wait until the given GTIDs have executed on slave. WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(gtid_set, timeout) Wait until the given GTIDs have executed on slave GTID_SUBSET(subset,set) subset 是否是 set 的子集,如果是返回1,不是返回0 dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57'); +-----------------------------------------------------------------------------------------------------+ | GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57') | +-----------------------------------------------------------------------------------------------------+ | 1 | +-----------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57'); +--------------------------------------------------------------------------------------------------------+ | GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57') | +--------------------------------------------------------------------------------------------------------+ | 1 | +--------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:23'); +--------------------------------------------------------------------------------------------------+ | GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:23','3E11FA47-71CA-11E1-9E33-C80AA9429562:23') | +--------------------------------------------------------------------------------------------------+ | 1 | +--------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> SELECT GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57'); +--------------------------------------------------------------------------------------------------------+ | GTID_SUBSET('3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25','3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57') | +--------------------------------------------------------------------------------------------------------+ | 0 | +--------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) GTID_SUBTRACT(set,subset) 哪些gtids仅仅是set独有的,subset没有的 dba:(none)> SELECT GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:21'); +-------------------------------------------------------------------------------------------------------+ | GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:21') | +-------------------------------------------------------------------------------------------------------+ | 3e11fa47-71ca-11e1-9e33-c80aa9429562:22-57 | +-------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> SELECT GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25'); +----------------------------------------------------------------------------------------------------------+ | GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:20-25') | +----------------------------------------------------------------------------------------------------------+ | 3e11fa47-71ca-11e1-9e33-c80aa9429562:26-57 | +----------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:(none)> SELECT GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:23-24'); +----------------------------------------------------------------------------------------------------------+ | GTID_SUBTRACT('3E11FA47-71CA-11E1-9E33-C80AA9429562:21-57','3E11FA47-71CA-11E1-9E33-C80AA9429562:23-24') | +----------------------------------------------------------------------------------------------------------+ | 3e11fa47-71ca-11e1-9e33-c80aa9429562:21-22:25-57 | +----------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) 以上两个函数可以用来干嘛呢?通过GTID_SUBSET,master可以知道slave是否是自己的子集,可以很方便的检查数据一致性通过GTID_SUBTRACT,假设slave是master的子集,那么可以很轻松的将slave没有,master有的gtid发送给slave,以便达到最终一致性 WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS(gtid_set, timeout) timeout 默认为0,表示无限等待slave gtid_set全部执行完毕如果全部执行完毕,会返回执行的gtid的数量。如果没有执行完,会等待timeout秒。如果slave没有起来,或者没有开启gtid,会返回NULL dba:lc> SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3'); +---------------------------------------------------------------------------------+ | WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3',1) | +---------------------------------------------------------------------------------+ | 0 | +---------------------------------------------------------------------------------+ 1 row in set (0.00 sec) stop slave; dba:lc> SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3'); +---------------------------------------------------------------------------------+ | WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3',1) | +---------------------------------------------------------------------------------+ | NULL | ## 如果slave的IO,SQL thread 没有running,返回NULL,不管gtid set 有木有执行完毕 +---------------------------------------------------------------------------------+ 1 row in set (0.00 sec) WAIT_FOR_EXECUTED_GTID_SET(gtid_set[, timeout]) 含义跟WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS一样,唯一一个区别就是:如果slave 的replication 线程没有起来,不会返回NULL。 stop slave; dba:lc> SELECT WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3'); +------------------------------------------------------------------------+ | WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-3') | +------------------------------------------------------------------------+ | 0 | ## 如果都执行了,返回0 , 跟slave的IO,SQL thread 起没起来无关 +------------------------------------------------------------------------+ 1 row in set (0.00 sec) dba:lc> SELECT WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-4',1); +--------------------------------------------------------------------------+ | WAIT_FOR_EXECUTED_GTID_SET('0923e916-3c36-11e6-82a5-ecf4bbf1f518:1-4',1) | +--------------------------------------------------------------------------+ | 1 | +--------------------------------------------------------------------------+ 1 row in set (1.00 sec) GTID 的限制和缺点 同事更新nontransactional和transactional的表,会导致gtid问题 CREATE TABLE ... SELECT statements 语法对GTID来说是不安全的 CREATE TEMPORARY TABLE and DROP TEMPORARY TABLE 对GTID也是不安全的 enforce-gtid-consistency 必须设置on,可以避免以上2,3 不安全的statement sql_slave_skip_counter 不允许执行,可以通过 Injecting empty transactions 来解决 GTID 和 mysqldump的问题,mysqldump 中 sql_log_bin 默认是关闭的.会导致导入master后,不会写入gtid到binlog. ( 可以通过 --set-gtid-purged=OFF 避免 ) GTID and mysql_upgrade, 因为部分系统表是myisam引擎的,会有问题。 (可以通过--write-binlog=off来避免 ) 参考文档 官方资料: 1.5 Server and Status Variables and Options Added, Deprecated, or Removed in MySQL 5.7 5.5.4 mysqldump — A Database Backup Program 5.6.7 mysqlbinlog — Utility for Processing Binary Log Files 13.17 Functions Used with Global Transaction IDs 14.4.2.1 CHANGE MASTER TO Syntax 14.4.2.6 START SLAVE Syntax 14.7.5.34 SHOW SLAVE STATUS Syntax 18.1.3 Replication with Global Transaction Identifiers 18.1.3.1 GTID Concepts 18.1.3.2 Setting Up Replication Using GTIDs 18.1.3.3 Using GTIDs for Failover and Scaleout 18.1.3.4 Restrictions on Replication with GTIDs 18.1.5.1 Replication Mode Concepts 18.1.5.2 Enabling GTID Transactions Online 18.1.5.3 Disabling GTID Transactions Online 18.1.6.1 Replication and Binary Logging Option and Variable Reference 18.1.6.5 Global Transaction ID Options and Variables 18.3.2 Handling an Unexpected Halt of a Replication Slave 18.4.1.34 Replication and Transaction Inconsistencies 18.4.3 Upgrading a Replication Setup 19.2.1.5 Adding Instances to the Group 24.10.7.1 The events_transactions_current Table 24.10.11.6 The replication_applier_status_by_worker Table 第三方资料 > http://www.fromdual.ch/things-you-should-consider-before-using-gtid > http://www.fromdual.ch/gtid_in_action > http://www.fromdual.ch/replication-troubleshooting-classic-vs-gtid > http://www.fromdual.ch/replication-in-a-star > http://www.fromdual.com/controlling-worldwide-manufacturing-plants-with-mysql > https://www.percona.com/blog/2014/05/19/errant-transactions-major-hurdle-for-gtid-based-failover-in-mysql-5-6/ > https://www.percona.com/blog/2016/12/01/database-daily-ops-series-gtid-replication-binary-logs-purge/ > https://www.percona.com/blog/2016/11/10/database-daily-ops-series-gtid-replication/ > https://www.percona.com/blog/2015/12/02/gtid-failover-with-mysqlslavetrx-fix-errant-transactions/ > https://www.percona.com/blog/2014/05/09/gtids-in-mysql-5-6-new-replication-protocol-new-ways-to-break-replication/
device is busy 关于umount的这个问题 环境 CentOS release 6.6 (Final) Linux tjtx135-2-90.58os.org 2.6.32-504.23.4.el6.x86_64 #1 SMP Tue Jun 9 20:57:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux 关于这个问题,网上也有很多类似的问题,解决方案大致如下: 问题: umount: /data: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) 方案一、losf lsof /data 如果发现有被打开的文件,就手动删掉相关进程 方案二、fuser fuser /data 大致的方案也是一样,就是找到使用相关分区的进程,找到后删掉即可 没错,大部分device is busy的问题,通过以上两种方案就基本可以解决。 当然,这里介绍的是一种特殊的问题,当然通过以上方法肯定是解决不了的咯。 问题和故障 umount: /data: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) 解决思路: lsof /data 无解 fuser /data 无解 继续查看相关信息 shell> mount /dev/sda1 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sdb1 on /data type xfs (rw,noatime,nodiratime,osyncisdsync,inode64) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) shell> df -hT Filesystem Type Size Used Avail Use% Mounted on /dev/sda1 ext4 99G 2.2G 92G 3% / tmpfs tmpfs 63G 0 63G 0% /dev/shm /dev/sdb1 xfs 28T 6.3T 22T 23% /data shell> ll /data drwxr-xr-x 6 root root 142 Feb 6 16:35 FULL_BACKUP drwxr-xr-x 4 root root 67 Feb 6 16:46 tmp shell> ll / lrwxrwxrwx 1 root root 10 Feb 6 16:12 tmp -> /data/tmp/ 突然发现,有一个软连接指向了/data/tmp 这个突破口找到后,立马测试,将软连接去掉后,是否就可以umount了呢? 经过测试,还是会报同样的错误。 既然/tmp --> /data/tmp , 然而/data/tmp没有被打开的文件,那么看看/tmp有没有? shell> lsof | grep /tmp atopacctd 2974 root cwd DIR 8,17 55 195 /data/tmp sshd 4634 root 7u unix 0xffff88204f2cc200 0t0 25300 /tmp/ssh-BxMiDQ4634/agent.4634 然后将这两个进程删掉,问题解决。。。 总结 以后解决这类问题,首先看那个目录忙,就去lsof 其次,如果还么有解决问题,可以看看这个目录是否有软链接和硬链接
MySQL运维之神奇的参数 sql_safe_updateshttp://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_sql_safe_updates 背景(why) 主要是针对大表的误操作。 如果只是更改了几条记录,那么说不定业务方可以很容易的根据日志进行恢复。即便没有,也可以通过找binlog,进行逆向操作恢复。 如果被误操作的表非常小,其实问题也不大,全备+binlog恢复 or 闪回 都可以进行很好的恢复。 But,如果你要恢复的表非常大,比如:100G,100T,对于这类型的误操作,恐怕神仙都难救。 所以,我们这里通过这个神奇的参数,可以避免掉80%的误操作场景。 PS: 不能避免100% ,下面的实战会告诉大家如何破解。 生产环境的误操作案例分享 update xx set url_desc='防不胜防' WHERE 4918=4918 AND SLEEP(5)-- xYpp' where id=7046 这种表,线上有500G,一次误操作,要恢复500G的数据,会中断服务很长时间。 如果设置了sql_safe_updates,此类事故就可以很华丽的避免掉了。 原理和实战 表结构 dba:lc> show create table tb; +-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | tb | CREATE TABLE `tb` ( `id` int(11) NOT NULL, `id_2` int(11) DEFAULT NULL COMMENT 'lc22222233333', `id_3` text, PRIMARY KEY (`id`), KEY `idx_2` (`id_2`), ) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ update 相关测试 UPDATE statements must have a WHERE clause that uses a key or a LIMIT clause, or both. * 不带where 条件 dba:lc> update tb set id_2=2 ; ERROR 1175 (HY000): You are using safe update mode and you tried to update a table without a WHERE that uses a KEY column * where 条件有索引,但是没有limit dba:lc> update tb set id_3 = 'bb' where id > 0; ^C^C -- query aborted ERROR 1317 (70100): Query execution was interrupted * where 条件无索引,也没有limit dba:lc> update tb set id_3 = 'bb' where id_3 = '0'; ERROR 1175 (HY000): You are using safe update mode and you tried to update a table without a WHERE that uses a KEY column * where 条件有索引,有limit dba:lc> update tb set id_3 = 'bb' where id > 0 limit 1; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 * where 条件无索引,有limit dba:lc> update tb set id_3 = 'bb' where id_3 > 0 limit 1; Query OK, 0 rows affected (0.26 sec) Rows matched: 0 Changed: 0 Warnings: 0 结论: 对于update,只有两种场景会被限制 无索引,无limit的情况 无where条件, 无limit的情况 delete相关测试 DELETE statements must have both * 不带where 条件 dba:lc> delete from tb ; ERROR 1175 (HY000): You are using safe update mode and you tried to update a table without a WHERE that uses a KEY column * where 条件有索引,但是没有limit dba:lc> delete from tb where id = 0 ; Query OK, 0 rows affected (0.00 sec) dba:lc> delete from tb where id > 0 ; ^C^C -- query aborted ERROR 1317 (70100): Query execution was interrupted dba:lc> explain select * from tb where id_2 > 0; +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | tb | NULL | ALL | idx_2,idx_3 | NULL | NULL | NULL | 245204 | 50.00 | Using where | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) dba:lc> delete from tb where id_2 > 0 ; ^C^C -- query aborted ^C^C -- query aborted ERROR 1317 (70100): Query execution was interrupted * where 条件无索引,也没有limit dba:lc> delete from tb where id_3 = 'a' ; ERROR 1175 (HY000): You are using safe update mode and you tried to update a table without a WHERE that uses a KEY column * where 条件有索引,有limit dba:lc> delete from tb where id = 205 limit 1 ; Query OK, 1 row affected (0.00 sec) * where 条件无索引,有limit dba:lc> delete from tb where id_3 = 'aaaaa' limit 1 ; Query OK, 1 row affected (0.00 sec) 测试结果证明: 关于delete相关,官方文档描述有误。 结论: 对于delete,只有两种场景会被限制 无索引,无limit的情况 无where条件, 无limit的情况 综上所述:不管是update,还是delete ,被限制的前提只有两个 1. 无索引,无limit的情况 2. 无where条件, 无limit的情况 好了,通过以上的知识,大家都应该很了解,接下来就是实施的问题了。 对于新业务,新DB,直接设置这样的参数就好了,再测试环境也设置,这样开发在测试环境就能发现问题,不会在新业务上产生这样危险的语句。 对于老业务,怎么办呢? 我们的做法:因为我们的MySQL是5.6,所以另外一个神奇的功能就是P_S(performance schema), 通过P_S,我们可以获取哪些query语句是没有使用索引的。 这里又会引发另外一个问题,可能是Performance schema的bug,它竟然无法统计dml 是否使用索引 经过我们大量的测试后证明:events_statements_summary_by_digest 表里面的SUM_NO_INDEX_USED,SUM_NO_GOOD_INDEX_USED ,对dml无效。 既然如此,我们所幸对dml语句自己进行分析,将dml转换成对应的select语句。比如: update tb set id = S where id = S; 转换成 select * from tb where id = '1' 。。。。 然后根据select语句,进行explain分析,如果type=ALL表示没有使用索引,这样的语句就是我们认为的全表dml语句了。 然而,理想很丰满,现实很骨感。这样的做法很快就出现了问题, 因为这里需要自己构造真实的SQL,由于数据分布以及构造的语句不可能真实,所以得到的执行计划谬之千里,type=None。 所以,以上方法很可能导致全表的dml没有被抓取出来,so 我们开始想其他办法。 说来也简单,sql_safe_udpates 只针对两种场景是不允许的,那就是: 1. where条件后面 无索引,无limit的情况 2. 无where条件的情况 , 无limit的情况 那么我们就获取dml语句后面的字段和关键字,用来构造我们的全表dml 1. 检查dml 是否是带有limit的语句 如果有,允许通过 -- ( 有limit , 肯定可以执行 ) 如果没有,则往下继续判断 2. 判断dml SQL有无where条件 如果没有, 则直接拒绝 -- (没有where,没有limit,肯定是全表扫描的更新,直接拒绝 ) 如果有,则继续往下判断 3. 判断where后面的字段是否符合索引前缀原理 如果符合,则允许执行 -- ( where条件后面字段有索引,无limit, 允许通过 ) 如果不符合,则拒绝 -- ( where条件后面字段无索引,无limit,直接拒绝 ) 恩,这样分析下来,是不是感觉很完美了? 还是那句话,理想和现实总有差距,那么来几条牛逼的漏网之鱼看看呗 1. 类型转换导致的问题 update tb set id=2 where id_change = 1; -- 注意:字段id_change是varchar类型。 2. 函数 UPDATE pay_log_id SET id=LAST_INSERT_ID(id + 1) 至少以上两种类型是抓不到的,所以,还是有问题,那么继续找方法。 重新分析下我们的初心,我们的目的是啥?没错,我们的目的就是要先找到没有使用索引的dml,突然脑海中飘来一句话,MySQL自身是否可以打印出没有使用索引的语句呢? 果然,去官方文档上一搜index关键字,结果log_queries_not_using_indexes就是我们迫切需要的,但是它会将select也打印出来,不过没关系,我们将select过滤掉即可。 so,最后的终极解决方案就是:在测试环境加上log_queries_not_using_indexes=1(long_query_time=1000,这样可以不用混淆),然后测试环境跑一个月,将没有使用索引的dml语句统统抓住来解决掉,这样就可以安心的上线sql_safe_updates=1 了。 注意: 当log_queries_not_using_indexes=1 和 sql_safe_updates=1 同时设置的时候: 1) delete from tb_1 ; --会被sql_safe_updates拒绝,不会记录到slow log中 2) update tb_1 set id = 1; --会被sql_safe_updates拒绝,同时也会被记录到slow log中 以上就是两者的区别,善用 总结 如果线上设置sql_safe_updates = 1 后,业务还有零星的dml被拒绝,业务方可以考虑如下解决方案: 1)如果你确保你的SQL语句没有任何问题,可以这样: set sql_safe_updates=0; 但是开发必须考虑到这样做的后果。 2) 可以改写SQL语句,让其使用上索引字段。 3)为什么这边没有让大家使用limit呢?因为在大多数场景下,dml + limit = 不确定的SQL 。 很可能导致主从不一致。 ( dml + limit 的方式,是线上禁止的) 各位看官,以上神器请大家慢慢享用。 关于PS和sys,如果大家有更加新奇的想法,可以一起讨论研究。
为什么要测试,测什么东西? 测试的种类非常多,测试的目的也非常多,我这里主要的目的就两个 测试MySQL的极限IO 对比不同版本MySQL,不同参数, 不同硬件,不同系统对MySQL的性能影响 为什么选择sysbench 因为MySQL官方的测试就是用sysbench哦 尽量选择最新版本的sysbench哦,大于0.4版本的sysbench有实时显示功能 如何下载sysbench http://github.com/akopytov/sysbench 文档在哪里 http://github.com/akopytov/sysbench 如何安装 * 基本步骤 cd sysbench-1.0; ./autogen.sh; ./configure --with-mysql-includes=/usr/local/mysql/include --with-mysql-libs=/usr/local/mysql/lib/; make; make install; * 过程中可能会遇到的故障 sysbench: error while loading shared libraries: libmysqlclient.so.20: cannot open shared object file: No such file or directory * 解决方案 export LD_LIBRARY_PATH=/usr/local/mysql/lib/; * 测试是否安装成功 shell> sysbench --version sysbench 1.0 介绍sysbench的核心用法 它可以用来测试很多东西,测试io,cpu,mem,mysql,oracle,pg等等。 这里主要介绍我关心的两个,IO & MySQL 以下前半部分是0.4版本的用法,0.4以上的版本用法不一样,会注明。 一、通用语法 sysbench [common-options] --test=name [test-options] command command * prepare 准备阶段,也就是装载数据。 filo test 中: 就是创建指定大小的文件 oltp test 中: 就是创建指定大小的表 * run 实际测试阶段 * cleanup 收尾阶段,清除之前测试的数据。 common-options 只介绍常用的选项 选项 描述 默认值 --num-threads 多少个线程 1 --max-requests 多少个请求,0意味着无限制 1000 --max-time 测试多长时间,0意味着无限制 0 --test 测试什么模块 必须要求 --report-interval 阶段性的汇报测试统计信息,0.4以上版本新增 --test=fileio 模块的选项 提前注明:--file-test-mode * seqwr sequential write * seqrewr sequential rewrite * seqrd sequential read * rndrd random read * rndwr random write * rndrw combined random read/write test option for fileio 选项 描述 默认值 --file-num 创建文件的数量 128 --file-block-size IO操作的大小 16k --file-total-size 所有文件的总大小 2G --file-test-mode seqwr,seqrewr, seqrd, rndrd, rndwr, rndwr(上面已经介绍) 必须 --file-io-mode i/O 模式,sync, async, fastmmap, slowmmap sync --file-extra-flags 以额外的标记(O_SYNC,O_DSYNC,O_DIRECT)打开 - --file-fsync-freq 多少请求后使用fsync 100 --file-fsync-all 每次写IO都必须fsync no --file-fsync-mode 用什么样的模式来同步文件fsync, fdatasync (see above) fsync --file-rw-ratio 随机读写请求的比例 1.5 举例: $ sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw prepare $ sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw run $ sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw cleanup OLTP-MySQL 此模式用于测试真实数据库性能。在prepare阶段创建表,sbtest默认 CREATE TABLE `sbtest` ( `id` int(10) unsigned NOT NULL auto_increment, `k` int(10) unsigned NOT NULL default '0', `c` char(120) NOT NULL default '', `pad` char(60) NOT NULL default '', PRIMARY KEY (`id`), KEY `k` (`k`)); 在run阶段 simple模式 SELECT c FROM sbtest WHERE id=N Point queries SELECT c FROM sbtest WHERE id=N Range queries: SELECT c FROM sbtest WHERE id BETWEEN N AND M Range SUM() queries SELECT SUM(K) FROM sbtest WHERE id BETWEEN N and M Range ORDER BY queries SELECT c FROM sbtest WHERE id between N and M ORDER BY c Range DISTINCT queries SELECT DISTINCT c FROM sbtest WHERE id BETWEEN N and M ORDER BY c UPDATEs on index column UPDATE sbtest SET k=k+1 WHERE id=N UPDATEs on non-index column: UPDATE sbtest SET c=N WHERE id=M DELETE queries DELETE FROM sbtest WHERE id=N INSERT queries INSERT INTO sbtest VALUES (...) oltp test模式通用参数 选项 描述 默认值 --oltp-table-name 表的名字 sbtest --oltp-table-size 表的行数 10000 --oltp-tables-count 表的个数 1 --oltp-dist-type 热点数据分布{uniform(均匀分布),Gaussian(高斯分布),special(空间分布)}。默认是special special --oltp-dist-pct special:热点数据产生的比例 1 --oltp-dist-res special:热点数据的访问频率 75 --oltp-test-mode simple,complex(以上介绍) complex --oltp-read-only 只有select 请求 off --oltp-skip-trx 不用事务 off --oltp-point-selects 一个事务中简单select查询数量 10 --oltp-simple-ranges 一个事务中简单range查询的数量 1 --oltp-sum-ranges sum range的数量 1 --oltp-order=ranges order range的数量 1 mysql test 参数 --mysql-host=[LIST,...] MySQL server host [localhost] --mysql-port=[LIST,...] MySQL server port [3306] --mysql-socket=[LIST,...] MySQL socket --mysql-user=STRING MySQL user [sbtest] --mysql-password=STRING MySQL password [] --mysql-db=STRING MySQL database name [sbtest] --mysql-table-engine=STRING storage engine to use for the test table {myisam,innodb,bdb,heap,ndbcluster,federated} [innodb] --mysql-engine-trx=STRING whether storage engine used is transactional or not {yes,no,auto} [auto] --mysql-ssl=[on|off] use SSL connections, if available in the client library [off] --mysql-ssl-cipher=STRING use specific cipher for SSL connections [] --mysql-compression=[on|off] use compression, if available in the client library [off] --myisam-max-rows=N max-rows parameter for MyISAM tables [1000000] --mysql-debug=[on|off] dump all client library calls [off] --mysql-ignore-errors=[LIST,...]list of errors to ignore, or "all" [1213,1020,1205] --mysql-dry-run=[on|off] Dry run, pretent that all MySQL client API calls are successful without executing them [off] 以上0.4版本的语法介绍完毕。 接下来是大于0.4版本的新语法,尤其是--test=oltp模块 用--test=xx.lua (完整路径来传递)来代替 FileIO实战 磁盘:S3610 * 6 raid10, 内存128G测试出相关场景下的极限IOPS 随机读写(3:2 oltp场景) * sysbench --num-threads=16 --report-interval=3 --max-requests=0 --max-time=300 --test=fileio --file-num=200 --file-total-size=200G --file-test-mode=rndrw --file-block-size=16384 --file-extra-flags=direct run 随机读写(5:1 oltp场景) * sysbench --num-threads=16 --report-interval=3 --max-requests=0 --max-time=300 --test=fileio --file-num=200 --file-total-size=200G --file-test-mode=rndrw --file-block-size=16384 --file-extra-flags=direct --file-rw-ratio=5 run 随机写 * sysbench --num-threads=16 --report-interval=3 --max-requests=0 --max-time=300 --test=fileio --file-num=200 --file-total-size=200G --file-test-mode=rndwr --file-block-size=16384 --file-extra-flags=direct run 随机读 * sysbench --num-threads=16 --report-interval=3 --max-requests=0 --max-time=300 --test=fileio --file-num=200 --file-total-size=200G --file-test-mode=rndrd --file-block-size=16384 --file-extra-flags=direct run MySQL5.6 vs MySQL5.7 测试 磁盘:S3610 * 6 raid10, 内存128G Point select * 产生数据 sysbench --num-threads=128 --report-interval=3 --max-requests=0 --max-time=300 --test=/root/sysbench-1.0/sysbench/tests/db/select.lua --mysql-table-engine=innodb --oltp-table-size=50000000 --mysql-user=sysbench --mysql-password=sysbench --oltp-tables-count=2 --mysql-host=xx --mysql-port=3306 prepare * 执行 sysbench --num-threads=128 --report-interval=3 --max-requests=0 --max-time=300 --test=/root/sysbench-1.0/sysbench/tests/db/select.lua --mysql-table-engine=innodb --oltp-table-size=50000000 --mysql-user=sysbench --mysql-password=sysbench --oltp-tables-count=2 --mysql-host=xx --mysql-port=3306 run Point oltp * 产生数据 sysbench --num-threads=128 --report-interval=3 --max-requests=0 --max-time=300 --test=/root/sysbench-1.0/sysbench/tests/db/oltp.lua --mysql-table-engine=innodb --oltp-table-size=50000000 --mysql-user=sysbench --mysql-password=sysbench --oltp-tables-count=2 --mysql-host=xx --mysql-port=3306 prepare * 执行 sysbench --num-threads=128 --report-interval=3 --max-requests=0 --max-time=300 --test=/root/sysbench-1.0/sysbench/tests/db/oltp.lua --mysql-table-engine=innodb --oltp-table-size=50000000 --mysql-user=sysbench --mysql-password=sysbench --oltp-tables-count=2 --mysql-host=xx --mysql-port=3306 run 结论 在性能方面,虽然官方号称5.7性能比5.6快3倍,但是在实际测试中5.7比5.6却稍微差一点点 是否会选择5.7生产环境?当然,因为5.7的新特性太诱人了 参考: https://www.percona.com/blog/2016/04/07/mysql-5-7-sysbench-oltp-read-results-really-faster/http://dimitrik.free.fr/blog/archives/2013/09/mysql-performance-reaching-500k-qps-with-mysql-57.htmlhttps://github.com/akopytov/sysbench http://www.mysql.com/why-mysql/benchmarks/