开发者社区 > 数据库 > 正文

mariadb10.11.6 glarea 单故障节点启动卡住失败

基础背景信息

mariadb Ver 15.1 Distrib 10.11.6-MariaDB glarea 集群,一个有三个节点:
node1:192.168.18.78
node2:192.168.18.79
node3:192.168.18.80

其中node1节点,因断电1小时后重启,执行 systemctl start mariadb 启动,卡住长期(执行6小时)依然没有恢复过来。

glarea 配置信息如下:

[mysqld]
event_scheduler=ON
bind-address=0.0.0.0

# Galera 提供者配置
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so

# Galera 集群配置
wsrep_cluster_name="hy_galera_cluster"
wsrep_cluster_address="gcomm://192.168.18.78,192.168.18.79,192.168.18.80"

# Galera 节点配置
wsrep_node_address="192.168.18.78"
wsrep_node_name="data-server"

# SST 方法选择
wsrep_sst_method=rsync

# InnoDB Configuration
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
binlog_format=ROW

log输入情况如下:

240403 05:05:09 mysqld_safe Starting mariadbd daemon with databases from /var/lib/mysql
240403 05:05:09 mysqld_safe WSREP: Running position recovery with --disable-log-error  --pid-file='/var/lib/mysql/data-server-recover.pid'
240403 05:05:09 mysqld_safe WSREP: Recovered position 20c1183c-e5c5-11ee-9129-97e9406cb3f8:7183126
2024-04-03  5:05:10 0 [Note] Starting MariaDB 10.11.6-MariaDB source revision fecd78b83785d5ae96f2c6ff340375be803cd299 as process 233407
2024-04-03  5:05:10 0 [Note] WSREP: Loading provider /usr/lib64/galera/libgalera_smm.so initial position: 20c1183c-e5c5-11ee-9129-97e9406cb3f8:7183126
2024-04-03  5:05:10 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2024-04-03  5:05:10 0 [Note] WSREP: wsrep_load(): Galera 26.4.16(rXXXX) by Codership Oy <info@codership.com> loaded successfully.
2024-04-03  5:05:10 0 [Note] WSREP: Initializing allowlist service v1
2024-04-03  5:05:10 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2024-04-03  5:05:10 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 0
2024-04-03  5:05:10 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 20c1183c-e5c5-11ee-9129-97e9406cb3f8
Seqno: -1 - -1
Offset: -1
Synced: 0
2024-04-03  5:05:10 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 20c1183c-e5c5-11ee-9129-97e9406cb3f8, offset: -1
2024-04-03  5:05:10 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
2024-04-03  5:05:10 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2024-04-03  5:05:10 0 [Note] WSREP: Recovering GCache ring buffer: Recovery failed, need to do full reset.
2024-04-03  5:05:10 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.18.78; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.keep_plaintext_size = 128M; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0
2024-04-03  5:05:10 0 [Note] WSREP: Start replication
2024-04-03  5:05:10 0 [Note] WSREP: Connecting with bootstrap option: 0
2024-04-03  5:05:10 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2024-04-03  5:05:10 0 [Note] WSREP: protonet asio version 0
2024-04-03  5:05:10 0 [Note] WSREP: Using CRC-32C for message checksums.
2024-04-03  5:05:10 0 [Note] WSREP: backend: asio
2024-04-03  5:05:10 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2024-04-03  5:05:10 0 [Note] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2024-04-03  5:05:10 0 [Note] WSREP: restore pc from disk failed
2024-04-03  5:05:10 0 [Note] WSREP: GMCast version 0
2024-04-03  5:05:10 0 [Note] WSREP: (b0bc65f1-8af3, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2024-04-03  5:05:10 0 [Note] WSREP: (b0bc65f1-8af3, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2024-04-03  5:05:10 0 [Note] WSREP: EVS version 1
2024-04-03  5:05:10 0 [Note] WSREP: gcomm: connecting to group 'hy_galera_cluster', peer '192.168.18.78:,192.168.18.79:,192.168.18.80:'
2024-04-03  5:05:10 0 [Note] WSREP: (b0bc65f1-8af3, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://192.168.18.78:4567
2024-04-03  5:05:10 0 [Note] WSREP: (b0bc65f1-8af3, 'tcp://0.0.0.0:4567') connection established to e1facb37-96cc tcp://192.168.18.80:4567
2024-04-03  5:05:10 0 [Note] WSREP: (b0bc65f1-8af3, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2024-04-03  5:05:10 0 [Note] WSREP: (b0bc65f1-8af3, 'tcp://0.0.0.0:4567') connection established to e8ab0109-98a4 tcp://192.168.18.79:4567
2024-04-03  5:05:10 0 [Note] WSREP: EVS version upgrade 0 -> 1
2024-04-03  5:05:10 0 [Note] WSREP: declaring e1facb37-96cc at tcp://192.168.18.80:4567 stable
2024-04-03  5:05:10 0 [Note] WSREP: declaring e8ab0109-98a4 at tcp://192.168.18.79:4567 stable
2024-04-03  5:05:10 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2024-04-03  5:05:10 0 [Note] WSREP: Node e1facb37-96cc state prim
2024-04-03  5:05:10 0 [Note] WSREP: view(view_id(PRIM,b0bc65f1-8af3,46) memb {
    b0bc65f1-8af3,0
    e1facb37-96cc,0
    e8ab0109-98a4,0
} joined {
} left {
} partitioned {
})
2024-04-03  5:05:10 0 [Note] WSREP: save pc into disk
2024-04-03  5:05:10 0 [Note] WSREP: gcomm: connected
2024-04-03  5:05:10 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2024-04-03  5:05:10 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2024-04-03  5:05:10 0 [Note] WSREP: Opened channel 'hy_galera_cluster'
2024-04-03  5:05:10 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2024-04-03  5:05:10 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b108e94c-f134-11ee-ac13-321fb976ab0c
2024-04-03  5:05:10 1 [Note] WSREP: Starting rollbacker thread 1
2024-04-03  5:05:10 2 [Note] WSREP: Starting applier thread 2
2024-04-03  5:05:10 0 [Note] WSREP: STATE EXCHANGE: sent state msg: b108e94c-f134-11ee-ac13-321fb976ab0c
2024-04-03  5:05:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: b108e94c-f134-11ee-ac13-321fb976ab0c from 0 (data-server)
2024-04-03  5:05:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: b108e94c-f134-11ee-ac13-321fb976ab0c from 1 (web02-server)
2024-04-03  5:05:10 0 [Note] WSREP: STATE EXCHANGE: got state msg: b108e94c-f134-11ee-ac13-321fb976ab0c from 2 (web01-server)
2024-04-03  5:05:10 0 [Note] WSREP: Quorum results:
    version    = 6,
    component  = PRIMARY,
    conf_id    = 44,
    members    = 2/3 (joined/total),
    act_id     = 7339907,
    last_appl. = 7339849,
    protocols  = 2/10/4 (gcs/repl/appl),
    vote policy= 0,
    group UUID = 20c1183c-e5c5-11ee-9129-97e9406cb3f8
2024-04-03  5:05:10 0 [Note] WSREP: Flow-control interval: [28, 28]
2024-04-03  5:05:10 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 7339908)
2024-04-03  5:05:10 2 [Note] WSREP: ####### processing CC 7339908, local, ordered
2024-04-03  5:05:10 2 [Note] WSREP: Process first view: 20c1183c-e5c5-11ee-9129-97e9406cb3f8 my uuid: b0bc65f1-f134-11ee-8af3-66b2cec80bb4
2024-04-03  5:05:10 2 [Note] WSREP: Server data-server connected to cluster at position 20c1183c-e5c5-11ee-9129-97e9406cb3f8:7339908 with ID b0bc65f1-f134-11ee-8af3-66b2cec80bb4
2024-04-03  5:05:10 2 [Note] WSREP: Server status change disconnected -> connected
2024-04-03  5:05:10 2 [Note] WSREP: ####### My UUID: b0bc65f1-f134-11ee-8af3-66b2cec80bb4
2024-04-03  5:05:10 2 [Note] WSREP: Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
2024-04-03  5:05:10 0 [Note] WSREP: Service thread queue flushed.
2024-04-03  5:05:10 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1
2024-04-03  5:05:10 2 [Note] WSREP: State transfer required: 
    Group state: 20c1183c-e5c5-11ee-9129-97e9406cb3f8:7339908
    Local state: 00000000-0000-0000-0000-000000000000:-1
2024-04-03  5:05:10 2 [Note] WSREP: Server status change connected -> joiner
2024-04-03  5:05:10 0 [Note] WSREP: Joiner monitor thread started to monitor
2024-04-03  5:05:10 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.18.78' --datadir '/var/lib/mysql/' --parent 233407 --progress 0 --mysqld-args --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mariadb/plugin --user=mysql --wsrep_on=ON --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --log-error=/data/log/mariadb/mariadb.log --pid-file=/run/mariadb/mariadb.pid --socket=/var/lib/mysql/mysql.sock --wsrep_start_position=20c1183c-e5c5-11ee-9129-97e9406cb3f8:7183126'
WSREP_SST: [INFO] rsync SST started on joiner (20240403 05:05:10.645)
2024-04-03  5:05:11 2 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 7339908, STRv: 3
2024-04-03  5:05:11 2 [Note] WSREP: IST receiver addr using tcp://192.168.18.78:4568
2024-04-03  5:05:11 2 [Note] WSREP: Prepared IST receiver for 0-7339908, listening at: tcp://192.168.18.78:4568
2024-04-03  5:05:11 0 [Note] WSREP: Member 0.0 (data-server) requested state transfer from '*any*'. Selected 1.0 (web02-server)(SYNCED) as donor.
2024-04-03  5:05:11 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 7339908)
2024-04-03  5:05:11 2 [Note] WSREP: Requesting state transfer: success, donor: 1
2024-04-03  5:05:11 2 [Note] WSREP: Resetting GCache seqno map due to different histories.
2024-04-03  5:05:11 2 [Note] WSREP: GCache history reset: 20c1183c-e5c5-11ee-9129-97e9406cb3f8:0 -> 20c1183c-e5c5-11ee-9129-97e9406cb3f8:7339908
2024-04-03  5:05:13 0 [Note] WSREP: (b0bc65f1-8af3, 'tcp://0.0.0.0:4567') turning message relay requesting off

展开
收起
华科信 2024-04-03 07:46:32 82 0
1 条回答
写回答
取消 提交回答
  • 北京阿里云ACE会长

    node1可能有一部分数据未能正确写入,导致数据不一致。

    检查磁盘和日志文件:

    确认 /var/lib/mysql/ 目录下的数据文件是否完整无损。
    检查 MariaDB 的错误日志(/var/log/mysql/mariadb.log),看是否有更详细的错误信息。

    2024-04-03 09:00:19
    赞同 1 展开评论 打赏

数据库领域前沿技术分享与交流

热门讨论

热门文章

相关电子书

更多
低代码开发师(初级)实战教程 立即下载
冬季实战营第三期:MySQL数据库进阶实战 立即下载
阿里巴巴DevOps 最佳实践手册 立即下载