Prometheus 监控MySQL

本文涉及的产品
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
云数据库 RDS PostgreSQL,集群系列 2核4GB
简介:

Prometheus 监控MySQL

目录
0、简介
1、mysql_exporter部署
2、mysql报警规则
0、简介
文中主要监控MySQL/MySQL主从信息
版本:mysql-5.7,mysql_exporter-0.12.1

mysql_exporter下载地址

1、mysql_exporter部署
1.下载mysql_exporter并解压

$ tar xf /opt/src/mysqld_exporter-0.12.1.linux-amd64.tar.gz

// 将mysql_exporter二进制文件拷贝至/usr/local/bin
$ cp /opt/src/mysqld_exporter-0.12.1.linux-amd64/mysqld_exporter /usr/local/bin/
2.需要授权用户给exporter使用

CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'abc12345' WITH MAX_USER_CONNECTIONS 5;

// 可查看主从运行情况查看线程,及所有数据库。

GRANT PROCESS, REPLICATION CLIENT, SELECT ON . TO 'exporter'@'localhost';

为该用户设置最大连接数为了避免监控数据过大导致服务器超载

3.修改mysql配置文件,添加刚才创建的exporter用户和密码

$ vim /etc/my.cnf
[client]
user=exporter
password=abc12345
4.启动exporter客户端,需指定mysql配置文件,读取exporter用户和密码

$ mysqld_exporter --config.my-cnf=/etc/my.cnf

常用参数:
// 选择采集innodb
--collect.info_schema.innodb_cmp
// innodb存储引擎状态
--collect.engine_innodb_status
// 指定配置文件
--config.my-cnf="/etc/my.cnf"
5.添加system系统服务

$ vim /usr/lib/systemd/system/mysql_exporter.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/bin/mysqld_exporter \
--config.my-cnf=/etc/my.cnf

[Install]
WantedBy=multi-user.target
6.启动添加的system服务

$ systemctl daemon-reload
$ systemctl start mysql_exporter.service
$ systemctl enable mysql_exporter.service

// mysql_export默认端口 - 9104
$ netstat -lntup | grep "9104"
tcp6 0 0 :::9104 :::* LISTEN 34137/mysqld_export
7.curl查看捕获mysql数据

curl http://localhost:9104/metrics
8.prometheus配置加入mysql节点

$ vim /usr/local/prometheus/prometheus.yml

  • job_name: 'mysql'

    scrape_interval: 5s
    # 静态添加node
    static_configs:
    • targets: ['10.4.7.12:9104']
      9.查看监控端是否接入

10.Granfana导入MySQL监控图表
去grafana dashboard下载对应的图表或者直接在grafana导入图表输入ID下载
图表下载地址:https://grafana.com/grafana/dashboards/7362
图表ID:11796

11.查看mysql dashboard

2、mysql报警规则
1.配置alertmanager报警,添加prometheus配置:

rule_files:
...

  • "/data/etc/mysql*.rules"
    2.配置mysql报警规则

groups:

  • name: MySQLStatsAlert
    rules:

    • alert: MySQL is down
      expr: mysql_up == 0

    for: 1m
    labels:

    severity: critical

    annotations:

    summary: "Instance {{ $labels.instance }} MySQL is down"
    description: "MySQL database is down. This requires immediate action!"
    • alert: open files high
      expr: mysql_global_status_innodb_num_open_files > (mysql_global_variables_open_files_limit) * 0.75

    for: 1m
    labels:

    severity: warning

    annotations:

    summary: "Instance {{ $labels.instance }} open files high"
    description: "Open files is high. Please consider increasing open_files_limit."
    • alert: Read buffer size is bigger than max. allowed packet size
      expr: mysql_global_variables_read_buffer_size > mysql_global_variables_slave_max_allowed_packet

    for: 1m
    labels:

    severity: warning

    annotations:

    summary: "Instance {{ $labels.instance }} Read buffer size is bigger than max. allowed packet size"
    description: "Read buffer size (read_buffer_size) is bigger than max. allowed packet size (max_allowed_packet).This can break your replication."
    • alert: Sort buffer possibly missconfigured
      expr: mysql_global_variables_innodb_sort_buffer_size <2561024 or mysql_global_variables_read_buffer_size > 41024*1024

    for: 1m
    labels:

    severity: warning

    annotations:

    summary: "Instance {{ $labels.instance }} Sort buffer possibly missconfigured"
    description: "Sort buffer size is either too big or too small. A good value for sort_buffer_size is between 256k and 4M."
    • alert: Thread stack size is too small
      expr: mysql_global_variables_thread_stack <196608

    for: 1m
    labels:

    severity: warning

    annotations:

    summary: "Instance {{ $labels.instance }} Thread stack size is too small"
    description: "Thread stack size is too small. This can cause problems when you use Stored Language constructs for example. A typical is 256k for thread_stack_size."
    • alert: Used more than 80% of max connections limited
      expr: mysql_global_status_max_used_connections > mysql_global_variables_max_connections * 0.8

    for: 1m
    labels:

    severity: warning

    annotations:

    summary: "Instance {{ $labels.instance }} Used more than 80% of max connections limited"
    description: "Used more than 80% of max connections limited"
    • alert: InnoDB Force Recovery is enabled
      expr: mysql_global_variables_innodb_force_recovery != 0

    for: 1m
    labels:

    severity: warning

    annotations:

    summary: "Instance {{ $labels.instance }} InnoDB Force Recovery is enabled"
    description: "InnoDB Force Recovery is enabled. This mode should be used for data recovery purposes only. It prohibits writing to the data."
    • alert: InnoDB Log File size is too small
      expr: mysql_global_variables_innodb_log_file_size < 16777216

    for: 1m
    labels:

    severity: warning

    annotations:

    summary: "Instance {{ $labels.instance }} InnoDB Log File size is too small"
    description: "The InnoDB Log File size is possibly too small. Choosing a small InnoDB Log File size can have significant performance impacts."
    • alert: InnoDB Flush Log at Transaction Commit
      expr: mysql_global_variables_innodb_flush_log_at_trx_commit != 1

    for: 1m
    labels:

    severity: warning

    annotations:

    summary: "Instance {{ $labels.instance }} InnoDB Flush Log at Transaction Commit"
    description: "InnoDB Flush Log at Transaction Commit is set to a values != 1. This can lead to a loss of commited transactions in case of a power failure."
    • alert: Table definition cache too small
      expr: mysql_global_status_open_table_definitions > mysql_global_variables_table_definition_cache

    for: 1m
    labels:

    severity: page

    annotations:

    summary: "Instance {{ $labels.instance }} Table definition cache too small"
    description: "Your Table Definition Cache is possibly too small. If it is much too small this can have significant performance impacts!"
    • alert: Table open cache too small
      expr: mysql_global_status_open_tables >mysql_global_variables_table_open_cache * 99/100

    for: 1m
    labels:

    severity: page

    annotations:

    summary: "Instance {{ $labels.instance }} Table open cache too small"
    description: "Your Table Open Cache is possibly too small (old name Table Cache). If it is much too small this can have significant performance impacts!"
    • alert: Thread stack size is possibly too small
      expr: mysql_global_variables_thread_stack < 262144

    for: 1m
    labels:

    severity: page

    annotations:

    summary: "Instance {{ $labels.instance }} Thread stack size is possibly too small"
    description: "Thread stack size is possibly too small. This can cause problems when you use Stored Language constructs for example. A typical is 256k for thread_stack_size."
    • alert: InnoDB Buffer Pool Instances is too small
      expr: mysql_global_variables_innodb_buffer_pool_instances == 1

    for: 1m
    labels:

    severity: page

    annotations:

    summary: "Instance {{ $labels.instance }} InnoDB Buffer Pool Instances is too small"
    description: "If you are using MySQL 5.5 and higher you should use several InnoDB Buffer Pool Instances for performance reasons. Some rules are: InnoDB Buffer Pool Instance should be at least 1 Gbyte in size. InnoDB Buffer Pool Instances you can set equal to the number of cores of your machine."
    • alert: InnoDB Plugin is enabled
      expr: mysql_global_variables_ignore_builtin_innodb == 1

    for: 1m
    labels:

    severity: page

    annotations:

    summary: "Instance {{ $labels.instance }} InnoDB Plugin is enabled"
    description: "InnoDB Plugin is enabled"
    • alert: Binary Log is disabled
      expr: mysql_global_variables_log_bin != 1

    for: 1m
    labels:

    severity: warning

    annotations:

    summary: "Instance {{ $labels.instance }} Binary Log is disabled"
    description: "Binary Log is disabled. This prohibits you to do Point in Time Recovery (PiTR)."
    • alert: Binlog Cache size too small
      expr: mysql_global_variables_binlog_cache_size < 1048576

    for: 1m
    labels:

    severity: page

    annotations:

    summary: "Instance {{ $labels.instance }} Binlog Cache size too small"
    description: "Binlog Cache size is possibly to small. A value of 1 Mbyte or higher is OK."
    • alert: Binlog Statement Cache size too small
      expr: mysql_global_variables_binlog_stmt_cache_size <1048576 and mysql_global_variables_binlog_stmt_cache_size > 0

    for: 1m
    labels:

    severity: page

    annotations:

    summary: "Instance {{ $labels.instance }} Binlog Statement Cache size too small"
    description: "Binlog Statement Cache size is possibly to small. A value of 1 Mbyte or higher is typically OK."
    • alert: Binlog Transaction Cache size too small
      expr: mysql_global_variables_binlog_cache_size <1048576

    for: 1m
    labels:

    severity: page

    annotations:

    summary: "Instance {{ $labels.instance }} Binlog Transaction Cache size too small"
    description: "Binlog Transaction Cache size is possibly to small. A value of 1 Mbyte or higher is typically OK."
    • alert: Sync Binlog is enabled
      expr: mysql_global_variables_sync_binlog == 1

    for: 1m
    labels:

    severity: page

    annotations:

    summary: "Instance {{ $labels.instance }} Sync Binlog is enabled"
    description: "Sync Binlog is enabled. This leads to higher data security but on the cost of write performance."
    • alert: IO thread stopped
      expr: mysql_slave_status_slave_io_running != 1

    for: 1m
    labels:

    severity: critical

    annotations:

    summary: "Instance {{ $labels.instance }} IO thread stopped"
    description: "IO thread has stopped. This is usually because it cannot connect to the Master any more."
    • alert: SQL thread stopped
      expr: mysql_slave_status_slave_sql_running == 0

    for: 1m
    labels:

    severity: critical

    annotations:

    summary: "Instance {{ $labels.instance }} SQL thread stopped"
    description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."
    • alert: SQL thread stopped
      expr: mysql_slave_status_slave_sql_running != 1

    for: 1m
    labels:

    severity: critical

    annotations:

    summary: "Instance {{ $labels.instance }} Sync Binlog is enabled"
    description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."
    • alert: Slave lagging behind Master
      expr: rate(mysql_slave_status_seconds_behind_master[1m]) >30

    for: 1m
    labels:

    severity: warning 

    annotations:

    summary: "Instance {{ $labels.instance }} Slave lagging behind Master"
    description: "Slave is lagging behind Master. Please check if Slave threads are running and if there are some performance issues!"
    • alert: Slave is NOT read only(Please ignore this warning indicator.)
      expr: mysql_global_variables_read_only != 0

    for: 1m
    labels:

    severity: page

    annotations:

    summary: "Instance {{ $labels.instance }} Slave is NOT read only"
    description: "Slave is NOT set to read only. You can accidentally manipulate data on the slave and get inconsistencies..."

    3.最后需要重启prometheus即可

原文地址https://www.cnblogs.com/jasonminghao/p/12715018.html

相关实践学习
通过可观测可视化Grafana版进行数据可视化展示与分析
使用可观测可视化Grafana版进行数据可视化展示与分析。
相关文章
|
3月前
|
Prometheus 运维 监控
智能运维实战:Prometheus与Grafana的监控与告警体系
【10月更文挑战第26天】Prometheus与Grafana是智能运维中的强大组合,前者是开源的系统监控和警报工具,后者是数据可视化平台。Prometheus具备时间序列数据库、多维数据模型、PromQL查询语言等特性,而Grafana支持多数据源、丰富的可视化选项和告警功能。两者结合可实现实时监控、灵活告警和高度定制化的仪表板,广泛应用于服务器、应用和数据库的监控。
481 3
|
2月前
|
存储 数据采集 Prometheus
Grafana Prometheus Altermanager 监控系统
Grafana、Prometheus 和 Alertmanager 是一套强大的开源监控系统组合。Prometheus 负责数据采集与存储,Alertmanager 处理告警通知,Grafana 提供可视化界面。本文简要介绍了这套系统的安装配置流程,包括各组件的下载、安装、服务配置及开机自启设置,并提供了访问地址和重启命令。适用于希望快速搭建高效监控平台的用户。
176 20
|
6月前
|
Prometheus 监控 Cloud Native
【监控】prometheus传统环境监控告警常用配置
【监控】prometheus传统环境监控告警常用配置
【监控】prometheus传统环境监控告警常用配置
|
2月前
|
Prometheus 监控 Cloud Native
Prometheus+Grafana监控Linux主机
通过本文的步骤,我们成功地在 Linux 主机上使用 Prometheus 和 Grafana 进行了监控配置。具体包括安装 Prometheus 和 Node Exporter,配置 Grafana 数据源,并导入预设的仪表盘来展示监控数据。通过这种方式,可以轻松实现对 Linux 主机的系统指标监控,帮助及时发现和处理潜在问题。
233 7
|
2月前
|
Prometheus 运维 监控
Prometheus+Grafana+NodeExporter:构建出色的Linux监控解决方案,让你的运维更轻松
本文介绍如何使用 Prometheus + Grafana + Node Exporter 搭建 Linux 主机监控系统。Prometheus 负责收集和存储指标数据,Grafana 用于可视化展示,Node Exporter 则采集主机的性能数据。通过 Docker 容器化部署,简化安装配置过程。完成安装后,配置 Prometheus 抓取节点数据,并在 Grafana 中添加数据源及导入仪表盘模板,实现对 Linux 主机的全面监控。整个过程简单易行,帮助运维人员轻松掌握系统状态。
330 3
|
2月前
|
Prometheus 监控 Cloud Native
无痛入门Prometheus:一个强大的开源监控和告警系统,如何快速安装和使用?
Prometheus 是一个完全开源的系统监控和告警工具包,受 Google 内部 BorgMon 系统启发,自2012年由前 Google 工程师在 SoundCloud 开发以来,已被众多公司采用。它拥有活跃的开发者和用户社区,现为独立开源项目,并于2016年加入云原生计算基金会(CNCF)。Prometheus 的主要特点包括多维数据模型、灵活的查询语言 PromQL、不依赖分布式存储、通过 HTTP 拉取时间序列数据等。其架构简单且功能强大,支持多种图形和仪表盘展示模式。安装和使用 Prometheus 非常简便,可以通过 Docker 快速部署,并与 Grafana 等可
655 2
|
3月前
|
存储 Prometheus 监控
监控堆外第三方监控工具Prometheus
监控堆外第三方监控工具Prometheus
88 3
|
3月前
|
存储 Prometheus 运维
在云原生环境中,阿里云ARMS与Prometheus的集成提供了强大的应用实时监控解决方案
在云原生环境中,阿里云ARMS与Prometheus的集成提供了强大的应用实时监控解决方案。该集成结合了ARMS的基础设施监控能力和Prometheus的灵活配置及社区支持,实现了全面、精准的系统状态、性能和错误监控,提升了应用的稳定性和管理效率。通过统一的数据视图和高级查询功能,帮助企业有效应对云原生挑战,促进业务的持续发展。
110 3
|
3月前
|
Prometheus 监控 Cloud Native
在 HBase 集群中,Prometheus 通常监控哪些类型的性能指标?
在 HBase 集群中,Prometheus 监控关注的核心指标包括 Master 和 RegionServer 的进程存在性、RPC 请求数、JVM 内存使用率、磁盘和网络错误、延迟和吞吐量、资源利用率及 JVM 使用信息。通过 Grafana 可视化和告警规则,帮助管理员实时监控集群性能和健康状况。
|
3月前
|
缓存 监控 关系型数据库
如何根据监控结果调整 MySQL 数据库的参数以提高性能?
【10月更文挑战第28天】根据MySQL数据库的监控结果来调整参数以提高性能,需要综合考虑多个方面的因素
156 1