ganglia customized module : postgresql module

本文涉及的产品
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
云原生数据库 PolarDB MySQL 版,通用型 2核8GB 50GB
简介:
前面我们简单的聊了一下ganglia的架构, 以及它和其他监控软件如nagios, zabbix对比的优缺点.
gmond只整合了一些常用的监控metric, 如果需要监控不在gmond范围内的metric, 那么ganglia提供了一个接口, 可以用于扩展.
例如你可以用C/C++写扩展模块, 也可以用python写扩展模块, 如果你不想写扩展模块, 也可以使用gmetric直接向send channel发送自定义的metric数据.
本文将拿一个开源的ganglia module for postgresql为例, 结合PostgreSQL常用的监控指标来实现ganglia采样postgresql监控指标数据.
这个模块的代码请参见 : 

postgresql python module的详解, 参考

这个模块提到的监控项有 : 
# Create your queries here. Keys whose names match those defined in the default
# set are overridden. Any additional key-value pairs (i.e. query) will not be
# added to the Ganglia metric definition but can be useful for data purposes.
metric_defs = {
    "pg_backends_waiting": {
        "description": "Number of postgres backends that are waiting",
        "units": "backends",
        "query": "SELECT count(*) AS backends_waiting FROM " + \
            "pg_stat_activity WHERE waiting = 't';"
    },
    "pg_database_size": {
        "description": "Total size of all databases in bytes",
        "value_type": "double",
        "format": "%.0f",
        "units": "bytes",
        "query": "SELECT sum(pg_database_size(d.oid)) AS " + \
            "size_database FROM pg_database d ORDER BY 1 DESC;"
    },
    "pg_idx_blks_read": {
        "description": "Total index blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(idx_blks_read) AS idx_blks_read " + \
            "FROM pg_statio_all_indexes;"
    },
    "pg_idx_blks_hit": {
        "description": "Total index blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(idx_blks_hit) AS idx_blks_hit " + \
            "FROM pg_statio_all_indexes;"
    },
    "pg_locks": {
        "description": "Number of locks held",
        "units": "locks",
        "query": "SELECT count(*) FROM pg_locks;"
    },
    "pg_query_time_idle_in_txn": {
        "description": 'Age of longest _idle in transaction_ transaction',
        "units": "seconds",
        "query": "SELECT COALESCE(max(COALESCE(ROUND(EXTRACT(epoch " + \
            "FROM now()-query_start)),0)),0) AS " + \
            "query_time_idle_in_txn FROM pg_stat_activity " + \
            "WHERE current_query = '% in transaction';"
    },
    "pg_max_idle_txn_time": {
        "description": "Age of longest idle transaction",
        "units": "seconds",
        "query": "SELECT COALESCE(max(COALESCE(ROUND(EXTRACT(epoch " + \
            "FROM now()-query_start)),0)),0) as query_time_max FROM " + \
             "pg_stat_activity WHERE current_query <> '<IDLE>';"
    },
    "pg_txn_time_max": {
        "description": "Age of longest transaction",
        "units": "seconds",
        "query": "SELECT max(COALESCE(ROUND(EXTRACT(epoch " + \
            "FROM now()-xact_start)),0)) as txn_time_max " + \
            "FROM pg_stat_activity WHERE xact_start IS NOT NULL;"
    },
    "pg_connections": {
        "description": "Number of connections",
        "units": "connctions",
        "query": "SELECT sum(numbackends) FROM pg_stat_database;"
    },
    "pg_wal_files": {
        "description": "number of wal files in pg_xlog directory",
        "units": "# wal files",
        "query": "SELECT count(*) AS wal_files FROM " + \
            "pg_ls_dir('pg_xlog') WHERE pg_ls_dir ~ E'^[0-9A-F]{24}$';"
    },
    "pg_xact_commit": {
        "description": "Transactions committed",
        "slope": "positive",
        "units": "transactions",
        "query": "SELECT sum(xact_commit) as xact_commit FROM " + \
            "pg_stat_database;",
    },
    "pg_xact_rollback": {
        "description": "Transactions rolled back",
        "slope": "positive",
        "units": "transactions",
        "query": "SELECT sum(xact_rollback) as xact_rollback FROM " + \
            "pg_stat_database;",
    },
    "pg_blks_read": {
        "description": "Blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(blks_read) as blks_read FROM " + \
            "pg_stat_database;",
    },
    "pg_blks_hit": {
        "description": "Blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(blks_hit) as blks_hit FROM " + \
            "pg_stat_database;",
    },
    "pg_tup_returned": {
        "description": "Tuples returned",
        "slope": "positive",
        "units": "tuples",
        "query": "SELECT sum(tup_returned) as tup_returned FROM " + \
            "pg_stat_database;",
    },
    "pg_tup_fetched": {
        "description": "Tuples fetched",
        "slope"       : "positive",
        "units": "tuples",
        "query": "SELECT sum(tup_fetched) as tup_fetched FROM " + \
            "pg_stat_database;",
    },
    "pg_tup_inserted": {
        "description": "Tuples inserted",
        "slope": "positive",
        "units": "tuples",
        "query": "SELECT sum(tup_inserted) as tup_inserted FROM " + \
            "pg_stat_database;",
    },
    "pg_tup_updated": {
        "description": "Tuples updated",
        "slope": "positive",
        "units": "tuples",
        "query": "SELECT sum(tup_updated) as tup_updated FROM " + \
            "pg_stat_database;",
    },
    "pg_tup_deleted": {
        "description": "Tuples deleted",
        "slope": "positive",
        "units": "tuples",
        "query": "SELECT sum(tup_deleted) as tup_deleted FROM " + \
            "pg_stat_database;",
    },
    "pg_heap_blks_read": {
        "description": "Heap blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(heap_blks_read) as heap_blks_read FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_heap_blks_hit": {
        "description": "Heap blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(heap_blks_hit) as heap_blks_hit FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_idx_blks_read_tbl": {
        "description": "Index blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(idx_blks_read) as idx_blks_read_tbl FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_idx_blks_hit_tbl": {
        "description": "Index blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(idx_blks_hit) as idx_blks_hit_tbl FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_toast_blks_read": {
        "description": "Toast blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(toast_blks_read) as toast_blks_read FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_toast_blks_hit": {
        "description": "Toast blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(toast_blks_hit) as toast_blks_hit FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_tidx_blks_read": {
        "description": "Toast index blocks read",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(tidx_blks_read) as tidx_blks_read FROM " + \
            "pg_statio_all_tables;",
    },
    "pg_tidx_blks_hit": {
        "description": "Toast index blocks hit",
        "slope": "positive",
        "units": "blocks",
        "query": "SELECT sum(tidx_blks_hit) as tidx_blks_hit FROM " + \
            "pg_statio_all_tables;",
    },
}

除此以外, 还可以丰富一下监控项. 如下 : 
postgresql常见监控项 : 
1. 是否打开归档, 布尔逻辑值.
2. 是否打开autovacuum , 布尔逻辑值.
3. 数据库年龄, 数字, 大于2, 低于20亿(大约).
4. 连接数, 低于最大连接数
5. 提交和回滚的比例(问题修复后,手工清除统计信息pg_stat_reset(), 连接到对应的库执行).
6. standby延迟, 字节数, 延迟高说明网络或STANDBY的IO有问题. 
7. 锁等待, 锁等待时间, 长则说明业务或在SQL的处理效率上有问题. 
8. 长事务/空闲事务, 长事务一般可能是业务设计的问题. 
9. prepared事务, 长时间的PREPARED事务可能是业务层出了问题, 例如未及时提交, 或业务以及挂了, prepared事务变成了僵尸事务. 
10. 序列剩余量(每个库查询), 序列如果没有开启cycle, 那么需要注意是否耗尽的问题. 
11. 未使用的索引(每个库查询), 如果索引不常使用, 那么说明索引不如不建. 
以上监控指标都可以量化, 所以使用ganglia来监控是可以的.

常见巡检项 : 

[参考 ]

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍如何基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
人工智能 自然语言处理 安全
国内首发,阿里云魔搭社区上架百川智能Baichuan 2系列模型
国内首发,阿里云魔搭社区上架百川智能Baichuan 2系列模型
1260 0
|
IDE Go 开发工具
Go开发IDE全览:GoLand vs VSCode全面解析
Go开发IDE全览:GoLand vs VSCode全面解析
860 0
|
存储 监控 NoSQL
一篇搞定Redis中的BigKey问题
BigKey的具体表现是redis中的key对应的value很大,占用的redis空间比较大,本质上是大value问题。
1621 0
|
9月前
|
人工智能 搜索推荐 vr&ar
《解锁无限可能:在鸿蒙开源系统中打造AI驱动的元宇宙应用》
鸿蒙系统的开源特性与人工智能技术的融合为开发者开启了创新大门,特别是在构建AI驱动的元宇宙应用方面展现出巨大潜力。通过深入理解鸿蒙的开源机制,开发者可以自由获取源代码,定制优化系统,并利用丰富的开源工具如DevEcoStudio快速搭建AI开发环境。鸿蒙的分布式架构支持多设备无缝协同,提升用户体验。同时,开源数据有助于提高AI训练效果,而积极参与社区贡献则加速了生态发展,共同推动元宇宙应用的创新与繁荣。
213 13
|
安全 Java 测试技术
Spring Boot集成支付宝支付:概念与实战
【4月更文挑战第29天】在电子商务和在线业务应用中,集成有效且安全的支付解决方案是至关重要的。支付宝作为中国领先的支付服务提供商,其支付功能的集成可以显著提升用户体验。本篇博客将详细介绍如何在Spring Boot应用中集成支付宝支付功能,并提供一个实战示例。
649 2
|
12月前
|
人工智能 弹性计算 自然语言处理
触手可及,函数计算玩转 AI 大模型
在AI技术迅速发展的背景下,大模型正推动各行业的智能化转型。企业为抓住机遇,纷纷部署AI大模型。阿里云函数计算凭借按量付费、高弹性和快速交付的特点,成为企业部署AI大模型的理想选择。本文介绍阿里云函数计算的技术解决方案,分析其优势,并通过具体应用场景评测其在AI大模型部署中的表现。
|
11月前
|
机器学习/深度学习 人工智能 自然语言处理
深入理解人工智能中的深度学习技术及其最新进展
深入理解人工智能中的深度学习技术及其最新进展
1161 33
|
12月前
|
5G
全面解读:5G NR频段划分及其重要性
【10月更文挑战第4天】
1337 4
|
JavaScript Java 测试技术
基于springboot+vue.js的网络海鲜市场系统附带文章和源代码设计说明文档ppt
基于springboot+vue.js的网络海鲜市场系统附带文章和源代码设计说明文档ppt
197 2
|
物联网 5G 智能硬件
【专栏】无线通信的基础术语,如频段、带宽、调制与解调,以及Wi-Fi、蓝牙、ZigBee等无线技术
【4月更文挑战第28天】本文介绍了无线通信的基础术语,如频段、带宽、调制与解调,以及Wi-Fi、蓝牙、ZigBee等无线技术。还涉及无线信号传播、信道容量、信噪比等概念。理解这些术语有助于深入理解无线通信原理,便于设计和优化无线系统。随着无线技术的不断发展,持续学习是关键。开始你的无线通信探索之旅吧!
779 0