prometheus|云原生|prometheus项目安装postgres-exporter监视组件的部署简介

本文涉及的产品
可观测监控 Prometheus 版,每月50GB免费额度
云原生数据库 PolarDB MySQL 版,通用型 2核4GB 50GB
EMR Serverless StarRocks,5000CU*H 48000GB*H
简介: prometheus|云原生|prometheus项目安装postgres-exporter监视组件的部署简介

前言:

最近刚好得空再琢磨prometheus,那肯定要玩玩postgrs-expoter这个数据库监控了,要不没办法体现prometheus的强大。

OK,postgres-exporter和其它的监视组件例如node-exporter基本是一样的,都是由go语言编写的,但是有一点雪微不同,它是监控数据库的,因此要和数据库打一点交道。所以呢,数据库需要有安装pg_stat插件,否则postgres-exporter可能会工作的不太好,很多数据收集不到吧。

其次呢,现在的网络文化就是一大抄,这个抄一下,那个抄一下,因此在学习部署的时候呢,满世界也就基本那两三篇文章,有部署前编写脚本的什么的,就看的很无语,总的给人的感觉就是postgres-exporter部署应该是一个简单的事情,但有得文章搞的非常复杂,可能有炫技的成分吧。

在这简单的吐槽一下,毕竟垃圾文章在学习的时候呢,你还不一定敢跳过去,万一有点东西呢?但是,真的看完后,我整个人是无语状态:这TM到底要搞哪样?纯粹浪费时间(虽然是条条大路通罗马,但不要搞一个羊肠小道让人踩坑我想这应该是每一个写文章分享知识的人的基本素养吧!!!!~~~~~)

好了,吐槽完毕,下面就postgres-exporter这个小组件的安装做一个尽量详细的说明。

一,

哪里下载postgres-exporter ?该下载哪个版本?下载哪种格式的文件?

毫无疑问的事情,必须从github下载,懒得折腾就到gitee找找看,碰碰运气喽,github的下载地址是:

https://github.com/prometheus-community/postgres_exporter/releases

OK,打开github,会发现有茫茫多的版本,what fuck? 这会难住N多的有选择症的同学,我的用例里使用的是0.11版本,根据github 的介绍呢,0.11以及以上的postgres-exporter 都还不错,功能够多,可以多机收集,当然,postgresql的版本我用的是12版本,建议使用12以上的postgresql版本,exporter就0.11以及以上版本即可。

版本问题说完了,就说说什么格式的文件,当然是二进制的文件了,source 源码太麻烦,麻烦就会出错,出错了就需要时间去更改,道理就是这个道理啦。因此,在本次用例里,我使用的是postgres_exporter-0.11.1.linux-amd64.tar

二,

postgres-exporter安装在哪里?

这个好像是一个废话文学,不过我还是打算在这里废话一下,当然是和数据库安装在一起啦,数据库在哪,exporter就安装在哪,和node-exporter一样的,不要犹豫,也无需思考。

三,

postgres-exporter的安装部署

总共需要两个文件,一个可执行程序    就是postgres_exporter-0.11.1.linux-amd64.tar这个文件解压出来的可执行文件,还一个就是启动脚本

启动脚本内需要设置一个变量,该变量名称是固定的

###说明:由于postgres-exporter 是和postgresql安装在一起的,因此,没有必要搞什么明文账号密码连接,直接localhost连接数据库就可以了,这样也就没有什么太多的安全隐患了。

一般数据库的pg_hba.conf 文件内都配置的是本地登录无需验证密码,那么,为什么本地登录无需密码呢?

话说,如果某黑客都已经登录到本地了,那设什么密码也没什么吊谓了(在安全方面来说,已经是完全无意义了),直接改配置文件重置密码也就几十秒的事情吧。

DATA_SOURCE_NAME=postgresql://localhost:数据库端口/postgres?sslmode=disable

环境变量和端口都在启动脚本里设置好,环境变量按实际的情况填写数据库的IP+端口postgres-exporter的默认端口是9187

cat > /usr/lib/systemd/system/postgres_exporter.service<<EOF
[Unit]
Description=postgres_exporter
After=network.target
[Service]
Type=simple
User=postgres
Environment="DATA_SOURCE_NAME=postgresql//:localhost@数据库ip:数据库端口/postgres?sslmode=disable"
ExecStart=/usr/local/bin/postgres_exporter --log.level=error --web.listen-address=:9187
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF

那么,这样配置有没有什么问题?

可以明确的说,稍有瑕疵,可能数据库的运行时间这个数据没有收集,但基本99%的数据库实例数据都收集到了

那么,想要完美的收集数据库实例数据,需要在上面的启动命令行添加一个额外参数,--extend.query-path=queries.yaml,

queries.yaml这个文件的内容如下(此节选做)

该文件主要是对postgres-exporter做一些扩展查询,例如,上面提到的服务器运行时间,表更新的行的数量,表删除的行的数量,手动vacuum的时间等等一些更细致的数据

pg_replication:
  query: "SELECT CASE WHEN NOT pg_is_in_recovery() THEN 0 ELSE GREATEST (0, EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))) END AS lag"
  master: true
  metrics:
    - lag:
        usage: "GAUGE"
        description: "Replication lag behind master in seconds"
pg_postmaster:
  query: "SELECT pg_postmaster_start_time as start_time_seconds from pg_postmaster_start_time()"
  master: true
  metrics:
    - start_time_seconds:
        usage: "GAUGE"
        description: "Time at which postmaster started"
pg_stat_user_tables:
  query: |
   SELECT
     current_database() datname,
     schemaname,
     relname,
     seq_scan,
     seq_tup_read,
     idx_scan,
     idx_tup_fetch,
     n_tup_ins,
     n_tup_upd,
     n_tup_del,
     n_tup_hot_upd,
     n_live_tup,
     n_dead_tup,
     n_mod_since_analyze,
     COALESCE(last_vacuum, '1970-01-01Z') as last_vacuum,
     COALESCE(last_autovacuum, '1970-01-01Z') as last_autovacuum,
     COALESCE(last_analyze, '1970-01-01Z') as last_analyze,
     COALESCE(last_autoanalyze, '1970-01-01Z') as last_autoanalyze,
     vacuum_count,
     autovacuum_count,
     analyze_count,
     autoanalyze_count
   FROM
     pg_stat_user_tables
  metrics:
    - datname:
        usage: "LABEL"
        description: "Name of current database"
    - schemaname:
        usage: "LABEL"
        description: "Name of the schema that this table is in"
    - relname:
        usage: "LABEL"
        description: "Name of this table"
    - seq_scan:
        usage: "COUNTER"
        description: "Number of sequential scans initiated on this table"
    - seq_tup_read:
        usage: "COUNTER"
        description: "Number of live rows fetched by sequential scans"
    - idx_scan:
        usage: "COUNTER"
        description: "Number of index scans initiated on this table"
    - idx_tup_fetch:
        usage: "COUNTER"
        description: "Number of live rows fetched by index scans"
    - n_tup_ins:
        usage: "COUNTER"
        description: "Number of rows inserted"
    - n_tup_upd:
        usage: "COUNTER"
        description: "Number of rows updated"
    - n_tup_del:
        usage: "COUNTER"
        description: "Number of rows deleted"
    - n_tup_hot_upd:
        usage: "COUNTER"
        description: "Number of rows HOT updated (i.e., with no separate index update required)"
    - n_live_tup:
        usage: "GAUGE"
        description: "Estimated number of live rows"
    - n_dead_tup:
        usage: "GAUGE"
        description: "Estimated number of dead rows"
    - n_mod_since_analyze:
        usage: "GAUGE"
        description: "Estimated number of rows changed since last analyze"
    - last_vacuum:
        usage: "GAUGE"
        description: "Last time at which this table was manually vacuumed (not counting VACUUM FULL)"
    - last_autovacuum:
        usage: "GAUGE"
        description: "Last time at which this table was vacuumed by the autovacuum daemon"
    - last_analyze:
        usage: "GAUGE"
        description: "Last time at which this table was manually analyzed"
    - last_autoanalyze:
        usage: "GAUGE"
        description: "Last time at which this table was analyzed by the autovacuum daemon"
    - vacuum_count:
        usage: "COUNTER"
        description: "Number of times this table has been manually vacuumed (not counting VACUUM FULL)"
    - autovacuum_count:
        usage: "COUNTER"
        description: "Number of times this table has been vacuumed by the autovacuum daemon"
    - analyze_count:
        usage: "COUNTER"
        description: "Number of times this table has been manually analyzed"
    - autoanalyze_count:
        usage: "COUNTER"
        description: "Number of times this table has been analyzed by the autovacuum daemon"
pg_statio_user_tables:
  query: "SELECT current_database() datname, schemaname, relname, heap_blks_read, heap_blks_hit, idx_blks_read, idx_blks_hit, toast_blks_read, toast_blks_hit, tidx_blks_read, tidx_blks_hit FROM pg_statio_user_tables"
  metrics:
    - datname:
        usage: "LABEL"
        description: "Name of current database"
    - schemaname:
        usage: "LABEL"
        description: "Name of the schema that this table is in"
    - relname:
        usage: "LABEL"
        description: "Name of this table"
    - heap_blks_read:
        usage: "COUNTER"
        description: "Number of disk blocks read from this table"
    - heap_blks_hit:
        usage: "COUNTER"
        description: "Number of buffer hits in this table"
    - idx_blks_read:
        usage: "COUNTER"
        description: "Number of disk blocks read from all indexes on this table"
    - idx_blks_hit:
        usage: "COUNTER"
        description: "Number of buffer hits in all indexes on this table"
    - toast_blks_read:
        usage: "COUNTER"
        description: "Number of disk blocks read from this table's TOAST table (if any)"
    - toast_blks_hit:
        usage: "COUNTER"
        description: "Number of buffer hits in this table's TOAST table (if any)"
    - tidx_blks_read:
        usage: "COUNTER"
        description: "Number of disk blocks read from this table's TOAST table indexes (if any)"
    - tidx_blks_hit:
        usage: "COUNTER"
        description: "Number of buffer hits in this table's TOAST table indexes (if any)"
# WARNING: This set of metrics can be very expensive on a busy server as every unique query executed will create an additional time series
pg_stat_statements:
  query: "SELECT t2.rolname, t3.datname, queryid, calls, total_time / 1000 as total_time_seconds, min_time / 1000 as min_time_seconds, max_time / 1000 as max_time_seconds, mean_time / 1000 as mean_time_seconds, stddev_time / 1000 as stddev_time_seconds, rows, shared_blks_hit, shared_blks_read, shared_blks_dirtied, shared_blks_written, local_blks_hit, local_blks_read, local_blks_dirtied, local_blks_written, temp_blks_read, temp_blks_written, blk_read_time / 1000 as blk_read_time_seconds, blk_write_time / 1000 as blk_write_time_seconds FROM pg_stat_statements t1 JOIN pg_roles t2 ON (t1.userid=t2.oid) JOIN pg_database t3 ON (t1.dbid=t3.oid) WHERE t2.rolname != 'rdsadmin'"
  master: true
  metrics:
    - rolname:
        usage: "LABEL"
        description: "Name of user"
    - datname:
        usage: "LABEL"
        description: "Name of database"
    - queryid:
        usage: "LABEL"
        description: "Query ID"
    - calls:
        usage: "COUNTER"
        description: "Number of times executed"
    - total_time_seconds:
        usage: "COUNTER"
        description: "Total time spent in the statement, in milliseconds"
    - min_time_seconds:
        usage: "GAUGE"
        description: "Minimum time spent in the statement, in milliseconds"
    - max_time_seconds:
        usage: "GAUGE"
        description: "Maximum time spent in the statement, in milliseconds"
    - mean_time_seconds:
        usage: "GAUGE"
        description: "Mean time spent in the statement, in milliseconds"
    - stddev_time_seconds:
        usage: "GAUGE"
        description: "Population standard deviation of time spent in the statement, in milliseconds"
    - rows:
        usage: "COUNTER"
        description: "Total number of rows retrieved or affected by the statement"
    - shared_blks_hit:
        usage: "COUNTER"
        description: "Total number of shared block cache hits by the statement"
    - shared_blks_read:
        usage: "COUNTER"
        description: "Total number of shared blocks read by the statement"
    - shared_blks_dirtied:
        usage: "COUNTER"
        description: "Total number of shared blocks dirtied by the statement"
    - shared_blks_written:
        usage: "COUNTER"
        description: "Total number of shared blocks written by the statement"
    - local_blks_hit:
        usage: "COUNTER"
        description: "Total number of local block cache hits by the statement"
    - local_blks_read:
        usage: "COUNTER"
        description: "Total number of local blocks read by the statement"
    - local_blks_dirtied:
        usage: "COUNTER"
        description: "Total number of local blocks dirtied by the statement"
    - local_blks_written:
        usage: "COUNTER"
        description: "Total number of local blocks written by the statement"
    - temp_blks_read:
        usage: "COUNTER"
        description: "Total number of temp blocks read by the statement"
    - temp_blks_written:
        usage: "COUNTER"
        description: "Total number of temp blocks written by the statement"
    - blk_read_time_seconds:
        usage: "COUNTER"
        description: "Total time the statement spent reading blocks, in milliseconds (if track_io_timing is enabled, otherwise zero)"
    - blk_write_time_seconds:
        usage: "COUNTER"
        description: "Total time the statement spent writing blocks, in milliseconds (if track_io_timing is enabled, otherwise zero)"
pg_process_idle:
  query: |
    WITH
      metrics AS (
        SELECT
          application_name,
          SUM(EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - state_change))::bigint)::float AS process_idle_seconds_sum,
          COUNT(*) AS process_idle_seconds_count
        FROM pg_stat_activity
        WHERE state = 'idle'
        GROUP BY application_name
      ),
      buckets AS (
        SELECT
          application_name,
          le,
          SUM(
            CASE WHEN EXTRACT(EPOCH FROM (CURRENT_TIMESTAMP - state_change)) <= le
              THEN 1
              ELSE 0
            END
          )::bigint AS bucket
        FROM
          pg_stat_activity,
          UNNEST(ARRAY[1, 2, 5, 15, 30, 60, 90, 120, 300]) AS le
        GROUP BY application_name, le
        ORDER BY application_name, le
      )
    SELECT
      application_name,
      process_idle_seconds_sum as seconds_sum,
      process_idle_seconds_count as seconds_count,
      ARRAY_AGG(le) AS seconds,
      ARRAY_AGG(bucket) AS seconds_bucket
    FROM metrics JOIN buckets USING (application_name)
    GROUP BY 1, 2, 3
  metrics:
    - application_name:
        usage: "LABEL"
        description: "Application Name"
    - seconds:
        usage: "HISTOGRAM"
        description: "Idle time of server processes"

四,

postgresql数据库的设置

postgresql数据库需要开启pg_stat_statements  开启方法见下面的博客,在这里就不重复了

postgresql|数据库|插件学习(一)---postgresql-12的内置插件pg_stat_statements的启用和使用_晚风_END的博客-CSDN博客

只是需要注意一点,尽量使用主流的postgresql数据库版本,现在应该是12版本

剩下的和node-exporter基本一样了,在prometheus的配置文件里加入postgres-exporter收集器任务就可以了,例如:

配置文件自动发现和分配置文件

添加配置Prometheus文件
- job_name: 'postgres_exporter'
    scrape_interval: 1s
    file_sd_configs:
      - files:
        - targets/postgresql/*.yml
        refresh_interval: 10s
    relabel_configs:
    - action: replace
      source_labels: ['__address__']
      regex: (.*):(.*)
      replacement: $1
      target_label: hostname
    - action: labeldrop
      regex: __meta_filepath
添加监控对象
vi targets/postgresql/pg1-instance.yml
[
  {
    "targets": ["localhost:9187"],
    "labels": { "instance": 'postgres:5432' }
  }
]

五,

新建普通用户使用postgres-exporter的情况(此步骤可以省略掉)

>=10版本主要是使用函数自动检查是否有postgres_exporter这个普通用户,并给予连接数据库的权限,设置postgres_exporter这个用户的密码是password,按需更改即可

<10版本主要是创建scheme,检查pg_stat插件,创建postgres_exporter使用的视图,这两个视图也主要是pg_stat_statements这个插件提供的

版本>=10的pg,以下三段SQL都要执行
CREATE OR REPLACE FUNCTION __tmp_create_user() returns void as $$
BEGIN
  IF NOT EXISTS (
          SELECT                       -- SELECT list can stay empty for this
          FROM   pg_catalog.pg_user
          WHERE  usename = 'postgres_exporter') THEN
    CREATE USER postgres_exporter;
  END IF;
END;
$$ language plpgsql;
SELECT __tmp_create_user();
DROP FUNCTION __tmp_create_user();
ALTER USER postgres_exporter WITH PASSWORD 'password';
ALTER USER postgres_exporter SET SEARCH_PATH TO postgres_exporter,pg_catalog;
-- If deploying as non-superuser (for example in AWS RDS), uncomment the GRANT
-- line below and replace <MASTER_USER> with your root user.
-- GRANT postgres_exporter TO <MASTER_USER>;
GRANT CONNECT ON DATABASE postgres TO postgres_exporter;
GRANT pg_monitor to postgres_exporter;
版本<10的pg,只执行下面的SQL
CREATE SCHEMA IF NOT EXISTS postgres_exporter;
GRANT USAGE ON SCHEMA postgres_exporter TO postgres_exporter;
CREATE OR REPLACE FUNCTION get_pg_stat_activity() RETURNS SETOF pg_stat_activity AS
$$ SELECT * FROM pg_catalog.pg_stat_activity; $$
LANGUAGE sql
VOLATILE
SECURITY DEFINER;
CREATE OR REPLACE VIEW postgres_exporter.pg_stat_activity
AS
  SELECT * from get_pg_stat_activity();
GRANT SELECT ON postgres_exporter.pg_stat_activity TO postgres_exporter;
CREATE OR REPLACE FUNCTION get_pg_stat_replication() RETURNS SETOF pg_stat_replication AS
$$ SELECT * FROM pg_catalog.pg_stat_replication; $$
LANGUAGE sql
VOLATILE
SECURITY DEFINER;
CREATE OR REPLACE VIEW postgres_exporter.pg_stat_replication
AS
  SELECT * FROM get_pg_stat_replication();
GRANT SELECT ON postgres_exporter.pg_stat_replication TO postgres_exporter;
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE OR REPLACE FUNCTION get_pg_stat_statements() RETURNS SETOF pg_stat_statements AS
$$ SELECT * FROM public.pg_stat_statements; $$
LANGUAGE sql
VOLATILE
SECURITY DEFINER;
CREATE OR REPLACE VIEW postgres_exporter.pg_stat_statements
AS
  SELECT * FROM get_pg_stat_statements();

六,

postgres-exporter使用哪个dashboard 的模版?

下载地址:

Dashboards | Grafana Labs

第一个人气最高的就比较好用   9628_rev7.json  ID是9628

这里有一个问题需要注意,该JSON文件里写的数据源名称是DS_PROMETHEUS,因此,要么创建数据源的时候,名字必须是这个,要么就把JSON文件里的所有DS_PROMETHEUS修改成自己实际再用的数据源名称~~~~~!!!!!!!!!!

相关实践学习
容器服务Serverless版ACK Serverless 快速入门:在线魔方应用部署和监控
通过本实验,您将了解到容器服务Serverless版ACK Serverless 的基本产品能力,即可以实现快速部署一个在线魔方应用,并借助阿里云容器服务成熟的产品生态,实现在线应用的企业级监控,提升应用稳定性。
目录
相关文章
|
26天前
|
运维 Kubernetes Cloud Native
云原生时代下,如何高效构建与部署微服务
【9月更文挑战第8天】随着云计算技术的飞速发展,云原生已成为现代软件架构的重要趋势。本文将深入浅出地介绍云原生概念、微服务架构的优势以及如何在云平台上高效构建和部署微服务。我们将通过实际的代码示例,展示在Kubernetes集群上部署一个简单的微服务应用的过程,帮助读者理解云原生环境下的微服务开发和运维实践。
|
17天前
|
Kubernetes Cloud Native Linux
云原生入门:Kubernetes的简易部署与应用
【8月更文挑战第49天】在云原生的世界里,Kubernetes(K8s)是一颗璀璨的星。本文将带你走进K8s的世界,从安装到简单应用,轻松驾驭这个强大的容器编排工具。让我们一起探索云原生的奥秘,解锁新技能!
|
19天前
|
Prometheus 监控 Cloud Native
docker安装prometheus+Granfan并监控容器
【9月更文挑战第14天】本文介绍了在Docker中安装Prometheus与Grafana并监控容器的步骤,包括创建配置文件、运行Prometheus与Grafana容器,以及在Grafana中配置数据源和创建监控仪表盘,展示了如何通过Prometheus抓取数据并利用Grafana展示容器的CPU使用率等关键指标。
|
19天前
|
存储 Prometheus 监控
在Ubuntu系统上安装与配置Prometheus的步骤
通过以上步骤,您应该已经成功在Ubuntu系统上安装并配置了Prometheus。您现在可以开始使用Prometheus收集和分析您的系统和应用程序的指标数据了。
38 1
|
22天前
|
运维 Cloud Native Devops
云原生架构的崛起与实践云原生架构是一种通过容器化、微服务和DevOps等技术手段,帮助应用系统实现敏捷部署、弹性扩展和高效运维的技术理念。本文将探讨云原生的概念、核心技术以及其在企业中的应用实践,揭示云原生如何成为现代软件开发和运营的主流方式。##
云原生架构是现代IT领域的一场革命,它依托于容器化、微服务和DevOps等核心技术,旨在解决传统架构在应对复杂业务需求时的不足。通过采用云原生方法,企业可以实现敏捷部署、弹性扩展和高效运维,从而大幅提升开发效率和系统可靠性。本文详细阐述了云原生的核心概念、主要技术和实际应用案例,并探讨了企业在实施云原生过程中的挑战与解决方案。无论是正在转型的传统企业,还是寻求创新的互联网企业,云原生都提供了一条实现高效能、高灵活性和高可靠性的技术路径。 ##
29 3
|
8天前
|
Cloud Native 持续交付 开发者
云端之旅:探索云原生应用的构建与部署
【9月更文挑战第26天】在这篇文章中,我们将一起踏上一段激动人心的旅程,深入探讨云原生应用的构建和部署。通过实际的代码示例和详细的步骤说明,我们将揭开云原生技术的神秘面纱,展示如何利用这些技术来创建灵活、可扩展的应用。无论你是云原生领域的新手还是希望深化理解的开发者,这篇文章都将为你提供宝贵的知识和技能。
18 0
|
26天前
|
Cloud Native API 持续交付
云原生技术:开启现代软件部署的新篇章
在数字化浪潮中,云计算已从简单的资源共享进化到支持复杂应用的平台。云原生技术作为这一演变的核心,不仅重塑了软件开发、部署的方式,还为业务敏捷性、可伸缩性和可靠性设定了新的标准。本文将探讨云原生的基本概念、核心技术及实践方法,揭示它如何引领企业走在数字化转型的前列。
|
2月前
|
Kubernetes Cloud Native Linux
云原生入门:Kubernetes的简易部署与应用
【8月更文挑战第31天】在云原生的世界里,Kubernetes(K8s)是一颗璀璨的星。本文将带你走进K8s的世界,从安装到简单应用,轻松驾驭这个强大的容器编排工具。让我们一起探索云原生的奥秘,解锁新技能!
|
2月前
|
开发者 容器 Java
Azure云之旅:JSF应用的神秘部署指南,揭开云原生的新篇章!
【8月更文挑战第31天】本文探讨了如何在Azure上部署JavaServer Faces (JSF) 应用,充分发挥其界面构建能力和云平台优势,实现高效安全的Web应用。Azure提供的多种服务如App Service、Kubernetes Service (AKS) 和DevOps简化了部署流程,并支持应用全生命周期管理。文章详细介绍了使用Azure Spring Cloud和App Service部署JSF应用的具体步骤,帮助开发者更好地利用Azure的强大功能。无论是在微服务架构下还是传统环境中,Azure都能为JSF应用提供全面支持,助力开发者拓展技术视野与实践机会。
13 0
|
2月前
|
Kubernetes Cloud Native 应用服务中间件
云原生技术入门与实践:Kubernetes的简易部署
【8月更文挑战第31天】云原生技术已成为现代软件部署的黄金标准,而Kubernetes作为其核心组件之一,在容器编排领域独领风骚。本文将引导您通过简单的步骤,快速部署一个Kubernetes集群,并运行一个简单的应用,让您轻松迈入云原生的世界。
下一篇
无影云桌面