前言
随着业务需求的增加导致应用的不断扩容,虚拟机越来越多,不利于运维快速了解虚拟机状态(cpu、磁盘、内存)、应用jvm情况及错误日志采集分析等,故需要结合实际情况做监控大盘,以快速了解虚拟机状态、和应用异常日志分析。
简介
本文主要介绍如何使用Grafana和Prometheus做监控大盘,监控内容包括主机监控、PG数据库监控、JVM 监控、日志采集。
软件清单
应用名称 |
用途 |
Grafana |
监控大盘,监控信息统一展示 |
Prometheus |
普罗米修斯监控 |
node_exporter |
服务器主机监控数据采集 |
postgres_exporter |
数据监控数据采集 |
jvm_prometheus_javaagent-0.12.0.jar |
Java应用jvm监控 |
Loki |
日志数据统一存储 |
Promtail |
日志文件采集 |
cadvisor |
Docker容器监控:cpu、内存、.... |
安装介绍
1. Prometheus 安装
1.1 安装包准备
Prometheus镜像包
1.2 Prometheus.yml配置文件说明
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s).
# Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus'
# metrics_path defaults to '/metrics' # scheme defaults to 'http'.
static_configs: - targets: ['localhost:9090']
# 此处为需要采集的主机信息,node_exporter,后续软件安装介绍 - job_name: "Linux" static_configs: - targets: ['172.16.0.2:9100','172.16.0.3:9100','172.16.0.4:9100'] # 数据库指标采集配置 - job_name: "postgres" static_configs: - targets: ['172.16.0.5:9187'] labels: instance: 'database1' - targets: ['172.16.0.6:9187'] labels: instance: 'database2' # jvm 监控 - job_name: "java" scrape_interval: 10s static_configs: - targets: ['172.16.0.2:30013','172.16.0.3:30013'] # 容器状态监控 - job_name: "docker" static_configs: - targets: ['172.16.0.2:8081','172.16.0.3:8081','172.16.0.4:8081'] |
1.3 部署命令
docker run -d \ --name=promethues \ -p 9090:9090 \ -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml \ b205ccdd28d3 |
1.4 服务验证
2. Grafana 安装
2.1 安装包准备
Grafana镜像包
2.2 部署命令
docker run -d \ --name=grafana \ -p 3000:3000 \ 78409b134146 |
2.3 验证
3. node_exporter主机监控
3.1 软件包及脚本准备
本文介绍使用二进制安装包介绍,也可使用docker镜像安装在此暂不做介绍。
node_exporter-0.17.0.linux-amd64.tar.gz
node_exporter.sh
#!/bin/bash tar zxf node_exporter-0.17.0.linux-amd64.tar.gz mv node_exporter-0.17.0.linux-amd64 /usr/local/node_exporter cat <<EOF >/usr/lib/systemd/system/node_exporter.service [Unit] Description=https://prometheus.io
[Service] Restart=on-failure ExecStart=/usr/local/node_exporter/node_exporter
[Install] WantedBy=multi-user.target EOF
systemctl daemon-reload systemctl enable node_exporter systemctl restart node_exporter |
3.2 服务创建
将node_exporter二进制包和安装脚本拷贝到对应主机/opt/node_exporter下,执行node_exporter.sh安装脚本后node_exporter服务创建成功
3.3 验证服务是否部署成功
浏览器访问:http://ip地址:9100/metrics,显示当前虚拟主机采集到的指标信息,如下图:
4. PG数据库监控
4.1 安装包准备
postgres_exporter_v0.8.0_linux-amd64.tar.gz
将二进制压缩包放到/opt下解压为postgres_exporter
4.2 修改环境变量
vi ~/.bash_profile export DATA_SOURCE_NAME="postgresql://数据库用户名:密码@数据库ip:端口/postgres?sslmode=disable" export PG_EXPORTER_EXTEND_QUERY_PATH="/opt/postgres_exporter/custom.yaml" source ~/.bash_profile |
4.3 服务创建
以”后台”方式执行程序
/opt/postgres_exporter/postgres_exporter >/dev/null 2>&1 & |
5. JVM 监控
5.1 软件包
jmx_prometheus_javaagent-0.12.0.jar
5.2 编辑配置文件
# vi config.yaml --- rules: - pattern: '.*' # vi catalina.sh JAVA_OPTS="-Duser.timezone=GMT+08 -javaagent:/home/jenkins/tomcat/bin/jmx_prometheus_javaagent-0.12.0.jar=30013:/home/jenkins/tomcat/bin/config.yaml" |
说明: /home/jenkins/tomcat/bin/jmx_prometheus_javaagent-0.12.0.jar 为jvm包路径 30013: jvm监控端口 /home/jenkins/tomcat/bin/config.yaml配置文件路径 |
5.3 启动应用
tomcat/bin/startup.sh
6. Loki日志管理&Promtail日志采集
日志采集使用docker镜像安装,故需要在已安装docker服务下运行。
6.1 镜像准备
Loki镜像包
Promtail镜像包
6.2 loki-local-config.yaml配置文件
# vi loki-local-config.yaml auth_enabled: false
server: http_listen_port: 3100
ingester: lifecycler: address: 127.0.0.1 ring: kvstore: store: inmemory replication_factor: 1 final_sleep: 0s chunk_idle_period: 5m chunk_retain_period: 30s
schema_config: configs: - from: 2018-04-15 store: boltdb object_store: filesystem schema: v9 index: prefix: index_ period: 168h
storage_config: boltdb: directory: /tmp/loki/index
filesystem: directory: /tmp/loki/chunks
limits_config: enforce_metric_name: false reject_old_samples: true reject_old_samples_max_age: 168h
chunk_store_config: max_look_back_period: 0
table_manager: chunk_tables_provisioning: inactive_read_throughput: 0 inactive_write_throughput: 0 provisioned_read_throughput: 0 provisioned_write_throughput: 0 index_tables_provisioning: inactive_read_throughput: 0 inactive_write_throughput: 0 provisioned_read_throughput: 0 provisioned_write_throughput: 0 retention_deletes_enabled: false retention_period: 0 |
6.3 promtail-docker-config.yaml配置文件
# cat promtail-docker-config.yaml server: http_listen_port: 0 grpc_listen_port: 0
positions: filename: /etc/promtail/positions.yaml # 游标记录上一次同步位置 sync_period: 10s #10秒钟同步一次
clients: - url: http://推送ip地址:3100/loki/api/v1/push #推送的Loki服务地址
scrape_configs: - job_name: test-java-log static_configs: - targets: - localhost labels: job: content-cloud-test app: content-cloud-test __path__: /opt/test81/*/logs/*.out # docker运行时已经把宿主机的目录 /opt/ 映射给了promtail 容器的 /opt/,所以这个地方可以直接访问log文件 |
6.4 docker-compose.yml配置文件
# vi docker-compose.yml version: "3" services: loki: image: 172.16.0.2/grafana/loki:20200923 container_name: loki restart: always ports: - "3100:3100" volumes: - $PWD:/etc/loki command: -config.file=/etc/loki/loki-local-config.yaml promtail: image: 172.16.0.2/grafana/promtail:20200923 container_name: promtail restart: always volumes: - $PWD:/etc/promtail - /home/docker/docker-mount:/opt/test81/ command: -config.file=/etc/promtail/promtail-docker-config.yaml |
6.5 部署命令
docker-compose top -d
6.6 示例
操作步骤,打开grafana页面,配置Loki数据源,查询条件:
{job="content-cloud-test",filename="/opt/test81/cacheapi/logs/catalina.out"}
使用介绍
1. 配置Prometheus或者Loki数据源
2. DataSources查看数据源列表
添加成功,通过Configuration > DataSources查看数据源列表
3. 导入仪表盘
通过dashboard json文件或者输入官方仪表盘id(在线下载)
4. 查看仪表盘