【云原生监控系列第一篇】一文详解Prometheus普罗米修斯监控系统（山前前后各有风景，有风无风都很自由）（二）-阿里云开发者社区

【云原生监控系列第一篇】一文详解Prometheus普罗米修斯监控系统（山前前后各有风景，有风无风都很自由）（二）

2022-11-14 378

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

可观测可视化 Grafana 版，10个用户账号 1个月

简介： 【云原生监控系列第一篇】一文详解Prometheus普罗米修斯监控系统（山前前后各有风景，有风无风都很自由）（二）

二、二进制包部署Prometheus

2.1 环境准备工作

服务器类型	IP地址	组件
Prometheus服务器	192.168.109.138	Prometheus、node_exporter
grafana服务器	192.168.109.138	Grafana
被监控服务器	192.168.109.0/24	node_exporter

2.2 普罗米修斯的部署

（1）上传 prometheus-2.35.0.linux-amd64.tar.gz 到 /opt 目录中，并解压

#解压上传后的软件包
root@localhost opt]# tar xf prometheus-2.35.0.linux-amd64.tar.gz
#移动并命名
[root@localhost opt]# mv prometheus-2.35.0.linux-amd64 /usr/local/prometheus
[root@localhost opt]# cd /usr/local/prometheus
[root@localhost prometheus]# ls
console_libraries  consoles  LICENSE  NOTICE  prometheus  prometheus.yml  promtool

配置文件

cat /usr/local/prometheus/prometheus.yml | grep -v "^#"
global:     #用于prometheus的全局配置，比如采集间隔，抓取超时时间等
  scrape_interval: 15s    #采集目标主机监控数据的时间间隔，默认为1m
  evaluation_interval: 15s    #触发告警生成alert的时间间隔，默认是1m
  # scrape_timeout is set to the global default (10s).
  scrape_timeout: 10s   #数据采集超时时间，默认10s
alerting:    #用于alertmanager实例的配置，支持静态配置和动态服务发现的机制
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093
rule_files:    #用于加载告警规则相关的文件路径的配置，可以使用文件名通配机制
  # - "first_rules.yml"
  # - "second_rules.yml"
scrape_configs:   #用于采集时序数据源的配置
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"  #每个被监控实例的集合用job_name命名，支持静态配置（static_configs）和动态服务发现的机制（*_sd_configs）
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:    #静态目标配置，固定从某个target拉取数据
      - targets: ["localhost:9090"]

（2）配置系统启动文件，启动 Prometheust

cat > /usr/lib/systemd/system/prometheus.service <<'EOF'
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/prometheus/prometheus \
--config.file=/usr/local/prometheus/prometheus.yml \
--storage.tsdb.path=/usr/local/prometheus/data/ \
--storage.tsdb.retention=15d \
--web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
---------------------------------------------------------------
[Unit]  #服务单元
Description=Prometheus Server  #描述
Documentation=https://prometheus.io  
After=network.target   #依赖关系
[Service]
Type=simple
ExecStart=/usr/local/prometheus/prometheus \
--config.file=/usr/local/prometheus/prometheus.yml \  #配置文件
--storage.tsdb.path=/usr/local/prometheus/data/ \  #数据目录
--storage.tsdb.retention=15d \  #保存时间
--web.enable-lifecycle  #开启热加载
ExecReload=/bin/kill -HUP $MAINPID  #重载
Restart=on-failure
[Install]
WantedBy=multi-user.target

(3）启动

systemctl start prometheus
systemctl enable prometheus
netstat -natp | grep :9090
浏览器访问：http://192.168.109.138:9090 ，访问到 Prometheus 的 Web UI 界面
点击页面的 Status -> Targets，如看到 Target 状态都为 UP，说明 Prometheus 能正常采集到数据
http://192.168.109.138:9090/metrics ，可以看到 Prometheus 采集到自己的指标数据

三、部署 Exporters

部署 Node Exporter 监控系统级指标

（1）上传 node_exporter-1.3.1.linux-amd64.tar.gz 到 /opt 目录中，并解压

cd /opt/
tar xf node_exporter-1.3.1.linux-amd64.tar.gz
mv node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin

（2）配置启动文件

cat > /usr/lib/systemd/system/node_exporter.service <<'EOF'
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--collector.ntp \
--collector.mountstats \
--collector.systemd \
--collector.tcpstat
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF

（3）启动

systemctl start node_exporter
systemctl enable node_exporter
netstat -natp | grep :9100
浏览器访问：http://192.168.109.138:9100/metrics ，可以看到 Node Exporter 采集到的指标数据

常用的各指标：

node_cpu_seconds_total

node_memory_MemTotal_bytes

node_filesystem_size_bytes{mount_point=PATH}

node_system_unit_state{name=}

node_vmstat_pswpin：系统每秒从磁盘读到内存的字节数

node_vmstat_pswpout：系统每秒钟从内存写到磁盘的字节数

更多指标介绍：https://github.com/prometheus/node_exporter

（4）修改 prometheus 配置文件，加入到 prometheus 监控中

vim /usr/local/prometheus/prometheus.yml
#在尾部增加如下内容
  - job_name: nodes
    metrics_path: "/metrics"
    static_configs:
    - targets:
   - 192.168.109.138:9100
   - 192.168.109.137:9100
   - 192.168.109.136:9100
      labels:
        service: kubernetes

（5）重新载入配置

curl -X POST http://192.168.109.138:9090/-/reload     #热加载
或systemctl reload prometheus
浏览器查看 Prometheus 页面的 Status -> Targets

四、部署Grafana进行展示

（1）下载和安装

下载地址：

https://grafana.com/grafana/download

https://mirrors.bfsu.edu.cn/grafana/yum/rpm/

#使用yum解决依赖关系  我这边直接上传软件包到opt
yum install -y grafana-7.4.0-1.x86_64.rpm
systemctl start grafana-server
systemctl enable grafana-server
netstat -natp | grep :3000
浏览器访问：http://192.168.109.138:3000 ，默认账号和密码为 admin/admin

（2）配置数据源

Configuration -> Data Sources -> Add data source -> 选择 Prometheus
HTTP -> URL 输入 http://192.168.109.138:9090
点击 Save & Test
点击 上方菜单 Dashboards，Import 所有默认模板
Dashboards -> Manage ，选择 Prometheus 2.0 Stats 或 Prometheus Stats 即可看到 Prometheus job 实例的监控图像

（3）导入 grafana 监控面板

浏览器访问：https://grafana.com/grafana/dashboards ，在页面中搜索 node exporter ，选择适合的面板，点击 Copy ID 或者 Download JSON
在 grafana 页面中，+ Create -> Import ，输入面板 ID 号或者上传 JSON 文件，点击 Load，即可导入监控面板