1. 下载
- prometheus(国内镜像):https://mirrors.tuna.tsinghua.edu.cn/github-release/prometheus/prometheus/2.34.0%20_%202022-03-15/prometheus-2.34.0.linux-amd64.tar.gz
- pushgateway(国外镜像,较慢):https://github.com/prometheus/pushgateway/releases/download/v1.4.2/pushgateway-1.4.2.linux-amd64.tar.gz
- node-exporter(国外镜像,较慢): https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
- Grafana(国内镜像):https://repo.huaweicloud.com/grafana/8.4.7/grafana-enterprise-8.4.7.linux-amd64.tar.gz
- Alertmanager(国外镜像,较慢):https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
进入服务器,执行如下命令下载:
cd /opt mkdir prometheus_env cd prometheus_env wget https://mirrors.tuna.tsinghua.edu.cn/github-release/prometheus/prometheus/2.34.0%20_%202022-03-15/prometheus-2.34.0.linux-amd64.tar.gz wget https://github.com/prometheus/pushgateway/releases/download/v1.4.2/pushgateway-1.4.2.linux-amd64.tar.gz wget https://github.com/prometheus/pushgateway/releases/download/v1.4.2/pushgateway-1.4.2.linux-amd64.tar.gz wget https://repo.huaweicloud.com/grafana/8.4.7/grafana-enterprise-8.4.7.linux-amd64.tar.gz wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz
2. 解压
tar -zxvf prometheus-2.34.0.linux-amd64.tar.gz tar -zxvf pushgateway-1.4.2.linux-amd64.tar.gz tar -zxvf pushgateway-1.4.2.linux-amd64.tar.gz tar -zxvf alertmanager-0.24.0.linux-amd64.tar.gz tar -zxvf grafana-enterprise-8.4.7.linux-amd64.tar.gz
3. 配置
3.1 修改prometheus.yml 配置文件
cd /opt/prometheus_env/prometheus-2.34.0.linux-amd64 vi prometheus.yml
内容如下:
global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: - 127.0.0.1:9093 rule_files: - "/opt/prometheus_env/prometheus-2.34.0.linux-amd64/alarm_rules.yml" scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['127.0.0.1:9090'] labels: instance: 'prometheus' - job_name: 'linux' static_configs: - targets: ['127.0.0.1:9100'] labels: instance: 'localhost' - job_name: 'pushgateway' static_configs: - targets: ['127.0.0.1:9091'] labels: instance: 'pushgateway'
3.2 新增alarm_rules.yml 文件
cd /opt/prometheus_env/prometheus-2.34.0.linux-amd64 vi alarm_rules.yml
内容如下:
groups: - name: node rules: - alert: server_status expr: up{} == 0 for: 15s annotations: summary: "机器{{ $labels.instance }} 挂了" description: "请立即查看问题!" - alert: server_status expr: 100 - ((node_memory_MemAvailable_bytes * 100) / node_memory_MemTotal_bytes) > 40 for: 1s annotations: summary: "机器{{ $labels.instance }} 内存大于50%" description: "请立即查看问题!" - alert: server_status expr: (1 - avg(rate(node_cpu_seconds_total{mode="idle"}[2m])) by (instance)) * 100 > 70 for: 1s annotations: summary: "机器{{ $labels.instance }} CPU使用率大于70%" description: "请立即查看问题!" - alert: server_status expr: max((node_filesystem_size_bytes{fstype=~"ext.?|xfs"}-node_filesystem_free_bytes{fstype=~"ext.?|xfs"}) *100/(node_filesystem_avail_bytes {fstype=~"ext.?|xfs"}+(node_filesystem_size_bytes{fstype=~"ext.?|xfs"}-node_filesystem_free_bytes{fstype=~"ext.?|xfs"})))by(instance) > 80 for: 15s annotations: summary: "机器{{ $labels.instance }} 分区使用率大于80%" description: "请立即查看问题!"
3.3 修改alertmanager.yml 文件
cd /opt/prometheus_env/alertmanager-0.24.0.linux-amd64 vi alertmanager.yml
修改内容如下:
global: resolve_timeout: 5m smtp_smarthost: 'smtp.exmail.qq.com:465' # 定义163邮箱服务器端 smtp_from: '你的邮箱地址@qq.com' #来自哪个邮箱发的 smtp_auth_username: 'yanglinwei@digibms.com' 邮箱验证 smtp_auth_password: '邮箱密码' # 邮箱授权码,不是登录密码 smtp_require_tls: false # 是否启用tls route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 3m # 发送告警后间隔多久再次发送,减少发送邮件频率 receiver: 'mail' #发送的告警媒体 receivers: - name: 'mail' # 接收者配置,这里要与接收媒体一致 email_configs: - to: '接收邮箱@qq.com' #发送给谁的邮箱,多个人多行列出 #inhibit_rules: # - source_match: # severity: 'critical' # target_match: # severity: 'warning' # equal: ['alertname', 'dev', 'instance']
3.4 修改defaults.ini文件
修改defaults.ini 是界面可以匿名访问:
vi vi /opt/prometheus_env/grafana-8.4.7/conf/defaults.ini
内容如下:
#################################### Anonymous Auth ###################### [auth.anonymous] # enable anonymous access enabled = true
3.3 新增service文件
cd /usr/lib/systemd/system
① pushgateway.service文件,内容如下:
[Unit] Description=Prometheus Push Gateway After=network.target [Service] ExecStart=/opt/prometheus_env/pushgateway-1.4.2.linux-amd64/pushgateway User=root [Install] WantedBy=multi-user.target
② node_exporter.service文件,内容如下:
[Unit] Description=Prometheus Node Exporter After=network.target [Service] ExecStart=/opt/prometheus_env/node_exporter-1.3.1.linux-amd64/node_exporter User=root [Install] WantedBy=multi-user.target
③ prometheus.service文件,内容如下:
[Unit] Description=Prometheus Service After=network.target [Service] ExecStart=/opt/prometheus_env/prometheus-2.34.0.linux-amd64/prometheus \ --config.file=/opt/prometheus_env/prometheus-2.34.0.linux-amd64/prometheus.yml \ --web.read-timeout=5m \ --web.max-connections=10 \ --storage.tsdb.retention=15d \ --storage.tsdb.path=/prometheus/data \ --query.max-concurrency=20 \ --query.timeout=2m User=root [Install] WantedBy=multi-user.target
④ grafana.service文件,内容如下:
[Unit] Description=Grafana After=network.target [Service] ExecStart=/opt/prometheus_env/grafana-8.4.7/bin/grafana-server \ --config=/opt/prometheus_env/grafana-8.4.7/conf/defaults.ini \ --homepath=/opt/prometheus_env/grafana-8.4.7 [Install] WantedBy=multi-user.target
⑤ alertmanager.service文件,内容如下:
[Unit] Description=Prometheus alertmanager After=network.target [Service] ExecStart=/opt/prometheus_env/alertmanager-0.24.0.linux-amd64/alertmanager \ --storage.path=/opt/prometheus_env/alertmanager-0.24.0.linux-amd64/data \ --config.file=/opt/prometheus_env/alertmanager-0.24.0.linux-amd64/alertmanager.yml User=root [Install] WantedBy=multi-user.target
4. 启动
重载配置:
systemctl daemon-reload
开启服务:
systemctl start pushgateway systemctl start node_exporter systemctl start prometheus systemctl start grafana systemctl start alertmanager
设置开机启动:
systemctl enable pushgateway systemctl enable node_exporter systemctl enable prometheus systemctl enable grafana systemctl enable alertmanager
查看服务状态:
systemctl status pushgateway
5. 其它命令
开启端口,能被浏览器访问(例如开启:3000)
firewall-cmd --zone=public --add-port=3000/tcp --permanent
重启防火墙:
firewall-cmd --reload
查看端口:
netstat -tunlp | grep 9090
查看进程:
ps -elf|grep promethues
6. 浏览器验证
浏览器打开:
http://127.0.0.1:9090/targets (账号密码均为admin)