Ceph 监控中应用 Prometheus relabel 功能-阿里云开发者社区

Ceph 监控中应用 Prometheus relabel 功能

2019-01-14 1486

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

可观测可视化 Grafana 版，10个用户账号 1个月

可观测监控 Prometheus 版，每月50GB免费额度

简介： 1. 问题描述工作环境中有三个独立的 Ceph 集群，分别负责对象存储、块存储和文件存储。搭建这几个 Ceph 集群时，我对 Ceph 重命名 Cluster name 的难度没有足够的了解，所以使用的都是默认的 cluster name：ceph，不巧的是 Prometheus 的 ceph_exporter 就是用 cluster name 来区分不同集群，结果是 Grafana 中各个集群的数据无法区分，所有的集群数据都绘制在了一个图标中，非常乱不说，而且部分数据还无法正常显示。

relabel

1. 问题描述

工作环境中有三个独立的 Ceph 集群，分别负责对象存储、块存储和文件存储。搭建这几个 Ceph 集群时，我对 Ceph 重命名 Cluster name 的难度没有足够的了解，所以使用的都是默认的 cluster name：ceph，不巧的是 Prometheus 的 ceph_exporter 就是用 cluster name 来区分不同集群，结果是 Grafana 中各个集群的数据无法区分，所有的集群数据都绘制在了一个图标中，非常乱不说，而且部分数据还无法正常显示。

也许大家会说，那就改 Ceph cluster name 不就好了。问题是 Ceph 修改 Cluster name 没那么简单，ceph 文件存储目录都是和 Cluster name 有对应关系的，所以很多配置文件和数据都需要修改目录才能生效，对于已经开始正式使用的 Ceph 集群，这么做风险有点大。当然如果给每个 Ceph 集群单独搭建一个 Prometheus 和 Grafana 环境的话，问题也能解决，但这种方式显得太没技术含量了，不到万不得已，实在不想采用。

我最开始想到的解决方式是修改 ceph_exporter，既然 cluster name 不行，那加上 Ceph 的 fsid 总能区分出来了吧，就像这样：

不过 fsid 这个变量很难直观看出来代表的是哪个 Ceph 集群，也不是一个好的方案。

最后多亏 neurodrone，才了解到 Prometheus 的 relabel 功能，可以完美的解决这个问题。

2. relabel 配置

Relabel 的本意其实修改导出 metrics 信息的 label 字段，可以对 metrics 做过滤，删除某些不必要的 metrics，label 重命名等，而且也支持对 label 的值作出修改。

举一个例子，三个集群的 ceph_pool_write_total 的 label cluster 取值都为 ceph。但在 Prometheus 的配置中，他们分别是分属于不通 job 的，我们可以通过对 job 进行 relabel 来修改 cluster label 的指，来完成区分。

# cluster1's metric
ceph_pool_write_total{cluster="ceph",pool=".rgw.root"} 4

# cluster2's metric
ceph_pool_write_total{cluster="ceph",pool=".rgw.root"} 10

# cluster3's metric
ceph_pool_write_total{cluster="ceph",pool=".rgw.root"} 7

具体的配置如下，cluster label 的值就改为了 ceph*，并且导出到了新 label clusters 中。

scrape_configs:
  - job_name: 'ceph1'
    relabel_configs:
    - source_labels: ["cluster"]
      replacement: "ceph1"
      action: replace
      target_label: "clusters"
    static_configs:
    - targets: ['ceph1:9128']
      labels:
        alias: ceph1

  - job_name: 'ceph2'
    relabel_configs:
    - source_labels: ["cluster"]
      replacement: "ceph2"
      action: replace
      target_label: "clusters"
    static_configs:
    - targets: ['ceph2:9128']
      labels:
        alias: ceph2

  - job_name: 'ceph3'
    relabel_configs:
    - source_labels: ["cluster"]
      replacement: "ceph3"
      action: replace
      target_label: "clusters"
    static_configs:
    - targets: ['ceph3:9128']
      labels:
        alias: ceph3

修改后的 metric 信息变成这个样子，这样我们就可以区分出不同的 Ceph 集群的数据了。

# cluster1's metric
ceph_pool_write_total{clusters="ceph1",pool=".rgw.root"} 4

# cluster2's metric
ceph_pool_write_total{clusters="ceph2",pool=".rgw.root"} 10

# cluster3's metric
ceph_pool_write_total{clusters="ceph3",pool=".rgw.root"} 7

3. Grafana dashboard 调整

光是修改 Prometheus 的配置还不够，毕竟我们还要在界面上能体现出来，Grafana 的 dashboard 也要做对应的修改，本文使用的 dashboard 是 Ceph - Cluster。

首先是要 dashboard 添加 clusters 变量，在界面上操作即可。
先点击 dashboard 的 "settings" 按钮（显示齿轮图标的就是）

如下图所示添加 clusters variable，最后保存。

我们已经可以在 dashboard 上看到新加的 variable 了：

接下来每个图表的查询语句也要做对应的修改：

最终改好的 dashboard json 文件可从如下链接下载到：
ceph-cluster.json

Ceph 监控中应用 Prometheus relabel 功能

1. 问题描述

2. relabel 配置

3. Grafana dashboard 调整

4. 参考文档

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Ceph 监控中应用 Prometheus relabel 功能

1. 问题描述

2. relabel 配置

3. Grafana dashboard 调整

4. 参考文档

热门文章

最新文章

相关课程

相关电子书

相关实验场景