prometheus 语法-阿里云开发者社区

prometheus 语法

2022-01-22 918

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

可观测监控 Prometheus 版，每月50GB免费额度

简介： prometheus 语法

prometheus 语法

时间序列

Prometheus会将所有采集到的样本数据以时间序列（time-series）的方式保存在内存数据库中，并且定时保存到硬盘上。time-series是按照时间戳和值的序列顺序存放的，我们称之为向量(vector). 每条time-series通过指标名称(metrics name)和一组标签集(labelset)命名。如下所示，可以将time-series理解为一个以时间为Y轴的数字矩阵：

  ^
  │   . . . . . . . . . . . . . . . . .   . .   node_cpu{cpu="cpu0",mode="idle"}
  │     . . . . . . . . . . . . . . . . . . .   node_cpu{cpu="cpu0",mode="system"}
  │     . . . . . . . . . .   . . . . . . . .   node_load1{}
  │     . . . . . . . . . . . . . . . .   . .  
  v
    <------------------ 时间 ---------------->

在time-series中的每一个点称为一个样本（sample），样本由以下三部分组成：

指标(metric)：metric name和描述当前样本特征的labelsets;
时间戳(timestamp)：一个精确到毫秒的时间戳;
样本值(value)：一个float64的浮点型数据表示当前样本的值。

<--------------- metric ---------------------><-timestamp -><-value->
http_request_total{status="200", method="GET"}@1434417560938 => 94355
http_request_total{status="200", method="GET"}@1434417561287 => 94334

http_request_total{status="404", method="GET"}@1434417560938 => 38473
http_request_total{status="404", method="GET"}@1434417561287 => 38544

http_request_total{status="200", method="POST"}@1434417560938 => 4748
http_request_total{status="200", method="POST"}@1434417561287 => 4785

指标类型 metric type

Prometheus定义了4种不同的指标类型(metric type)：Counter（计数器）、Gauge（仪表盘）、Histogram（直方图）、Summary（摘要）

Counter类型的指标其工作方式和计数器一样，只增不减（除非系统发生重置）。常见的监控指标，如http_requests_total，node_cpu都是Counter类型的监控指标。

Gauge类型的指标侧重于反应系统的当前状态。如node_memory_MemFree

Histogram和Summary主用用于统计和分析样本的分布情况。

PromQL

PromQL是Prometheus内置的数据查询语言，其提供对时间序列数据丰富的查询，聚合以及逻辑运算能力的支持。并且被广泛应用在Prometheus的日常应用当中，包括对数据查询、可视化、告警处理当中。可以这么说，PromQL是Prometheus所有应用场景的基础。

查询时间序列

当我们直接使用监控指标名称查询时，可以查询该指标下的所有时间序列。如：

http_requests_total 等同于：http_requests_total{}

查询指标指定标签：http_requests_total{job="apiserver", handler="/api/comments"}

该表达式会返回指标名称为http_requests_total的所有时间序列：

http_requests_total{code="200",handler="alerts",instance="localhost:9090",job="prometheus",method="get"}=(20889@1518096812.326) http_requests_total{code="200",handler="graph",instance="localhost:9090",job="prometheus",method="get"}=(21287@1518096812.326)

匹配模式

正向、反向查询（正则表达式）

= 标签满足表达式定义
!= 根据标签匹配排除
=~ 选择符合正则表达式定义
!~ 选择不符合正则表达式定义(反向)

范围查询

http_requests_total{}[5m]

该表达式将会返回查询到的时间序列中最近5分钟的所有样本数据

除了使用m表示分钟以外，PromQL的时间范围选择器支持其它时间单位：

s - 秒
m - 分钟
h - 小时
d - 天
w - 周
y - 年

数学运算符：

+ (加法)
- (减法)
* (乘法)
/ (除法)
% (求余)
^ (求幂)

未使用的内存(MiB): (instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024

比较运算符：

== (相等)
!= (不相等)
\> (大于)
< (小于)
\>= (大于等于)
<= (小于

过滤标签job="apiserver"样本大于10的所有样本数据 : http_requests_total{job="apiserver", handler="/api/comments"}>1

逻辑运算符:

and (并且)
or (或)
unless (除)

如：http_requests_total{job="apiserver", handler="/api/comments"}>10 or http_requests_total{job="apiserver", handler="/api/comments"}>16

运算符优先级：

100 * (1 - avg (irate(node_cpu{mode='idle'}[5m])) by(job) )

^
*, /, %
+, -
==, !=, <=, <, >=, >
and, unless
or

使用聚合操作

一般来说，如果描述样本特征的标签(label)在并非唯一的情况下，通过PromQL查询数据，会返回多条满足这些特征维度的时间序列。而PromQL提供的聚合操作可以用来对这些时间序列进行处理，形成一条新的时间序列：

sum (求和)

min (最小值)

max (最大值)

avg (平均值)

stddev (标准差)

stdvar (标准方差)

count (计数)

count_values (对value进行计数)

bottomk (后n条时序)

topk (前n条时序)

quantile (分位数)

\# 查询系统所有http请求的总量 sum(http_request_total)

without用于从计算结果中移除列举的标签，而保留其它标签。by则正好相反，结果向量中只保留列出的标签，其余标签则移除

前后N位

topk和bottomk则用于对样本值进行排序，返回当前样本值前n位，或者后n位的时间序列。

topk(5, http_requests_total)

PromQL内置函数

increase()：计算区间向量里最后一个值和第一个值的差值
rate()：指定时间范围内每秒速率
irate() 指定时间范围内的最近两个数据点来算速率
abs()：样本值的绝对值
sqrt())：平方根
exp( )：指数计算
ln( )：自然对数
ceil( )：向上取整
floor( )：向下取整
round(v instant-vector, to_nearest=1 scalar)：四舍五入取整
delta()：计算区间向量里最大最小的差值
sort() 升序
sort_desc() 降序
absent() : 具有样本数据返回空向量,没有样本数据则返回带有标签的时间序列，样本值为1(告警)
changes(v range-vector): 区间向量内每个样本数据值变化的次数（瞬时向量）
clamp_max(v instant-vector, max scalar) : 输入一个瞬时向量和最大值，样本数据值若大于 max，则改为 max，否则不变
clamp_min(v instant-vector, min scalar) 函数，输入一个瞬时向量和最小值，样本数据值若小于 min，则改为 min，否则不变
time() 返回从 1970-01-01 到现在的秒数(时间戳)
timestamp(v instant-vector) 返回向量 v 中的每个样本的时间戳（从 1970-01-01 到现在的秒数）。
minute(v=vector(time()) instant-vector) 函数返回给定 UTC 时间当前小时的第多少分钟
hour(v=vector(time()) instant-vector) 给定 UTC 时间的当前第几个小时，时间范围：0~23
month(v=vector(time()) instant-vector) 函数返回给定 UTC 时间当前属于第几个月
year(v=vector(time()) instant-vector) 函数返回被给定 UTC 时间的当前年份
day_of_month(v=vector(time()) instant-vector) 给定 UTC 时间所在月的第几天范围：1~31
day_of_week(v=vector(time()) instant-vector) 给定 UTC 时间所在周的第几天。返回值范围：0~6，0 表示星期天
days_in_month(v=vector(time()) instant-vector) 函数，返回当月一共有多少天。返回值范围：28~31
delta(v range-vector) 计算一个区间向量 v 的第一个元素和最后一个元素之间的差值(一般用在 Gauge 类型)
idelta(v range-vector) 计算一个区间向量最新的 2 个样本值之间的差值(一般用在 Gauge 类型)
deriv(v range-vector) 使用简单的线性回归计算区间向量 v 中各个时间序列的导数(一般用在 Gauge 类型)
exp(v instant-vector) 输入一个瞬时向量，返回各个样本值的 e 的指数值，即 e 的 N 次方。当 N 的值足够大时会返回 +Inf
label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, ...)

将时间序列 v 中多个标签 src_label 的值，通过 separator 作为连接符写入到一个新的标签 dst_label 中。可以有多个 src_label 标签

avg_over_time(range-vector) : 区间向量内每个度量指标的平均值。
min_over_time(range-vector) : 区间向量内每个度量指标的最小值。
max_over_time(range-vector) : 区间向量内每个度量指标的最大值。
sum_over_time(range-vector) : 区间向量内每个度量指标的求和。
count_over_time(range-vector) : 区间向量内每个度量指标的样本数据个数。
quantile_over_time(scalar, range-vector) : 区间向量内每个度量指标的样本数据值分位数，φ-quantile (0 ≤ φ ≤ 1)。
stddev_over_time(range-vector) : 区间向量内每个度量指标的总体标准差。
stdvar_over_time(range-vector) : 区间向量内每个度量指标的总体标准方差。

rate与irate的区别

rate计算区间增长率，对短时间的突变不敏感。

irate通过最后两个计算的增长率反应出的是瞬时增长率，对突变敏感。

irate函数相比于rate函数提供了更高的灵敏度，不过当需要分析长期趋势或者在告警规则中，irate的这种灵敏度反而容易造成干扰。因此在长期趋势分析或者告警中更推荐使用rate函数。

prometheus 语法

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

prometheus 语法

热门文章

最新文章

相关课程

相关电子书