prometheus 版本1.7 常用启动参数

本文涉及的产品
可观测监控 Prometheus 版,每月50GB免费额度
简介:

参数是使用./prometheus -h 获取的,部分翻译参考网上已有资料。部分参数已经废弃了,因此我这里就没有列出来。



prometheus 版本1.7 常用启动参数:

日志类:

-log.level 可选值 [debug, info, warn, error, fatal]  例:-log.level "info"

-log.format  可选输出到syslog或者控制台  例:-log.format "logger:syslog?appname=prom&local=7"



查询类:

      -query.max-concurrency 20  最大支持的并发查询量

-query.staleness-delta 5m0s

      Staleness delta allowance during expression evaluations.

  

-query.timeout 2m0s   查询超时时间,2分钟。超过自动被kill掉。

  


存储类:

-storage.local.checkpoint-dirty-series-limit 5000   崩溃恢复时候,只恢复5000个时序数据,这样减少了prometheus的恢复时间。如果是SSD盘,可以适当增大这个值。

-storage.local.checkpoint-interval 5m0s   5分钟执行一次落盘,将in-memory metrics and chunks持久化到磁盘。

-storage.local.chunk-encoding-version 1     chunks的编码格式 ,默认是1

-storage.local.engine "persisted"    开启持久化

-storage.local.index-cache-size.label-name-to-label-values 10485760     存放prometheus里面定义的 label名称的 index cache大小,默认10MB

-storage.local.path "/bdata/data/nowdb2"

-storage.local.retention 8760h0m0s    保存1年的数据

-storage.local.series-file-shrink-ratio 0.3    表示 30%的chunks被移除的时候才触发rewrite

-storage.local.num-fingerprint-mutexes 4096 当prometheus server端在进行checkpoint操作或者处理开销较大的查询的时候,采集指标的操作会有短暂的停顿,这是因为prometheus给时间序列分配的mutexes可能不够用,可以通过这个指标来增大预分配的mutexes,有时候可以设置到上万个。

-storage.local.series-sync-strategy "adaptive"

-storage.local.target-heap-size 2147483648     # prometheus独占的内存空间,默认2GB的内存空间,建议不要超过3GB



Web配置:

-web.listen-address ":9090"

-web.max-connections 512

-web.read-timeout 30s



目前在用的启动参数:

 nohup ./prometheus -log.level "info" -log.format "logger:syslog?appname=prom&local=7" info -storage.local.checkpoint-dirty-series-limit 5000 -storage.local.checkpoint-interval 5m0s -storage.local.chunk-encoding-version 1 -storage.local.engine "persisted" -storage.local.index-cache-size.label-name-to-label-values 10485760 -storage.local.path "/bdata/data/nowdb2" -storage.local.retention 8760h0m0s -storage.local.series-file-shrink-ratio 0.3 -storage.local.series-sync-strategy "adaptive" -storage.local.target-heap-size 2147483648 & 



重载配置文件:

  kill -SIGHUP $(pidof prometheus)



关闭进程:

  kill -SIGTERM $(pidof prometheus)





######################################################################################################

补充: ./prometheus -h的结果:


usage: prometheus [<args>]


   -version false

      Print version information.

  

   -config.file "prometheus.yml"

      Prometheus configuration file name.

  

 == ALERTMANAGER ==

  

   -alertmanager.notification-queue-capacity 10000

      The capacity of the queue for pending alert manager notifications.

  

   -alertmanager.timeout 10s

      Alert manager HTTP API timeout.

  

   -alertmanager.url 

      Comma-separated list of Alertmanager URLs to send notifications to.

  

 == LOG ==

  

   -log.format "\"logger:stderr\""

      Set the log target and format. Example: 

      "logger:syslog?appname=bob&local=7" or "logger:stdout?json=true"

  

   -log.level "\"info\""

      Only log messages with the given severity or above. Valid levels: 

      [debug, info, warn, error, fatal]

  

 == QUERY ==

  

   -query.max-concurrency 20    最大支持的并发查询量

      Maximum number of queries executed concurrently.

  

   -query.staleness-delta 5m0s

      Staleness delta allowance during expression evaluations.

  

   -query.timeout 2m0s   查询超时时间,2分钟。超过自动被kill掉。

      Maximum time a query may take before being aborted.

  

 == STORAGE ==

  

   -storage.local.checkpoint-dirty-series-limit 5000   崩溃恢复时候,只恢复5000个时序数据,这样减少了prometheus的恢复时间。如果是SSD盘,可以适当增大这个值。

      If approx. that many time series are in a state that would require 

      a recovery operation after a crash, a checkpoint is triggered, even if 

      the checkpoint interval hasn't passed yet. A recovery operation requires 

      a disk seek. The default limit intends to keep the recovery time below 

      1min even on spinning disks. With SSD, recovery is much faster, so you 

      might want to increase this value in that case to avoid overly frequent 

      checkpoints. Also note that a checkpoint is never triggered before at 

      least as much time has passed as the last checkpoint took.

  

   -storage.local.checkpoint-interval 5m0s   5分钟执行一次落盘,将in-memory metrics and chunks持久化到磁盘。

      The time to wait between checkpoints of in-memory metrics and 

      chunks not yet persisted to series files. Note that a checkpoint is never 

      triggered before at least as much time has passed as the last checkpoint 

      took.

  

   -storage.local.chunk-encoding-version 1  chunks的编码格式 ,默认是1

      Which chunk encoding version to use for newly created chunks. 

      Currently supported is 0 (delta encoding), 1 (double-delta encoding), and 

      2 (double-delta encoding with variable bit-width).

  

   -storage.local.dirty=false   是否强制开启crash recovery功能。默认 -storage.local.dirty=false的。

      如果您怀疑数据库中的损坏引起的问题,可设置启动的时候 -storage.local.dirty=true强制执行crash recovery

  If set, the local storage layer will perform crash recovery even if 

      the last shutdown appears to be clean.


   -storage.local.engine "persisted"

      Local storage engine. Supported values are: 'persisted' (full local 

      storage with on-disk persistence) and 'none' (no local storage).


  

   -storage.local.index-cache-size.fingerprint-to-metric 10485760

      The size in bytes for the fingerprint to metric index cache.

  

   -storage.local.index-cache-size.fingerprint-to-timerange 5242880

      The size in bytes for the metric time range index cache.


上面2个参数的作用: Increase the size if you have a large number of archived time series, i.e. series that have not received samples in a while but are still not old enough to be purged completely.   



   -storage.local.index-cache-size.label-name-to-label-values 10485760     存放prometheus里面定义的 label名称的 index cache大小,默认10MB

      The size in bytes for the label name to label values index cache.

  

   -storage.local.index-cache-size.label-pair-to-fingerprints 20971520  # 

      The size in bytes for the label pair to fingerprints index cache. Increase the size if a large number of time series share the same label pair or name.

  

   -storage.local.max-chunks-to-persist 0   废弃的参数

      Deprecated. This flag has no effect anymore.

  

   -storage.local.memory-chunks 0  废弃的参数 设定prometheus内存中保留的chunks的最大个数

      Deprecated. If set, -storage.local.target-heap-size will be set to 

      this value times 3072.

  

   -storage.local.num-fingerprint-mutexes 4096

      The number of mutexes used for fingerprint locking.

当prometheus server端在进行checkpoint操作或者处理开销较大的查询的时候,采集指标的操作会有短暂的停顿,这是因为prometheus给时间序列分配的mutexes可能不够用,可以通过这个指标来增大预分配的mutexes,有时候可以设置到上万个。

   -storage.local.path "data"

      Base path for metrics storage.

  

   -storage.local.pedantic-checks false   默认false 如果设置true,崩溃恢复时候会检查每一个序列文件

      If set, a crash recovery will perform checks on each series file. 

      This might take a very long time.

  

   -storage.local.retention 360h0m0s   历史数据存储多久,默认15天。

      How long to retain samples in the local storage.


  

   -storage.local.series-file-shrink-ratio 0.1

      A series file is only truncated (to delete samples that have 

      exceeded the retention period) if it shrinks by at least the provided 

      ratio. This saves I/O operations while causing only a limited storage 

      space overhead. If 0 or smaller, truncation will be performed even for a 

      single dropped chunk, while 1 or larger will effectively prevent any 

      truncation.

用来控制序列文件rewrite的时机,默认是在10%的chunks被移除的时候进行rewrite,如果磁盘空间够大,不想频繁rewrite,可以提升该值,比如0.3,即30%的chunks被移除的时候才触发rewrite。


   -storage.local.series-sync-strategy "adaptive"

      When to sync series files after modification. Possible values: 

      'never', 'always', 'adaptive'. Sync'ing slows down storage performance 

      but reduces the risk of data loss in case of an OS crash. With the 

      'adaptive' strategy, series files are sync'd for as long as the storage 

      is not too much behind on chunk persistence.

控制写入数据之后,何时同步到磁盘,有'never', 'always', 'adaptive'. 同步操作可以降低因为操作系统崩溃带来数据丢失,但是会降低写入数据的性能。

默认为adaptive的策略,即不会写完数据就立刻同步磁盘,会利用操作系统的page cache来批量同步。



   -storage.local.target-heap-size 2147483648     # prometheus独占的内存空间,默认2GB的内存空间,建议不要超过3GB

      The metrics storage attempts to limit its own memory usage such 

      that the total heap size approaches this value. Note that this is not a 

      hard limit. Actual heap size might be temporarily or permanently higher 

      for a variety of reasons. The default value is a relatively safe setting 

      to not use more than 3 GiB physical memory.

  

   -storage.remote.graphite-address 

      WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB, 

      Graphite, and OpenTSDB has been removed. Use Prometheus's generic remote 

      write feature for building remote storage integrations. See 

      https://prometheus.io/docs/operating/configuration/#<remote_write>

  

   -storage.remote.graphite-prefix 

      WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB, 

      Graphite, and OpenTSDB has been removed. Use Prometheus's generic remote 

      write feature for building remote storage integrations. See 

      https://prometheus.io/docs/operating/configuration/#<remote_write>

  

   -storage.remote.graphite-transport 

      WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB, 

      Graphite, and OpenTSDB has been removed. Use Prometheus's generic remote 

      write feature for building remote storage integrations. See 

      https://prometheus.io/docs/operating/configuration/#<remote_write>

  

   -storage.remote.influxdb-url 

      WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB, 

      Graphite, and OpenTSDB has been removed. Use Prometheus's generic remote 

      write feature for building remote storage integrations. See 

      https://prometheus.io/docs/operating/configuration/#<remote_write>

  

   -storage.remote.influxdb.database 

      WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB, 

      Graphite, and OpenTSDB has been removed. Use Prometheus's generic remote 

      write feature for building remote storage integrations. See 

      https://prometheus.io/docs/operating/configuration/#<remote_write>

  

   -storage.remote.influxdb.retention-policy 

      WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB, 

      Graphite, and OpenTSDB has been removed. Use Prometheus's generic remote 

      write feature for building remote storage integrations. See 

      https://prometheus.io/docs/operating/configuration/#<remote_write>

  

   -storage.remote.influxdb.username 

      WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB, 

      Graphite, and OpenTSDB has been removed. Use Prometheus's generic remote 

      write feature for building remote storage integrations. See 

      https://prometheus.io/docs/operating/configuration/#<remote_write>

  

   -storage.remote.opentsdb-url 

      WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB, 

      Graphite, and OpenTSDB has been removed. Use Prometheus's generic remote 

      write feature for building remote storage integrations. See 

      https://prometheus.io/docs/operating/configuration/#<remote_write>

  

   -storage.remote.timeout 

      WARNING: THIS FLAG IS UNUSED! Built-in support for InfluxDB, 

      Graphite, and OpenTSDB has been removed. Use Prometheus's generic remote 

      write feature for building remote storage integrations. See 

      https://prometheus.io/docs/operating/configuration/#<remote_write>

  

 == WEB ==

  

   -web.console.libraries "console_libraries"

      Path to the console library directory.

  

   -web.console.templates "consoles"

      Path to the console template directory, available at /consoles.

  

   -web.enable-remote-shutdown false

      Enable remote service shutdown.

  

   -web.external-url 

      The URL under which Prometheus is externally reachable (for 

      example, if Prometheus is served via a reverse proxy). Used for 

      generating relative and absolute links back to Prometheus itself. If the 

      URL has a path portion, it will be used to prefix all HTTP endpoints 

      served by Prometheus. If omitted, relevant URL components will be derived 

      automatically.

  

   -web.listen-address ":9090"

      Address to listen on for the web interface, API, and telemetry.

  

   -web.max-connections 512

      Maximum number of simultaneous connections.

  

   -web.read-timeout 30s

      Maximum duration before timing out read of the request, and closing 

      idle connections.

  

   -web.route-prefix 

      Prefix for the internal routes of web endpoints. Defaults to path 

      of -web.external-url.

  

   -web.telemetry-path "/metrics"

      Path under which to expose metrics.

  

   -web.user-assets 

      Path to static asset directory, available at /user.











本文转自 lirulei90 51CTO博客,原文链接:http://blog.51cto.com/lee90/1953896,如需转载请自行联系原作者
相关实践学习
容器服务Serverless版ACK Serverless 快速入门:在线魔方应用部署和监控
通过本实验,您将了解到容器服务Serverless版ACK Serverless 的基本产品能力,即可以实现快速部署一个在线魔方应用,并借助阿里云容器服务成熟的产品生态,实现在线应用的企业级监控,提升应用稳定性。
目录
相关文章
|
7月前
|
Prometheus 监控 前端开发
prometheus|云原生|grafana-9.4.3版本的主题更改
prometheus|云原生|grafana-9.4.3版本的主题更改
481 0
|
3月前
|
Prometheus 监控 Cloud Native
Prometheus版本
Prometheus版本
50 2
|
Prometheus Cloud Native
详解Prometheus range query中的step参数
详细介绍了Prometheus查询参数step的作用
8028 0
|
1月前
|
Prometheus 运维 监控
智能运维实战:Prometheus与Grafana的监控与告警体系
【10月更文挑战第26天】Prometheus与Grafana是智能运维中的强大组合,前者是开源的系统监控和警报工具,后者是数据可视化平台。Prometheus具备时间序列数据库、多维数据模型、PromQL查询语言等特性,而Grafana支持多数据源、丰富的可视化选项和告警功能。两者结合可实现实时监控、灵活告警和高度定制化的仪表板,广泛应用于服务器、应用和数据库的监控。
254 3
|
21小时前
|
存储 数据采集 Prometheus
Grafana Prometheus Altermanager 监控系统
Grafana、Prometheus 和 Alertmanager 是一套强大的开源监控系统组合。Prometheus 负责数据采集与存储,Alertmanager 处理告警通知,Grafana 提供可视化界面。本文简要介绍了这套系统的安装配置流程,包括各组件的下载、安装、服务配置及开机自启设置,并提供了访问地址和重启命令。适用于希望快速搭建高效监控平台的用户。
37 20
|
3天前
|
Prometheus 运维 监控
Prometheus+Grafana+NodeExporter:构建出色的Linux监控解决方案,让你的运维更轻松
本文介绍如何使用 Prometheus + Grafana + Node Exporter 搭建 Linux 主机监控系统。Prometheus 负责收集和存储指标数据,Grafana 用于可视化展示,Node Exporter 则采集主机的性能数据。通过 Docker 容器化部署,简化安装配置过程。完成安装后,配置 Prometheus 抓取节点数据,并在 Grafana 中添加数据源及导入仪表盘模板,实现对 Linux 主机的全面监控。整个过程简单易行,帮助运维人员轻松掌握系统状态。
32 3
|
3天前
|
Prometheus 监控 Cloud Native
无痛入门Prometheus:一个强大的开源监控和告警系统,如何快速安装和使用?
Prometheus 是一个完全开源的系统监控和告警工具包,受 Google 内部 BorgMon 系统启发,自2012年由前 Google 工程师在 SoundCloud 开发以来,已被众多公司采用。它拥有活跃的开发者和用户社区,现为独立开源项目,并于2016年加入云原生计算基金会(CNCF)。Prometheus 的主要特点包括多维数据模型、灵活的查询语言 PromQL、不依赖分布式存储、通过 HTTP 拉取时间序列数据等。其架构简单且功能强大,支持多种图形和仪表盘展示模式。安装和使用 Prometheus 非常简便,可以通过 Docker 快速部署,并与 Grafana 等可
28 2
|
4月前
|
Prometheus 监控 Cloud Native
【监控】prometheus传统环境监控告警常用配置
【监控】prometheus传统环境监控告警常用配置
【监控】prometheus传统环境监控告警常用配置
|
27天前
|
存储 Prometheus 监控
监控堆外第三方监控工具Prometheus
监控堆外第三方监控工具Prometheus
45 3
|
1月前
|
存储 Prometheus 运维
在云原生环境中,阿里云ARMS与Prometheus的集成提供了强大的应用实时监控解决方案
在云原生环境中,阿里云ARMS与Prometheus的集成提供了强大的应用实时监控解决方案。该集成结合了ARMS的基础设施监控能力和Prometheus的灵活配置及社区支持,实现了全面、精准的系统状态、性能和错误监控,提升了应用的稳定性和管理效率。通过统一的数据视图和高级查询功能,帮助企业有效应对云原生挑战,促进业务的持续发展。
38 3