Grafana 系列文章(十四):Helm 安装 Loki

本文涉及的产品
可观测监控 Prometheus 版,每月50GB免费额度
可观测可视化 Grafana 版,10个用户账号 1个月
简介: Grafana 系列文章(十四):Helm 安装 Loki

前言

写或者翻译这么多篇 Loki 相关的文章了, 发现还没写怎么安装 😓

现在开始介绍如何使用 Helm 安装 Loki.

前提

有 Helm, 并且添加 Grafana 的官方源:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
BASH

🐾Warning:

网络受限, 需要保证网络通畅.

部署

架构

Promtail(收集) + Loki(存储及处理) + Grafana(展示)

Loki 架构图

Promtail

  1. 启用 Prometheus Operator Service Monitor 做监控
  2. 增加 external_labels - cluster, 以识别是哪个 K8S 集群;
  3. pipeline_stages 改为 cri, 以对 cri 日志做处理 (因为我的集群用的 Container Runtime 是 CRI, 而 Loki Helm 默认配置是 docker)
  4. 增加对 systemd-journal 的日志收集:
promtail:
  config:
    snippets:
      pipelineStages:
        - cri: {}
  extraArgs: 
    - -client.external-labels=cluster=ctyun
  # systemd-journal 额外配置:
  # Add additional scrape config
  extraScrapeConfigs:
    - job_name: journal
      journal:
        path: /var/log/journal
        max_age: 12h
        labels:
          job: systemd-journal
      relabel_configs:
        - source_labels: ['__journal__systemd_unit']
          target_label: 'unit'
        - source_labels: ['__journal__hostname']
          target_label: 'hostname'
  # Mount journal directory into Promtail pods
  extraVolumes:
    - name: journal
      hostPath:
        path: /var/log/journal
  extraVolumeMounts:
    - name: journal
      mountPath: /var/log/journal
      readOnly: true
YAML

Loki

  1. 启用持久化存储
  2. 启用 Prometheus Operator Service Monitor 做监控
  1. 并配置 Loki 相关 Prometheus Rule 做告警
  1. 因为个人集群日志量较小, 适当调大 ingester 相关配置

Grafana

  1. 启用持久化存储
  2. 启用 Prometheus Operator Service Monitor 做监控
  3. sidecar 都配置上, 方便动态更新 dashboards/datasources/plugins/notifiers;

Helm 安装

通过如下命令安装:

helm upgrade --install loki --namespace=loki --create-namespace grafana/loki-stack -f values.yaml
BASH

自定义 values.yaml 如下:

loki:
  enabled: true
  persistence:
    enabled: true
    storageClassName: local-path
    size: 20Gi
  serviceScheme: https
  user: admin
  password: changit!
  config:
    ingester:
      chunk_idle_period: 1h
      max_chunk_age: 4h
    compactor:
      retention_enabled: true
  serviceMonitor:
    enabled: true
    prometheusRule:
      enabled: true
      rules:
        #  Some examples from https://awesome-prometheus-alerts.grep.to/rules.html#loki
        - alert: LokiProcessTooManyRestarts
          expr: changes(process_start_time_seconds{job=~"loki"}[15m]) > 2
          for: 0m
          labels:
            severity: warning
          annotations:
            summary: Loki process too many restarts (instance {{ $labels.instance }})
            description: "A loki process had too many restarts (target {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: LokiRequestErrors
          expr: 100 * sum(rate(loki_request_duration_seconds_count{status_code=~"5.."}[1m])) by (namespace, job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route) > 10
          for: 15m
          labels:
            severity: critical
          annotations:
            summary: Loki request errors (instance {{ $labels.instance }})
            description: "The {{ $labels.job }} and {{ $labels.route }} are experiencing errors\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: LokiRequestPanic
          expr: sum(increase(loki_panic_total[10m])) by (namespace, job) > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: Loki request panic (instance {{ $labels.instance }})
            description: "The {{ $labels.job }} is experiencing {{ printf \"%.2f\" $value }}% increase of panics\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: LokiRequestLatency
          expr: (histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route!~"(?i).*tail.*"}[5m])) by (le)))  > 1
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: Loki request latency (instance {{ $labels.instance }})
            description: "The {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}s 99th percentile latency\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
promtail:
  enabled: true
  config:
    snippets:
      pipelineStages:
        - cri: {}  
  extraArgs:
    - -client.external-labels=cluster=ctyun        
  serviceMonitor:
    # -- If enabled, ServiceMonitor resources for Prometheus Operator are created
    enabled: true
  # systemd-journal 额外配置:
  # Add additional scrape config
  extraScrapeConfigs:
    - job_name: journal
      journal:
        path: /var/log/journal
        max_age: 12h
        labels:
          job: systemd-journal
      relabel_configs:
        - source_labels: ['__journal__systemd_unit']
          target_label: 'unit'
        - source_labels: ['__journal__hostname']
          target_label: 'hostname'
  # Mount journal directory into Promtail pods
  extraVolumes:
    - name: journal
      hostPath:
        path: /var/log/journal
  extraVolumeMounts:
    - name: journal
      mountPath: /var/log/journal
      readOnly: true
fluent-bit:
  enabled: false
grafana:
  enabled: true
  adminUser: caseycui
  adminPassword: changit!
  ## Sidecars that collect the configmaps with specified label and stores the included files them into the respective folders
  ## Requires at least Grafana 5 to work and can't be used together with parameters dashboardProviders, datasources and dashboards
  sidecar:
    image:
      repository: quay.io/kiwigrid/k8s-sidecar
      tag: 1.15.6
      sha: ''
    dashboards:
      enabled: true
      SCProvider: true
      label: grafana_dashboard
    datasources:
      enabled: true
      # label that the configmaps with datasources are marked with
      label: grafana_datasource
    plugins:
      enabled: true
      # label that the configmaps with plugins are marked with
      label: grafana_plugin
    notifiers:
      enabled: true
      # label that the configmaps with notifiers are marked with
      label: grafana_notifier
  image:
    tag: 8.3.5
  persistence:
    enabled: true
    size: 2Gi
    storageClassName: local-path
  serviceMonitor:
    enabled: true
  imageRenderer:
    enabled: disable
filebeat:
  enabled: false
logstash:
  enabled: false
YAML

安装后的资源拓扑如下:

Loki K8S 资源拓扑

Day 2 配置 (按需)

Grafana 增加 Dashboards

在同一个 NS 下, 创建如下 ConfigMap: (只要打上 grafana_dashboard 这个 label 就会被 Grafana 的 sidecar 自动导入 )

apiVersion: v1
kind: ConfigMap
metadata:
  name: sample-grafana-dashboard
  labels:
     grafana_dashboard: "1"
data:
  k8s-dashboard.json: |-
  [...]
YAML

Grafana 增加 DataSource

在同一个 NS 下, 创建如下 ConfigMap: (只要打上 grafana_datasource 这个 label 就会被 Grafana 的 sidecar 自动导入 )

apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-loki-stack
  labels:
    grafana_datasource: '1'
data:
  loki-stack-datasource.yaml: |-
    apiVersion: 1
    datasources:
    - name: Loki
      type: loki
      access: proxy
      url: http://loki:3100
      version: 1
YAML

Traefik 配置 Grafana IngressRoute

因为我是用的 Traefik 2, 通过 CRD IngressRoute 配置 Ingress, 配置如下:

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: grafana
spec:
  entryPoints:
    - web
    - websecure
  routes:
    - kind: Rule
      match: Host(`grafana.ewhisper.cn`)
      middlewares:
        - name: hsts-header
          namespace: kube-system
        - name: redirectshttps
          namespace: kube-system
      services:
        - name: loki-grafana
          namespace: monitoring
          port: 80
  tls: {}
YAML

最终效果

如下:

Grafana Explore Logs

🎉🎉🎉

📚️参考文档

相关实践学习
通过可观测可视化Grafana版进行数据可视化展示与分析
使用可观测可视化Grafana版进行数据可视化展示与分析。
相关文章
|
7月前
|
存储 Prometheus Cloud Native
「译文」Grafana Loki 简要指南:关于标签您需要了解的一切
「译文」Grafana Loki 简要指南:关于标签您需要了解的一切
|
4月前
|
前端开发 测试技术 对象存储
Grafana Loki查询加速:如何在不添加资源的前提下提升查询速度
Grafana Loki查询加速:如何在不添加资源的前提下提升查询速度
160 2
|
4月前
|
存储 监控 Serverless
阿里泛日志设计与实践问题之Grafana Loki在日志查询方案中存在哪些设计限制,如何解决
阿里泛日志设计与实践问题之Grafana Loki在日志查询方案中存在哪些设计限制,如何解决
|
1月前
|
Prometheus 监控 Cloud Native
基于Docker安装Grafana和Prometheus
Grafana 是一款用 Go 语言开发的开源数据可视化工具,支持数据监控和统计,并具备告警功能。通过 Docker 部署 Grafana 和 Prometheus,可实现系统数据的采集、展示和告警。默认登录用户名和密码均为 admin。配置 Prometheus 数据源后,可导入主机监控模板(ID 8919)进行数据展示。
105 2
|
4月前
|
Prometheus 监控 Cloud Native
prometheus学习笔记之Grafana安装与配置
prometheus学习笔记之Grafana安装与配置
|
3月前
|
运维 Kubernetes 监控
Loki+Promtail+Grafana监控K8s日志
综上,Loki+Promtail+Grafana 监控组合对于在 K8s 环境中优化日志管理至关重要,它不仅提供了强大且易于扩展的日志收集与汇总工具,还有可视化这些日志的能力。通过有效地使用这套工具,可以显著地提高对应用的运维监控能力和故障诊断效率。
408 0
|
7月前
|
存储 监控 数据可视化
Nginx+Promtail+Loki+Grafana Nginx日志展示
通过这些步骤,你可以将Nginx的日志收集、存储、查询和可视化整合在一起。这样,你就可以在Grafana中轻松地创建和展示Nginx日志的图表和面板。
311 3
|
7月前
阿里云Grafana服务支持一键安装Grafana插件
【2月更文挑战第12天】阿里云Grafana服务支持一键安装Grafana插件
108 4
|
7月前
|
存储 Prometheus Cloud Native
「译文」开源日志监控:Grafana Loki 简要指南
「译文」开源日志监控:Grafana Loki 简要指南
|
7月前
|
JSON Prometheus Cloud Native
Grafana 系列 -Loki- 基于日志实现告警
Grafana 系列 -Loki- 基于日志实现告警