《Apache Dubbo微服务开发从入门到精通》——可视化监测服务状态——二、 微服务集群监控(2) https://developer.aliyun.com/article/1224286
2) 本地聚合
本地聚合指将一些简单的指标通过计算获取各分位数指标的过程。
a) 参数设计
收集指标时,默认只收集基础指标,而一些单机聚合指标则需要开启服务柔性或者本地聚合后另起线程计算。此处若开启服务柔性,则本地聚合默认开启。
本地聚合开启方式
指标聚合参数
b) 具体指标
Dubbo的指标模块帮助用户从外部观察正在运行的系统的内部服务状况,Dubbo参考“四大黄金信号”RED方法、USE方法等理论并结合实际企业应用场景从不同维度统计了丰富的关键指标,关注这些核心指标对于提供可用性的服务是至关重要的。
Dubbo的关键指标包含:延迟(Latency)、流量(Traffic)、错误(Errors)和饱和度(Saturation)等内容。同时,为了更好的监测服务运行状态,Dubbo还提供了对核心组件状态的监控,如Dubbo应用信息、线程池信息、三大中心交互的指标数据等。
在Dubbo中主要包含如下监控指标:
|
基础设施 |
业务监控 |
延迟类 |
IO等待;网络延迟; |
接口、服务的平均耗时、TP90、TP99、TP999等 |
流量类 |
网络和磁盘IO; |
服务层面的QPS、 |
错误类 |
宕机;磁盘(坏盘或文件系统错误);进程或端口挂掉;网络丢包; |
错误日志;业务状态码、错误码走势; |
饱和度类 |
系统资源利用率:CPU、内存、磁盘、网络等;饱和度:等待线程数,队列积压长度; |
这里主要包含JVM、线程池等 |
• Qps:基于滑动窗口获取动态qps
• rt:基于滑动窗口获取动态rt
• 失败请求数: 基于滑动窗口获取最近时间内的失败请求数
• 成功请求数: 基于滑动窗口获取最近时间内的成功请求数
• 处理中请求数: 前后增加Filter简单统计
• 具体指标依赖滑动窗口,额外使用AggregateMetricsCollector收集
输出到普罗米修斯的相关指标可以参考的内容如下:
# HELP jvm_gc_live_data_size_bytes Size of long-lived heap memory pool after reclamation # TYPE jvm_gc_live_data_size_bytes gauge jvm_gc_live_data_size_bytes 1.6086528E7 # HELP requests_succeed_aggregate Aggregated Succeed Requests # TYPE requests_succeed_aggregate gauge requests_succeed_aggregate{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 39.0 # HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool # TYPE jvm_buffer_memory_used_bytes gauge jvm_buffer_memory_used_bytes{id="direct",} 1.679975E7 jvm_buffer_memory_used_bytes{id="mapped",} 0.0 # HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the (young) heap memory pool after one GC to before the next # TYPE jvm_gc_memory_allocated_bytes_total counter jvm_gc_memory_allocated_bytes_total 2.9884416E9 # HELP requests_total_aggregate Aggregated Total Requests # TYPE requests_total_aggregate gauge requests_total_aggregate{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 39.0 # HELP system_load_average_1m The sum of the number of runnable entities queued to available processors and the number of runnable entities running on the available processors averaged over a period of time # TYPE system_load_average_1m gauge system_load_average_1m 0.0 # HELP system_cpu_usage The "recent cpu usage" for the whole system # TYPE system_cpu_usage gauge system_cpu_usage 0.015802269043760128 # HELP jvm_threads_peak_threads The peak live thread count since the Java virtual machine started or peak was reset # TYPE jvm_threads_peak_threads gauge jvm_threads_peak_threads 40.0 # HELP requests_processing Processing Requests # TYPE requests_processing gauge requests_processing{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.0 # HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management # TYPE jvm_memory_max_bytes gauge jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.22912768E8 jvm_memory_max_bytes{area="heap",id="G1 Survivor Space",} -1.0 jvm_memory_max_bytes{area="heap",id="G1 Old Gen",} 9.52107008E8 jvm_memory_max_bytes{area="nonheap",id="Metaspace",} -1.0 jvm_memory_max_bytes{area="heap",id="G1 Eden Space",} -1.0 jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 5828608.0 jvm_memory_max_bytes{area="nonheap",id="Compressed Class Space",} 1.073741824E9 jvm_memory_max_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 1.22916864E8 # HELP jvm_threads_states_threads The current number of threads having BLOCKED state # TYPE jvm_threads_states_threads gauge jvm_threads_states_threads{state="blocked",} 0.0 jvm_threads_states_threads{state="runnable",} 10.0 jvm_threads_states_threads{state="waiting",} 16.0 jvm_threads_states_threads{state="timed-waiting",} 13.0 jvm_threads_states_threads{state="new",} 0.0 jvm_threads_states_threads{state="terminated",} 0.0 # HELP jvm_buffer_total_capacity_bytes An estimate of the total capacity of the buffers in this pool # TYPE jvm_buffer_total_capacity_bytes gauge jvm_buffer_total_capacity_bytes{id="direct",} 1.6799749E7 jvm_buffer_total_capacity_bytes{id="mapped",} 0.0 # HELP rt_p99 Response Time P99 # TYPE rt_p99 gauge rt_p99{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 1.0 # HELP jvm_memory_used_bytes The amount of used memory # TYPE jvm_memory_used_bytes gauge jvm_memory_used_bytes{area="heap",id="G1 Survivor Space",} 1048576.0 jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.462464E7 jvm_memory_used_bytes{area="heap",id="G1 Old Gen",} 1.6098728E7 jvm_memory_used_bytes{area="nonheap",id="Metaspace",} 4.0126952E7 jvm_memory_used_bytes{area="heap",id="G1 Eden Space",} 8.2837504E7 jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 1372032.0 jvm_memory_used_bytes{area="nonheap",id="Compressed Class Space",} 4519248.0 jvm_memory_used_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 5697408.0 # HELP qps Query Per Seconds # TYPE qps gauge qps{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.3333333333333333 # HELP rt_min Min Response Time # TYPE rt_min gauge rt_min{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.0 # HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool # TYPE jvm_buffer_count_buffers gauge jvm_buffer_count_buffers{id="mapped",} 0.0 jvm_buffer_count_buffers{id="direct",} 10.0 # HELP system_cpu_count The number of processors available to the Java virtual machine # TYPE system_cpu_count gauge system_cpu_count 2.0 # HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine # TYPE jvm_classes_loaded_classes gauge jvm_classes_loaded_classes 7325.0 # HELP rt_total Total Response Time # TYPE rt_total gauge rt_total{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 2783.0 # HELP rt_last Last Response Time # TYPE rt_last gauge rt_last{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.0 # HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC # TYPE jvm_gc_memory_promoted_bytes_total counter jvm_gc_memory_promoted_bytes_total 1.4450952E7 # HELP jvm_gc_pause_seconds Time spent in GC pause # TYPE jvm_gc_pause_seconds summary jvm_gc_pause_seconds_count{action="end of minor GC",cause="Metadata GC Threshold",} 2.0 jvm_gc_pause_seconds_sum{action="end of minor GC",cause="Metadata GC Threshold",} 0.026 jvm_gc_pause_seconds_count{action="end of minor GC",cause="G1 Evacuation Pause",} 37.0 jvm_gc_pause_seconds_sum{action="end of minor GC",cause="G1 Evacuation Pause",} 0.156 # HELP jvm_gc_pause_seconds_max Time spent in GC pause # TYPE jvm_gc_pause_seconds_max gauge jvm_gc_pause_seconds_max{action="end of minor GC",cause="Metadata GC Threshold",} 0.0 jvm_gc_pause_seconds_max{action="end of minor GC",cause="G1 Evacuation Pause",} 0.0 # HELP rt_p95 Response Time P95 # TYPE rt_p95 gauge rt_p95{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.0 # HELP requests_total Total Requests # TYPE requests_total gauge requests_total{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 27738.0 # HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process # TYPE process_cpu_usage gauge process_cpu_usage 8.103727714748784E-4 # HELP rt_max Max Response Time # TYPE rt_max gauge rt_max{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 4.0 # HELP jvm_gc_max_data_size_bytes Max size of long-lived heap memory pool # TYPE jvm_gc_max_data_size_bytes gauge jvm_gc_max_data_size_bytes 9.52107008E8 # HELP jvm_threads_live_threads The current number of live threads including both daemon and non-daemon threads # TYPE jvm_threads_live_threads gauge jvm_threads_live_threads 39.0 # HELP jvm_threads_daemon_threads The current number of live daemon threads # TYPE jvm_threads_daemon_threads gauge jvm_threads_daemon_threads 36.0 # HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution # TYPE jvm_classes_unloaded_classes_total counter jvm_classes_unloaded_classes_total 0.0 # HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use # TYPE jvm_memory_committed_bytes gauge jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'profiled nmethods'",} 1.4680064E7 jvm_memory_committed_bytes{area="heap",id="G1 Survivor Space",} 1048576.0 jvm_memory_committed_bytes{area="heap",id="G1 Old Gen",} 5.24288E7 jvm_memory_committed_bytes{area="nonheap",id="Metaspace",} 4.1623552E7 jvm_memory_committed_bytes{area="heap",id="G1 Eden Space",} 9.0177536E7 jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-nmethods'",} 2555904.0 jvm_memory_committed_bytes{area="nonheap",id="Compressed Class Space",} 5111808.0 jvm_memory_committed_bytes{area="nonheap",id="CodeHeap 'non-profiled nmethods'",} 5701632.0 # HELP requests_succeed Succeed Requests # TYPE requests_succeed gauge requests_succeed{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 27738.0 # HELP rt_avg Average Response Time # TYPE rt_avg gauge rt_avg{application_name="metrics-provider",group="",hostname="iZ8lgm9icspkthZ",interface="org.apache.dubbo.samples.metrics.prometheus.api.DemoService",ip="172.28.236.104",method="sayHello",version="",} 0.0
《Apache Dubbo微服务开发从入门到精通》——可视化监测服务状态——二、 微服务集群监控(4) https://developer.aliyun.com/article/1224283