YarnJMX监控

本文涉及的产品
实时数仓Hologres,5000CU*H 100GB 3个月
实时计算 Flink 版,1000CU*H 3个月
智能开放搜索 OpenSearch行业算法版,1GB 20LCU 1个月
简介: YarnJMX监控

JMX端口查看

ResourceManager 管理页面 后缀改为 /jmx即可,如:http://192.168.1.2:8088/jmx

NodeManager JMX页面:在ResourceManager 管理页面选择Nodes里面会显示节点列表

如下图:NodeManager JMX 地址为http://bigdata-24-194:8042/jmx

image-20230213112124788

监控参数说明

节点类型 Name 参数 含义 类型
ResourceManager Hadoop:service=ResourceManager,name=ClusterMetrics NumActiveNMs 当前存活的 NodeManager 个数 基础指标
ResourceManager Hadoop:service=ResourceManager,name=ClusterMetrics NumDecommissionedNMs 当前 Decommissioned 的 NodeManager 个数 基础指标
ResourceManager Hadoop:service=ResourceManager,name=ClusterMetrics NumDecommissioningNMs 集群正在下线的节点数 基础指标
ResourceManager Hadoop:service=ResourceManager,name=ClusterMetrics NumLostNMs 集群丢失的节点数 基础指标
ResourceManager Hadoop:service=ResourceManager,name=ClusterMetrics NumUnhealthyNMs 集群不健康的节点数 基础指标
ResourceManager Hadoop:service=ResourceManager,name=RpcActivityForPort* RpcProcessingTimeAvgTime Hadoop:service=ResourceManager,name=RpcActivityForPort RPC
ResourceManager Hadoop:service=ResourceManager,name=RpcActivityForPort* CallQueueLength ResourceManager RPC队列积压长度 RPC
ResourceManager Hadoop:service=ResourceManager,name=JvmMetrics MemNonHeapCommittedM ResourceManager JVM当前非堆内存大小已提交大小,单位为MB 基础指标
ResourceManager Hadoop:service=ResourceManager,name=JvmMetrics MemNonHeapMaxM ResourceManager JVM非堆最大可用内存,单位为MB 基础指标
ResourceManager Hadoop:service=ResourceManager,name=JvmMetrics MemNonHeapUsedM ResourceManager JVM当前已使用的非堆内存大小,单位为MB 基础指标
ResourceManager Hadoop:service=ResourceManager,name=JvmMetrics MemHeapCommittedM ResourceManager JVM当前已使用堆内存大小,单位为MB 基础指标
ResourceManager Hadoop:service=ResourceManager,name=JvmMetrics MemHeapMaxM ResourceManager JVM堆内存最大可用内存,单位为MB 基础指标
ResourceManager Hadoop:service=ResourceManager,name=JvmMetrics MemHeapUsedM ResourceManager JVM当前已使用堆内存大小,单位为MB 基础指标
ResourceManager Hadoop:service=ResourceManager,name=JvmMetrics GcTimeMillis ResourceManager JVM GC时间 GC
ResourceManager Hadoop:service=ResourceManager,name=JvmMetrics GcCount ResourceManager JVM GC次数 GC
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AllocatedVCores ResourceManager调度器特定队列分配的虚拟核数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* ReservedVCores ResourceManager调度器特定队列预留核数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AvailableVCores ResourceManager调度器特定队列可用核数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* PendingVCores ResourceManager调度器特定队列阻塞调度核数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AllocatedMB ResourceManager调度器特定队列已分配(已用)的内存大小,单位为MB Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AvailableMB ResourceManager调度器特定队可用内存,单位为MB Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* PendingMB ResourceManager调度器特定队列阻塞调度内存,单位为MB Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* ReservedMB ResourceManager调度器特定队列预留内存,单位为MB Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AllocatedContainers ResourceManager调度器特定队列已分配(已用)的container数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* PendingContainers ResourceManager调度器特定队列阻塞调度container个数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* ReservedContainers ResourceManager调度器特定队列预留container数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AggregateContainersAllocated ResourceManager调度器特定队列累积的container分配总数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AggregateContainersReleased ResourceManager调度器特定队列累积的container释放总数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AppsCompleted ResourceManager调度器特定队列完成的任务数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AppsKilled ResourceManager调度器特定队列被杀掉的任务数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AppsFailed ResourceManager调度器特定队列失败的任务数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AppsPending ResourceManager调度器特定队列阻塞的任务数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AppsRunning ResourceManager调度器特定队列提正在运行的任务数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* AppsSubmitted ResourceManager调度器特定队列提交过的任务数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* running_0 当前队列中运行作业运行时间小于60分钟的作业个数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* running_60 当前队列中运行作业运行时间介于60~300分钟的作业个数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* running_300 当前队列中运行作业运行时间介于300~1440分钟的作业个数 Yarn队列
ResourceManager Hadoop:service=ResourceManager,name=QueueMetrics* running_1440 当前队列中运行作业运行时间大于1440分钟的作业个数 Yarn队列
ResourceManager java.lang:type=GarbageCollector,name=G1 Old Generation CollectionCount 老年代GC次数/Full GC 次数 GC
ResourceManager java.lang:type=GarbageCollector,name=G1 Old Generation CollectionTime 老年代GC消耗时间 GC
ResourceManager java.lang:type=GarbageCollector,name=G1 Young Generation CollectionCount 新生代GC次数/Young GC 次数 GC
ResourceManager java.lang:type=GarbageCollector,name=G1 Young Generation CollectionTime 新生代GC消耗时间 GC
ResourceManager java.lang:type=GarbageCollector,name=ParNew CollectionCount 新生代GC次数/Young GC 次数 GC
ResourceManager java.lang:type=GarbageCollector,name=ParNew CollectionTime 新生代GC消耗时间 GC
ResourceManager java.lang:type=GarbageCollector,name=ConcurrentMarkSweep CollectionCount 老年代GC次数/Full GC 次数 GC
ResourceManager java.lang:type=GarbageCollector,name=ConcurrentMarkSweep CollectionTime 老年代GC消耗时间 GC
ResourceManager java.lang:type=GarbageCollector,name=PS MarkSweep CollectionCount 老年代GC次数/Full GC 次数 GC
ResourceManager java.lang:type=GarbageCollector,name=PS MarkSweep CollectionTime 老年代GC消耗时间 GC
ResourceManager java.lang:type=GarbageCollector,name=PS Scavenge CollectionCount 新生代GC次数/Young GC 次数 GC
ResourceManager java.lang:type=GarbageCollector,name=PS Scavenge CollectionTime 新生代GC消耗时间 GC
ResourceManager java.lang:type=Runtime StartTime 启动时间戳 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics AvailableGB NodeManager可用的内存大小,单位为GB 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics AllocatedGB NodeManager使用的内存大小,单位为GB 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics AllocatedVCores NodeManager使用的虚拟核数 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics AvailableVCores NodeManager可用的虚拟核数 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics ContainersLaunched NodeManager Container启动过的个数 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics ContainersRunning NodeManager Container正在运行的个数 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics ContainersFailed NodeManager Container失败的个数 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics ContainersCompleted NodeManager Container运行完成的个数 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics ContainersIniting NodeManager Container初始化中的个数 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics ContainersKilled NodeManager Container被中止Kill的个数 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics BadLocalDirs NodeManager磁盘损坏个数 基础指标
NodeManager Hadoop:service=NodeManager,name=NodeManagerMetrics GoodLocalDirsDiskUtilizationPerc NodeManager磁盘利用率 基础指标
NodeManager java.lang:type=GarbageCollector,name=G1 Old Generation CollectionCount 老年代GC次数/Full GC 次数 GC
NodeManager java.lang:type=GarbageCollector,name=G1 Old Generation CollectionTime 老年代GC消耗时间 GC
NodeManager java.lang:type=GarbageCollector,name=G1 Young Generation CollectionCount 新生代GC次数/Young GC 次数 GC
NodeManager java.lang:type=GarbageCollector,name=G1 Young Generation CollectionTime 新生代GC消耗时间 GC
NodeManager java.lang:type=GarbageCollector,name=ParNew CollectionCount 新生代GC次数/Young GC 次数 GC
NodeManager java.lang:type=GarbageCollector,name=ParNew CollectionTime 新生代GC消耗时间 GC
NodeManager java.lang:type=GarbageCollector,name=ConcurrentMarkSweep CollectionCount 老年代GC次数/Full GC 次数 GC
NodeManager java.lang:type=GarbageCollector,name=ConcurrentMarkSweep CollectionTime 老年代GC消耗时间 GC
NodeManager java.lang:type=GarbageCollector,name=PS MarkSweep CollectionCount 老年代GC次数/Full GC 次数 GC
NodeManager java.lang:type=GarbageCollector,name=PS MarkSweep CollectionTime 老年代GC消耗时间 GC
NodeManager java.lang:type=GarbageCollector,name=PS Scavenge CollectionCount 新生代GC次数/Young GC 次数 GC
NodeManager java.lang:type=GarbageCollector,name=PS Scavenge CollectionTime 新生代GC消耗时间 GC
NodeManager Hadoop:service=NodeManager,name=JvmMetrics MemNonHeapCommittedM NodeManager JVM当前非堆内存大小已提交大小,单位为MB 基础指标
NodeManager Hadoop:service=NodeManager,name=JvmMetrics MemNonHeapMaxM NodeManager JVM非堆最大可用内存,单位为MB 基础指标
NodeManager Hadoop:service=NodeManager,name=JvmMetrics MemNonHeapUsedM NodeManager JVM当前已使用的非堆内存大小,单位为MB 基础指标
NodeManager Hadoop:service=NodeManager,name=JvmMetrics MemHeapCommittedM NodeManager JVM当前已使用堆内存大小,单位为MB 基础指标
NodeManager Hadoop:service=NodeManager,name=JvmMetrics MemHeapMaxM NodeManager JVM堆内存最大可用内存,单位为MB 基础指标
NodeManager Hadoop:service=NodeManager,name=JvmMetrics MemHeapUsedM NodeManager JVM当前已使用堆内存大小,单位为MB 基础指标
NodeManager Hadoop:service=NodeManager,name=JvmMetrics GcTimeMillis NodeManager JVM GC时间 基础指标
NodeManager Hadoop:service=NodeManager,name=JvmMetrics GcCount NodeManager JVM GC次数 基础指标
NodeManager java.lang:type=Runtime StartTime 启动时间戳 基础指标
目录
相关文章
|
分布式计算 资源调度 Hadoop
|
SQL 分布式计算 资源调度
线上 hive on spark 作业执行超时问题排查案例分享
线上 hive on spark 作业执行超时问题排查案例分享
|
XML 分布式计算 资源调度
查看YARN上应用的日志之JobHistory
查看YARN上应用的日志之JobHistory
896 0
查看YARN上应用的日志之JobHistory
|
6月前
|
消息中间件 存储 Kafka
10倍降本、10倍无损弹性!Kafka Serverless 基础版与专业版重磅发布!
云消息队列 Kafka 版基于 Apache Kafka 构建,提供高吞吐量与高可扩展性的分布式消息队列服务,广泛应用于日志收集、监控数据聚合、流式数据处理及在离线分析等场景,是 AI 与大数据时代企业数据处理体系的核心组件。
|
资源调度 分布式计算 Kubernetes
Koordinator 支持 K8s 与 YARN 混部,小红书在离线混部实践分享
Koordinator 支持 K8s 与 YARN 混部,小红书在离线混部实践分享
|
分布式计算 资源调度 监控
spark 监控梳理
spark 监控梳理
spark 监控梳理
|
存储 缓存 大数据
Starrocks执行查询报错:Memory of process exceed limit. Used: XXX, Limit: XXX. Mem usage has exceed the limit of BE
Starrocks执行查询报错:Memory of process exceed limit. Used: XXX, Limit: XXX. Mem usage has exceed the limit of BE
|
存储 分布式计算 Hadoop
Hadoop的HDFS数据均衡
【6月更文挑战第13天】
588 3
|
消息中间件 监控 Java
使用 JMX 监控 Kafka 集群性能指标
使用 JMX 监控 Kafka 集群性能指标
1343 1
|
分布式计算 Java Apache
快速体验Spark Connect
在Apache Spark 3.4中,引入了一个解耦的客户端-服务器架构的新模块Spark Connect,允许使用DataFrame API和未解析的逻辑计划作为协议远程连接到Spark集群。客户端

热门文章

最新文章