本文基于OpenJDK 11
最近使用Spring Cloud Gateway
的时候,遇到了一个奇怪的问题:
线上有3个 API 网关实例,压力均衡,平稳运行3天后,突然有一个实例,CPU飚高,并且响应时间增加很多,从几十毫秒涨到了几分钟。
线上是 k8s 管理容器,立刻停掉了这个 pod,重建,恢复正常。
线上我们开启了 JFR
记录(可以参考我的另外系列文章:Java 监控 JFR),通过 JMC 查看下出问题的 JFR 记录。
首先我们来看 GC,我们的 GC 算法是 G1,主要通过 G1 Garbage Collection这个事件查看:
发现 GC 全部为 Young GC,且耗时比较正常,频率上也没有什么明显异常。
接下来来看,CPU 占用相关。直接看 Thread CPU Load 这个事件,看每个线程的 CPU 占用情况。发现reactor-http-epoll
线程池的线程,CPU 占用很高,加在一起,接近了 100%。
这些线程是 reactor-netty 处理业务的线程,观察其他实例,发现正常情况下,并不会有这么高的 CPU 负载。那么为啥会有这么高的负载呢?通过 Thread Dump 来看一下线程堆栈有何发现.
通过查看多个线程堆栈 dump,发现这些线程基本都处于 Runnable,并且执行的方法是原生方法,和StackWalker
相关,例如:
"reactor-http-epoll-2" #75 daemon prio=5 os_prio=0 cpu=25100145.64ms elapsed=306507.26s tid=0x0000556eddcbd000 nid=0x61 runnable [0x00007f8605443000] java.lang.Thread.State: RUNNABLE at java.lang.StackStreamFactory$AbstractStackWalker.callStackWalk(java.base@11.0.6/Native Method) at java.lang.StackStreamFactory$AbstractStackWalker.beginStackWalk(java.base@11.0.6/StackStreamFactory.java:370) at java.lang.StackStreamFactory$AbstractStackWalker.walk(java.base@11.0.6/StackStreamFactory.java:243) at java.lang.StackWalker.walk(java.base@11.0.6/StackWalker.java:498) at org.apache.logging.log4j.util.StackLocator.calcLocation(StackLocator.java:81) at org.apache.logging.log4j.util.StackLocatorUtil.calcLocation(StackLocatorUtil.java:76) at org.apache.logging.log4j.spi.AbstractLogger.getLocation(AbstractLogger.java:2201) at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2144) at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2127) at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2020) at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891) at org.apache.logging.log4j.spi.AbstractLogger.info(AbstractLogger.java:1436) at com.xxx.apigateway.filter.AccessCheckFilter.filter(AccessCheckFilter.java:144) at org.springframework.cloud.gateway.handler.FilteringWebHandler$GatewayFilterAdapter.filter(FilteringWebHandler.java:138) at org.springframework.cloud.gateway.filter.OrderedGatewayFilter.filter(OrderedGatewayFilter.java:44) at org.springframework.cloud.gateway.handler.FilteringWebHandler$DefaultGatewayFilterChain.lambda$filter$0(FilteringWebHandler.java:118) at org.springframework.cloud.gateway.handler.FilteringWebHandler$DefaultGatewayFilterChain$$Lambda$1265/0x0000000800b83440.get(Unknown Source) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:44) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.Mono.subscribe(Mono.java:4105) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:172) at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:150) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:76) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerNext(FluxConcatMap.java:274) at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onNext(FluxConcatMap.java:851) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1637) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241) at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2199) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onSubscribe(MonoFlatMap.java:230) at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:54) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:150) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:76) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerNext(FluxConcatMap.java:274) at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onNext(FluxConcatMap.java:851) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:73) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onNext(MonoPeekTerminal.java:173) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1637) at reactor.core.publisher.MonoFilterWhen$MonoFilterWhenMain.innerResult(MonoFilterWhen.java:193) at reactor.core.publisher.MonoFilterWhen$FilterWhenInner.onNext(MonoFilterWhen.java:260) at reactor.core.publisher.MonoFilterWhen$FilterWhenInner.onNext(MonoFilterWhen.java:228) at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2199) at reactor.core.publisher.MonoFilterWhen$FilterWhenInner.onSubscribe(MonoFilterWhen.java:249) at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:54) at reactor.core.publisher.Mono.subscribe(Mono.java:4105) at reactor.core.publisher.MonoFilterWhen$MonoFilterWhenMain.onNext(MonoFilterWhen.java:150) at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2199) at reactor.core.publisher.MonoFilterWhen$MonoFilterWhenMain.onSubscribe(MonoFilterWhen.java:103) at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:54) at reactor.core.publisher.Mono.subscribe(Mono.java:4105) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:441) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onNext(FluxConcatMap.java:243) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onNext(FluxDematerialize.java:91) at reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onNext(FluxDematerialize.java:38) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxIterable$IterableSubscription.slowPath(FluxIterable.java:243) at reactor.core.publisher.FluxIterable$IterableSubscription.request(FluxIterable.java:201) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.request(ScopePassingSpanSubscriber.java:76) at reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.request(FluxDematerialize.java:120) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.request(ScopePassingSpanSubscriber.java:76) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.request(ScopePassingSpanSubscriber.java:76) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onSubscribe(FluxConcatMap.java:228) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onSubscribe(ScopePassingSpanSubscriber.java:69) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onSubscribe(ScopePassingSpanSubscriber.java:69) at reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onSubscribe(FluxDematerialize.java:70) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onSubscribe(ScopePassingSpanSubscriber.java:69) at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:139) at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:63) at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:53) at reactor.core.publisher.FluxDefer.subscribe(FluxDefer.java:54) at reactor.core.publisher.Mono.subscribe(Mono.java:4105) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:441) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onNext(FluxConcatMap.java:243) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxIterable$IterableSubscription.slowPath(FluxIterable.java:243) at reactor.core.publisher.FluxIterable$IterableSubscription.request(FluxIterable.java:201) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.request(ScopePassingSpanSubscriber.java:76) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onSubscribe(FluxConcatMap.java:228) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onSubscribe(ScopePassingSpanSubscriber.java:69) at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:139) at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:63) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at org.springframework.cloud.sleuth.instrument.web.TraceWebFilter$MonoWebFilterTrace.subscribe(TraceWebFilter.java:162) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.Mono.subscribe(Mono.java:4105) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:172) at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.netty.http.server.HttpServerHandle.onStateChange(HttpServerHandle.java:64) at reactor.netty.tcp.TcpServerBind$ChildObserver.onStateChange(TcpServerBind.java:228) at reactor.netty.http.server.HttpServerOperations.onInboundNext(HttpServerOperations.java:465) at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:90) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at reactor.netty.http.server.HttpTrafficHandler.channelRead(HttpTrafficHandler.java:167) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:308) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:422) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:387) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(java.base@11.0.6/Thread.java:834) "reactor-http-epoll-4" #79 daemon prio=5 os_prio=0 cpu=25266372.53ms elapsed=306415.58s tid=0x0000556eddcc4800 nid=0x65 runnable [0x00007f85e2ff9000] java.lang.Thread.State: RUNNABLE at java.lang.StackStreamFactory$AbstractStackWalker.fetchStackFrames(java.base@11.0.6/Native Method) at java.lang.StackStreamFactory$AbstractStackWalker.fetchStackFrames(java.base@11.0.6/StackStreamFactory.java:386) at java.lang.StackStreamFactory$AbstractStackWalker.getNextBatch(java.base@11.0.6/StackStreamFactory.java:322) at java.lang.StackStreamFactory$AbstractStackWalker.peekFrame(java.base@11.0.6/StackStreamFactory.java:263) at java.lang.StackStreamFactory$AbstractStackWalker.hasNext(java.base@11.0.6/StackStreamFactory.java:351) at java.lang.StackStreamFactory$StackFrameTraverser.nextStackFrame(java.base@11.0.6/StackStreamFactory.java:520) at java.lang.StackStreamFactory$StackFrameTraverser.forEachRemaining(java.base@11.0.6/StackStreamFactory.java:581) at java.util.stream.AbstractPipeline.copyInto(java.base@11.0.6/AbstractPipeline.java:484) at java.util.stream.AbstractPipeline.wrapAndCopyInto(java.base@11.0.6/AbstractPipeline.java:474) at java.util.stream.AbstractPipeline.evaluate(java.base@11.0.6/AbstractPipeline.java:550) at java.util.stream.AbstractPipeline.evaluateToArrayNode(java.base@11.0.6/AbstractPipeline.java:260) at java.util.stream.ReferencePipeline.toArray(java.base@11.0.6/ReferencePipeline.java:517) at java.util.stream.ReferencePipeline.toArray(java.base@11.0.6/ReferencePipeline.java:523) at org.apache.logging.log4j.util.StackLocator$FqcnCallerLocator.apply(StackLocator.java:96) at org.apache.logging.log4j.util.StackLocator$FqcnCallerLocator.apply(StackLocator.java:90) at java.lang.StackStreamFactory$StackFrameTraverser.consumeFrames(java.base@11.0.6/StackStreamFactory.java:534) at java.lang.StackStreamFactory$AbstractStackWalker.doStackWalk(java.base@11.0.6/StackStreamFactory.java:306) at java.lang.StackStreamFactory$AbstractStackWalker.callStackWalk(java.base@11.0.6/Native Method) at java.lang.StackStreamFactory$AbstractStackWalker.beginStackWalk(java.base@11.0.6/StackStreamFactory.java:370) at java.lang.StackStreamFactory$AbstractStackWalker.walk(java.base@11.0.6/StackStreamFactory.java:243) at java.lang.StackWalker.walk(java.base@11.0.6/StackWalker.java:498) at org.apache.logging.log4j.util.StackLocator.calcLocation(StackLocator.java:81) at org.apache.logging.log4j.util.StackLocatorUtil.calcLocation(StackLocatorUtil.java:76) at org.apache.logging.log4j.spi.AbstractLogger.getLocation(AbstractLogger.java:2201) at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2144) at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2127) at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2038) at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1915) at org.apache.logging.log4j.spi.AbstractLogger.info(AbstractLogger.java:1451) at com.xxx.apigateway.filter.CommonLogFilter.filter(CommonLogFilter.java:42) at org.springframework.cloud.gateway.handler.FilteringWebHandler$GatewayFilterAdapter.filter(FilteringWebHandler.java:138) at org.springframework.cloud.gateway.filter.OrderedGatewayFilter.filter(OrderedGatewayFilter.java:44) at org.springframework.cloud.gateway.handler.FilteringWebHandler$DefaultGatewayFilterChain.lambda$filter$0(FilteringWebHandler.java:118) at org.springframework.cloud.gateway.handler.FilteringWebHandler$DefaultGatewayFilterChain$$Lambda$1265/0x0000000800b83440.get(Unknown Source) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:44) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.Mono.subscribe(Mono.java:4105) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:172) at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:150) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:76) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerNext(FluxConcatMap.java:274) at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onNext(FluxConcatMap.java:851) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1637) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241) at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2199) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onSubscribe(MonoFlatMap.java:230) at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:54) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:150) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:76) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerNext(FluxConcatMap.java:274) at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onNext(FluxConcatMap.java:851) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:73) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onNext(MonoPeekTerminal.java:173) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1637) at reactor.core.publisher.MonoFilterWhen$MonoFilterWhenMain.innerResult(MonoFilterWhen.java:193) at reactor.core.publisher.MonoFilterWhen$FilterWhenInner.onNext(MonoFilterWhen.java:260) at reactor.core.publisher.MonoFilterWhen$FilterWhenInner.onNext(MonoFilterWhen.java:228) at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2199) at reactor.core.publisher.MonoFilterWhen$FilterWhenInner.onSubscribe(MonoFilterWhen.java:249) at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:54) at reactor.core.publisher.Mono.subscribe(Mono.java:4105) at reactor.core.publisher.MonoFilterWhen$MonoFilterWhenMain.onNext(MonoFilterWhen.java:150) at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2199) at reactor.core.publisher.MonoFilterWhen$MonoFilterWhenMain.onSubscribe(MonoFilterWhen.java:103) at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:54) at reactor.core.publisher.Mono.subscribe(Mono.java:4105) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:441) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onNext(FluxConcatMap.java:243) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onNext(FluxDematerialize.java:91) at reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onNext(FluxDematerialize.java:38) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxIterable$IterableSubscription.slowPath(FluxIterable.java:243) at reactor.core.publisher.FluxIterable$IterableSubscription.request(FluxIterable.java:201) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.request(ScopePassingSpanSubscriber.java:76) at reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.request(FluxDematerialize.java:120) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.request(ScopePassingSpanSubscriber.java:76) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.request(ScopePassingSpanSubscriber.java:76) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onSubscribe(FluxConcatMap.java:228) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onSubscribe(ScopePassingSpanSubscriber.java:69) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onSubscribe(ScopePassingSpanSubscriber.java:69) at reactor.core.publisher.FluxDematerialize$DematerializeSubscriber.onSubscribe(FluxDematerialize.java:70) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onSubscribe(ScopePassingSpanSubscriber.java:69) at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:139) at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:63) at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:53) at reactor.core.publisher.FluxDefer.subscribe(FluxDefer.java:54) at reactor.core.publisher.Mono.subscribe(Mono.java:4105) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:441) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onNext(FluxConcatMap.java:243) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onNext(ScopePassingSpanSubscriber.java:90) at reactor.core.publisher.FluxIterable$IterableSubscription.slowPath(FluxIterable.java:243) at reactor.core.publisher.FluxIterable$IterableSubscription.request(FluxIterable.java:201) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.request(ScopePassingSpanSubscriber.java:76) at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onSubscribe(FluxConcatMap.java:228) at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onSubscribe(ScopePassingSpanSubscriber.java:69) at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:139) at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:63) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at org.springframework.cloud.sleuth.instrument.web.TraceWebFilter$MonoWebFilterTrace.subscribe(TraceWebFilter.java:162) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52) at reactor.core.publisher.Mono.subscribe(Mono.java:4105) at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:172) at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56) at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:55) at reactor.netty.http.server.HttpServerHandle.onStateChange(HttpServerHandle.java:64) at reactor.netty.tcp.TcpServerBind$ChildObserver.onStateChange(TcpServerBind.java:228) at reactor.netty.http.server.HttpServerOperations.onInboundNext(HttpServerOperations.java:465) at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:90) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at reactor.netty.http.server.HttpTrafficHandler.channelRead(HttpTrafficHandler.java:167) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:321) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:295) at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792) at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
主要和这两个原生方法有关:
java.lang.StackStreamFactory$AbstractStackWalker.callStackWalk
java.lang.StackStreamFactory$AbstractStackWalker.fetchStackFrames
虽然一直有OpenJDK 11 之后, StackWalker
性能有问题,不如new Throwable()
获取堆栈快的问题。但是考虑到压力是均衡的,其他两个实例并没有这个问题,应该不是本身对于这个类的应用,导致的CPU消耗突然变大。查看 JDK 相关的 JIRA,发现一个有意思的 BUG:Application on JDK11 consume 100% CPU after a few hours of uptime
这里面说,java.lang.StackStreamFactory$AbstractStackWalker.callStackWalk
这个原生方法,在 JVM 运行几小时后,突然 CPU 就会飚高到 100%。看来可能确实有些问题。针对这个问题,我也提了个 Issue 到 Log4j 的 JIRA:[]
那么为什么会调用这个类呢?Log4j2 打异步日志的时候,如果需要保留日志产生的类还有行号,需要缓存堆栈,那么需要配置includeLocation = true
。这个堆栈,如果环境是 java 9 之前,那么通过 new Throwable()
实现, 参考:StackLocator.java 如果环境是 java 9 之后,那么通过 StackWalker
实现:StackLocator.java 这个如果你配置了includeLocation = true
,就会缓存堆栈。
对于网关,我们可以不用打印类和行号,配置includeLocation = false
可以避免再出现类似的问题。
补充
看到这篇大神的分析文章: JDK11现存性能bug(JDK-8221393)深度解析 感觉受益匪浅,大家也可以看看。Arthas 我们已经在生产的每个Pod上部署,一直没注意到火焰图这个功能(不过, JFR也可以定位到,嘻嘻),我们又多了一种手段。 JDK 13,可以考虑升级一波,感觉问题应该就是博主分析的那个 Table 过大检索效率低下类似的问题,看是不是也能解决这个CPU 100%的问题。
再次补充
提了个 Issue 到 Log4j 的 JIRA:High CPU consumption using StackWalkerLog4j2 的开发者已经做了优化,给他们点个大赞~~
优化相当于,原来取堆栈的所有,现在只取最上层的一个堆栈(本来就是只需要这个)。这样的话,对于我们的应用,由于堆栈是很深的,从上面的线程堆栈快照就能看出来,只取最上层的一个堆栈是很高的性能提升