4. 自定义异常格式化插件,减少在异常集中发生的时候,因为打印异常栈日志量过大导致进一步的性能问题
其实 JVM 有参数 -XX:+OmitStackTraceInFastThrow
可以在某一个异常抛出过多次数时,自动省略异常堆栈。但是这个坏处在于:
- 对于某个新异常,赶上流量高峰突然输出很多异常,还是会有很多日志打印出来。
- 对于一个已经启动很久的进程,比如在 1 月 1 日抛出了很多 A 异常,在 3 月 1 日又抛出 A 异常却没有异常栈,这样很难寻找这个异常栈在哪里。
由于我们项目中使用了各种框架,有的使用了异步框架,导致异常栈会非常非常长(有的甚至有 1000 行),所以其实最符合我们需求的是:
- 每次异常都输出异常栈
- 但是异常栈只包括我们关心的包,其他的包都被省略
但是 Log4j2 官方只是提供了黑名单包的配置,也就是哪些包的异常栈被省略掉:
其实我们需要就是将这个黑名单变成白名单。于是,针对这种情况,我们根据官方源码,将其中的黑名单的判断取反,从黑名单变成了白名单。并创建自定义异常格式化插件:
import org.apache.logging.log4j.core.LogEvent; import org.apache.logging.log4j.core.config.Configuration; import org.apache.logging.log4j.core.config.plugins.Plugin; import org.apache.logging.log4j.core.impl.CustomizedThrowableProxyRenderer; import org.apache.logging.log4j.core.impl.ThrowableProxy; import org.apache.logging.log4j.core.pattern.ConverterKeys; import org.apache.logging.log4j.core.pattern.PatternConverter; import org.apache.logging.log4j.core.pattern.ThrowablePatternConverter; @Plugin(name = "CustomizedThrowablePatternConverter", category = PatternConverter.CATEGORY) //配置 key 是 %cusEx 或者 %cusThrowable 或者 %cusException @ConverterKeys({ "cusEx", "cusThrowable", "cusException" }) public class CustomizedThrowablePatternConverter extends ThrowablePatternConverter { public static CustomizedThrowablePatternConverter newInstance(final Configuration config, final String[] options) { return new CustomizedThrowablePatternConverter(config, options); } private CustomizedThrowablePatternConverter(final Configuration config, final String[] options) { super("CustomizedThrowable", "throwable", options, config); } @Override public void format(final LogEvent event, final StringBuilder toAppendTo) { final ThrowableProxy proxy = event.getThrownProxy(); final Throwable throwable = event.getThrown(); if ((throwable != null || proxy != null) && options.anyLines()) { if (proxy == null) { super.format(event, toAppendTo); return; } final int len = toAppendTo.length(); if (len > 0 && !Character.isWhitespace(toAppendTo.charAt(len - 1))) { toAppendTo.append(' '); } //这里的 CustomizedThrowableProxyRenderer 其实就是将 log4j2 的 ThrowableProxyRenderer 代码复制一份,之后将其中的黑名单判断取反,变成白名单 CustomizedThrowableProxyRenderer.formatExtendedStackTraceTo(proxy, toAppendTo, options.getIgnorePackages(), options.getTextRenderer(), getSuffix(event), options.getSeparator()); } } }
之后通过 layout %cusEx{filters(java, com.mycompany)}
,这样异常栈就只会输出这些包开头的异常堆栈,这里是 java 和 com.mycompany 开头的,输出的堆栈类似于:
... suppressed 27 lines at com.mycompany.spring.cloud.parent.web.common.undertow.jfr.JFRTracingFilter.doFilter(JFRTracingFilter.java:40) ~[spring-cloud-parent-web-common-2020.0.3-SNAPSHOT.jar!/:2020.0.3-SNAPSHOT] at com.mycompany.spring.cloud.parent.web.common.undertow.jfr.LazyJFRTracingFilter.doFilter(LazyJFRTracingFilter.java:23) ~[spring-cloud-parent-web-common-2020.0.3-SNAPSHOT.jar!/:2020.0.3-SNAPSHOT] ... suppressed 46 lines Caused by: com.alibaba.fastjson.JSONException: write javaBean error, fastjson version 1.2.75, class com.hopegaming.factsCenter.query.revo.controller.frontend.share.matches.MatchController$EventVO, fieldName : data ... suppressed 8 lines ... 74 more Caused by: java.lang.NullPointerException at com.mycompany.OrderController.postOrder(OrderController.java:103) ~[classes!/:2020.0.3-SNAPSHOT] ... suppressed 13 lines ... 74 more
5. 监控 RingBuffer 的使用率大小,如果使用率超过一定比例并且持续一段时间,证明应用写日志过忙,需要进行动态扩容,并且暂时将流量从这个实例切走一部分
log4j2 disruptor 的 RingBuffer 既然是一个环形 Buffer,它的容量是有限的,我们这里没有修改它的大小,走的是默认配置,查看其源码:
AsyncLoggerConfigDisruptor.java
public synchronized void start() { //省略其他代码 ringBufferSize = DisruptorUtil.calculateRingBufferSize("AsyncLoggerConfig.RingBufferSize"); disruptor = new Disruptor<>(factory, ringBufferSize, threadFactory, ProducerType.MULTI, waitStrategy); }
private static final int RINGBUFFER_MIN_SIZE = 128; private static final int RINGBUFFER_DEFAULT_SIZE = 256 * 1024; private static final int RINGBUFFER_NO_GC_DEFAULT_SIZE = 4 * 1024; static int calculateRingBufferSize(final String propertyName) { //是否启用了 ThreadLocal,如果是则为 4 kB,不是则为 256 kB int ringBufferSize = Constants.ENABLE_THREADLOCALS ? RINGBUFFER_NO_GC_DEFAULT_SIZE : RINGBUFFER_DEFAULT_SIZE; //读取系统变量,以及 log4j2.component.properties 文件获取 propertyName(这里是 AsyncLoggerConfig.RingBufferSize) 这个配置 final String userPreferredRBSize = PropertiesUtil.getProperties().getStringProperty(propertyName, String.valueOf(ringBufferSize)); try { int size = Integer.parseInt(userPreferredRBSize); //如果小于 128 字节则按照 128 字节设置 if (size < RINGBUFFER_MIN_SIZE) { size = RINGBUFFER_MIN_SIZE; LOGGER.warn("Invalid RingBufferSize {}, using minimum size {}.", userPreferredRBSize, RINGBUFFER_MIN_SIZE); } ringBufferSize = size; } catch (final Exception ex) { LOGGER.warn("Invalid RingBufferSize {}, using default size {}.", userPreferredRBSize, ringBufferSize); } //取最近的 2 的 n 次方,因为对于 2 的 n 次方取余等于对于 2^n-1 取与运算,这样更快 return Integers.ceilingNextPowerOfTwo(ringBufferSize); }
如果启用了 ThreadLocal 这种方式生成 LogEvent,每次不新生成的 LogEvent 用之前的,用 ThreadLocal 存储的,这样避免了创建新的 LogEvent。但是考虑下面这种情况:
logger.info("{}", someObj);
这样会造成强引用,导致如果线程没有新的日志,这个 someObj
一直不能回收。所以针对 Web 应用,log4j2 默认是不启用 ThreadLocal 的 方式生成 LogEvent:
public static final boolean IS_WEB_APP = PropertiesUtil.getProperties().getBooleanProperty( "log4j2.is.webapp", isClassAvailable("javax.servlet.Servlet")); public static final boolean ENABLE_THREADLOCALS = !IS_WEB_APP && PropertiesUtil.getProperties().getBooleanProperty( "log4j2.enable.threadlocals", true);
由此,可以看出,我们的 RingBuffer 的大小为 256 kB。
RingBuffer 满了 log4j2 会发生什么?当 RingBuffer 满了,如果在 log4j2.component.properties
配置了
AsyncLoggerConfig.SynchronizeEnqueueWhenQueueFull=false
,则会 Wait(其实是 park) 在 Disruptor 的 produce 方法上,等待消费出下一个可以生产的环形 buffer 槽;默认这个配置为 true,即所有生产日志的线程尝试获取全局中的同一个锁(private final Object queueFullEnqueueLock = new Object();
):
static final boolean ASYNC_CONFIG_SYNCHRONIZE_ENQUEUE_WHEN_QUEUE_FULL = PropertiesUtil.getProperties() .getBooleanProperty("AsyncLoggerConfig.SynchronizeEnqueueWhenQueueFull", true); private boolean synchronizeEnqueueWhenQueueFull() { return DisruptorUtil.ASYNC_CONFIG_SYNCHRONIZE_ENQUEUE_WHEN_QUEUE_FULL // Background thread must never block && backgroundThreadId != Thread.currentThread().getId(); } private final Object queueFullEnqueueLock = new Object(); private void enqueue(final LogEvent logEvent, final AsyncLoggerConfig asyncLoggerConfig) { //如果 AsyncLoggerConfig.SynchronizeEnqueueWhenQueueFull=true,默认就是 true if (synchronizeEnqueueWhenQueueFull()) { synchronized (queueFullEnqueueLock) { disruptor.getRingBuffer().publishEvent(translator, logEvent, asyncLoggerConfig); } } else { //如果 AsyncLoggerConfig.SynchronizeEnqueueWhenQueueFull=false disruptor.getRingBuffer().publishEvent(translator, logEvent, asyncLoggerConfig); } }
默认配置的时候,异常堆栈和我们 JFR 中看到的一样,举个例子:
"Thread-0" #27 [13136] prio=5 os_prio=0 cpu=0.00ms elapsed=141.08s tid=0x0000022d6f2fbcc0 nid=0x3350 waiting for monitor entry [0x000000399bcfe000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor.enqueue(AsyncLoggerConfigDisruptor.java:375) - waiting to lock <merged>(a java.lang.Object) at org.apache.logging.log4j.core.async.AsyncLoggerConfigDisruptor.enqueueEvent(AsyncLoggerConfigDisruptor.java:330) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.logInBackgroundThread(AsyncLoggerConfig.java:159) at org.apache.logging.log4j.core.async.EventRoute$1.logMessage(EventRoute.java:46) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.handleQueueFull(AsyncLoggerConfig.java:149) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.logToAsyncDelegate(AsyncLoggerConfig.java:136) at org.apache.logging.log4j.core.async.AsyncLoggerConfig.log(AsyncLoggerConfig.java:116) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:460) at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82) at org.apache.logging.log4j.core.Logger.log(Logger.java:162) at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2190) at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2144) at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2127) at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2003) at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1975) at org.apache.logging.log4j.spi.AbstractLogger.info(AbstractLogger.java:1312) 省略业务方法堆栈