最近生产环境出现了一个问题,就是Job服务日志好端端的不打印日志了,服务也没有挂, 现在将此次问题解决过程记录下来~
问题描述
生产环境有一台Job服务器,是专门用来跑所有定时任务的,然后有一天发现定时任务好像没有执行,所以上Job服务器查看日志,结果发现的情况是:
最后打印的是昨天晚上九点半的,到我看的时候就一直没有日志,没有日志就没有执行Job;当时为了快速解决问题就重启了服务器,Job就正常执行了;后来第二天上去看的时候居然又停止了, 但是Job的JVM是还在的,并没有挂掉,就是没有日志,看起来就是被阻塞了一样;
这个时候又把生成环境重启让他先正常执行,然后立马去测试环境看看能不能重现, 结果测试环境的日志也是一样
问题代码
//允许异步执行 Schedule @EnableAsync @Component public class TestSchedule { private static final Logger LOGGER = LoggerFactory.getLogger(LotterySchedule.class); // 使用线程池myAsync来执行这个Job @Async("myAsync") @Scheduled(cron = "0/1 * * * * ?") public void testDoGet(){ LOGGER.info("\ntestDoGet:"+Thread.currentThread()); //业务代码:里面调用了 String json = HttpUtil.doGet(url);来调用第三方接口 HttpUtil.doGet("www.baidu.com") } //这里没有用异步执行,单线程执行 @Scheduled(cron = "0/1 * * * * ?") public void testPrint(){ LOGGER.info("\ntestPrint:"+Thread.currentThread()); } }
然后看看线程池的代码
@Configuration @EnableAsync public class ExecutorConfig { /** Set the ThreadPoolExecutor's core pool size. */ private int corePoolSize = 10; /** Set the ThreadPoolExecutor's maximum pool size. */ private int maxPoolSize = 25; /** Set the capacity for the ThreadPoolExecutor's BlockingQueue. */ private int queueCapacity = 10; @Bean public Executor myAsync() { ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor(); executor.setCorePoolSize(corePoolSize); executor.setMaxPoolSize(maxPoolSize); executor.setQueueCapacity(queueCapacity); executor.setThreadNamePrefix("MyJobExecutor-"); // rejection-policy:当pool已经达到max size的时候,如何处理新任务 // CALLER_RUNS:不在新线程中执行任务,而是有调用者所在的线程来执行 executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy()); executor.initialize(); return executor; } }
1.查看JVM情况
Linux使用jstat命令查看jvm的GC情况
jps 查询Jvm进程号
查询Jvm jstat -gc 21738 5000
发现Jvm好像没有出现频繁GC,GC处理异常的情况,而且Jvm启动也配置了:+HeapDumpOnOutOfMemoryError;但是没有看到内存溢出的Dump文件;排除 Jvm异常的情况
2.查看线程栈分析
jps 查询Jvm进程号
jstack -l 22741 查询线程栈信息
"MyJobExecutor-2" #25 prio=5 os_prio=31 tid=0x00007fc7f7374800 nid=0xa203 waiting on condition [0x0000700004def000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000742196c58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:380) at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69) at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:246) - locked <0x00000007969b6910> (a org.apache.http.pool.AbstractConnPool$2) at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:303) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:279) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) at com.xxx.util.HttpUtil.doGet(HttpUtil.java:97) at com.xxx.util.HttpUtil.doGet(HttpUtil.java:70) 省略 at at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:746) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.transaction.interceptor.TransactionInterceptor$$Lambda$392/226041624.proceedWithInvocation(Unknown Source) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:294) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:688) at com.bfc.service.impl.LotteryHbkuai3MainServiceImpl$$EnhancerBySpringCGLIB$$4d3de61.publishPreByJob(<generated>) at com.bfc.job.LotterySchedule.publishByJob(LotterySchedule.java:45) at com.bfc.job.LotterySchedule$$FastClassBySpringCGLIB$$75ac0373.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:746) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:115) at org.springframework.aop.interceptor.AsyncExecutionInterceptor$$Lambda$391/524684837.call(Unknown Source) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - <0x0000000741db91a0> (a java.util.concurrent.ThreadPoolExecutor$Worker) "MyJobExecutor-1" #24 prio=5 os_prio=31 tid=0x00007fc7f5ff7800 nid=0xa403 waiting on condition [0x0000700004cec000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000742196c58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:380) at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69) at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:246) - locked <0x00000007968318d8> (a org.apache.http.pool.AbstractConnPool$2) at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:303) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:279) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) at com.xxx.util.HttpUtil.doGet(HttpUtil.java:97) at com.xxx.util.HttpUtil.doGet(HttpUtil.java:70) at 省略 at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:746) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.transaction.interceptor.TransactionInterceptor$$Lambda$392/226041624.proceedWithInvocation(Unknown Source) at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:294) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:98) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:688) at com.bfc.service.impl.LotteryHbkuai3MainServiceImpl$$EnhancerBySpringCGLIB$$4d3de61.publishPreByJob(<generated>) at com.bfc.job.LotterySchedule.publishByJob(LotterySchedule.java:45) at com.bfc.job.LotterySchedule$$FastClassBySpringCGLIB$$75ac0373.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:746) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:115) at org.springframework.aop.interceptor.AsyncExecutionInterceptor$$Lambda$391/524684837.call(Unknown Source) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - <0x0000000741da4a28> (a java.util.concurrent.ThreadPoolExecutor$Worker)
分析问题
通过栈信息可以发现,基本上所有的线程都被阻塞了,都在wait;
重点看 MyJobExecutor-" 开头的线程都在wait同一个lock,并且代码发生的地方是 HttpUtil.doGet 方法;
这么看来好像是HttpUtil.doGet 发生了阻塞;然后分析问题代码,结果发现这个代码是这样的
private static PoolingHttpClientConnectionManager connMgr; private static RequestConfig requestConfig; private static final int MAX_TIMEOUT = 7000; private static CloseableHttpClient httpClient ; static { // 设置连接池 connMgr = new PoolingHttpClientConnectionManager(); // 设置连接池大小 connMgr.setMaxTotal(8); connMgr.setDefaultMaxPerRoute(connMgr.getMaxTotal()); RequestConfig.Builder configBuilder = RequestConfig.custom(); // 设置连接超时 configBuilder.setConnectTimeout(MAX_TIMEOUT); // 设置读取超时 configBuilder.setSocketTimeout(MAX_TIMEOUT); // 设置从连接池获取连接实例的超时 configBuilder.setConnectionRequestTimeout(MAX_TIMEOUT); // 在提交请求之前 测试连接是否可用 configBuilder.setStaleConnectionCheckEnabled(true); requestConfig = configBuilder.build(); httpClient = HttpClients.custom().setConnectionManager(connMgr).build(); } /** * 发送 GET 请求(HTTP),不带输入数据 * @param url * @return */ public static String doGet(String url) { return doGet(url, new HashMap<String, Object>()); } /** * 发送 GET 请求(HTTP),K-V形式 * @param url * @param params * @return */ public static String doGet(String url, Map<String, Object> params) { String apiUrl = url; StringBuffer param = new StringBuffer(); int i = 0; for (String key : params.keySet()) { if (i == 0) { param.append("?"); }else { param.append("&"); } param.append(key).append("=").append(params.get(key)); i++; } apiUrl += param; String result = null; // HttpClient httpclient = new DefaultHttpClient(); try { HttpGet httpPost = new HttpGet(apiUrl); HttpResponse response = httpClient.execute(httpPost); HttpEntity entity = response.getEntity(); if (entity != null) { InputStream instream = entity.getContent(); result = IOUtils.toString(instream, "UTF-8"); } } catch (IOException e) { e.printStackTrace(); } return result; } }
注意看这个doGet(); 流没有关闭…
因为流没有关闭,这个HttpClient连接池的连接一直没有回收回去,后面的线程又一直在调用这个doGet方法;
但是又获取不到连接,所以就一直阻塞在哪里,直到连接超时HttpClient内部三个超时时间的区别
然后myAsync 这个线程池的线程也是有限的, Schedule每秒都在执行,很快线程不够用了,然后就阻塞了testDoGet这个定时任务了;
为了确认是 流未关闭的问题 我们可以看看服务器的TCP连接
netstat -anp | grep 进程号
可以看到有很多的80连接端口处于CLOSE_WAIT状态的; CLOSE_WAIT状态的原因与解决方法
问题的原因找到了,那么解决的方法就很简单了,把HttpClient的连接的流关闭掉就行了
HttpEntity entity = response.getEntity(); httpStr = EntityUtils.toString(entity, "UTF-8");
EntityUtils.toString方法里面有关闭流的;这样改了就没有问题了;
好像问题是解决了 但是怎么觉得哪里不对呢??
定时任务 public void testDoGet()是用的线程池异步执行
定时任务public void testPrint()是单线程执行的;
讲道理就算testDoGet阻塞了,也不应该把testPrint阻塞了吧?
再看jstack -l 信息
"pool-3-thread-1" #23 prio=5 os_prio=31 tid=0x00007ff371c86800 nid=0x5803 waiting on condition [0x0000700004776000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000007421cb160> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:380) at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69) at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:246) - locked <0x0000000797dbc900> (a org.apache.http.pool.AbstractConnPool$2) at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:303) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:279) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) at com.xxx.util.HttpUtil.doGet(HttpUtil.java:97) at com.xxx.util.HttpUtil.doGet(HttpUtil.java:70) at com.bfc.job.LotterySchedule$$FastClassBySpringCGLIB$$75ac0373.invoke(<generated>) at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:746) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:115) at org.springframework.aop.interceptor.AsyncExecutionInterceptor$$Lambda$401/476029857.call(Unknown Source) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:2022) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134) at org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor.submit(ThreadPoolTaskExecutor.java:341) at org.springframework.aop.interceptor.AsyncExecutionAspectSupport.doSubmit(AsyncExecutionAspectSupport.java:284) at org.springframework.aop.interceptor.AsyncExecutionInterceptor.invoke(AsyncExecutionInterceptor.java:129) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185) at org.springframework.aop.framework.CglibAopProxysun.reflect.GeneratedMethodAccessor96.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
初步分析这个信息,说明 pool-3-thread-1(就是Schedule的主线程) 也去执行了我们的testDoGet;
myAsync线程池不够用了,用了主线程来执行这个定时任务,然后又阻塞了,所以就阻塞了所有的其他定时任务
什么鬼?为什么会这样,然后翻过去看 myAsync的线程池代码
@Bean public Executor myAsync() { ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor(); executor.setCorePoolSize(corePoolSize); executor.setMaxPoolSize(maxPoolSize); executor.setQueueCapacity(queueCapacity); executor.setThreadNamePrefix("MyJobExecutor-"); // rejection-policy:当pool已经达到max size的时候,如何处理新任务 // CALLER_RUNS:不在新线程中执行任务,而是有调用者所在的线程来执行 executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy()); executor.initialize(); return executor; }
用的丢弃策略是 CallerRunsPolicy
:它直接在 execute 方法的调用线程中运行被拒绝的任务;如果执行程序已关闭,则会丢弃该任务
就是说线程池的队列也满了,就会触发丢弃策略,CallerRunsPolicy 是用调用线程池的那个线程来执行;