问题现象:
线上报HSF线程池满,CPU100%。
排查过程:
1 线程dump下来,分析等待线程
Hsf等待线程1:大约400个线程等待获取连接池
at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@22630f9b at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:233) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at com.alibaba.druid.pool.DruidDataSource.pollLast(DruidDataSource.java:1609) at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1172) at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1045) at com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4544) at com.alibaba.intl.commons.monitor.druid.ConnectionMonitorDruidFilter.dataSource_getConnection(ConnectionMonitorDruidFilter.java:60) at com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4540) at com.alibaba.druid.filter.stat.StatFilter.dataSource_getConnection(StatFilter.java:662) at com.alibaba.druid.filter.FilterChainImpl.dataSource_connect(FilterChainImpl.java:4540) at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1023) at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1015) at com.taobao.tddl.atom.jdbc.TDataSourceWrapper.getConnectionByTargetDataSource(TDataSourceWrapper.java:349) at com.taobao.tddl.atom.jdbc.TDataSourceWrapper.getConnectionWrapper(TDataSourceWrapper.java:319) at com.taobao.tddl.atom.jdbc.TDataSourceWrapper.getConnection0(TDataSourceWrapper.java:290) at com.taobao.tddl.atom.jdbc.TDataSourceWrapper.getConnection(TDataSourceWrapper.java:264) at com.taobao.tddl.atom.jdbc.TDataSourceWrapper.getConnection(TDataSourceWrapper.java:231) at com.taobao.tddl.atom.AbstractTAtomDataSource.getConnection(AbstractTAtomDataSource.java:32) at com.taobao.tddl.group.jdbc.DataSourceWrapper.getConnection(DataSourceWrapper.java:133) at com.taobao.tddl.group.jdbc.TGroupConnection.createNewConnection(TGroupConnection.java:210) at com.taobao.tddl.group.jdbc.TGroupStatement$4.tryOnDataSource(TGroupStatement.java:640) at com.taobao.tddl.group.jdbc.TGroupStatement$4.tryOnDataSource(TGroupStatement.java:632) at com.taobao.tddl.group.dbselector.AbstractDBSelector.tryOnDataSourceHolderWithIndex(AbstractDBSelector.java:279) at com.taobao.tddl.group.dbselector.AbstractDBSelector.tryExecute(AbstractDBSelector.java:434) at com.taobao.tddl.group.dbselector.AbstractDBSelector.tryExecute(AbstractDBSelector.java:449) at com.taobao.tddl.group.jdbc.TGroupStatement.executeQuery(TGroupStatement.java:505) at com.taobao.tddl.group.jdbc.TGroupStatement.executeInternal(TGroupStatement.java:144) at com.taobao.tddl.group.jdbc.TGroupStatement.execute(TGroupStatement.java:103) at com.taobao.tddl.group.jdbc.TGroupPreparedStatement.execute(TGroupPreparedStatement.java:92) at com.taobao.tddl.repo.mysql.spi.My_JdbcHandler.executeQuery(My_JdbcHandler.java:695) at com.taobao.tddl.repo.mysql.spi.My_Cursor.init(My_Cursor.java:106) at com.taobao.tddl.repo.mysql.handler.QueryMyHandler.handle(QueryMyHandler.java:89) at com.taobao.tddl.repo.mysql.handler.QueryMyHandlerWithMemcached.handleBySql(QueryMyHandlerWithMemcached.java:169) at com.taobao.tddl.repo.mysql.handler.QueryMyHandlerWithMemcached.handle(QueryMyHandlerWithMemcached.java:54) at com.taobao.tddl.executor.AbstractGroupExecutor.executeInner(AbstractGroupExecutor.java:47) at com.taobao.tddl.executor.AbstractGroupExecutor.execByExecPlanNode(AbstractGroupExecutor.java:36) at com.taobao.tddl.executor.TopologyExecutor.execByExecPlanNode(TopologyExecutor.java:74) at com.taobao.tddl.executor.MatrixExecutor.execByExecPlanNodeByOne(MatrixExecutor.java:800) at com.taobao.tddl.executor.MatrixExecutor.execByExecPlanNode(MatrixExecutor.java:789) at com.taobao.tddl.executor.MatrixExecutor.execute(MatrixExecutor.java:145) at com.taobao.tddl.matrix.jdbc.TConnection.executeSQL(TConnection.java:275)
Hsf等待线程2: 大约80个线程在等待获取createSqlStat的并发锁。
"HSFBizProcessor-DEFAULT-8-thread-719" Id=1604 WAITING on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@7cc0ca97 at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@7cc0ca97 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943) at com.alibaba.druid.stat.JdbcDataSourceStat.createSqlStat(JdbcDataSourceStat.java:410) at com.alibaba.druid.filter.stat.StatFilter.createSqlStat(StatFilter.java:631) at com.alibaba.druid.filter.stat.StatFilter.statementPrepareAfter(StatFilter.java:305) at com.alibaba.druid.filter.FilterEventAdapter.connection_prepareStatement(FilterEventAdapter.java:146) at com.alibaba.druid.filter.FilterChainImpl.connection_prepareStatement(FilterChainImpl.java:471) at com.alibaba.druid.proxy.jdbc.ConnectionProxyImpl.prepareStatement(ConnectionProxyImpl.java:375) at com.alibaba.druid.pool.DruidPooledConnection.prepareStatement(DruidPooledConnection.java:369) at com.taobao.tddl.atom.jdbc.TConnectionWrapper.prepareStatement(TConnectionWrapper.java:304) at com.taobao.tddl.group.jdbc.TGroupPreparedStatement.createPreparedStatementInternal(TGroupPreparedStatement.java:68) at com.taobao.tddl.group.jdbc.TGroupPreparedStatement.executeQueryOnConnection(TGroupPreparedStatement.java:108) at com.taobao.tddl.group.jdbc.TGroupStatement$4.tryOnDataSource(TGroupStatement.java:650) at com.taobao.tddl.group.jdbc.TGroupStatement$4.tryOnDataSource(TGroupStatement.java:632) at com.taobao.tddl.group.dbselector.AbstractDBSelector.tryOnDataSourceHolderWithIndex(AbstractDBSelector.java:279) at com.taobao.tddl.group.dbselector.AbstractDBSelector.tryExecute(AbstractDBSelector.java:434) at com.taobao.tddl.group.dbselector.AbstractDBSelector.tryExecute(AbstractDBSelector.java:449) at com.taobao.tddl.group.jdbc.TGroupStatement.executeQuery(TGroupStatement.java:505) at com.taobao.tddl.group.jdbc.TGroupStatement.executeInternal(TGroupStatement.java:144) at com.taobao.tddl.group.jdbc.TGroupStatement.execute(TGroupStatement.java:103) at com.taobao.tddl.group.jdbc.TGroupPreparedStatement.execute(TGroupPreparedStatement.java:92) at com.taobao.tddl.repo.mysql.spi.My_JdbcHandler.executeQuery(My_JdbcHandler.java:695)
代码分析
根据以上线程堆栈信息找到代码位置:
tddl使用druid连接池,下面这段是核心代码,总体流程分为四步
获取DB连接池->创建缓存sql->执行sql->释放DB连接
protected DataSourceTryer<ResultSet> executeQueryTryer = new AbstractDataSourceTryer<ResultSet>() { public ResultSet tryOnDataSource(TGroupStatement stmt, DataSourceWrapper dsw, Object... args) throws SQLException { String sql = (String)args[0]; Connection conn = TGroupStatement.this.tGroupConnection.createNewConnection(dsw, true); int lastSocketTimeout = -1; if (TGroupStatement.this.tGroupConnection.getNetworkTimeout() >= 0) { lastSocketTimeout = NetworkTimeoutUtils.getNetworkTimeout(conn); NetworkTimeoutUtils.setNetworkTimeout(conn, TGroupConnection.executor, TGroupStatement.this.tGroupConnection.getNetworkTimeout()); } ResultSet var7; try { var7 = TGroupStatement.this.executeQueryOnConnection(conn, sql); } finally { if (lastSocketTimeout >= 0 && TGroupStatement.this.tGroupConnection.getNetworkTimeout() >= 0) { try { NetworkTimeoutUtils.setNetworkTimeout(conn, TGroupConnection.executor, lastSocketTimeout); } catch (Throwable var14) { ; } } } return var7;
观察DB没有发现慢sql,sql执行时间在1ms以下,但是tddl日志里报了很多慢sql,并且连接池用完。
继续监控线上,发现很多sql执行时间耗在创建sql这一步
耗时位置和上午hsf线程卡住位置一致,这个地方druid做了一个并发缓存,并发请求会在此排队等待,并发高的情况下会导致耗时增加,从而导致释放连接的速度变慢
建议优化
tddl把sql缓存的逻辑放在获取连接之后会拖慢DB连接池的释放速度,可以考虑使用以下流程, sql创建缓存流程不占用DB连接池的资源。
创建缓存sql->获取DB连接池->执行sql->释放DB连接