ok,言归正传。上图:
Java的线程状态一共有NEW,RUNNABLE,BLOCKED,WAITING,TIMED_WAITING,TERMINATED 6种状态。这里重点关注一下BLOCKED和TIMED_WAITING状态。
BLOCKED状态:线程进入此状态的前提一般有两个:waiting for monitor(intrinsic or external) entry 或者 reenter 同步代码块。讲到这我们先了解一下Java线程模型中的两个队列。如图所示:
每个 Monitor在某个时刻,只能被一个线程拥有,该线程就是 “Active Thread”,而其它线程都是 “Waiting Thread”,分别在两个队列 “Entry Set”和 “Wait Set”里面等候。在 “Entry Set”中等待的线程状态是 “Waiting for monitor entry”,而在 “Wait Set”中等待的线程状态是 “in Object.wait()”。如果你不恰当的使用了ReentrantLock或者ReentrantReadWriteLock类,就有可能陷入BLOCKED状态,这个也是我们调优中经常会遇到的情况,解决方案也很简单,找到等待上锁的地址,分析是否发生了Thread starvation。
至于TIME_WAITING状态,官方文档也讲解的比较好,即你在调用下面方法时,线程会进入该状态。
Thread.sleep Object.wait with timeout Thread.join with timeout LockSupport.parkNanos LockSupport.parkUntil
这里重点关注一下LockSupport,该类是用来创建锁和其他同步类的基本线程阻塞原语,是一个针对Thread.suspend和Thread.resume()的优化,也是针对忙等,防止过度自旋的一种优化(关于这一点,感兴趣的同学可以参阅一下文献5)。
ok,在简单介绍完几个重点的线程状态后,我们通过几个具体的case来了解下Thread stack:
Case 1:NIO 中的Acceptor
"qtp589745448-36 Acceptor0 SelectChannelConnector@0.0.0.0:8161" prio=10 tid=0x00007f02f8eea800 nid=0x18ee runnable [0x00007f02e70b3000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241) - locked <0x00000000ec8ffde8> (a java.lang.Object) at org.eclipse.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:109) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:938) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:724) Locked ownable synchronizers: - None
瞅瞅源代码中是怎么实现的,如下:
public void accept(int acceptorID) throws IOException 100 { 101 ServerSocketChannel server; 102 synchronized(this) 103 { 104 server = _acceptChannel; 105 } 106 107 if (server!=null && server.isOpen() && _manager.isStarted()) 108 { 109 SocketChannel channel = server.accept(); 110 channel.configureBlocking(false); 111 Socket socket = channel.socket(); 112 configure(socket); 113 _manager.register(channel); 114 } 115 }
关于Thread stack,这里强调一点:nid,native lwp id,即本地轻量级进程(即线程)ID。
Case 2: NIO中的Selector
"qtp589745448-35 Selector0" prio=10 tid=0x00007f02f8ee9800 nid=0x18ed runnable [0x00007f02e71b4000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked <0x00000000ec9006f0> (a sun.nio.ch.Util$2) - locked <0x00000000ec9006e0> (a java.util.Collections$UnmodifiableSet) - locked <0x00000000ec9004c0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.eclipse.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:569) at org.eclipse.jetty.io.nio.SelectorManager$1.run(SelectorManager.java:290) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:724) Locked ownable synchronizers: - None
代码片段如下:
// If we should wait with a select 566 if (wait>0) 567 { 568 long before=now; 569 selector.select(wait); 570 now = System.currentTimeMillis(); 571 _timeout.setNow(now); 572 573 // If we are monitoring for busy selector 574 // and this select did not wait more than 1ms 575 if (__MONITOR_PERIOD>0 && now-before <=1) 576 { 577 // count this as a busy select and if there have been too many this monitor cycle 578 if (++_busySelects>__MAX_SELECTS) 579 { 580 // Start injecting pauses 581 _pausing=true; 582 583 // if this is the first pause 584 if (!_paused) 585 { 586 // Log and dump some status 587 _paused=true; 588 LOG.warn("Selector {} is too busy, pausing!",this); 589 } 590 } 591 } 592 }Case 3: ActveMQ中针对MQTT协议的Handler
"ActiveMQ Transport Server Thread Handler: mqtt://0.0.0.0:1883?maximumConnections=1000&wireFormat.maxFrameSize=104857600" daemon prio=10 tid=0x00007f02f8ba6000 nid=0x18dc waiting on condition [0x00007f02ec824000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000faad0458> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467) at org.apache.activemq.transport.tcp.TcpTransportServer$1.run(TcpTransportServer.java:373) at java.lang.Thread.run(Thread.java:724) Locked ownable synchronizers: - None代码片段:
@Override protected void doStart() throws Exception { if (useQueueForAccept) { Runnable run = new Runnable() { @Override public void run() { try { while (!isStopped() && !isStopping()) { Socket sock = socketQueue.poll(1, TimeUnit.SECONDS); if (sock != null) { handleSocket(sock); } } } catch (InterruptedException e) { LOG.info("socketQueue interuppted - stopping"); if (!isStopping()) { onAcceptError(e); } } } }; socketHandlerThread = new Thread(null, run, "ActiveMQ Transport Server Thread Handler: " + toString(), getStackSize()); socketHandlerThread.setDaemon(true); socketHandlerThread.setPriority(ThreadPriorities.BROKER_MANAGEMENT - 1); socketHandlerThread.start(); } super.doStart(); }Case 5: 模拟银行转帐存款
"withdraw" prio=10 tid=0x00007f3428110800 nid=0x2b6b waiting for monitor entry [0x00007f34155bb000] java.lang.Thread.State: BLOCKED (on object monitor) at com.von.thread.research.DeadThread.depositMoney(DeadThread.java:13) - waiting to lock <0x00000000d7fae540> (a java.lang.Object) - locked <0x00000000d7fae530> (a java.lang.Object) at com.von.thread.research.DeadThread.run(DeadThread.java:28) at java.lang.Thread.run(Thread.java:724) Locked ownable synchronizers: - None "deposit" prio=10 tid=0x00007f342810f000 nid=0x2b6a waiting for monitor entry [0x00007f34156bc000] java.lang.Thread.State: BLOCKED (on object monitor) at com.von.thread.research.DeadThread.withdrawMoney(DeadThread.java:21) - waiting to lock <0x00000000d7fae530> (a java.lang.Object) - locked <0x00000000d7fae540> (a java.lang.Object) at com.von.thread.research.DeadThread.run(DeadThread.java:29) at java.lang.Thread.run(Thread.java:724) Locked ownable synchronizers: - None
Found one Java-level deadlock: ============================= "withdraw": waiting to lock monitor 0x00007f3400003620 (object 0x00000000d7fae540, a java.lang.Object), which is held by "deposit" "deposit": waiting to lock monitor 0x00007f3400004b20 (object 0x00000000d7fae530, a java.lang.Object), which is held by "withdraw" Java stack information for the threads listed above: =================================================== "withdraw": at com.von.thread.research.DeadThread.depositMoney(DeadThread.java:13) - waiting to lock <0x00000000d7fae540> (a java.lang.Object) - locked <0x00000000d7fae530> (a java.lang.Object) at com.von.thread.research.DeadThread.run(DeadThread.java:28) at java.lang.Thread.run(Thread.java:724) "deposit": at com.von.thread.research.DeadThread.withdrawMoney(DeadThread.java:21) - waiting to lock <0x00000000d7fae530> (a java.lang.Object) - locked <0x00000000d7fae540> (a java.lang.Object) at com.von.thread.research.DeadThread.run(DeadThread.java:29) at java.lang.Thread.run(Thread.java:724) Found 1 deadlock.
这里是一个非顺序加锁诱发的一个死锁场景。
好了,差不多了,总结一下在调优过程中需要重点关注的三类情况(grep java.lang.Thread.State dump.bin | awk '{print $2$3$4$5}' | sort | uniq -c):
1. waiting for monitor entry – thread state blocked。可能发生的问题: deadlock(sequential deadlock,starvation deadlock...)
2. waiting on condition – sleeping or timed_waiting。可能发生的问题: IO bottleneck
3. Object.wait – TIMED_WAITING。wait & notifyAll使用上需要明确其性能及其局限性问题,JCIP上也推荐尽可能使用JUC提供的高级并发原语AQS
参考文献:
http://architects.dzone.com/articles/how-analyze-java-thread-dumps
http://stackoverflow.com/questions/7698861/simple-java-example-runs-with-14-threads-why
http://www.longene.org/forum/viewtopic.php?f=5&t=94&p=399#p399
http://www.slideshare.net/Byungwook/analysis-bottleneck-in-j2ee-application
http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html
JavaConcurrency in practice
http://stackoverflow.com/questions/37026/java-notify-vs-notifyall-all-over-again