引言
程序运行卡的时候,我们经常需要定位到底程序在执行什么代码最消耗CPU,这里有个我经常用的小技巧,分享一下。
过程如下:
首先,Linux下面top命令可以直接找出最消耗CPU的进程
top
可以看到如下显示,我们看到前面有三个都是100% CPU占用的
top - 19:53:23 up 118 days, 6:09, 4 users, load average: 3.00, 2.68, 1.68 Tasks: 446 total, 1 running, 438 sleeping, 7 stopped, 0 zombie Cpu(s): 8.6%us, 4.2%sy, 0.0%ni, 87.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132103700k total, 76585104k used, 55518596k free, 424052k buffers Swap: 16383996k total, 0k used, 16383996k free, 36290728k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 156888 hdfs 20 0 34.9g 544m 11m S 100.8 0.4 9:44.08 java 156928 hdfs 20 0 34.9g 545m 11m S 100.8 0.4 9:42.84 java 156769 hdfs 20 0 34.9g 544m 11m S 100.2 0.4 9:53.26 java 153980 hdfs 20 0 64.0g 24g 55m S 1.7 19.6 5879:56 presto-server 105 root 20 0 0 0 0 S 0.3 0.0 45:26.08 events/6 106 root 20 0 0 0 0 S 0.3 0.0 4:41.29 events/7 代码片
接下来,我们一个个分析,先找第一个最消耗CPU的,我们使用命令
top -Hp 156888
这个命令可以展示进程内最消耗CPU的线程,可以得到如下结果
top - 20:00:25 up 118 days, 6:16, 2 users, load average: 3.00, 2.91, 2.14 Tasks: 39 total, 1 running, 38 sleeping, 0 stopped, 0 zombie Cpu(s): 9.3%us, 3.5%sy, 0.0%ni, 87.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132103700k total, 76578608k used, 55525092k free, 424052k buffers Swap: 16383996k total, 0k used, 16383996k free, 36290948k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 156926 hdfs 20 0 34.9g 544m 11m R 99.8 0.4 16:43.39 java 156901 hdfs 20 0 34.9g 544m 11m S 0.3 0.4 0:00.16 java 156888 hdfs 20 0 34.9g 544m 11m S 0.0 0.4 0:00.00 java 156889 hdfs 20 0 34.9g 544m 11m S 0.0 0.4 0:00.12 java 156890 hdfs 20 0 34.9g 544m 11m S 0.0 0.4 0:00.16 java 156891 hdfs 20 0 34.9g 544m 11m S 0.0 0.4 0:00.17 java 156892 hdfs 20 0 34.9g 544m 11m S 0.0 0.4 0:00.18 java 156893 hdfs 20 0 34.9g 544m 11m S 0.0 0.4 0:00.17 java
排名第一的便是我们这个进程下面的线程号了。
接下来,我们使用一个java中的jstack命令去定位哪一行:
jstack 156888
这样子直接获取信息内容是这样子的:
"Service Thread" #17 daemon prio=9 os_prio=0 tid=0x00002b47080f6800 nid=0x264fc runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C1 CompilerThread11" #16 daemon prio=9 os_prio=0 tid=0x00002b47080f3800 nid=0x264fb waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C1 CompilerThread10" #15 daemon prio=9 os_prio=0 tid=0x00002b47080f1000 nid=0x264fa waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C1 CompilerThread9" #14 daemon prio=9 os_prio=0 tid=0x00002b47080ef000 nid=0x264f9 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C1 CompilerThread8" #13 daemon prio=9 os_prio=0 tid=0x00002b47080ed000 nid=0x264f8 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE
我们大致读得出这个线程号,但是我们需要定位最消耗cpu的那个线程,我们看到nid项其实就是线程号,不过是16进制输出的,所以我们也要把我们的线程号转换成16进制,可以直接用计算器,没问题,我这里用linux下面打印命令:
printf "%x\n" 156926
结果为:
264fe
这个时候我们可以用这个数字结合我们的jstack命令去查询了:
jstack 156888 |grep 264fe
我们这样子可以搜索到这个线程,但是只有一行结果
"thread1" #18 prio=5 os_prio=0 tid=0x00002b4708167000 nid=0x264fe runnable [0x00002b4723545000]
我们可以在grep前后增加要显示的行数,可以看到上下文的内容,完整命令如下:
jstack 156888 |grep -A 20 -B 10 264fe
我们可以输出我们要的结果
"thread1" #18 prio=5 os_prio=0 tid=0x00002b4708167000 nid=0x264fe runnable [0x00002b4723545000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) - locked <0x0000000080208ea8> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:482) - locked <0x00000000802048c0> (a java.io.PrintStream) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) - locked <0x00000000802048a8> (a java.io.OutputStreamWriter) at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) at java.io.PrintStream.write(PrintStream.java:527) - eliminated <0x00000000802048c0> (a java.io.PrintStream) at java.io.PrintStream.print(PrintStream.java:669) at java.io.PrintStream.println(PrintStream.java:806) - locked <0x00000000802048c0> (a java.io.PrintStream) at Demo1_16.lambda$main$0(Demo1_16.java:6) at Demo1_16$$Lambda$1/834600351.run(Unknown Source)
从内容上面看,我们尅看到是在第6行代码上面有个打印,我把完整代码贴出来:
public class Demo1_16 { public static void main(String[] args) { new Thread(()->{ while (true){ System.out.println("hi!"); } },"thread1").start(); } }
可以看到是有个死循环导致的,至于为什么有三个占用最高的,那是因为我同时启动了三个这个程序。
我们把命令可以直接连起来:
jstack 156888 |grep `printf "%x" 264fe` -C6 --color
总结:
要定位最消耗的代码,我们的思路是缩小范围,顺序是进程->线程->代码,最后再看实际代码去解决问题