how to use perf

简介: Since I did't see here anything about perf which is a relatively new tool for profiling the kernel and user applications on Linux I decided to add this information.

Since I did't see here anything about perf which is a relatively new tool for profiling the kernel and user applications on Linux I decided to add this information.

First of all - this is a tutorial about Linux profiling with perf

You can use perf if your Linux Kernel is greater than 2.6.32 or oprofile if it is older. Both programs don't require from you to instrument your program (like gprof requires). However in order to get call graph correctly in perf you need to build you program with -fno-omit-frame-pointer. For example: g++ -fno-omit-frame-pointer -O2 main.cpp.

You can see "live" analysis of your application with perf top:

sudo perf top -p `pidof a.out` -K

Or you can record performance data of a running application and analyze them after that:

1) To record performance data:

perf record -p `pidof a.out`

or to record for 10 secs:

perf record -p `pidof a.out` sleep 10

or to record with call graph ()

perf record -g -p `pidof a.out`

2) To analyze the recorded data

perf report --stdio

perf report --stdio --sort=dso -g none

perf report --stdio -g none

perf report --stdio -g

Or you can record performace data of a application and analyze them after that just by launching the application in this way and waiting for it to exit:

perf record ./a.out

This is an example of profiling a test program

The test program is in file main.cpp (I will put main.cpp at the bottom of the message):

I compile it in this way:

g++ -m64 -fno-omit-frame-pointer -g main.cpp -L.  -ltcmalloc_minimal -o my_test

I use libmalloc_minimial.so since it is compiled with -fno-omit-frame-pointer while libc malloc seems to be compiled without this option. Then I run my test program

./my_test 100000000

Then I record performance data of a running process:

perf record -g  -p `pidof my_test` -o ./my_test.perf.data sleep 30

Then I analyze load per module:

perf report --stdio -g none --sort comm,dso -i ./my_test.perf.data

# Overhead  Command                 Shared Object

# ........  .......  ............................

#

70.06%  my_test  my_test

and so on ...

Then call chains are analyzed:

perf report --stdio -g graph -i ./my_test.perf.data | c++filt

0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock

and so on ...

So at this point you know where your program spends time.

And this is main.cpp for the test:

#include <stdio.h>

#include <stdlib.h>

#include <time.h>

time_t f1(time_t time_value)

{

for (int j =0; j < 10; ++j) {

++time_value;

if (j%5 == 0) {

double *p = new double;

delete p;

}

}

return time_value;

}

time_t f2(time_t time_value)

{

for (int j =0; j < 40; ++j) {

++time_value;

}

time_value=f1(time_value);

return time_value;

}

time_t process_request(time_t time_value)

{

for (int j =0; j < 10; ++j) {

int *p = new int;

delete p;

for (int m =0; m < 10; ++m) {

++time_value;

}

}

for (int i =0; i < 10; ++i) {

time_value=f1(time_value);

time_value=f2(time_value);

}

return time_value;

}

int main(int argc, char* argv2[])

{

int number_loops = argc > 1 ? atoi(argv2[1]) : 1;

time_t time_value = time(0);

printf("number loops %d\n", number_loops);

printf("time_value: %d\n", time_value );

for (int i =0; i < number_loops; ++i) {

time_value = process_request(time_value);

}

printf("time_value: %ld\n", time_value );

return 0;

}

原文

http://stackoverflow.com/questions/1777556/alternatives-to-gprof#comment3480484_1779343

目录
相关文章
|
机器学习/深度学习 监控 Ubuntu
perf性能分析工具使用分享
perf性能分析工具使用分享
1928 0
perf性能分析工具使用分享
|
Linux 虚拟化 监控
PERF EVENT 硬件篇
简介 本文将通过以 X86 为例子介绍硬件 PMU 如何为 linux kernel perf_event 子系统提供硬件性能采集功能 理解硬件 MSR (Model Specify Register) 可以理解为CPU硬件的专用寄存器,下述的所有寄存器都是这个类型 汇编指令 rdmsr/wrm.
3880 0
|
1月前
|
Linux Shell
perf学习笔记
perf学习笔记
|
2月前
|
Linux
将perf跟funcgraph-retval结合起来使用
将perf跟funcgraph-retval结合起来使用
|
2月前
|
前端开发 Linux 调度
ftrace、perf、bcc、bpftrace、ply的使用
ftrace、perf、bcc、bpftrace、ply的使用
|
2月前
|
Linux
perf_event_open学习 —— 缓冲区管理
perf_event_open学习 —— 缓冲区管理
|
2月前
|
Linux 网络架构
perf_event_open学习 —— design
perf_event_open学习 —— design
|
数据可视化 Linux 调度
译 | Linux perf_events Off-CPU Time Flame Graph
译 | Linux perf_events Off-CPU Time Flame Graph
105 0
|
存储 算法 关系型数据库
PostgreSQL 垃圾回收参数优化之 - maintenance_work_mem , autovacuum_work_mem
PostgreSQL 垃圾回收参数优化之 - maintenance_work_mem , autovacuum_work_mem
3506 1
|
传感器 算法 Linux
Perf Subsystem —— 基于PMI实现的NMI Watchdog
## 背景 任务能否被及时响应,对内核来说,至关重用。Linux kernel实现了softlockup和hardlockup,用于检测系统是否出现了长时间无响应。 &gt; A ‘softlockup’ is defined as a bug that causes the kernel to loop in kernel mode for more than 20 seconds, with
2114 1