how to use perf

简介: Since I did't see here anything about perf which is a relatively new tool for profiling the kernel and user applications on Linux I decided to add this information.

Since I did't see here anything about perf which is a relatively new tool for profiling the kernel and user applications on Linux I decided to add this information.

First of all - this is a tutorial about Linux profiling with perf

You can use perf if your Linux Kernel is greater than 2.6.32 or oprofile if it is older. Both programs don't require from you to instrument your program (like gprof requires). However in order to get call graph correctly in perf you need to build you program with -fno-omit-frame-pointer. For example: g++ -fno-omit-frame-pointer -O2 main.cpp.

You can see "live" analysis of your application with perf top:

sudo perf top -p `pidof a.out` -K

Or you can record performance data of a running application and analyze them after that:

1) To record performance data:

perf record -p `pidof a.out`

or to record for 10 secs:

perf record -p `pidof a.out` sleep 10

or to record with call graph ()

perf record -g -p `pidof a.out`

2) To analyze the recorded data

perf report --stdio

perf report --stdio --sort=dso -g none

perf report --stdio -g none

perf report --stdio -g

Or you can record performace data of a application and analyze them after that just by launching the application in this way and waiting for it to exit:

perf record ./a.out

This is an example of profiling a test program

The test program is in file main.cpp (I will put main.cpp at the bottom of the message):

I compile it in this way:

g++ -m64 -fno-omit-frame-pointer -g main.cpp -L.  -ltcmalloc_minimal -o my_test

I use libmalloc_minimial.so since it is compiled with -fno-omit-frame-pointer while libc malloc seems to be compiled without this option. Then I run my test program

./my_test 100000000

Then I record performance data of a running process:

perf record -g  -p `pidof my_test` -o ./my_test.perf.data sleep 30

Then I analyze load per module:

perf report --stdio -g none --sort comm,dso -i ./my_test.perf.data

# Overhead  Command                 Shared Object

# ........  .......  ............................

#

70.06%  my_test  my_test

and so on ...

Then call chains are analyzed:

perf report --stdio -g graph -i ./my_test.perf.data | c++filt

0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock

and so on ...

So at this point you know where your program spends time.

And this is main.cpp for the test:

#include <stdio.h>

#include <stdlib.h>

#include <time.h>

time_t f1(time_t time_value)

{

for (int j =0; j < 10; ++j) {

++time_value;

if (j%5 == 0) {

double *p = new double;

delete p;

}

}

return time_value;

}

time_t f2(time_t time_value)

{

for (int j =0; j < 40; ++j) {

++time_value;

}

time_value=f1(time_value);

return time_value;

}

time_t process_request(time_t time_value)

{

for (int j =0; j < 10; ++j) {

int *p = new int;

delete p;

for (int m =0; m < 10; ++m) {

++time_value;

}

}

for (int i =0; i < 10; ++i) {

time_value=f1(time_value);

time_value=f2(time_value);

}

return time_value;

}

int main(int argc, char* argv2[])

{

int number_loops = argc > 1 ? atoi(argv2[1]) : 1;

time_t time_value = time(0);

printf("number loops %d\n", number_loops);

printf("time_value: %d\n", time_value );

for (int i =0; i < number_loops; ++i) {

time_value = process_request(time_value);

}

printf("time_value: %ld\n", time_value );

return 0;

}

原文

http://stackoverflow.com/questions/1777556/alternatives-to-gprof#comment3480484_1779343

目录
相关文章
|
JavaScript 开发者 UED
虚拟DOM的原理
虚拟DOM的原理
199 1
|
11月前
|
移动开发 前端开发 JavaScript
Vue与React两大前端框架的主要差异点
以上就是Vue和React的主要差异点,希望对你有所帮助。在选择使用哪一个框架时,需要根据项目的具体需求和团队的技术栈来决定。
613 83
|
机器学习/深度学习 PyTorch 算法框架/工具
【从零开始学习深度学习】30. 神经网络中批量归一化层(batch normalization)的作用及其Pytorch实现
【从零开始学习深度学习】30. 神经网络中批量归一化层(batch normalization)的作用及其Pytorch实现
|
测试技术 API 持续交付
微服务的版本控制
微服务的版本控制
220 6
|
程序员 API C语言
C语言函数大全--c开头的函数
【6月更文挑战第4天】本篇介绍 C语言中 c开头的函数【C语言函数大全】
398 2
C语言函数大全--c开头的函数
|
消息中间件 运维 监控
函数计算产品使用问题之HTTP触发器如何通过异步调用的方式执行
函数计算产品作为一种事件驱动的全托管计算服务,让用户能够专注于业务逻辑的编写,而无需关心底层服务器的管理与运维。你可以有效地利用函数计算产品来支撑各类应用场景,从简单的数据处理到复杂的业务逻辑,实现快速、高效、低成本的云上部署与运维。以下是一些关于使用函数计算产品的合集和要点,帮助你更好地理解和应用这一服务。
161 1
|
开发框架 前端开发 JavaScript
如何提高前端代码的可维护性
【2月更文挑战第3天】在现代软件开发中,前端已经变得越来越重要。一个好的前端代码可以帮助团队更快速、高效地开发出高质量的产品。但是,随着项目规模的扩大,前端代码也变得越来越复杂和难以维护。本文将介绍一些提高前端代码可维护性的方法。
|
安全 网络安全 前端开发
2023年山东省职业院校技能大赛高职组信息安全管理与评估 模块二(正式赛)
2023年山东省职业院校技能大赛高职组信息安全管理与评估 模块二(正式赛)
|
分布式数据库
《PolarDB-X开源分布式数据库实战进阶》——PolarDB-X分区管理(4)
《PolarDB-X开源分布式数据库实战进阶》——PolarDB-X分区管理(4)
282 0
|
安全 Android开发 网络虚拟化
APP安全——抓包代理工具的设置
APP安全——抓包代理工具的设置
862 0

热门文章

最新文章