OProfile & Systemtap

本文涉及的产品
云原生数据库 PolarDB MySQL 版,通用型 2核4GB 50GB
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
简介:
Oprofile性能损耗小,如果CPU支持硬件监控的话(现在大多数CPU已经支持)。但是Oprofile不能像stap样使用timer来间断输出或累计输出统计,STAP损耗较大。 Oprofile 适合做性能诊断,例如系统中最耗CPU的进程,进程中哪些函数是比较耗CPU的,函数中哪段代码是最耗CPU的。。。 operf开启监控, opreport, opannotate可以输出调用报告,或函数、汇编指令等统计情况。 Stap 适合做跟踪。 例子 : 
 
     

[root@digoal ~]# cd /data06
[root@digoal data06]#  operf --system-wide --lazy-conversion
operf: Press Ctl-c or 'kill -SIGINT 45366' to stop profiling
operf: Profiler started
^C
Profiling done.
Converting profile data to OProfile format
................

输出报告:
 
       

[root@digoal data06]# opreport -l -f -w -x -t 1 
Using /data06/oprofile_data/samples/ for samples directory.
CPU: Intel Core/i7, speed 1995.14 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
vma      samples  %        app name                 symbol name
007827a0 2091381  26.6819  /opt/pgsql9.4.1/bin/postgres HeapTupleSatisfiesVacuum
00490300 988600   12.6126  /opt/pgsql9.4.1/bin/postgres heap_page_prune
0078a8c0 698665    8.9136  /opt/pgsql9.4.1/bin/postgres pg_qsort
0058afb0 676022    8.6247  /opt/pgsql9.4.1/bin/postgres vac_cmp_itemptr
0058baf0 385039    4.9123  /opt/pgsql9.4.1/bin/postgres lazy_vacuum_rel
004c4d00 365497    4.6630  /opt/pgsql9.4.1/bin/postgres XLogInsert
00675420 229805    2.9319  /opt/pgsql9.4.1/bin/postgres itemoffcompare
00675d20 184668    2.3560  /opt/pgsql9.4.1/bin/postgres PageRepairFragmentation
0078a7e0 169808    2.1664  /opt/pgsql9.4.1/bin/postgres swapfunc
00655590 147647    1.8837  /opt/pgsql9.4.1/bin/postgres BufferGetBlockNumber
00488940 139389    1.7783  /opt/pgsql9.4.1/bin/postgres heap_prepare_freeze_tuple
007624d0 86239     1.1002  /opt/pgsql9.4.1/bin/postgres hash_search_with_hash_value

[root@digoal data06]# opreport -l -f -g -w -x -t 1 /opt/pgsql/bin/postgres
Using /data06/oprofile_data/samples/ for samples directory.
CPU: Intel Core/i7, speed 1995.14 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
vma      samples  %        linenr info                 symbol name
007827a0 2091381  26.7572  /opt/soft_bak/postgresql-9.4.1/src/backend/utils/time/tqual.c:1116 HeapTupleSatisfiesVacuum
00490300 988600   12.6482  /opt/soft_bak/postgresql-9.4.1/src/backend/access/heap/pruneheap.c:174 heap_page_prune
0078a8c0 698665    8.9387  /opt/soft_bak/postgresql-9.4.1/src/port/qsort.c:104 pg_qsort
0058afb0 676022    8.6491  /opt/soft_bak/postgresql-9.4.1/src/backend/commands/vacuumlazy.c:1728 vac_cmp_itemptr
0058baf0 385039    4.9262  /opt/soft_bak/postgresql-9.4.1/src/backend/commands/vacuumlazy.c:172 lazy_vacuum_rel
004c4d00 365497    4.6762  /opt/soft_bak/postgresql-9.4.1/src/backend/access/transam/xlog.c:844 XLogInsert
00675420 229805    2.9401  /opt/soft_bak/postgresql-9.4.1/src/backend/storage/page/bufpage.c:415 itemoffcompare
00675d20 184668    2.3626  /opt/soft_bak/postgresql-9.4.1/src/backend/storage/page/bufpage.c:433 PageRepairFragmentation
0078a7e0 169808    2.1725  /opt/soft_bak/postgresql-9.4.1/src/port/qsort.c:78 swapfunc
00655590 147647    1.8890  /opt/soft_bak/postgresql-9.4.1/src/backend/storage/buffer/bufmgr.c:1898 BufferGetBlockNumber
00488940 139389    1.7833  /opt/soft_bak/postgresql-9.4.1/src/backend/access/heap/heapam.c:5756 heap_prepare_freeze_tuple
007624d0 86239     1.1033  /opt/soft_bak/postgresql-9.4.1/src/backend/utils/hash/dynahash.c:824 hash_search_with_hash_value

可以看到最耗费CPU的调用是哪些。
 
       

[root@digoal data06]# opannotate -x -s -t 1 /opt/pgsql/bin/postgres -i HeapTupleSatisfiesVacuum|less
Using /data06/oprofile_data/samples/ for session-dir
/* 
 * Command line: opannotate -x -s -t 1 /opt/pgsql/bin/postgres -i HeapTupleSatisfiesVacuum 
 * 
 * Interpretation of command line:
 * Output annotated source file with samples
 * Output files where samples count reach 1% of the samples
 * 
 * CPU: Intel Core/i7, speed 1995.14 MHz (estimated)
 * Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
 */
/* 
 * Total samples for file : "/opt/soft_bak/postgresql-9.4.1/src/backend/utils/time/tqual.c"
 * 
 * 2091381 100.000
 */


               :/*-------------------------------------------------------------------------
               : *
               : * tqual.c
               : *        POSTGRES "time qualification" code, ie, tuple visibility rules.
               : *
               : * NOTE: all the HeapTupleSatisfies routines will update the tuple's
               : * "hint" status bits if we see that the inserting or deleting transaction
               : * has now committed or aborted (and it is safe to set the hint bits).
               : * If the hint bits are changed, MarkBufferDirtyHint is called on
               : * the passed-in buffer.  The caller must hold not only a pin, but at least
               : * shared buffer content lock on the buffer containing the tuple.
               : *
               : * NOTE: must check TransactionIdIsInProgress (which looks in PGXACT array)
。。。。。。
1879024 89.8461 :       if (!HeapTupleHeaderXminCommitted(tuple))
               :        {
    63  0.0030 :                if (HeapTupleHeaderXminInvalid(tuple))
               :                        return HEAPTUPLE_DEAD;
               :                /* Used by pre-9.0 binary upgrades */
    18 8.6e-04 :                else if (tuple->t_infomask & HEAP_MOVED_OFF)
               :                {
               :                        TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
               :
。。。。。。

最耗费的出现在代码中的这段调用。
if (!HeapTupleHeaderXminCommitted(tuple))
Oprofile支持的事件,使用opcontrol --list-events查看:
 
         

[root@digoal data06]# opcontrol --list-events
oprofile: available events for CPU type "Intel Core/i7"

See Intel Architecture Developer's Manual Volume 3B, Appendix A and
Intel Architecture Optimization Reference Manual

For architectures using unit masks, you may be able to specify
unit masks by name.  See 'opcontrol' or 'operf' man page for more details.

CPU_CLK_UNHALTED: (counter: all)
        Clock cycles when not halted (min count: 6000)
UNHALTED_REFERENCE_CYCLES: (counter: all)
        Unhalted reference cycles (min count: 6000)
        Unit masks (default 0x1)
        ----------
        0x01: No unit mask
......

事件配置:
 
         

       --events / -e event1[,event2[,...]]
              This option is for passing a comma-separated list of event specifications for profiling. Each event spec
              is of the form:
                 name:count[:unitmask[:kernel[:user]]]
              You can specify unit mask values using either a numerical value (hex values must begin with "0x")  or  a
              symbolic  name  (if  the name=<um_name> field is shown in the ophelp output). For some named unit masks,
              the hex value is not unique; thus, OProfile tools enforce specifying such unit masks value by name.

              Event names for some IBM PowerPC systems include a _GRP<n> (group number) suffix. You  can  pass  either
              the  full event name or the base event name (i.e., without the suffix) to operf.  If the base event name
              is passed, operf will automatically choose an appropriate group number suffix for the event; thus, OPro-
              file post-processing tools will always show real event names that include the group number suffix.

              When  no event specification is given, the default event for the running processor type will be used for
              profiling.  Use ophelp to list the available events for your processor type.


以下摘自redhat admin doc
OProfile is a low overhead, system-wide performance monitoring tool. It uses the performance monitoring hardware on the processor to retrieve information about the kernel and executables on the system, such as when memory is referenced, the number of L2 cache requests, and the number of hardware interrupts received. On a Red Hat Enterprise Linux system, the  oprofile  package must be installed to use this tool.
Many processors include dedicated performance monitoring hardware. This hardware makes it possible to detect when certain events happen (such as the requested data not being in cache). The hardware normally takes the form of one or more  counters that are incremented each time an event takes place. When the counter value increments, an interrupt is generated, making it possible to control the amount of detail (and therefore, overhead) produced by performance monitoring.
OProfile uses this hardware (or a timer-based substitute in cases where performance monitoring hardware is not present) to collect  samples of performance-related data each time a counter generates an interrupt. These samples are periodically written out to disk; later, the data contained in these samples can then be used to generate reports on system-level and application-level performance.
Be aware of the following limitations when using OProfile:
  • Use of shared libraries — Samples for code in shared libraries are not attributed to the particular application unless the  --separate=library option is used.
  • Performance monitoring samples are inexact — When a performance monitoring register triggers a sample, the interrupt handling is not precise like a divide by zero exception. Due to the out-of-order execution of instructions by the processor, the sample may be recorded on a nearby instruction.
  • opreport does not associate samples for inline functions properly —  opreport uses a simple address range mechanism to determine which function an address is in. Inline function samples are not attributed to the inline function but rather to the function the inline function was inserted into.
  • OProfile accumulates data from multiple runs — OProfile is a system-wide profiler and expects processes to start up and shut down multiple times. Thus, samples from multiple runs accumulate. Use the command  opcontrol --reset to clear out the samples from previous runs.
  • Hardware performance counters do not work on guest virtual machines — Because the hardware performance counters are not available on virtual systems, you need to use the  timer mode. Enter the command  opcontrol --deinit, and then execute  modprobe oprofile timer=1 to enable the  timer mode.
  • Non-CPU-limited performance problems — OProfile is oriented to finding problems with CPU-limited processes. OProfile does not identify processes that are asleep because they are waiting on locks or for some other event to occur (for example an I/O device to finish an operation).

SystemTap is a tracing and probing tool that allows users to study and monitor the activities of the operating system in fine detail. It provides information similar to the output of tools like netstat, ps, top, and iostat; however, SystemTap is designed to provide more filtering and analysis options for the collected information.
While using OProfile is suggested in cases of collecting data on where and why the processor spends time in a particular area of code, it is less usable when finding out why the processor stays idle.
You might want to use SystemTap when instrumenting specific places in code. Because SystemTap allows you to run the code instrumentation without having to stop and restart the instrumented code, it is particularly useful for instrumenting the kernel and daemons.

[参考]
相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
8月前
|
算法 数据可视化 Linux
Linux内核编译:深入理解`make menuconfig`命令
Linux内核编译:深入理解`make menuconfig`命令
517 0
|
7月前
|
Ubuntu Linux Shell
深入理解Linux命令 - autoconf
`autoconf`是Linux软件开发中的自动化配置工具,它根据`configure.ac`模板生成`configure`脚本,检测系统特性并创建适应性的Makefile。工作流程包括读取模板、执行宏、生成配置脚本及运行配置。安装`autoconf`后,编写`configure.ac`,运行`autoconf`生成`configure`,再执行`./configure`以配置项目。此工具简化了跨平台编译的复杂性。
|
7月前
|
Linux 虚拟化 网络架构
Linux命令`arch`详解
`arch`命令在Linux中用于显示系统CPU架构,如x86_64、ARM等。它在跨平台编程、软件包管理和系统故障排除时很有用。通过`arch`,用户能得知系统运行的架构,但注意这可能与物理CPU架构不同,尤其是在虚拟环境中。在某些系统中,`arch`可能是`uname`命令的别名。了解CPU架构对有效管理系统至关重要。
|
自然语言处理 前端开发 Linux
Linux工具学习之【gcc/g++】
书接上文,我们已经学习了 Linux 中的编辑器 vim 的相关使用方法,现在已经能直接在 Linux 中编写C/C++代码,有了代码之后就要尝试去编译并运行它,此时就可以学习一下 Linux 中的编译器 gcc/g++ 了,我们一般使用 gcc 编译C语言,g++ 编译C++(当然 g++ 也可编译C语言),这两个编译器我们可以当作一个来学习,因为它们的命令选项都是通用的,只是编译对象不同。除了编译器相关介绍外,本文还会库、自动化构建工具、提权等知识,一起来看看吧
308 0
Linux工具学习之【gcc/g++】
|
Linux
LINUX编译autoconf
LINUX编译autoconf
80 0
|
C语言 Linux
SystemTap工具的使用基础
systemtap工具的安装 准备工作 uname -a 查看当前内核版本是哪一个,然后使用 yum install kernel-devel 安装kernel debuginfo包 rpm -qi kernel-devel 找到内核构建的详细信息,然后去对应发布网站上找kernel-debuginfo和kernel-debuginfo-common包。
2065 0
|
机器学习/深度学习 NoSQL IDE
linux编程必备(yum vim gcc g++ gdb makefile)
linux编程必备(yum vim gcc g++ gdb makefile)
linux编程必备(yum vim gcc g++ gdb makefile)