taskset - retrieve or set a process's CPU affinity (affect SYSTEMTAP TIME)

简介:
在使用systemtap监控进程或者内核的运行状况时, 我们会发现使用systemtap和不使用systemtap时, 某些操作的运行时间差别会比较大. 这是因为systemtap本身带来的开销导致的, 那么如何减少这部分开销呢?
可选的方法较多, 例如精简systemtap, 减少systemtap的触发事件的范围, 简化handler的逻辑等等.
除此之外, 还有其他的方法, 例如设置CPU亲和, Linux进程使用哪个CPU资源是由内核进行调度的.
被跟踪进程的亲和与stap运行的进程亲和分开, 可以有效的减少stap的影响吗. 实际上影响甚微, 但是有效果.  
其次我发现CentOS 6.4 x64系统下面, 使用CPU 0的话, 性能会远远不如使用其他任何CPU.  (这才是本文主题)
所以改亲和的话, 只要不使用0号CPU就可以了.
例如 : 
服务器上有8个CPU
pg93@db-172-16-3-150-> numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 
cpubind: 0 
nodebind: 0 
membind: 0 
pg93@db-172-16-3-150-> cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 16
initial apicid  : 16
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.98
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.50
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 1
cpu cores       : 4
apicid          : 18
initial apicid  : 18
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.98
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 1
cpu cores       : 4
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.50
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 4
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 2
cpu cores       : 4
apicid          : 20
initial apicid  : 20
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.98
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 4
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.50
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 6
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 22
initial apicid  : 22
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.98
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
stepping        : 5
cpu MHz         : 1595.992
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 3
cpu cores       : 4
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm dts tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3191.50
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
从以上CPUINFO可以看出, 这里有2个物理CPU, 0和1.
使用dmidecode也可以看到这部分信息
[root@db-172-16-3-150 ~]# dmidecode -t processor
# dmidecode 2.11
SMBIOS 2.6 present.

Handle 0x0400, DMI type 4, 40 bytes
Processor Information
        Socket Designation: CPU1
        Type: Central Processor
        Family: Xeon
        Manufacturer: Intel
        ID: A5 06 01 00 FF FB EB BF
        Signature: Type 0, Family 6, Model 26, Stepping 5
        Flags:
                FPU (Floating-point unit on-chip)
                VME (Virtual mode extension)
                DE (Debugging extension)
                PSE (Page size extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)
                PAE (Physical address extension)
                MCE (Machine check exception)
                CX8 (CMPXCHG8 instruction supported)
                APIC (On-chip APIC hardware supported)
                SEP (Fast system call)
                MTRR (Memory type range registers)
                PGE (Page global enable)
                MCA (Machine check architecture)
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)
                CLFSH (CLFLUSH instruction supported)
                DS (Debug store)
                ACPI (ACPI supported)
                MMX (MMX technology supported)
                FXSR (FXSAVE and FXSTOR instructions supported)
                SSE (Streaming SIMD extensions)
                SSE2 (Streaming SIMD extensions 2)
                SS (Self-snoop)
                HTT (Multi-threading)
                TM (Thermal monitor supported)
                PBE (Pending break enabled)
        Version: Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
        Voltage: 1.2 V
        External Clock: 4800 MHz
        Max Speed: 3600 MHz
        Current Speed: 1600 MHz
        Status: Populated, Enabled
        Upgrade: Socket LGA1366
        L1 Cache Handle: 0x0700
        L2 Cache Handle: 0x0701
        L3 Cache Handle: 0x0702
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: Not Specified
        Core Count: 4
        Core Enabled: 4
        Thread Count: 4
        Characteristics:
                64-bit capable

Handle 0x0401, DMI type 4, 40 bytes
Processor Information
        Socket Designation: CPU2
        Type: Central Processor
        Family: Xeon
        Manufacturer: Intel
        ID: A5 06 01 00 FF FB EB BF
        Signature: Type 0, Family 6, Model 26, Stepping 5
        Flags:
                FPU (Floating-point unit on-chip)
                VME (Virtual mode extension)
                DE (Debugging extension)
                PSE (Page size extension)
                TSC (Time stamp counter)
                MSR (Model specific registers)
                PAE (Physical address extension)
                MCE (Machine check exception)
                CX8 (CMPXCHG8 instruction supported)
                APIC (On-chip APIC hardware supported)
                SEP (Fast system call)
                MTRR (Memory type range registers)
                PGE (Page global enable)
                MCA (Machine check architecture)
                CMOV (Conditional move instruction supported)
                PAT (Page attribute table)
                PSE-36 (36-bit page size extension)
                CLFSH (CLFLUSH instruction supported)
                DS (Debug store)
                ACPI (ACPI supported)
                MMX (MMX technology supported)
                FXSR (FXSAVE and FXSTOR instructions supported)
                SSE (Streaming SIMD extensions)
                SSE2 (Streaming SIMD extensions 2)
                SS (Self-snoop)
                HTT (Multi-threading)
                TM (Thermal monitor supported)
                PBE (Pending break enabled)
        Version: Intel(R) Xeon(R) CPU           E5504  @ 2.00GHz
        Voltage: 1.2 V
        External Clock: 4800 MHz
        Max Speed: 3600 MHz
        Current Speed: 1600 MHz
        Status: Populated, Idle
        Upgrade: Socket LGA1366
        L1 Cache Handle: 0x0703
        L2 Cache Handle: 0x0704
        L3 Cache Handle: 0x0705
        Serial Number: Not Specified
        Asset Tag: Not Specified
        Part Number: Not Specified
        Core Count: 4
        Core Enabled: 4
        Thread Count: 4
        Characteristics:
                64-bit capable
[root@db-172-16-3-150 ~]# cat /proc/cpuinfo |grep "core id"
core id         : 0
core id         : 0
core id         : 1
core id         : 1
core id         : 2
core id         : 2
core id         : 3
core id         : 3
[root@db-172-16-3-150 ~]# cat /proc/cpuinfo |grep "physical id"
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0

在没有运行业务的情况下, 我们要测试PostgreSQL某SQL的IO请求次数和请求时间.
启动PostgreSQL数据库时, 把CPU亲和设置为0. (也就是文章开头提到的性能最烂的做法.)
pg93@db-172-16-3-150-> taskset -c 0 /home/pg93/pgsql9.3.1/bin/postgres >/dev/null 2>&1
使用亲和1启动psql.
pg93@db-172-16-3-150-> taskset -c 1 psql
psql (9.3.1)
Type "help" for help.
digoal=# select pg_backend_pid();
 pg_backend_pid 
----------------
          24781
(1 row)

然后在启动stap时把亲和设置为和PostgreSQL不一样的CPU id. 例如3(这样也确保了不在一个物理core上.). 
[root@db-172-16-3-150 ~]# taskset -c 3 stap -e '
global a
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__start") {
  delete a
  println("query__start ", user_string($arg1), "pid:", pid())
}
probe vfs.read.return {
  t = gettimeofday_ns() - @entry(gettimeofday_ns())
  # if (execname() == "postgres" && devname != "N/A")  # 这句判断也有一定的影响, 所以不用. 我们使用target模式.
    a[pid()] <<< t
}
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__done") {
  printdln("**", pid(), @count(a[pid()]), @avg(a[pid()]))
  println("query__done ", user_string($arg1), "pid:", pid())
  delete a
}' -x 24781

在psql中连续快速的执行SQL
digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.023..3154.782 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared read=94393
 Total runtime: 4710.788 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.040..3151.680 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=32 read=94361
 Total runtime: 4696.888 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.035..3158.134 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=64 read=94329
 Total runtime: 4706.135 ms
(4 rows)
stap输出
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781
24781**94417**5461
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781
24781**94361**5429
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781
24781**94329**5529
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24781
我们看到每次read IO的时间为5461纳秒左右.

如果把stap的亲和设置为非0, 即使与stap的亲和一致也没关系. 那么会发生什么情况呢?
pg93@db-172-16-3-150-> taskset -c 3 /home/pg93/pgsql9.3.1/bin/postgres >/dev/null 2>&1
pg93@db-172-16-3-150-> taskset -c 3 psql
psql (9.3.1)
Type "help" for help.

digoal=# select pg_backend_pid();
 pg_backend_pid 
----------------
          24829
(1 row)
[root@db-172-16-3-150 ~]# taskset -c 3 stap -e '
global a
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__start") {
  delete a
  println("query__start ", user_string($arg1), "pid:", pid())
}
probe vfs.read.return {
  t = gettimeofday_ns() - @entry(gettimeofday_ns())
  # if (execname() == "postgres" && devname != "N/A")  # 这句判断也有一定的影响, 所以不用. 建议使用target模式.
    a[pid()] <<< t
}
probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__done") {
  printdln("**", pid(), @count(a[pid()]), @avg(a[pid()]))
  println("query__done ", user_string($arg1), "pid:", pid())
  delete a
}' -x 24829

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.023..2039.248 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared read=94393
 Total runtime: 3065.989 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.036..2039.643 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=32 read=94361
 Total runtime: 3066.841 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.034..2039.005 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=64 read=94329
 Total runtime: 3066.746 ms
(4 rows)

query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94417**3277
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94361**3272
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94329**3273
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
每次IO响应时间降到了3273纳秒左右, 几乎下降了一半, 所以说CPU 0不行啊.

最后把STAP的亲和设置为4, (与postgres进程分开使用物理CPU)那么会怎么样呢?
digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.055..2032.953 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=96 read=94297
 Total runtime: 3059.635 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.044..2034.114 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=128 read=94265
 Total runtime: 3060.832 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.043..2038.548 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=160 read=94233
 Total runtime: 3065.674 ms
(4 rows)
测得的IO响应时间又有了一定的下降, 说明还是有一定效果的. 但是相比不使用CPU 0 这里显得效果比较弱了.
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94297**3194
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94265**3200
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829
24829**94233**3195
query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:24829

最后在附一个未开启stap得到的结果.
digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.037..1812.712 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=192 read=94201
 Total runtime: 2833.760 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.038..1814.074 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=224 read=94169
 Total runtime: 2835.363 ms
(4 rows)

digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;
                                                               QUERY PLAN                                                           
    
------------------------------------------------------------------------------------------------------------------------------------
----
 Seq Scan on postgres.tbl_cost_align  (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.039..1814.641 rows=10100000 loops
=1)
   Output: id, info, crt_time
   Buffers: shared hit=256 read=94137
 Total runtime: 2840.508 ms
(4 rows)

stap开启后, 查询时间长了220毫秒左右. 本文中每个handler带来的额外开销是 : 
digoal=# select 220/94137.0;
        ?column?        
------------------------
 0.00233701945037551653
(1 row)
0.002337毫秒. 
2337纳秒.

[参考]
1. man taskset
3.  http://docs.mongodb.org/manual/administration/production-notes/#mongodb-on-numa-hardware
目录
相关文章
|
分布式计算 Hadoop 大数据
|
数据可视化 Linux 调度
译 | Linux perf_events Off-CPU Time Flame Graph
译 | Linux perf_events Off-CPU Time Flame Graph
109 0
|
存储 缓存 固态存储
Linux系统之Wait CPU time解析
上篇文章我们简要解析了用户CPU时间相关概念及应用实践,具体可参考链接🔗: Linux系统之User CPU time解析。 回顾之前的内容:在Linux操作系统中,通常采用8个不同的指标来研究Linux / Unix操作系统中的CPU消耗:用户CPU时间(us)、系统CPU时间(sy)、良好的CPU时间(ni)、空闲CPU时间(id)、等待CPU时间(wa)、硬件中断CPU时间(hi),软件中断CPU时间(si),被盗CPU时间(st)。在本文中,我们主要针对“等待CPU时间”进行解析。
221 0
Linux系统之Wait CPU time解析
|
存储 Unix Linux
Linux系统之 Steal CPU time解析
正如之前文章讲过:在 Unix / Linux 体系中,常常使用“用户” CPU 时间(us)、“系统” CPU 时间(sy)、“良好”的 CPU 时间(ni)、“空闲” CPU 时间(id)、“等待”CPU 时间(wa)、“硬件中断” CPU 时间(hi)、“软件中断” CPU 时间(si)以及“被盗” CPU 时间(st)等 8 个不同的指标来评判操作系统的 CPU 资源使用情况。
558 0
|
Java Unix Linux
Linux系统之User CPU time解析
在Linux操作系统中,通常采用8个不同的指标来研究Unix / Linux操作系统中的CPU消耗:用户CPU时间(us)、系统CPU时间(sy)、良好的CPU时间(ni)、空闲CPU时间(id)、等待CPU时间(wa)、硬件中断CPU时间(hi),软件中断CPU时间(si),被盗CPU时间(st)。在本文中,我们主要对“用户CPU时间”进行解析。
541 0
|
缓存 Linux
Linux 的 top命令参数详解 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
Linux 的 top命令参数详解 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
506 0
|
SQL AliSQL 关系型数据库
AliSQL 20180124 版本发布 支持 CPU TIME 和优化 BINLOG 应用时索引选择
AliSQL 在线程级别增加了 CPU time 的性能指标,可以用来度量某个SQL或者执行单元所耗费的CPU 时间片,便于性能诊断。 当 Binlog 为 Row 模式的情况下,备库在同步更新时,索引的选择是基于简单规则的,没有走优化器的代价模型,所以有可能会因为索引选择不当导致主备有较大的延迟。
2222 0
|
数据库 关系型数据库 Oracle
|
Linux 应用服务中间件 nginx