解读PostgreSQL Oracle 兼容性之 - performance insight(性能洞察)

本文涉及的产品
云原生数据库 PolarDB MySQL 版,通用型 2核4GB 50GB
日志服务 SLS,月写入数据量 50GB 1个月
云原生数据库 PolarDB PostgreSQL 版,标准版 2核4GB 50GB
简介: 标签 PostgreSQL , perf insight , 等待事件 , 采样 , 发现问题 , Oracle 兼容性 背景 通常普通的监控会包括系统资源的监控: cpu io 内存 网络 等,但是仅凭资源的监控,当问题发生时,如何快速的定位到问题在哪里?需要更高级的监控: 更高级的监控方法通常是从数据库本身的

标签

PostgreSQL , perf insight , 等待事件 , 采样 , 发现问题 , Oracle 兼容性


背景

通常普通的监控会包括系统资源的监控:

cpu    
    
io    
    
内存    
  
网络  

等,但是仅凭资源的监控,当问题发生时,如何快速的定位到问题在哪里?需要更高级的监控:

更高级的监控方法通常是从数据库本身的特性触发,但是需要对数据库具备非常深刻的理解,才能做出好的监控和诊断系统。属于专家型或叫做经验型的监控和诊断系统。

《[未完待续] PostgreSQL 一键诊断项 - 珍藏级》

《PostgreSQL 实时健康监控 大屏 - 低频指标 - 珍藏级》

《PostgreSQL 实时健康监控 大屏 - 高频指标(服务器) - 珍藏级》

《PostgreSQL 实时健康监控 大屏 - 高频指标 - 珍藏级》

《PostgreSQL pgmetrics - 多版本、健康监控指标采集、报告》

《PostgreSQL pg_top pgcenter - 实时top类工具》

《PostgreSQL、Greenplum 日常监控 和 维护任务 - 最佳实践》

《PostgreSQL 如何查找TOP SQL (例如IO消耗最高的SQL) (包含SQL优化内容) - 珍藏级》

《PostgreSQL 锁等待监控 珍藏级SQL - 谁堵塞了谁》

然而数据库在不断的演进,经验型的诊断系统好是好,但是不通用,有没有更加通用,有效的发现系统问题的方法?

AWS与Oracle perf insight的思路非常不错,实际上就是等待事件的统计追踪,作为性能诊断的方法。

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html

《AWS performance insight》

简单来说就是对系统不停的打点,例如每秒一个采样,仅记录这一秒数据库活跃的会话(包括等待中的会话),等待事件,QUERY,时间,用户,数据库。这几个指标。

活跃度会话,不管是在耗费CPU,还是在等待(锁,IO)或者其他,实际上都是占用了资源的。可以算出平均的活跃会话(例如10秒的平均值,5秒的平均值)(avg active sessions)。

这个avg active sessions是一个值,这个值和数据库实例的CPU个数进行比较,就可以衡量出系统是否存在瓶颈(当avg active sessions超过CPU个数时,说明存在瓶颈)。

当某个时间窗口存在瓶颈,瓶颈在哪里,则可以通过这个时间窗口内的打点明细,进行统计。等待事件,QUERY,用户,数据库。

PostgreSQL打点的方法也很多:

1、(推荐)通过pg_stat_activity 内存中的动态视图获取,每秒取一次ACTIVE的内容(例如:会话ID,等待事件,QUERY,时间,用户,数据库)。

https://www.postgresql.org/docs/11/monitoring-stats.html#MONITORING-STATS-VIEWS

2、(不推荐)开启审计日志,在审计日志中获取,这个在高并发系统中,不太好用。并且审计日志是在结束时打印,一个QUERY的中间执行过程并不完全是占用CPU或其他资源的,所以审计日志获取的信息对于perf insight并没有什么效果。

perf insight的入门门槛低,可以摆平很多问题,在出现问题时快速定位到问题SQL,问题的等待事件在哪里。结合经验型的监控,可以构建PG非常强大的监控、诊断、优化体系。

perf insight 实现讲解

pic

pic

pic

pic

pic

pic

pic

举例1

会话1

postgres=# begin;      
BEGIN      
postgres=# lock table abc in access exclusive mode ;      
LOCK TABLE      

会话2

postgres=# select * from abc;      

从pg_stat_activity获取状态,可以看到会话2在等待,会话处于active状态,这种消耗需要被记录到avg active session中,用来评估资源消耗指标。

postgres=# select now(),state,datname,usename,wait_event_type,wait_event,query from pg_stat_activity where state in ('active', 'fastpath function call');      
              now              | state  | datname  | usename  | wait_event_type | wait_event |                                           query                                                  
-------------------------------+--------+----------+----------+-----------------+------------+--------------------------------------------------------------------------------------------      
 2019-01-25 21:17:28.540264+08 | active | postgres | postgres |                 |            | select datname,usename,query,state,wait_event_type,wait_event,now() from pg_stat_activity;      
 2019-01-25 21:17:28.540264+08 | active | postgres | postgres | Lock            | relation   | select * from abc;      
(2 rows)      

举例2

使用pgbench压测数据库,每秒打点,后期进行可视化展示

pgbench -i -s 100      

1、压测只读

pgbench -M prepared -n -r -P 1 -c 64 -j 64 -T 300 -S      

2、查看压测时的活跃会话状态

postgres=#     
select now()::timestamptz(0),state,    
datname,usename,wait_event_type,wait_event,query     
from pg_stat_activity     
where state in     
('active', 'fastpath function call')     
and pid<>pg_backend_pid();      
    
         now         | state  | datname  | usename  | wait_event_type | wait_event |                         query                               
---------------------+--------+----------+----------+-----------------+------------+-------------------------------------------------------      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres | Client          | ClientRead | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
 2019-01-25 21:28:52 | active | postgres | postgres |                 |            | SELECT abalance FROM pgbench_accounts WHERE aid = $1;      
(46 rows)      

3、为了方便统计,可以在本地建表,用于收集pg_stat_activity的内容,在实际的生产中,可以把这个信息读走,存到其他地方(例如专用于监控的其他数据库)。

postgres=# create unlogged table perf_insight as     
select now()::timestamptz(0) as ts,    
extract(epoch from backend_start)||'.'||pid as sessid,    
state,datname,usename,    
wait_event_type||'_'||wait_event as waiting ,    
query from     
pg_stat_activity     
where state in     
('active', 'fastpath function call')     
and pid<>pg_backend_pid();      
    
SELECT 48      

4、试着写入当时pg_stat_activity状态

postgres=#     
    
insert into perf_insight     
select now()::timestamptz(0),    
extract(epoch from backend_start)||'.'||pid,    
state,datname,    
usename,wait_event_type||'_'||wait_event,    
query from pg_stat_activity     
where state in ('active', 'fastpath function call')     
and pid<>pg_backend_pid();      
    
INSERT 0 42      

5、使用psql watch,每秒打一个点

postgres=# \watch 1      

6、只读压测,压测结果,130万QPS

pgbench -M prepared -n -r -P 1 -c 64 -j 64 -T 300 -S      
      
    
    
transaction type: <builtin: select only>      
scaling factor: 100      
query mode: prepared      
number of clients: 64      
number of threads: 64      
duration: 300 s      
number of transactions actually processed: 390179555      
latency average = 0.049 ms      
latency stddev = 0.026 ms      
tps = 1300555.237752 (including connections establishing)      
tps = 1300584.885231 (excluding connections establishing)      
statement latencies in milliseconds:      
         0.001  \set aid random(1, 100000 * :scale)      
         0.049  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;      

7、接下来,开启一个读写压测,9.4万TPS(yue 47万qps)

pgbench -M prepared -n -r -P 1 -c 64 -j 64 -T 300       
      
      
      
transaction type: <builtin: TPC-B (sort of)>      
scaling factor: 100      
query mode: prepared      
number of clients: 64      
number of threads: 64      
duration: 300 s      
number of transactions actually processed: 28371829      
latency average = 0.677 ms      
latency stddev = 0.413 ms      
tps = 94569.412707 (including connections establishing)      
tps = 94571.934011 (excluding connections establishing)      
statement latencies in milliseconds:      
         0.002  \set aid random(1, 100000 * :scale)      
         0.001  \set bid random(1, 1 * :scale)      
         0.001  \set tid random(1, 10 * :scale)      
         0.001  \set delta random(-5000, 5000)      
         0.045  BEGIN;      
         0.108  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;      
         0.069  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;      
         0.091  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;      
         0.139  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;      
         0.068  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);      
         0.153  END;      

8、perf insight 可视化需要的素材

时间、状态、会话ID、数据库名、用户名、等待事件、查询

当然,我们可以再细化,例如增加会话ID字段,可以针对一个会话来进行展示和统计。

postgres=# \d perf_insight     
                   Unlogged table "public.perf_insight"    
 Column  |              Type              |     
---------+--------------------------------+-    
 ts      | timestamp(0) with time zone    | 时间戳    
 sessid  | text                           | 会话ID    
 state   | text                           | 状态    
 datname | name                           | 数据库    
 usename | name                           | 用户    
 waiting | text                           | 等待事件    
 query   | text                           | SQL语句    

9、查看perf insight素材内容

postgres=# select * from perf_insight limit 10;     
         ts          |         sessid         | state  | datname  | usename  |         waiting          |                                query                                     
---------------------+------------------------+--------+----------+----------+--------------------------+----------------------------------------------------------------------    
 2019-01-26 09:43:28 | 1548467007.4805.32968  | active | postgres | postgres | Lock_transactionid       | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;    
 2019-01-26 09:43:28 | 1548467007.47991.32966 | active | postgres | postgres | Client_ClientRead        | END;    
 2019-01-26 09:43:28 | 1548467007.48362.32979 | active | postgres | postgres | Lock_transactionid       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;    
 2019-01-26 09:43:28 | 1548467007.48388.32980 | active | postgres | postgres | Lock_tuple               | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;    
 2019-01-26 09:43:28 | 1548467007.48329.32978 | active | postgres | postgres | Lock_transactionid       | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;    
 2019-01-26 09:43:28 | 1548467007.48275.32976 | active | postgres | postgres | Lock_tuple               | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;    
 2019-01-26 09:43:28 | 1548467007.48107.32970 | active | postgres | postgres | Lock_transactionid       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;    
 2019-01-26 09:43:28 | 1548467007.48243.32975 | active | postgres | postgres | Lock_transactionid       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;    
 2019-01-26 09:43:28 | 1548467007.48417.32981 | active | postgres | postgres | IPC_ProcArrayGroupUpdate | SELECT abalance FROM pgbench_accounts WHERE aid = $1;    
 2019-01-26 09:43:28 | 1548467007.48448.32982 | active | postgres | postgres | Lock_tuple               | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;    
(10 rows)     

10、查看在这段时间中,有多少种等待事件

postgres=# select distinct waiting from perf_insight ;      
         waiting                
--------------------------      
 LWLock_wal_insert      
       
 LWLock_XidGenLock      
 Lock_extend      
 LWLock_ProcArrayLock      
 Lock_tuple      
 Lock_transactionid      
 LWLock_lock_manager      
 Client_ClientRead      
 IPC_ProcArrayGroupUpdate      
 LWLock_buffer_content      
 IPC_ClogGroupUpdate      
 LWLock_CLogControlLock      
 IO_DataFileExtend      
(14 rows)      

perf insight 可视化,统计

采集粒度为1秒,可以对n秒的打点求平均值(分不同维度),得到可视化图形:

1、总avg active sessions ,用于告警。

2、其他维度,用于分析造成性能瓶颈问题的权重:

2.1、等待事件维度(NULL表示无等待,纯CPU time) avg active sessions

2.2、query 维度 avg active sessions

2.3、数据库维度 avg active sessions

2.4、用户维度 avg active sessions

如何判断问题:

例如,对于一个64线程的系统:

avg active sessions 在64以下时,可以认为是没有问题的。

1 总 avg active sessions,用于告警。

5秒统计间隔。

select     
  coalesce(t1.ts, t2.ts) ts,     
  coalesce(avg_active_sessions,0) avg_active_sessions       
from       
(    
select     
  to_timestamp((extract(epoch from ts))::int8/5*5) ts,     
  count(*)/5::float8 avg_active_sessions     
from perf_insight     
group by 1    
) t1      
full outer join       
(select     
  generate_series(    
    to_timestamp((extract(epoch from min(ts)))::int8/5*5),    
    to_timestamp((extract(epoch from max(ts)))::int8/5*5),    
    interval '5 s'    
  ) ts     
from perf_insight    
) t2      
on (t1.ts=t2.ts);      
      
      
           ts           | avg_active_sessions       
------------------------+---------------------      
 2019-01-26 05:39:20+08 |                14.2      
 2019-01-26 05:39:25+08 |                30.4      
 2019-01-26 05:39:30+08 |                35.8      
 2019-01-26 05:39:35+08 |                41.8      
 2019-01-26 05:39:40+08 |                38.6      
 2019-01-26 05:39:45+08 |                38.2      
 2019-01-26 05:39:50+08 |                34.6      
 2019-01-26 05:39:55+08 |                35.6      
 2019-01-26 05:40:00+08 |                42.4      
 2019-01-26 05:40:05+08 |                36.8      
 2019-01-26 05:40:10+08 |                36.2      
 2019-01-26 05:40:15+08 |                39.4      
 2019-01-26 05:40:20+08 |                  40      
 2019-01-26 05:40:25+08 |                35.8      
 2019-01-26 05:40:30+08 |                37.2      
 2019-01-26 05:40:35+08 |                36.4      
 2019-01-26 05:40:40+08 |                40.6      
 2019-01-26 05:40:45+08 |                39.2      
 2019-01-26 05:40:50+08 |                36.6      
 2019-01-26 05:40:55+08 |                37.4      
 2019-01-26 05:41:00+08 |                  38      
 2019-01-26 05:41:05+08 |                38.6      
 2019-01-26 05:41:10+08 |                38.4      
 2019-01-26 05:41:15+08 |                40.4      
 2019-01-26 05:41:20+08 |                35.8      
 2019-01-26 05:41:25+08 |                40.6      
 2019-01-26 05:41:30+08 |                39.4      
 2019-01-26 05:41:35+08 |                37.4      
 2019-01-26 05:41:40+08 |                36.6      
 2019-01-26 05:41:45+08 |                39.6      
 2019-01-26 05:41:50+08 |                36.2      
 2019-01-26 05:41:55+08 |                37.4      
 2019-01-26 05:42:00+08 |                37.8      
 2019-01-26 05:42:05+08 |                  39      
 2019-01-26 05:42:10+08 |                36.2      
 2019-01-26 05:42:15+08 |                  37      
 2019-01-26 05:42:20+08 |                36.4      
 2019-01-26 05:42:25+08 |                  36      
 2019-01-26 05:42:30+08 |                37.6      
 2019-01-26 05:42:35+08 |                   0      
 2019-01-26 05:42:40+08 |                   0      
 2019-01-26 05:42:45+08 |                   0      
 2019-01-26 05:42:50+08 |                 8.4      
 2019-01-26 05:42:55+08 |                40.6      
 2019-01-26 05:43:00+08 |                42.4      
 2019-01-26 05:43:05+08 |                37.4      
 2019-01-26 05:43:10+08 |                44.8      
 2019-01-26 05:43:15+08 |                36.2      
 2019-01-26 05:43:20+08 |                39.6      
 2019-01-26 05:43:25+08 |                41.4      
 2019-01-26 05:43:30+08 |                34.2      
 2019-01-26 05:43:35+08 |                41.8      
 2019-01-26 05:43:40+08 |                37.4      
 2019-01-26 05:43:45+08 |                30.2      
 2019-01-26 05:43:50+08 |                36.6      
 2019-01-26 05:43:55+08 |                  36      
 2019-01-26 05:44:00+08 |                33.8      
 2019-01-26 05:44:05+08 |                37.8      
 2019-01-26 05:44:10+08 |                39.2      
 2019-01-26 05:44:15+08 |                36.6      
 2019-01-26 05:44:20+08 |                39.8      
 2019-01-26 05:44:25+08 |                35.2      
 2019-01-26 05:44:30+08 |                35.8      
 2019-01-26 05:44:35+08 |                42.8      
 2019-01-26 05:44:40+08 |                40.8      
 2019-01-26 05:44:45+08 |                39.4      
 2019-01-26 05:44:50+08 |                  40      
 2019-01-26 05:44:55+08 |                40.2      
 2019-01-26 05:45:00+08 |                41.2      
 2019-01-26 05:45:05+08 |                41.6      
 2019-01-26 05:45:10+08 |                40.6      
 2019-01-26 05:45:15+08 |                33.8      
 2019-01-26 05:45:20+08 |                35.8      
 2019-01-26 05:45:25+08 |                42.2      
 2019-01-26 05:45:30+08 |                37.8      
 2019-01-26 05:45:35+08 |                37.6      
 2019-01-26 05:45:40+08 |                40.2      
 2019-01-26 05:45:45+08 |                37.4      
 2019-01-26 05:45:50+08 |                38.2      
 2019-01-26 05:45:55+08 |                39.6      
 2019-01-26 05:46:00+08 |                41.6      
 2019-01-26 05:46:05+08 |                  36      
 2019-01-26 05:46:10+08 |                34.6      
 2019-01-26 05:46:15+08 |                37.8      
 2019-01-26 05:46:20+08 |                40.8      
 2019-01-26 05:46:25+08 |                  42      
 2019-01-26 05:46:30+08 |                36.4      
 2019-01-26 05:46:35+08 |                44.6      
 2019-01-26 05:46:40+08 |                38.8      
 2019-01-26 05:46:45+08 |                  35      
 2019-01-26 05:46:50+08 |                36.2      
 2019-01-26 05:46:55+08 |                37.2      
 2019-01-26 05:47:00+08 |                  36      
 2019-01-26 05:47:05+08 |                38.2      
 2019-01-26 05:47:10+08 |                37.2      
 2019-01-26 05:47:15+08 |                42.8      
 2019-01-26 05:47:20+08 |                  32      
 2019-01-26 05:47:25+08 |                  41      
 2019-01-26 05:47:30+08 |                  44      
 2019-01-26 05:47:35+08 |                37.4      
 2019-01-26 05:47:40+08 |                36.2      
 2019-01-26 05:47:45+08 |                  39      
 2019-01-26 05:47:50+08 |                27.8      
(103 rows)      

10秒统计间隔的SQL

select     
  coalesce(t1.ts,t2.ts) ts,     
  coalesce(avg_active_sessions,0) avg_active_sessions       
from       
(    
select     
  to_timestamp((extract(epoch from ts))::int8/10*10) ts,     
  count(*)/10::float8 avg_active_sessions     
from perf_insight     
group by 1    
) t1      
full outer join       
(    
select     
  generate_series(    
    to_timestamp((extract(epoch from min(ts)))::int8/10*10),    
    to_timestamp((extract(epoch from max(ts)))::int8/10*10),    
    interval '10 s'    
  ) ts     
from perf_insight    
) t2      
on (t1.ts=t2.ts);      
      
      
           ts           | avg_active_sessions       
------------------------+---------------------      
 2019-01-26 05:39:20+08 |                22.3      
 2019-01-26 05:39:30+08 |                38.8      
 2019-01-26 05:39:40+08 |                38.4      
 2019-01-26 05:39:50+08 |                35.1      
 2019-01-26 05:40:00+08 |                39.6      
 2019-01-26 05:40:10+08 |                37.8      
 2019-01-26 05:40:20+08 |                37.9      
 2019-01-26 05:40:30+08 |                36.8      
 2019-01-26 05:40:40+08 |                39.9      
 2019-01-26 05:40:50+08 |                  37      
 2019-01-26 05:41:00+08 |                38.3      
 2019-01-26 05:41:10+08 |                39.4      
 2019-01-26 05:41:20+08 |                38.2      
 2019-01-26 05:41:30+08 |                38.4      
 2019-01-26 05:41:40+08 |                38.1      
 2019-01-26 05:41:50+08 |                36.8      
 2019-01-26 05:42:00+08 |                38.4      
 2019-01-26 05:42:10+08 |                36.6      
 2019-01-26 05:42:20+08 |                36.2      
 2019-01-26 05:42:30+08 |                18.8      
 2019-01-26 05:42:40+08 |                   0      
 2019-01-26 05:42:50+08 |                24.5      
 2019-01-26 05:43:00+08 |                39.9      
 2019-01-26 05:43:10+08 |                40.5      
 2019-01-26 05:43:20+08 |                40.5      
 2019-01-26 05:43:30+08 |                  38      
 2019-01-26 05:43:40+08 |                33.8      
 2019-01-26 05:43:50+08 |                36.3      
 2019-01-26 05:44:00+08 |                35.8      
 2019-01-26 05:44:10+08 |                37.9      
 2019-01-26 05:44:20+08 |                37.5      
 2019-01-26 05:44:30+08 |                39.3      
 2019-01-26 05:44:40+08 |                40.1      
 2019-01-26 05:44:50+08 |                40.1      
 2019-01-26 05:45:00+08 |                41.4      
 2019-01-26 05:45:10+08 |                37.2      
 2019-01-26 05:45:20+08 |                  39      
 2019-01-26 05:45:30+08 |                37.7      
 2019-01-26 05:45:40+08 |                38.8      
 2019-01-26 05:45:50+08 |                38.9      
 2019-01-26 05:46:00+08 |                38.8      
 2019-01-26 05:46:10+08 |                36.2      
 2019-01-26 05:46:20+08 |                41.4      
 2019-01-26 05:46:30+08 |                40.5      
 2019-01-26 05:46:40+08 |                36.9      
 2019-01-26 05:46:50+08 |                36.7      
 2019-01-26 05:47:00+08 |                37.1      
 2019-01-26 05:47:10+08 |                  40      
 2019-01-26 05:47:20+08 |                36.5      
 2019-01-26 05:47:30+08 |                40.7      
 2019-01-26 05:47:40+08 |                37.6      
 2019-01-26 05:47:50+08 |                13.9      
(52 rows)      

2 具体到一个时间段内,是什么问题

例如2019-01-26 05:45:20+08,这个时间区间,性能问题钻取:

1、数据库维度的资源消耗时间占用,判定哪个数据库占用的资源最多

postgres=#     
    
select     
  datname,    
  count(*)/10::float8 cnt     
from perf_insight     
where     
  to_timestamp((extract(epoch from ts))::int8/10*10)   -- 以10秒统计粒度的图形为例    
  ='2019-01-26 05:45:20+08'   -- 问题时间点    
group by 1     
order by cnt desc;      
    
    
 datname  | cnt       
----------+-----      
 postgres |  39      
(1 row)      

2、用户维度的资源消耗时间占用,判定哪个用户占用的资源最多

postgres=#     
    
select     
  usename,    
  count(*)/10::float8 cnt     
from perf_insight     
where     
  to_timestamp((extract(epoch from ts))::int8/10*10)   -- 以10秒统计粒度的图形为例    
  ='2019-01-26 05:45:20+08'   -- 问题时间点    
group by 1     
order by cnt desc;      
    
    
 usename  | cnt       
----------+-----      
 postgres |  39      
(1 row)      

3、等待事件维度的资源消耗时间占用,判定问题集中在哪些等待事件上,可以针对性的优化、加资源。

postgres=#     
    
select     
  coalesce(waiting, 'CPU_TIME') waiting,    
  count(*)/10::float8 cnt     
from perf_insight     
where     
  to_timestamp((extract(epoch from ts))::int8/10*10)   -- 以10秒统计粒度的图形为例    
  ='2019-01-26 05:45:20+08'   -- 问题时间点    
group by 1     
order by cnt desc;      
    
    
         waiting          | cnt        
--------------------------+------      
 CPU_TIME                 | 15.3      
 Client_ClientRead        | 10.6      
 IPC_ProcArrayGroupUpdate |  6.1      
 Lock_transactionid       |  5.4      
 Lock_tuple               |  0.5      
 LWLock_wal_insert        |  0.3      
 LWLock_ProcArrayLock     |  0.2      
 LWLock_buffer_content    |  0.2      
 IPC_ClogGroupUpdate      |  0.2      
 LWLock_lock_manager      |  0.1      
 LWLock_CLogControlLock   |  0.1      
(11 rows)      

4、SQL维度的资源消耗时间占用,判定问题集中在哪些SQL上,可以针对性的优化。

postgres=#     
    
select     
  query,    
  count(*)/10::float8 cnt     
from perf_insight     
where     
  to_timestamp((extract(epoch from ts))::int8/10*10)  -- 以10秒统计粒度的图形为例    
  ='2019-01-26 05:45:20+08'   -- 问题时间点    
group by 1     
order by cnt desc;       
    
                                                 query                                                 | cnt        
-------------------------------------------------------------------------------------------------------+------      
 END;                                                                                                  | 11.5      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | 11.3      
 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;                                  |  6.8      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   |  4.5      
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP); |  2.3      
 SELECT abalance FROM pgbench_accounts WHERE aid = $1;                                                 |  2.1      
 BEGIN;                                                                                                |  0.5      
(7 rows)      

5、单条QUERY在不同等待事件上的资源消耗时间占用,判定问题SQL的突出等待事件,可以针对性的优化、加资源。

postgres=#     
    
select     
  query,     
  coalesce(waiting, 'CPU_TIME') waiting,     
  count(*)/10::float8 cnt     
from perf_insight     
where     
  to_timestamp((extract(epoch from ts))::int8/10*10)  -- 以10秒统计粒度的图形为例    
  ='2019-01-26 05:45:20+08'  -- 问题时间点    
group by 1,2     
order by 1,cnt desc;     
    
    
                                                 query                                                 |         waiting          | cnt       
-------------------------------------------------------------------------------------------------------+--------------------------+-----      
 BEGIN;                                                                                                | Client_ClientRead        | 0.3      
 BEGIN;                                                                                                | CPU_TIME                 | 0.2      
 END;                                                                                                  | CPU_TIME                 | 4.6      
 END;                                                                                                  | IPC_ProcArrayGroupUpdate | 3.7      
 END;                                                                                                  | Client_ClientRead        | 3.1      
 END;                                                                                                  | IPC_ClogGroupUpdate      | 0.1      
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP); | CPU_TIME                 |   1      
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP); | Client_ClientRead        | 0.6      
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP); | IPC_ProcArrayGroupUpdate | 0.6      
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP); | IPC_ClogGroupUpdate      | 0.1      
 SELECT abalance FROM pgbench_accounts WHERE aid = $1;                                                 | CPU_TIME                 | 1.2      
 SELECT abalance FROM pgbench_accounts WHERE aid = $1;                                                 | Client_ClientRead        | 0.6      
 SELECT abalance FROM pgbench_accounts WHERE aid = $1;                                                 | Lock_transactionid       | 0.3      
 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;                                  | CPU_TIME                 | 3.8      
 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;                                  | Client_ClientRead        | 2.9      
 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;                                  | LWLock_wal_insert        | 0.1      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | Lock_transactionid       |   4      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | CPU_TIME                 | 2.5      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | Client_ClientRead        | 2.1      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | IPC_ProcArrayGroupUpdate | 1.7      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | Lock_tuple               | 0.5      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | LWLock_buffer_content    | 0.2      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | LWLock_ProcArrayLock     | 0.2      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | LWLock_wal_insert        | 0.1      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | CPU_TIME                 |   2      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | Lock_transactionid       | 1.1      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | Client_ClientRead        |   1      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | IPC_ProcArrayGroupUpdate | 0.1      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | LWLock_CLogControlLock   | 0.1      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | LWLock_lock_manager      | 0.1      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | LWLock_wal_insert        | 0.1      
(31 rows)      

6、点中单条QUERY,在不同等待事件上的资源消耗时间占用,判定问题SQL的突出等待事件,可以针对性的优化、加资源。

通过4,发现占用最多的是END这条SQL,那么这条SQL的等待时间分布如何?是什么等待引起的?

postgres=#     
    
select     
  coalesce(waiting, 'CPU_TIME') waiting,    
  count(*)/10::float8 cnt     
from perf_insight     
where     
  to_timestamp((extract(epoch from ts))::int8/10*10)   -- 以10秒统计粒度的图形为例    
  ='2019-01-26 05:45:20+08'   -- 问题时间点    
  and query='END;'     
group by 1     
order by cnt desc;      
    
    
         waiting          | cnt       
--------------------------+-----      
 CPU_TIME                 | 4.6      
 IPC_ProcArrayGroupUpdate | 3.7      
 Client_ClientRead        | 3.1      
 IPC_ClogGroupUpdate      | 0.1      
(4 rows)      

3 开启一个可以造成性能问题的压测场景,通过perf insight直接发现问题

1、开启640个并发,读写压测,由于数据量小,并发高,直接导致了ROW LOCK冲突的问题,使用perf insight问题毕现。

pgbench -M prepared -n -r -P 1 -c 640 -j 640 -T 300       
postgres=#     
    
select     
  query,    
  coalesce(waiting, 'CPU_TIME') waiting,    
  count(*)/10::float8 cnt     
from perf_insight     
where     
  to_timestamp((extract(epoch from ts))::int8/10*10)   -- 以10秒统计粒度的图形为例    
  ='2019-01-26 06:38:20+08'   -- 问题时间点    
group by 1,2     
order by 1,cnt desc;    
    
    
                                       query                                                 |         waiting          |  cnt        
-------------------------------------------------------------------------------------------------------+--------------------------+-------      
 BEGIN;                                                                                                | Lock_transactionid       |   0.3      
 BEGIN;                                                                                                | Lock_tuple               |   0.3      
 BEGIN;                                                                                                | LWLock_lock_manager      |   0.1      
 END;                                                                                                  | IPC_ProcArrayGroupUpdate |  29.5      
 END;                                                                                                  | CPU_TIME                 |  14.1      
 END;                                                                                                  | Lock_transactionid       |    13      
 END;                                                                                                  | Client_ClientRead        |   8.4      
 END;                                                                                                  | Lock_tuple               |   8.1      
 END;                                                                                                  | LWLock_lock_manager      |     3      
 END;                                                                                                  | LWLock_ProcArrayLock     |   0.4      
 END;                                                                                                  | LWLock_buffer_content    |   0.3      
 END;                                                                                                  | IPC_ClogGroupUpdate      |   0.1      
 END;                                                                                                  | LWLock_wal_insert        |   0.1      
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP); | IPC_ProcArrayGroupUpdate |   1.3      
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP); | CPU_TIME                 |   0.4      
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP); | Lock_transactionid       |   0.3      
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP); | Lock_tuple               |   0.2      
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP); | Client_ClientRead        |   0.2      
 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES ($1, $2, $3, $4, CURRENT_TIMESTAMP); | LWLock_lock_manager      |   0.1      
 SELECT abalance FROM pgbench_accounts WHERE aid = $1;                                                 | Lock_tuple               |   0.9      
 SELECT abalance FROM pgbench_accounts WHERE aid = $1;                                                 | Lock_transactionid       |   0.9      
 SELECT abalance FROM pgbench_accounts WHERE aid = $1;                                                 | IPC_ProcArrayGroupUpdate |   0.4      
 SELECT abalance FROM pgbench_accounts WHERE aid = $1;                                                 | Client_ClientRead        |   0.3      
 SELECT abalance FROM pgbench_accounts WHERE aid = $1;                                                 | CPU_TIME                 |   0.1      
 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;                                  | Lock_transactionid       |   1.7      
 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;                                  | IPC_ProcArrayGroupUpdate |   1.4      
 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;                                  | Lock_tuple               |   0.9      
 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;                                  | LWLock_lock_manager      |   0.1      
 UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;                                  | CPU_TIME                 |   0.1      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | Lock_transactionid       | 161.5  # 突出问题在这里      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | IPC_ProcArrayGroupUpdate |  27.2      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | Lock_tuple               |  27.2      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | LWLock_lock_manager      |  19.6      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | CPU_TIME                 |  12.3      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | Client_ClientRead        |     4      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | LWLock_buffer_content    |   3.3      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | LWLock_ProcArrayLock     |   0.3      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | LWLock_wal_insert        |   0.1      
 UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;                                  | IPC_ClogGroupUpdate      |   0.1      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | Lock_transactionid       | 178.4  # 突出问题在这里      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | Lock_tuple               |  83.7  # 突出问题在这里      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | CPU_TIME                 |   5.6      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | IPC_ProcArrayGroupUpdate |   5.3      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | LWLock_lock_manager      |   3.8      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | Client_ClientRead        |     2      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | LWLock_ProcArrayLock     |   0.1      
 UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;                                   | LWLock_buffer_content    |   0.1      
(47 rows)      
postgres=#     
    
select     
  coalesce(waiting, 'CPU_TIME') waiting,    
  count(*)/10::float8 cnt     
from perf_insight     
where     
  to_timestamp((extract(epoch from ts))::int8/10*10)  -- 以10秒统计粒度的图形为例    
  ='2019-01-26 06:38:20+08'   -- 问题时间点    
group by 1     
order by cnt desc;    
    
    
         waiting          |  cnt      
--------------------------+-------    
 Lock_transactionid       | 356.1    
 Lock_tuple               | 121.3    
 IPC_ProcArrayGroupUpdate |  65.1    
 CPU_TIME                 |  32.6    
 LWLock_lock_manager      |  26.7    
 Client_ClientRead        |  14.9    
 LWLock_buffer_content    |   3.7    
 LWLock_ProcArrayLock     |   0.8    
 LWLock_wal_insert        |   0.2    
 IPC_ClogGroupUpdate      |   0.2    
(10 rows)    

其他压测场景使用perf insight发现问题的例子

1、批量数据写入,BLOCK extend或wal insert lock瓶颈,或pglz压缩瓶颈。

create table test(id int, info text default repeat(md5(random()::text),1000));    
    
    
vi test.sql    
insert into test(id) select generate_series(1,10);    
    
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 300    
postgres=#     
select     
  to_timestamp((extract(epoch from ts))::int8/10*10) ts,     
  coalesce(waiting, 'CPU_TIME') waiting,     
  count(*)/10::float8 cnt     
from perf_insight     
group by 1,2     
order by 1,cnt desc;    
    
    
           ts           |         waiting          | cnt      
------------------------+--------------------------+------    
 2019-01-26 10:28:50+08 | IO_DataFileExtend        |  0.1    
 2019-01-26 10:29:00+08 | CPU_TIME                 |   50    
 2019-01-26 10:29:00+08 | Lock_extend              | 11.9  -- 扩展数据文件    
 2019-01-26 10:29:00+08 | Client_ClientRead        |  0.3    
 2019-01-26 10:29:00+08 | IO_DataFileExtend        |  0.2    
 2019-01-26 10:29:00+08 | LWLock_lock_manager      |  0.1    
 2019-01-26 10:29:10+08 | CPU_TIME                 | 47.1    
 2019-01-26 10:29:10+08 | Lock_extend              | 13.5    
 2019-01-26 10:29:10+08 | Client_ClientRead        |  0.7    
 2019-01-26 10:29:10+08 | IO_DataFileExtend        |  0.3    
 2019-01-26 10:29:10+08 | LWLock_buffer_content    |  0.2    
 2019-01-26 10:29:10+08 | LWLock_lock_manager      |  0.1    
 2019-01-26 10:29:20+08 | CPU_TIME                 | 54.5    
 2019-01-26 10:29:20+08 | Lock_extend              |  6.7    
 2019-01-26 10:29:20+08 | Client_ClientRead        |  0.2    
 2019-01-26 10:29:20+08 | IO_DataFileExtend        |  0.1    
 2019-01-26 10:29:30+08 | CPU_TIME                 | 61.9  -- CPU,通过perf top来看是 pglz接口的瓶颈(pglz_compress)     
 2019-01-26 10:29:30+08 | Client_ClientRead        |  0.2    
 2019-01-26 10:29:40+08 | CPU_TIME                 | 30.9    
 2019-01-26 10:29:40+08 | LWLock_wal_insert        |  0.2    
 2019-01-26 10:29:40+08 | Client_ClientRead        |  0.1    
(28 rows)    

所以上面这个问题,如果改成不压缩,那么瓶颈就会变成其他的:

alter table test alter COLUMN info set storage external;    
    
    
postgres=# \d+ test    
                                                  Table "public.test"    
 Column |  Type   | Collation | Nullable |               Default               | Storage  | Stats target | Description     
--------+---------+-----------+----------+-------------------------------------+----------+--------------+-------------    
 id     | integer |           |          |                                     | plain    |              |     
 info   | text    |           |          | repeat(md5((random())::text), 1000) | external |              |     

瓶颈就会变成其他的:

 2019-01-26 10:33:50+08 | Lock_extend              | 43.2    
 2019-01-26 10:33:50+08 | LWLock_buffer_content    | 14.8    
 2019-01-26 10:33:50+08 | CPU_TIME                 |  4.6    
 2019-01-26 10:33:50+08 | LWLock_lock_manager      |  0.5    
 2019-01-26 10:33:50+08 | LWLock_wal_insert        |  0.4    
 2019-01-26 10:33:50+08 | IO_DataFileExtend        |  0.4    
 2019-01-26 10:33:50+08 | Client_ClientRead        |  0.1    
 2019-01-26 10:34:00+08 | Lock_extend              | 55.6    
 2019-01-26 10:34:00+08 | LWLock_buffer_content    |  6.3    
 2019-01-26 10:34:00+08 | CPU_TIME                 |  1.2    
 2019-01-26 10:34:00+08 | IO_DataFileExtend        |  0.8    
 2019-01-26 10:34:00+08 | LWLock_wal_insert        |  0.1    
 2019-01-26 10:34:10+08 | Lock_extend              |  6.3    
 2019-01-26 10:34:10+08 | LWLock_buffer_content    |  5.8    
 2019-01-26 10:34:10+08 | CPU_TIME                 |  0.7    

因此治本的方法是提供更好的压缩接口,这也是PG 12的版本正在改进的:

《[未完待续] PostgreSQL 开放压缩接口 与 lz4压缩插件》

《[未完待续] PostgreSQL zstd 压缩算法 插件》

2、秒杀,单条UPDATE。行锁瓶颈。

create table t_hot (id int primary key, cnt int8);    
insert into t_hot values (1,0);    
    
vi test.sql    
update t_hot set cnt=cnt+1 where id=1;    
    
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 300    
    
    
postgres=#     
select     
  to_timestamp((extract(epoch from ts))::int8/10*10) ts,     
  coalesce(waiting, 'CPU_TIME') waiting,     
  count(*)/10::float8 cnt     
from perf_insight     
group by 1,2     
order by 1,cnt desc;    
    
 2019-01-26 10:37:50+08 | Lock_tuple               | 29.6  -- 瓶颈为行锁冲突    
 2019-01-26 10:37:50+08 | LWLock_lock_manager      | 11.4  -- 伴随热点块    
 2019-01-26 10:37:50+08 | LWLock_buffer_content    |  8.4    
 2019-01-26 10:37:50+08 | Lock_transactionid       |  7.6    
 2019-01-26 10:37:50+08 | CPU_TIME                 |  6.5    
 2019-01-26 10:37:50+08 | Client_ClientRead        |  0.2    
 2019-01-26 10:38:00+08 | Lock_tuple               | 29.2  -- 瓶颈为行锁冲突    
 2019-01-26 10:38:00+08 | LWLock_buffer_content    | 15.6  -- 伴随热点块    
 2019-01-26 10:38:00+08 | CPU_TIME                 |  7.9    
 2019-01-26 10:38:00+08 | LWLock_lock_manager      |  7.2    
 2019-01-26 10:38:00+08 | Lock_transactionid       |  3.7    

秒杀的场景,优化方法

《PostgreSQL 秒杀4种方法 - 增加 批量流式加减库存 方法》

《HTAP数据库 PostgreSQL 场景与性能测试之 30 - (OLTP) 秒杀 - 高并发单点更新》

《聊一聊双十一背后的技术 - 不一样的秒杀技术, 裸秒》

《PostgreSQL 秒杀场景优化》

3、未优化SQL,全表扫描filter,CPU time瓶颈。

postgres=# create table t_bad (id int, info text);    
CREATE TABLE    
postgres=# insert into t_bad select generate_series(1,10000), md5(random()::Text);    
INSERT 0 10000    
    
vi test.sql    
\set id random(1,10000)    
select * from t_bad where id=:id;    
    
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 300    

瓶颈

postgres=#     
select     
  to_timestamp((extract(epoch from ts))::int8/10*10) ts,     
  coalesce(waiting, 'CPU_TIME') waiting,     
  count(*)/10::float8 cnt     
from perf_insight     
group by 1,2     
order by 1,cnt desc;    
    
 2019-01-26 10:41:40+08 | CPU_TIME                 | 61.3    
 2019-01-26 10:41:40+08 | Client_ClientRead        |  0.9    
 2019-01-26 10:41:50+08 | CPU_TIME                 | 61.7    
 2019-01-26 10:41:50+08 | Client_ClientRead        |  0.1    
 2019-01-26 10:42:00+08 | CPU_TIME                 | 60.7    
 2019-01-26 10:42:00+08 | Client_ClientRead        |  0.5    

perf insight 的基准线

如果要设置一个基准线,用于报警。那么:

1、基准线跟QPS没什么关系。

2、基准线跟avg active sessions有莫大关系。avg active sessions大于实例CPU核数时,说明有性能问题。

perf insight 不是万能的

perf insight 发现当时的问题是非常迅速的。

神医华佗说,不治已病治未病才是最高境界,perf insight实际上是发现已病,而未病是发现不了的。

未病还是需要对引擎的深刻理解和丰富的经验积累。

例如:

1、年龄

2、FREEZE风暴

3、sequence耗尽

4、索引推荐

5、膨胀

6、安全风险

7、不合理索引

8、增长趋势

9、碎片

10、分区建议

11、冷热分离建议

12、TOP SQL诊断与优化

13、扩容(容量、计算资源、IO、内存...)建议

14、分片建议

15、架构优化建议

等。

除此之外,perf insight对于这类情况也是发现不了的:

1、long query (waiting (ddl, block one session)),当long query比较少,总体avg active session低于基准水位时,实际上long query的问题就无法暴露。

然而long query是有一些潜在问题的,例如可能导致膨胀。

perf insight + 经验型监控、诊断,可以使得你的数据库监测系统更加强壮。

其他知识点、内核需改进点

1、会话ID,使用backend的启动时间,backend pid两者结合,就可以作为PG数据库的唯一session id。

有了session id,就可以基于SESSION维度进行性能诊断和可视化展示。

select extract(epoch from backend_start)||'.'||pid as sessid     
from pg_stat_activity ;    
    
         sessid             
------------------------    
 1547978042.41326.13447    
 1547978042.41407.13450    

2、对于未使用绑定变量的SQL,要做SQL层的统计透视,就会比较悲剧了,因为只要输入的变量不同在pg_stat_activity的query

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
4月前
|
缓存 关系型数据库 数据库
PostgreSQL性能
【8月更文挑战第26天】PostgreSQL性能
76 1
|
2月前
|
SQL Oracle 关系型数据库
Oracle SQL:了解执行计划和性能调优
Oracle SQL:了解执行计划和性能调优
76 1
|
3月前
|
Oracle NoSQL 关系型数据库
主流数据库对比:MySQL、PostgreSQL、Oracle和Redis的优缺点分析
主流数据库对比:MySQL、PostgreSQL、Oracle和Redis的优缺点分析
661 2
|
3月前
|
缓存 关系型数据库 数据库
如何优化 PostgreSQL 数据库性能?
如何优化 PostgreSQL 数据库性能?
162 2
|
2月前
|
存储 关系型数据库 MySQL
四种数据库对比MySQL、PostgreSQL、ClickHouse、MongoDB——特点、性能、扩展性、安全性、适用场景
四种数据库对比 MySQL、PostgreSQL、ClickHouse、MongoDB——特点、性能、扩展性、安全性、适用场景
|
3月前
|
缓存 关系型数据库 数据库
PostgreSQL的性能
PostgreSQL的性能
196 2
|
4月前
|
缓存 关系型数据库 数据库
PostgreSQL 查询性能
【8月更文挑战第5天】PostgreSQL 查询性能
88 8
|
4月前
|
关系型数据库 Java 数据库
PostgreSQL性能
【8月更文挑战第5天】PostgreSQL性能
137 7
|
4月前
|
监控 关系型数据库 数据库
如何优化PostgreSQL的性能?
【8月更文挑战第4天】如何优化PostgreSQL的性能?
282 7
|
4月前
|
监控 Oracle 关系型数据库
"深度剖析:Oracle SGA大小调整策略——从组件解析到动态优化,打造高效数据库性能"
【8月更文挑战第9天】在Oracle数据库性能优化中,系统全局区(SGA)的大小调整至关重要。SGA作为一组共享内存区域,直接影响数据库处理能力和响应速度。本文通过问答形式介绍SGA调整策略:包括SGA的组成(如数据缓冲区、共享池等),如何根据负载与物理内存确定初始大小,手动调整SGA的方法(如使用`ALTER SYSTEM`命令),以及利用自动内存管理(AMM)特性实现智能调整。调整过程中需注意监控与测试,确保稳定性和性能。
393 2

推荐镜像

更多