[20180305]手工模拟buffer busy wait.txt
--//一般出现buffer busy wait原因,主要是对热块,大量dml操作.
--//一种提法:oracle读不会阻塞写,写不会阻塞读,实际上写一定程度会阻塞读,只不过时间很短罢了.
--//以前vage提到过一种特殊情况导致buffer busy wait,那就是写日志缓慢,也会导致这种情况出现,昨天看
--//http://www.askmaclean.com/archives/why-slow-redo-write-cause-buffer-busy-wait.html
--//自己也重新测试看看:
1.环境:
SCOTT@book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
2.测试:
SCOTT@book> @ &r/s
SCOTT@book(41,2229)> @ &r/spid
SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50
---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------
41 2229 11184 DEDICATED 11185 27 241 alter system kill session '41,2229' immediate;
--//进程号=11185
SCOTT@book(41,2229)> create table deptx as select * from dept;
Table created.
SCOTT@book(41,2229)> select rowid,deptx.* from deptx where rownum=1;
ROWID DEPTNO DNAME LOC
------------------ ---------- -------------- -------------
AAAWGMAAEAAAAIrAAA 10 ACCOUNTING NEW YORK
SCOTT@book(41,2229)> @ &r/rowid AAAWGMAAEAAAAIrAAA
OBJECT FILE BLOCK ROW ROWID_DBA DBA TEXT
---------- ---------- ---------- ---------- -------------------- -------------------- ----------------------------------------
90508 4 555 0 0x100022B 4,555 alter system dump datafile 4 block 555 ;
--//终端:
gdb $(which oracle) 11185
...
(gdb) b kcrf_commit_force
Breakpoint 1 at 0x96cf4e4
--//session 1:
SCOTT@book(41,2229)> update deptx set dname=lower(dname) where deptno=10;
1 row updated.
SCOTT@book(41,2229)> commit ;
--//挂起!!
--//终端界面:
Breakpoint 1, 0x00000000096cf4e4 in kcrf_commit_force ()
--//session 2:
SCOTT@book> @ &r/s
SCOTT@book(54,1201)> @ &r/spid
SID SERIAL# PROCESS SERVER SPID PID P_SERIAL# C50
---------- ---------- ------------------------ --------- ------ ------- ---------- --------------------------------------------------
54 1201 11208 DEDICATED 11209 28 38 alter system kill session '54,1201' immediate;
--//进程号=11209
SCOTT@book(54,1201)> @ &r/wait
no rows selected
SCOTT@book(54,1201)> select * from deptx;
--//挂起!!
--//session 3:
SYS@book> @ &r/wait
P1RAW P2RAW P3RAW P1 P2 P3 SID SERIAL# SEQ# EVENT STATUS STATE WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00 1650815232 1 0 41 2229 85 SQL*Net message from client ACTIVE WAITED KNOWN TIME 6655081 21 Idle
0000000000000004 000000000000022B 0000000000000001 4 555 1 54 1201 51 buffer busy waits ACTIVE WAITING 3121034 3 Concurrency
SYS@book> @ &r/ev_name 'buffer busy waits'
EVENT# EVENT_ID NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS
---------- ---------- ---------------------------------------- -------------------- -------------------- -------------------- ------------- ----------- --------------------
95 2161531084 buffer busy waits file# block# class# 3875070507 4 Concurrency
--//sid=54出现buffer busy waits,奇怪的是sid=41,出现的是SQL*Net message from client.
--//你可以发现buffer busy waits 的P1,P2参数与前面查询访问的块号一致dba=4,555.
--//终端界面:
(gdb) bt
#0 0x00000000096cf4e4 in kcrf_commit_force ()
#1 0x00000000096cb7bf in kcrfw_redo_gen ()
#2 0x0000000000e3d60b in kcb_commit_main ()
#3 0x0000000000e3c949 in kcb_commit ()
#4 0x0000000000f6412f in ktiCommitCleanout ()
#5 0x0000000000f63806 in kticmt ()
#6 0x0000000000f1e6dc in ktucmt ()
#7 0x00000000094facf5 in ktcCommitTxn ()
#8 0x000000000304023c in ktdcmt ()
#9 0x000000000210f400 in k2lcom ()
#10 0x0000000002128f65 in k2send ()
#11 0x0000000000f05073 in xctctl ()
#12 0x0000000000f03392 in xctCommitTxn ()
#13 0x0000000001ed651b in kksExecuteCommand ()
#14 0x00000000095c8398 in opiexe ()
#15 0x0000000001ba5992 in kpoal8 ()
#16 0x00000000095bbdad in opiodr ()
#17 0x00000000097a629f in ttcpip ()
#18 0x000000000186470e in opitsk ()
#19 0x0000000001869235 in opiino ()
#20 0x00000000095bbdad in opiodr ()
#21 0x00000000018607ac in opidrv ()
#22 0x0000000001e3a48f in sou2o ()
#23 0x0000000000a29265 in opimai_real ()
#24 0x0000000001e407ad in ssthrdmain ()
#25 0x0000000000a291d1 in main ()
--//如果在一个块内大量的修改与查询,就很容易出现buffer busy waits,我这里在commit设置断点,很明显
--//要写redo 缓存到redo文件.如果在应用中出现log file sync+buffer busy wait优先解决log file sync.
--//当然一些dml的热块也要注意.
--//转载:http://www.askmaclean.com/archives/why-slow-redo-write-cause-buffer-busy-wait.html
3.简单来说这个示例说明了几点:
OLTP类型的小DML操作一般都会是immediate block cleanout的,这要求在commit之前对block做change kcbchg
在commit kcrf_commit_force完成前都不会释放对该block buffer的buffer pin
由上述2点造成的buffer pin最终会影响select和其他insert/update/delete 形成buffer busy wait
由于慢的lgwr写redo log会造成 kcrf_commit_force commit的缓慢,表现在等待事件上就是log file sync
由于block cleanout时pin block buffer且commit 慢,则会导致更长时间的buffer busy wait
若log file sync是由lgwr 写redo log慢(log file parallel write)引起的,则它的另一个效应就是buffer busy wait增多
若看到AWR中log file sync+buffer busy wait是主要等待事件,则优先解决log file sync ,因为buffer busy wait实际可能是受害者
AWR中与commit cleanout相关的 Instance activity 有好几个
commit cleanout failures: block lost
commit cleanout failures: buffer being written
commit cleanout failures: callback failure
commit cleanout failures: cannot pin
commit cleanouts
commit cleanouts successfully completed
4.补充:
--//以上测试等一会:
--//session 3:
SYS@book> @ &r/wait
P1RAW P2RAW P3RAW P1 P2 P3 SID SERIAL# SEQ# EVENT STATUS STATE WAIT_TIME_MICRO SECONDS_IN_WAIT WAIT_CLASS
---------------- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------------------------------- -------- ------------------- --------------- --------------- --------------------
0000000062657100 0000000000000001 00 1650815232 1 0 41 2229 85 SQL*Net message from client ACTIVE WAITED KNOWN TIME 6655081 255 Idle
0000000000000004 000000000000022B 0000000000000001 4 555 1 54 1201 51 buffer busy waits ACTIVE WAITING 237183753 237 Concurrency
0000000000000003 0000000000000090 0000000000000013 3 144 19 222 1 25198 buffer busy waits ACTIVE WAITING 225469340 225 Concurrency
SYS@book> @ &r/ev_name 'buffer busy waits'
EVENT# EVENT_ID NAME PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS
---------- ---------- ---------------------------------------- -------------------- -------------------- -------------------- ------------- ----------- --------------------
95 2161531084 buffer busy waits file# block# class# 3875070507 4 Concurrency
--//又出现一个会话阻塞.buffer busy waits的P1,P2的参数是file#,block#.
SYS@book> @ &r/bh 3 144
HLADDR DBARFIL DBABLK CLASS CLASS_TYPE STATE TCH CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL CR_UBA_BLK CR_UBA_SEQ BA OBJECT_NAME
---------------- ------- ----------- ----- ------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- --------------------
0000000084F763C0 3 144 19 xcur 173 0 0 0 0 0 000000007A33C000
SYS@book> @ &r/bh 4 555
HLADDR DBARFIL DBABLK CLASS CLASS_TYPE STATE TCH CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL CR_UBA_BLK CR_UBA_SEQ BA OBJECT_NAME
---------------- ------- ----------- ----- ------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- --------------------
0000000084EB03C8 4 555 1 data block xcur 2 0 0 0 0 0 00000000648CE000 DEPTX
0000000084EB03C8 4 555 1 data block cr 2 392461831 3 0 0 0 00000000648D0000 DEPTX
0000000084EB03C8 4 555 1 data block cr 1 392461277 3 0 0 0 00000000733B2000 DEPTX
0000000084EB03C8 4 555 1 data block free 0 0 0 0 0 0 0000000070730000
--//HLADDR 不一样,不知道为什么出现buffer busy waits sid=222.
SYS@book> select PROGRAM,sid,serial#,sql_id from v$session where sid=222;
PROGRAM SID SERIAL# SQL_ID
-------------------------- ---------- ---------- -------------
oracle@gxqyydg4 (MMON) 222 1
SYS@book> select PROGRAM,sid,serial#,sql_id,PREV_SQL_ID from v$session where sid=222;
PROGRAM SID SERIAL# SQL_ID PREV_SQL_ID
-------------------------- ---------- ---------- ------------- -------------
oracle@gxqyydg4 (MMON) 222 1 fsbqktj5vw6n9
SYS@book> @ &r/sql_id fsbqktj5vw6n9
old 1: select sql_id,sql_fulltext sqltext from gv$sqlarea where sql_id='&1'
new 1: select sql_id,sql_fulltext sqltext from gv$sqlarea where sql_id='fsbqktj5vw6n9'
SQL_ID SQLTEXT
------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
fsbqktj5vw6n9 select next_run_date, obj#, run_job, sch_job from (select decode(bitand(a.flags, 16384), 0, a.next_run_date, a.last_enabled_time) next_run_date, a.obj# obj#, decode(bitand(a.flags
, 16384), 0, 0, 1) run_job, a.sch_job sch_job from (select p.obj# obj#, p.flags flags, p.next_run_date next_run_date, p.job_status job_status, p.class_oid class_oid, p.last_enabled_time l
ast_enabled_time, p.instance_id instance_id, 1 sch_job from sys.scheduler$_job p where bitand(p.job_status, 3) = 1 and ((bitand(p.flags, 134217728 + 268435456) = 0) or (bitand(p.jo
b_status, 1024) <> 0)) and bitand(p.flags, 4096) = 0 and p.instance_id is NULL and (p.class_oid is null or (p.class_oid is not null and p.class_oid in (select b.obj# from sys.schedu
ler$_class b where b.affinity is null))) UNION ALL select q.obj#, q.flags, q.next_run_date, q.job_status, q.class_oid, q.last_enabled_time, q.instance_id, 1 from sy
s.scheduler$_lightweight_job q where bitand(q.job_status, 3) = 1 and ((bitand(q.flags, 134217728 + 268435456) = 0) or (bitand(q.job_status, 1024) <> 0)) and bitand(q.flags, 4096) = 0
and q.instance_id is NULL and (q.class_oid is null or (q.class_oid is not null and q.class_oid in (select c.obj# from sys.scheduler$_class c where c.affinity is
null))) UNION ALL select j.job, 0, from_tz(cast(j.next_date as timestamp), to_char(systimestamp,'TZH:TZM')), 1, NULL, from_tz(cast(j.next_date as timestamp),to_char(systimestamp,'TZH:TZ
M')), NULL, 0 from sys.job$ j where (j.field1 is null or j.field1 = 0) and j.this_date is null) a order by 1) where rownum = 1
--//放弃,不知道为什么出现后续的buffer busy waits的等待事件.
--//在终端界面上执行:
(gdb) c
Continuing.
Breakpoint 1, 0x00000000096cf4e4 in kcrf_commit_force ()
(gdb) c
Continuing.
--//session 1:
SCOTT@book(41,2229)> commit ;
Commit complete.
--//提交成功.
--//session 2:
SCOTT@book(54,1201)> select * from deptx;
DEPTNO DNAME LOC
---------- -------------- -------------
10 accounting NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
--//session 3:
SYS@book> @ &r/wait
no rows selected