需求:
数据库中过去某个时间发生enq: TX - row lock contention等待,处理完恢复正常后。
开发人员往往提出是否可以找出历史操作,哪个机器、哪个用户、执行了哪条具体SQL的操作未及时提交,引发该问题。
本文档有效的前提条件:数据库在故障发生前已打开最小日志(MIN为YES)
SELECT supplemental_log_data_min min,
supplemental_log_data_pk pk,
supplemental_log_data_ui ui,
supplemental_log_data_fk fk,
supplemental_log_data_all allc
FROM v$database;
MIN PK UI FK ALL
-------- --- --- --- ---
YES NO NO NO NO
附:
----打开最小附加日志
alter database drop supplemental log data;
----关闭最小附加日志
alter database add supplemental log data;
DROP TABLE tx_eg;
CREATE TABLE tx_eg ( num number, txt varchar2(10), sex varchar2(10) ) INITRANS 1 MAXTRANS 1;
INSERT into tx_eg VALUES ( 1, 'First','FEMALE' );
INSERT into tx_eg VALUES ( 2, 'Second','MALE' );
INSERT into tx_eg VALUES ( 3, 'Third','MALE' );
INSERT into tx_eg VALUES ( 4, 'Fourth','MALE' );
INSERT into tx_eg VALUES ( 5, 'Fifth','MALE' );
COMMIT;
--Ses#1:
UPDATE tx_eg SET txt='Garbage' WHERE num=1;
--Ses#2:
UPDATE tx_eg SET txt='Garbage' WHERE num=1;
--DBA:
SELECT sid,type,id1,id2,lmode,request FROM v$lock WHERE type='TX';
1. 在发生等待的时间段,创建临时表m_ash保存ash采样数据
create table m_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between TO_TIMESTAMP ('2022-02-10 18:00:00',
'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('2022-02-10 19:00:00', 'YYYY-MM-DD HH24:MI:SS');
set line 300
select min(sample_time),max(sample_time) from m_ash;
MIN(SAMPLE_TIME) MAX(SAMPLE_TIME)
-------------------------------------------------- --------------------------------------------------
2022/2/10 18:24:04.019 2022/2/10 18:59:57.624
已选择 1 行。
2. 查看m_ash表中等待'enq: TX - row lock contention'事件的session
alter session set nls_date_format = 'yyyy-mm-dd hh24:mi:ss';
set line 400
col SAMPLE_TIME for a30
col EVENT for a30
col machine for a15
select /*+ parallel 8 */
level lv,
connect_by_isleaf isleaf,
t.sample_time,
t.sample_id,
t.instance_number,
t.session_id,
t.session_serial#,
t.machine,
t.sql_id,
t.SQL_EXEC_ID,
t.sql_exec_start,
t.session_type,
t.event,
t.session_state,
t.blocking_session,
t.blocking_session_serial# serial#,
t.blocking_session_status b_s_status,
t.blocking_inst_id INST_ID,
t.current_obj#,t.current_file#,t.current_block#,t.current_row#
from m_ash t
start with event = 'enq: TX - row lock contention'
connect by nocycle prior blocking_session = session_id
and prior sample_id = sample_id;
LV ISLEAF SAMPLE_TIME SAMPLE_ID INSTANCE_NUMBER SESSION_ID SESSION_SERIAL# MACHINE SQL_ID SQL_EXEC_ID SQL_EXEC_START SESSION_TYPE EVENT SESSION_STATE BLOCKING_SESSION SERIAL# B_S_STATUS INST_ID CURRENT_OBJ# CURRENT_FILE# CURRENT_BLOCK# CURRENT_ROW#
---------- ---------- ------------------------------ ---------- --------------- ---------- --------------- --------------- ------------- ----------- --------------------- ------------ ------------------------------ ------------- ---------------- ---------- ----------- ---------- ------------ ------------- -------------- ------------
1 1 2022/2/10 18:24:04.019 29795835 1 1136 57577 dgvxl6632 1ssx3w9sj0sub 16777217 2022-02-10 18:24:02 FOREGROUND enq: TX - row lock contention WAITING 1139 58717 VALID 1 138476 10 8069 0
1 1 2022/2/10 18:24:14.029 29795845 1 1136 57577 dgvxl6632 1ssx3w9sj0sub 16777217 2022-02-10 18:24:02 FOREGROUND enq: TX - row lock contention WAITING 1139 58717 VALID 1 138476 10 8069 0
1 1 2022/2/10 18:24:24.039 29795855 1 1136 57577 dgvxl6632 1ssx3w9sj0sub 16777217 2022-02-10 18:24:02 FOREGROUND enq: TX - row lock contention WAITING 1139 58717 VALID 1 138476 10 8069 0
.........
.........
1 1 2022/2/10 18:24:24.039 29795855 1 2839 30913 dgvxl6632 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 FOREGROUND enq: TX - row lock contention WAITING 1139 58717 VALID 1 138476 10 8069 0
1 1 2022/2/10 18:24:34.079 29795865 1 2839 30913 dgvxl6632 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 FOREGROUND enq: TX - row lock contention WAITING 1139 58717 VALID 1 138476 10 8069 0
1 1 2022/2/10 18:24:44.089 29795875 1 2839 30913 dgvxl6632 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 FOREGROUND enq: TX - row lock contention WAITING 1139 58717 VALID 1 138476 10 8069 0
1 1 2022/2/10 18:24:54.099 29795885 1 2839 30913 dgvxl6632 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 FOREGROUND enq: TX - row lock contention WAITING 1139 58717 VALID 1 138476 10 8069 0
1 1 2022/2/10 18:25:04.125 29795895 1 2839 30913 dgvxl6632 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 FOREGROUND enq: TX - row lock contention WAITING 1139 58717 VALID 1 138476 10 8069 0
1 1 2022/2/10 18:25:14.135 29795905 1 2839 30913 dgvxl6632 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 FOREGROUND enq: TX - row lock contention WAITING 1139 58717 VALID 1 138476 10 8069 0
session 2839 当时正在执行的语句
set line 200
select sql_id,sql_text from dba_hist_sqltext where sql_id='1ssx3w9sj0sub';
SQL_ID SQL_TEXT
------------- ------------------------------------------------------------------------------------
1ssx3w9sj0sub UPDATE tx_eg SET txt='Garbage' WHERE num=1
已选择 1 行。
查看session 1139 是否有被采样
select * from m_ash where SESSION_ID=1139;
未选择任何行。
分析结果
引发1ssx3w9sj0sub产生enq: TX - row lock contention等待的是session 1139,
但session 1139是个空闲的session因此没有采样,session 1139可能在很早前执行了某一个SQL并且没有提交,接着空闲,这样就会造成enq: TX - row lock contention等待。
但oracle并没有一个视图来记载某个session历史执行的所有SQL,为了分析该问题,我们需要挖掘archive或者online redo。
3. 找出session 2839等待的事务信息
set line 400
col SAMPLE_TIME for a30
col EVENT for a30
col machine for a15
select /*+ parallel 8 */
t.sample_time,
t.session_id,
t.sql_id,
t.SQL_EXEC_ID,
t.sql_exec_start,
t.event,
t.p1,
t.p2,
t.p3,
trunc(t.p2 / power(2, 16)) xidusn,
bitand(t.p2, to_number('ffff', 'xxxx')) xidslot,
t.p3 xidsqn
from m_ash t
where session_id = 2839
order by sample_time;
SAMPLE_TIME SESSION_ID SQL_ID SQL_EXEC_ID SQL_EXEC_START EVENT P1 P2 P3 XIDUSN XIDSLOT XIDSQN
------------------------------ ---------- ------------- ----------- --------------------- ------------------------------ ---------- ---------- ---------- ---------- ---------- ----------
2022/2/10 18:24:24.039 2839 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 enq: TX - row lock contention 1415053318 131105 98064 2 33 98064
2022/2/10 18:24:34.079 2839 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 enq: TX - row lock contention 1415053318 131105 98064 2 33 98064
2022/2/10 18:24:44.089 2839 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 enq: TX - row lock contention 1415053318 131105 98064 2 33 98064
2022/2/10 18:24:54.099 2839 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 enq: TX - row lock contention 1415053318 131105 98064 2 33 98064
2022/2/10 18:25:04.125 2839 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 enq: TX - row lock contention 1415053318 131105 98064 2 33 98064
2022/2/10 18:25:14.135 2839 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 enq: TX - row lock contention 1415053318 131105 98064 2 33 98064
2022/2/10 18:25:24.145 2839 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 enq: TX - row lock contention 1415053318 131105 98064 2 33 98064
2022/2/10 18:25:34.195 2839 1ssx3w9sj0sub 16777218 2022-02-10 18:24:16 enq: TX - row lock contention 1415053318 131105 98064 2 33 98064
4. 通过查询v$archived_log找出问题发生前的归档日志,倒序排列
set line 400
col name for a70
select name, first_time,next_time
from v$archived_log
where next_time <= to_date('2022-02-10 19:40:00', 'yyyy-mm-dd hh24:mi:ss') and standby_dest='NO'
order by next_time desc;
NAME FIRST_TIME NEXT_TIME
---------------------------------------------------------------------- --------------------- ---------------------
/data01/arch01/orcl/1_18063_1066151367.arc 2022-02-10 18:51:01 2022-02-10 19:21:02
/data01/arch01/orcl/1_18062_1066151367.arc 2022-02-10 18:21:00 2022-02-10 18:51:01
/data01/arch01/orcl/1_18061_1066151367.arc 2022-02-10 17:51:02 2022-02-10 18:21:00
/data01/arch01/orcl/1_18060_1066151367.arc 2022-02-10 17:21:01 2022-02-10 17:51:02
/data01/arch01/orcl/1_18059_1066151367.arc 2022-02-10 16:51:01 2022-02-10 17:21:01
/data01/arch01/orcl/1_18058_1066151367.arc 2022-02-10 16:21:00 2022-02-10 16:51:01
/data01/arch01/orcl/1_18057_1066151367.arc 2022-02-10 15:51:02 2022-02-10 16:21:00
/data01/arch01/orcl/1_18056_1066151367.arc 2022-02-10 15:21:01 2022-02-10 15:51:02
/data01/arch01/orcl/1_18055_1066151367.arc 2022-02-10 14:51:00 2022-02-10 15:21:01
/data01/arch01/orcl/1_18054_1066151367.arc 2022-02-10 14:21:02 2022-02-10 14:51:00
5. 此时将日志按顺序添加到logminer进行分析,注意这里要按时间先后顺序。
begin
sys.dbms_logmnr.add_logfile(logfilename => '/data01/arch01/orcl/1_18057_1066151367.arc', ----归档日志路径
options => sys.dbms_logmnr.new); ----第一个日志要用new
sys.dbms_logmnr.add_logfile(logfilename => '/data01/arch01/orcl/1_18058_1066151367.arc',
options => sys.dbms_logmnr.addfile); ----之后的用addfile
sys.dbms_logmnr.add_logfile(logfilename => '/data01/arch01/orcl/1_18059_1066151367.arc',
options => sys.dbms_logmnr.addfile);
sys.dbms_logmnr.add_logfile(logfilename => '/data01/arch01/orcl/1_18060_1066151367.arc',
options => sys.dbms_logmnr.addfile);
sys.dbms_logmnr.add_logfile(logfilename => '/data01/arch01/orcl/1_18061_1066151367.arc',
options => sys.dbms_logmnr.addfile);
sys.dbms_logmnr.add_logfile(logfilename => '/data01/arch01/orcl/1_18062_1066151367.arc',
options => sys.dbms_logmnr.addfile);
end;
可以用下述方法查询加进来分析的日志包含的SCN范围和日期范围
select low_time,high_time,low_scn,next_scn from v$logmnr_logs;
LOW_TIME HIGH_TIME LOW_SCN NEXT_SCN
--------------------- --------------------- ---------- ----------
2022-02-10 16:51:01 2022-02-10 17:21:01 7.0395E+12 7.0395E+12
2022-02-10 17:21:01 2022-02-10 17:51:02 7.0395E+12 7.0395E+12
2022-02-10 17:51:02 2022-02-10 18:21:00 7.0395E+12 7.0395E+12
已选择 3 行。
6. 调用dbms_logmnr系统包,启动Log Miner开始挖掘日志:
begin
sys.dbms_logmnr.start_logmnr(options=>dbms_logmnr.dict_from_online_catalog);
end;
7. 从v$logmnr_contents表中,查看是否包含此事务的SQL语句:
set line 400
col MACHINE_NAME for a15
col OPERATION for a10
col SESSION_INFO for a60
col SQL_REDO for a100
select scn,timestamp,session#,serial#,session_info,operation,sql_redo,xidusn,xidslt,xidsqn,xid,row_id,machine_name
from v$logmnr_contents
where xidusn = 2
and xidslt = 33
and xidsqn = 98064;
SCN TIMESTAMP SESSION# SERIAL# SESSION_INFO OPERATION SQL_REDO XIDUSN XIDSLT XIDSQN XID ROW_ID MACHINE_NAME
---------- --------------------- ---------- ---------- ------------------------------------------------------------ ---------- ---------------------------------------------------------------------------------------------------- ---------- ---------- ---------- ---------------- ------------------ ---------------
7.0395E+12 2022-02-10 18:23:56 1139 58717 login_username=QM client_info= OS_username=oracle Machine_na START set transaction read write; 2 33 98064 02002100107F0100 AAAhzsAAAAAAAAAAAA dgvxl6632
me=dgvxl6632 OS_terminal=pts/1 OS_process_id=14052 OS_progra
m_name=sqlplus@dgvxl6632 (TNS V1-V3)
7.0395E+12 2022-02-10 18:23:56 1139 58717 login_username=QM client_info= OS_username=oracle Machine_na UPDATE update "QM"."TX_EG" set "TXT" = 'Garbage' where "TXT" = 'First' and ROWID = 'AAAhzsAAKAAAB+FAAA'; 2 33 98064 02002100107F0100 AAAhzsAAKAAAB+FAAA dgvxl6632
me=dgvxl6632 OS_terminal=pts/1 OS_process_id=14052 OS_progra
m_name=sqlplus@dgvxl6632 (TNS V1-V3)
已选择 2 行。
8. 最后,结束分析
begin
sys.dbms_logmnr.end_logmnr;
end;
注意:
可能需要尝试很多日志才能找到该SQL,因为不知道具体该holder在什么时间发起的事务。
您可以每次尝试一个归档,然后按时间顺序向前推溯。
注意找到的SQL并不是原始执行的SQL,而是变成rowid的形式。
9. 总结:
- 从ASH dump可知session 2839 和session 1136 从2022-02-10 18:24:16后一直等待enq: TX - row lock contention,其holder为session 1139
- 根据等待的p1 p2 p3可以得知holder session 1139 的transaction的XIDUSN XIDSLOT XIDSQN分别为 2 33 98064
- 根据XIDUSN XIDSLOT XIDSQN使用logminer找出2022-02-10 18:24:16之前的archive log进行挖掘,
可知session 1139 (holder,machine=dgvxl6632)在2022-02-10 18:23:56发起了一个transaction执行了如下SQL,但是未提交(未见commit):update "QM"."TX_EG" set "TXT" = 'Garbage' where "TXT" = 'First' and ROWID = 'AAAhzsAAKAAAB+FAAA';
- 再结合waiter session (session 2839 )的执行语句 1ssx3w9sj0sub:
UPDATE tx_eg SET txt='Garbage' WHERE num=1
可知update 表 tx_eg 时,如果更新的记录重复则会产生enq: TX - row lock contention。
参考文档:
Waits for 'Enq: Tx - Row Lock Contention' - Wait Scenario Examples (Doc ID 62354.1)
Analyzing Locking Issues with LOGMINER (Doc ID 198828.1)