[20170104]ORA-08103错误的模拟和诊断.txt
参考连接http://www.oratea.com/2016/12/10/ora-08103%e9%94%99%e8%af%af%e7%9a%84%e6%a8%a1%e6%8b%9f%e5%92%8c%e8%af%8a%e6%96%ad/,重复测试:
$ oerr ora 8103
08103, 00000, "object no longer exists"
// *Cause: The object has been deleted by another user since the operation
// began, or a prior incomplete recovery restored the database to
// a point in time during the deletion of the object.
// *Action: Delete the object if this is the result of an incomplete
// recovery.
在 Master Note for Handling Oracle Database Corruption Issues (文档 ID 1088018.1)中对ORA-8103错误的描述如下:
The object has been deleted by another user since the operation began.
If the error is reproducible, following may be the reasons:-
a.) The header block has an invalid block type.
b.) The data_object_id (seg/obj) stored in the block is different than the data_object_id stored in the segment header.
See dba_objects.data_object_id and compare it to the decimal value stored in the block (field seg/obj).
文档解释为:ORA-8103错误如果只发生1次,那表明该会话操作前,对象被其它会话删除了。
如果ORA-8103错误重复发生,可能有两个原因:
a) 段头是坏块
b) 数据块上存储的data_object_id和段头上存储的data_object_id不一致。
1.环境:
SCOTT@book> @ &r/ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
SCOTT@book> create table t1 tablespace users as select * from all_objects;
Table created.
SCOTT@book> select object_id,data_object_id from dba_objects where object_name = 'T1' and owner = user;
OBJECT_ID DATA_OBJECT_ID
---------- --------------
89097 89097
--89097 = 0x15c09
SCOTT@book> select segment_type, HEADER_FILE, HEADER_BLOCK from dba_segments where owner = user and segment_name = 'T1';
SEGMENT_TYPE HEADER_FILE HEADER_BLOCK
------------------ ----------- ------------
TABLE 4 522
SCOTT@book> select FILE_ID, block_id, blocks from dba_extents where owner = user and segment_name = 'T1';
FILE_ID BLOCK_ID BLOCKS
---------- ---------- ----------
4 520 8
4 528 8
4 536 8
4 544 8
4 552 8
4 560 8
4 568 8
4 576 8
4 584 8
4 592 8
4 600 8
4 608 8
4 616 8
4 624 8
4 632 8
4 640 8
4 768 128
4 896 128
4 1024 128
4 1152 128
4 1280 128
4 1408 128
4 1536 128
4 1664 128
4 1792 128
25 rows selected.
2.模拟段头损坏:
SCOTT@book> alter system checkpoint ;
System altered.
BBED> set dba 4,522
DBA 0x0100020a (16777738 4,522)
BBED> dump /v count 32
File: /mnt/ramdisk/book/users01.dbf (4)
Block: 522 Offsets: 0 to 31 Dba:0x0100020a
-----------------------------------------------------------------------------------------------------------
23a20000 0a020001 cf113400 00000104 35f20000 00000000 00000000 00000000 l #.........4.....5...............
<32 bytes per line>
BBED> m /x 24a2 offset 0
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
File: /mnt/ramdisk/book/users01.dbf (4)
Block: 522 Offsets: 0 to 31 Dba:0x0100020a
---------------------------------------------------------------------------------------------------
24a20000 0a020001 cf113400 00000104 35f20000 00000000 00000000 00000000
<64 bytes per line>
SCOTT@book> alter system flush buffer_cache;
System altered.
SCOTT@book> select count(*) from t1;
select count(*) from t1
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 522)
ORA-01110: data file 4: '/mnt/ramdisk/book/users01.dbf'
--//如果段头为坏块,则读取直接报坏块。
3.修改段头上的data_object_id:
--//还原
BBED> m /x 23a2 offset 0
File: /mnt/ramdisk/book/users01.dbf (4)
Block: 522 Offsets: 0 to 31 Dba:0x0100020a
---------------------------------------------------------------------------------------------------
23a20000 0a020001 cf113400 00000104 32f20000 00000000 00000000 00000000
<64 bytes per line>
BBED> sum apply
Check value for File 4, Block 522:
current = 0xf235, required = 0xf235
SCOTT@book> alter system flush buffer_cache;
System altered.
SCOTT@book> select count(*) from t1;
COUNT(*)
----------
84752
SCOTT@book> @ &r/10to16 89097
10 to 16 HEX REVERSE16
-------------- ------------------
0000000015c09 0x095c0100
SCOTT@book> @ &r/bbvi 4 522
BVI_COMMAND
----------------------------------------------------
bvi -b 4276224 -s 8192 /mnt/ramdisk/book/users01.dbf
--//检索095c0100,09换成0a.
BBED> set dba 4,522
DBA 0x0100020a (16777738 4,522)
BBED> sum apply
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
Check value for File 4, Block 522:
current = 0xf236, required = 0xf236
SCOTT@book> alter system flush buffer_cache;
System altered.
SCOTT@book> select count(*) from t1;
select count(*) from t1
*
ERROR at line 1:
ORA-08103: object no longer exists
--总结:段头上的data_object_id出现问题,导致ORA-08103错误发生
4.模拟数据块data_object_id与段头不一致:
--还原略.
select owner, object_name,
dbms_rowid.rowid_relative_fno(rowid) as fno,
dbms_rowid.rowid_block_number(rowid) as bno,
dbms_rowid.rowid_row_number(rowid) as rno,
to_char(dbms_utility.make_data_block_address(dbms_rowid.rowid_relative_fno(rowid),
dbms_rowid.rowid_block_number(rowid)), 'xxxxxxxxxx') as dba
7 from t1 where object_id = 7840;
OWNER OBJECT_NAME FNO BNO RNO DBA
------ -------------------- ---------- ---------- ---------- -----------
SYS V_$DIAG_EM_TARGET_IN 4 625 46 1000271
FO
BBED> set dba 4,625
DBA 0x01000271 (16777841 4,625)
BBED> p /d ktbbh.ktbbhsid
union ktbbhsid, 4 bytes @24
ub4 ktbbhsg1 @24 89097
ub4 ktbbhod1 @24 89097
--查询相关文档,ktbbhsg1=>对象段号,ktbbhod1==>对象号.感觉不对,这个类型union.
BBED> assign ktbbh.ktbbhsid.ktbbhsg1=89098
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
ub4 ktbbhsg1 @24 0x00015c0a
BBED> p /d ktbbh.ktbbhsid
union ktbbhsid, 4 bytes @24
ub4 ktbbhsg1 @24 89098
ub4 ktbbhod1 @24 89098
--噢,一起修改,也验证了前面的判断.
BBED> sum apply
Check value for File 4, Block 625:
current = 0xee92, required = 0xee92
SCOTT@book> alter system flush buffer_cache;
System altered.
SCOTT@book> select count(*) from t1;
select count(*) from t1
*
ERROR at line 1:
ORA-08103: object no longer exists
SCOTT@book> select count(*) from t1 where rownum<=10 ;
COUNT(*)
----------
10
--//问题来了如何判断.那个块存在问题.
$ dbv file=/mnt/ramdisk/book/users01.dbf
DBVERIFY: Release 11.2.0.4.0 - Production on Wed Jan 4 09:41:27 2017
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
DBVERIFY - Verification starting : FILE = /mnt/ramdisk/book/users01.dbf
DBVERIFY - Verification complete
Total Pages Examined : 16000
Total Pages Processed (Data) : 11459
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 3585
Total Pages Failing (Index): 0
Total Pages Processed (Other): 690
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 266
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Total Pages Encrypted : 0
Highest block SCN : 3412431 (0.3412431)
RMAN> validate datafile 4;
Starting validate at 2017-01-04 09:42:19
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=35 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=46 device type=DISK
allocated channel: ORA_DISK_3
channel ORA_DISK_3: SID=56 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00004 name=/mnt/ramdisk/book/users01.dbf
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4 OK 0 266 16027 3412431
File Name: /mnt/ramdisk/book/users01.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 11459
Index 0 3585
Other 0 690
Finished validate at 2017-01-04 09:42:21
--这样检查根本看不出来那块存在问题.另外backup check logical database;也不用.
SCOTT@book> alter session set events '8103 trace name errorstack level 3';
Session altered.
--//感觉监测8103就足够了.
SCOTT@book> select count(*) from t1;
select count(*) from t1
*
ERROR at line 1:
ORA-08103: object no longer exists
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)
----- Error Stack Dump -----
ORA-08103: object no longer exists
----- Current SQL Statement for this session (sql_id=5bc0v4my7dvr5) -----
select count(*) from t1
...
kcbzibmlt: dump suspect buffer, err=8103
buffer tsn: 4 rdba: 0x01000271 (4/625)
scn: 0x0000.003411ae seq: 0x02 flg: 0x04 tail: 0x11ae0602
frmt: 0x02 chkval: 0xee92 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x0000000062E16000 to 0x0000000062E18000
...
Block header dump: 0x01000271
Object id on Block? Y
seg/obj: 0x15c0a csc: 0x00.34118b itc: 3 flg: E typ: 1 - DATA
--obj不对.15c0a = 89098
brn: 1 bdba: 0x1000268 ver: 0x01 opc: 0
inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0xffff.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.0034118b
0x02 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
0x03 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
bdba: 0x01000271
data_block_dump,data header at 0x62e1607c
===============
tsiz: 0x1f80
hsiz: 0xa6
pbl: 0x62e1607c
76543210
flag=--------
ntab=1
nrow=74
frre=-1
fsbo=0xa6
fseo=0x408
avsp=0x362
tosp=0x362
...
Dump of buffer cache at level 8 for tsn=4 rdba=16777841
BH (0x62fea448) file#: 4 rdba: 0x01000271 (4/625) class: 1 ba: 0x62e16000
set: 69 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 2,19
dbwrid: 2 obj: 89097 objn: 89097 tsn: 4 afn: 4 hint: f
hash: [0x73fef7e0,0x849e1bd0] lru: [0x62fea670,0x842000a8]
ckptq: [NULL] fileq: [NULL] objq: [0x633e7ab8,0x7d249880] objaq: [0x633e7ac8,0x7d249870]
use: [0x8448b790,0x8448b790] wait: [NULL]
st: READING md: EXCL tch: 0
flags: only_sequential_access
Using State Objects
----------------------------------------
SO: 0x8448b710, type: 38, owner: 0x855656e8, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
proc=0x8591f618, name=buffer handle, file=kcb2.h LINE:2761, pg=0
(buffer) (CR) PR: 0x8591f618 FLG: 0x0
class bit: 0x0
cr[0]:
sh[0]:
kcbbfbp: [BH: 0x62fea448, LINK: 0x8448b790]
type: normal pin
multiblock read ptr: 0x7da5dc80, count: 127
where: kdswh11: kdst_fetch, why: 0
BH (0x73fef728) file#: 4 rdba: 0x01000271 (4/625) class: 1 ba: 0x73e9e000
set: 64 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 4,19
dbwrid: 0 obj: 89075 objn: 89075 tsn: 4 afn: 4 hint: f
hash: [0x849e1bd0,0x62fea500] lru: [0x73fef950,0x73fef6e0]
lru-flags: on_auxiliary_list
ckptq: [NULL] fileq: [NULL] objq: [NULL] objaq: [NULL]
st: FREE md: NULL fpin: 'kdswh02: kdsgrp' tch: 0 lfb: 33
flags:
--//在数据缓存是对的.
5.换一种方法测试(利用10231事件):
$ oerr ora 10231
10231, 00000, "skip corrupted blocks on _table_scans_"
// *Cause:
// *Action: such blocks are skipped in table scans, and listed in trace files
--alter session set db_file_multiblock_read_count=1 ;
ALTER SESSION SET EVENTS '10231 TRACE NAME CONTEXT FOREVER, LEVEL 10';
ALTER SESSION SET events '10046 trace name context forever, level 12';
--退出重新登录.补充我感觉不需要设置设置db_file_multiblock_read_count=1 ; 估计10046事件也不需要,看上面的信息.
ALTER SESSION SET EVENTS '10231 TRACE NAME CONTEXT FOREVER, LEVEL 10';
ALTER SESSION SET events '10046 trace name context forever, level 12';
SCOTT@book> select count(*) from t1;
select count(*) from t1
*
ERROR at line 1:
ORA-08103: object no longer exists
--有点意外,10231事件并没有跳过这个有问题的块,直接报错.想想也正常,这个块通过了dbv以及rman检查测试.
select count(*) from t1
END OF STMT
PARSE #139737249221224:c=67990,e=68161,p=24,cr=440,cu=0,mis=1,r=0,dep=0,og=1,plh=3724264953,tim=1483494987390074
EXEC #139737249221224:c=0,e=26,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3724264953,tim=1483494987390185
WAIT #139737249221224: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1483494987390254
WAIT #139737249221224: nam='Disk file operations I/O' ela= 29 FileOperation=2 fileno=4 filetype=2 obj#=89097 tim=1483494987390566
WAIT #139737249221224: nam='direct path read' ela= 145 file number=4 first dba=523 block cnt=13 obj#=89097 tim=1483494987390762
WAIT #139737249221224: nam='direct path read' ela= 134 file number=4 first dba=537 block cnt=15 obj#=89097 tim=1483494987390977
WAIT #139737249221224: nam='direct path read' ela= 66 file number=4 first dba=553 block cnt=15 obj#=89097 tim=1483494987391204
WAIT #139737249221224: nam='direct path read' ela= 43 file number=4 first dba=569 block cnt=15 obj#=89097 tim=1483494987391402
WAIT #139737249221224: nam='direct path read' ela= 43 file number=4 first dba=585 block cnt=15 obj#=89097 tim=1483494987391606
WAIT #139737249221224: nam='direct path read' ela= 44 file number=4 first dba=601 block cnt=15 obj#=89097 tim=1483494987391818
WAIT #139737249221224: nam='direct path read' ela= 43 file number=4 first dba=617 block cnt=15 obj#=89097 tim=1483494987392017
WAIT #139737249221224: nam='direct path read' ela= 43 file number=4 first dba=633 block cnt=15 obj#=89097 tim=1483494987392208
WAIT #139737249221224: nam='db file sequential read' ela= 9 file#=4 block#=625 blocks=1 obj#=89097 tim=1483494987392416
kcbzibmlt: dump suspect buffer, err=8103
buffer tsn: 4 rdba: 0x01000271 (4/625)
scn: 0x0000.003411ae seq: 0x02 flg: 0x04 tail: 0x11ae0602
frmt: 0x02 chkval: 0xee92 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x0000000061B84000 to 0x0000000061B86000
--我后面测试仅仅10046也能发现问题. 单独使用10231也可以,不在另行测试.
6.如何跳过,这种情况在真正的生产环境很难出现:
--应该损坏的块对于的表结构跟原来的块对于的表结构一样.要通过扫描,先破坏这个块.设置10231事件.
BBED> set dba 4,625
DBA 0x01000271 (16777841 4,625)
BBED> assign kcbh.type_kcbh=0x07
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
ub1 type_kcbh @0 0x07
BBED> sum apply
Check value for File 4, Block 625:
current = 0xee93, required = 0xee93
SCOTT@book> ALTER SESSION SET EVENTS '10231 TRACE NAME CONTEXT FOREVER, LEVEL 10';
Session altered.
SCOTT@book> alter system flush buffer_cache;
System altered.
SCOTT@book> select count(*) from t1;
COUNT(*)
----------
84678
--检查跟踪可以发现坏块.dba=4,625.
7.补充学习bbed修复:
BBED> set dba 4,625
DBA 0x01000271 (16777841 4,625)
BBED> assign kcbh.type_kcbh=0x06
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
ub1 type_kcbh @0 0x06
BBED> assign ktbbh.ktbbhsid.ktbbhsg1=89097
ub4 ktbbhsg1 @24 0x00015c09
BBED> p /d ktbbh.ktbbhsid
union ktbbhsid, 4 bytes @24
ub4 ktbbhsg1 @24 89097
ub4 ktbbhod1 @24 89097
BBED> sum apply
Check value for File 4, Block 625:
current = 0xee91, required = 0xee91
SCOTT@book> alter system flush buffer_cache;
System altered.
SCOTT@book> select count(*) from t1;
COUNT(*)
----------
84752
--对比前面比原来的数据多.