姚远在一个有两万个客户的公司做数据库支持,什么稀奇古怪的事情都能遇到,有个客户的数据库不停地产生大量的trace,经常把硬盘撑爆,看看姚远怎么解决这个问题的。
trace文件的命令规则是,前端:实例名_ora_进程号.trc,对于后台进程,就把ora换成进程名。根据进程号和时间点分析,这些trace文件是每天凌晨3点30时的合成增量备份的rman进程产生的,Oracle的metelink网站给出了解决方法,参见Document 29061016.8,打补丁即可解决。
Bug 29061016 - huge tracefile generated by rman incremental backups with kcbtse structure (Doc ID 29061016.8) Oracle Support
但客户申请对生产系统打补丁的流程很长,目前短期内只能保守治疗,手工删除trace文件。姚远推荐客户可以在adrci中删除,例如一天内的trace文件都删除掉:
adrci> purge -age 3600 -type trace
最好设置自动删除策略,先查询一下默认的设置
adrci> show homeADR Homes: diag/rdbms/small/smalldiag/rdbms/orcl1/orcl1diag/rdbms/aurreum/aurreumdiag/rdbms/orcl/orcldiag/clients/user_oracle/host_3498212516_110diag/tnslsnr/dell/listener1adrci> set home diag/rdbms/small/smalladrci> show control ADR Home = /u01/app/oracle/diag/rdbms/small/small:*************************************************************************ADRID SHORTP_POLICY LONGP_POLICY LAST_MOD_TIME LAST_AUTOPRG_TIME LAST_MANUPRG_TIME ADRDIR_VERSION ADRSCHM_VERSION ADRSCHMV_SUMMARY ADRALERT_VERSION CREATE_TIME SIZEP_POLICY PURGE_PERIOD FLAGS PURGE_THRESHOLD -------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- 114559742 720 8760 2021-06-25 11:56:19.334671 +08:00 2022-08-05 12:21:51.844023 +08:00 1 2 110 1 2021-06-25 11:56:19.334671 +08:00 18446744073709551615 0 0 95 1 row fetched
- SHORTP_POLICY是720,单位小时,表示一个月,用于Incident and health monitor warnings
- LONGP_POLICY是8760,单位小时,表示1年,用于 trace and core dump files
- LAST_AUTOPRG_TIME 上次自动删除的时间
- LAST_MANUPRG_TIME为空,表示没有手动删除过
下面的命令都设置成3天72小时,或者一周168小时。
adrci> set control (SHORTP_POLICY=168)adrci> set control (LONGP_POLICY=168)adrci> show control ADR Home = /u01/app/oracle/diag/rdbms/small/small:*************************************************************************ADRID SHORTP_POLICY LONGP_POLICY LAST_MOD_TIME LAST_AUTOPRG_TIME LAST_MANUPRG_TIME ADRDIR_VERSION ADRSCHM_VERSION ADRSCHMV_SUMMARY ADRALERT_VERSION CREATE_TIME SIZEP_POLICY PURGE_PERIOD FLAGS PURGE_THRESHOLD -------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- 114559742 168 168 2022-08-05 15:47:32.029723 +08:00 2022-08-05 12:21:51.844023 +08:00 1 2 110 1 2021-06-25 11:56:19.334671 +08:00 18446744073709551615 0 0 95 1 row fetched
运行下面的purge命令,发现LAST_MANUPRG_TIME时间已经有了,trace文件被删除掉了。
adrci> purgeadrci> show control ADR Home = /u01/app/oracle/diag/rdbms/small/small:*************************************************************************ADRID SHORTP_POLICY LONGP_POLICY LAST_MOD_TIME LAST_AUTOPRG_TIME LAST_MANUPRG_TIME ADRDIR_VERSION ADRSCHM_VERSION ADRSCHMV_SUMMARY ADRALERT_VERSION CREATE_TIME SIZEP_POLICY PURGE_PERIOD FLAGS PURGE_THRESHOLD -------------------- -------------------- -------------------- ---------------------------------------- ---------------------------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- ---------------------------------------- -------------------- -------------------- -------------------- -------------------- 114559742 168 168 2022-08-05 15:47:32.029723 +08:00 2022-08-05 12:21:51.844023 +08:00 2022-08-05 15:48:43.927238 +08:00 1 2 110 1 2021-06-25 11:56:19.334671 +08:00 18446744073709551615 0 0 95 1 row fetched
姚远提供一个脚本,可以对不同的ADR Home批量进行设置
#!/bin/shfor ADRHOME in `adrci exec="show home"`do if [$ADRHOME ="ADR" -o $ADRHOME = "Homes:"] then continue; fi echo $ADRHOME adrci<<EOF set home $ADRHOME set control (SHORTP_POLICY=168) set control (LONGP_POLICY=168) purge exitEOFdone