oracle IO性能分析

简介: 我们在IO问题发生时间段:1,可以查dba_hist_system_event 来看该IO相关等待时间的平均响应时间的变化趋势, 2,直接查dba_hist_event_histogram或者sys.WRH$_EVENT_HISTOGRAM , 来看该IO相关等待事件的等待时间直方图分布。

我们在IO问题发生时间段:
1,可以查dba_hist_system_event 来看该IO相关等待时间的平均响应时间的变化趋势,
2,直接查dba_hist_event_histogram或者sys.WRH$_EVENT_HISTOGRAM , 来看该IO相关等待事件的等待时间直方图分布。
3, 我们也可以从整体数据库的IO变化量上(dba_hist_sysstat), 看出问题时段的总体IO量, 比正常时候的IO量到底是否有增加。
4, 我们也可以从某个IO function 即某个进程上面(dba_hist_iostat_function),看问题时间段某个进程的IO request次数、IO 的量。
比如说LGWR进程的写request次数、 写的量。

说明:
1, 如上视图均可关联dba_hist_snapshot
2, 所有dba_hist 相关视图都是由对应的v$ 动态性能视图汇聚而来, 当时的数据可以查对应的v$视图。
福利: 后附sql。

例2:
直方图分布可能更加精确, 因为查看平均响应时间可能会因为粒度太粗、时间跨度长而被平均了。
正常情况下,应该<8ms的占据了90%甚至更多, > 32ms 的应该占比接近为0。
可以看到, 昨天smscp异常时候,在16 – 32 ms 之间的已经占比很多了。

   END_INTERVAL_TIME    SNAP_ID    TIME_HISTG    TOTAL_WAIT    PER_OF_WAITS

1 22-10月-14 03.00.49.801 下午 20536 <1ms 89464 0.71
2 22-10月-14 03.00.49.801 下午 20536 <2ms 89464 11.32
3 22-10月-14 03.00.49.801 下午 20536 <4ms 89464 26.39
4 22-10月-14 03.00.49.801 下午 20536 <8ms 89464 18.28
5 22-10月-14 03.00.49.801 下午 20536 <16ms 89464 13.16
6 22-10月-14 03.00.49.801 下午 20536 <32ms 89464 8.21
7 22-10月-14 03.00.49.801 下午 20536 <64ms 89464 7.56
8 22-10月-14 03.00.49.801 下午 20536 <128ms 89464 6.06
9 22-10月-14 03.00.49.801 下午 20536 <256ms 89464 4.72
10 22-10月-14 03.00.49.801 下午 20536 <512ms 89464 2.72
11 22-10月-14 03.00.49.801 下午 20536 <1s 89464 0.83
12 22-10月-14 03.00.49.801 下午 20536 <2s 89464 0.03
13 22-10月-14 03.00.49.801 下午 20536 <4s 89464 0
14 22-10月-14 03.00.49.801 下午 20536 >=4s 89464 0

正常情况下是这样的,8ms以上的占比很少:

   END_INTERVAL_TIME    SNAP_ID    TIME_HISTG    TOTAL_WAIT    PER_OF_WAITS

1 22-10月-14 07.00.49.701 下午 20540 <1ms 1091640 38.28
2 22-10月-14 07.00.49.701 下午 20540 <2ms 1091640 51.09
3 22-10月-14 07.00.49.701 下午 20540 <4ms 1091640 9.03
4 22-10月-14 07.00.49.701 下午 20540 <8ms 1091640 1.37
5 22-10月-14 07.00.49.701 下午 20540 <16ms 1091640 0.17
6 22-10月-14 07.00.49.701 下午 20540 <32ms 1091640 0.04
7 22-10月-14 07.00.49.701 下午 20540 <64ms 1091640 0.01
8 22-10月-14 07.00.49.701 下午 20540 <128ms 1091640 0
9 22-10月-14 07.00.49.701 下午 20540 <256ms 1091640 0
10 22-10月-14 07.00.49.701 下午 20540 <512ms 1091640 0
11 22-10月-14 07.00.49.701 下午 20540 <1s 1091640 0
12 22-10月-14 07.00.49.701 下午 20540 <2s 1091640 0
13 22-10月-14 07.00.49.701 下午 20540 <4s 1091640 0
14 22-10月-14 07.00.49.701 下午 20540 >=4s 1091640 0

例3:
可以看到昨天下午的整库的physical read total IO requests 并没有明显增加, 但是physical read total bytes 增加到了3000GB 一个小时。
其它时候也有这么高的,但是都是在晚上, 只有昨天是在下午15点这个生产时间的snapshot,开发有大量direct path read 所以对应用产生了较大影响。

例4:可以看到问题发生时段,10.22 14点-15点, LGWR的large_write_megabytes 和requests有明显增加, 而 平时有如此高峰的是有晚上和凌晨。
Smll read 的bytes和request 都没有增加。

附sql:
--- 查等待事件的等待时间直方图分布:
select b.end_interval_time,a.*
from
(select snap_id,

   decode(wait_time_milli,
          1,
          '<1ms',
          2,
          '<2ms',
          4,
          '<4ms',
          8,
         '<8ms',
         16,
         '<16ms',
         32,
         '<32ms',
         64,
         '<64ms',
         128,
         '<128ms',
         256,
         '<256ms',
         512,
         '<512ms',
         1024,
         '<1s',
         2048,
         '<2s',
         4096,
         '<4s',
         8192,
         '>=4s',
         wait_time_milli) time_histg,
  total_wait,
  round(delta_wait_count / total_wait * 100, 2) as per_of_waits

from (select snap_id,

          wait_time_milli,
          delta_wait_count,
          sum(delta_wait_count) over(partition by snap_id order by snap_id) as total_wait
     from (select xx.snap_id,
                  xx.wait_time_milli,
                  xx.wait_count - yy.wait_count as delta_wait_count
             from dba_hist_event_histogram xx,
                  dba_hist_event_histogram yy
            where xx.event_name = 'log file parallel write'
              and xx.event_id = yy.event_id
              and xx.snap_id = yy.snap_id + 1
              and xx.instance_number = yy.instance_number
              and xx.wait_time_milli = yy.wait_time_milli
              and xx.snap_id in (20540)))

order by snap_id, wait_time_milli) a , dba_hist_snapshot b
where a.snap_id=b.snap_id;

--- 查等待事件的平均等待时间的变化趋势:
with t as (
select s.snap_id, s.instance_number, s.end_interval_time ,

     total_waits - lag(total_waits, 1) over (partition by s.instance_number order by s.snap_id) waits,
     (time_waited_micro - lag(time_waited_micro, 1) over (partition by s.instance_number order by s.snap_id))/1000 twt
from dba_hist_system_event ev, dba_hist_snapshot s

where ev.instance_number = s.instance_number and ev.snap_id = s.snap_id

 and event_name = 'log file parallel write'
 and s.end_interval_time BETWEEN TO_DATE('20140401 10:00:00', 'yyyymmdd hh24:mi:ss') AND  TO_DATE('20140414 11:00:00', 'yyyymmdd hh24:mi:ss') )

select to_char(end_interval_time,'YYYYMMDD HH24:MI'), instance_number, sum(waits), sum(twt), round(sum(twt)/sum(waits), 2) wt
from t
where (to_char(end_interval_time, 'hh24') between 9 and 12 or to_char(end_interval_time, 'hh24') between 14 and 17)
group by to_char(end_interval_time,'YYYYMMDD HH24:MI'), instance_number
order by 1, instance_number;

--- 查整库IO相关指标:

--get_stat_10g.sql
with s as (
select sn.instance_number, sn.snap_id, sn.end_interval_time stime, (s.value - lag(s.value, 1) over (partition by sn.instance_number order by sn.snap_id)) value
--(sn.end_interval_time - lag(sn.end_interval_time, 1) over (order by sn.snap_id))2460*60 elap_time,
--round((s.value - lag(s.value, 1) over (order by sn.snap_id))/((sn.end_interval_time - lag(sn.end_interval_time, 1) over (order by sn.snap_id))2460*60)) value_s
from dba_hist_snapshot sn, dba_hist_sysstat s
where s.snap_id = sn.snap_id and s.stat_name = '&1' and s.instance_number = sn.instance_number
)
select instance_number , snap_id, stime, value ,
rank() over (partition by instance_number order by value desc) ranking, min(value) over (partition by instance_number) min_phy,
max(value) over (partition by instance_number) max_phy, round(avg(value) over (partition by instance_number)) avg_phy
from s
where (to_char(stime, 'hh24') between 9 and 12 or to_char(stime, 'hh24') between 14 and 17)
and to_char(stime, 'D') not in ('1', '7')
and stime between to_date(&2, 'yyyymmdd') and to_date(&3, 'yyyymmdd')
order by instance_number, snap_id
/

---eg: @get_stat_10g.sql 'parse count (hard)' 20100716 20100726

---physical read total IO requests
---physical read total bytes
---physical write total IO requests
---physical write total bytes
---以上四个指标即IOPS(requests)、IO吞吐量(bytes)

--- 查整库IO相关指标2:

select e.snap_id,

   to_char(e.startup_time, 'yyyy-mm-dd hh24:mi:ss') instance_startup_time,
   to_char(e.end_interval_time, 'yyyy-mm-dd hh24:mi:ss') snapshot_end_time,
   b.value - a.value IO_requests,
   round((d.value - c.value)/1024/1024/1024) IO_GB

from dba_hist_sysstat a,

   dba_hist_sysstat  b,
   dba_hist_sysstat  c,
   dba_hist_sysstat  d,
   dba_hist_snapshot e

where a.stat_name = 'physical read total IO requests'
and b.stat_name = 'physical read total IO requests'
and c.stat_name = 'physical read total bytes'
and d.stat_name = 'physical read total bytes'
and a.snap_id = e.snap_id - 1
and b.snap_id = e.snap_id
and c.snap_id = e.snap_id - 1
and d.snap_id = e.snap_id
and e.end_interval_time BETWEEN

   TO_DATE('20141021 09:00:00', 'yyyymmdd hh24:mi:ss') AND
   TO_DATE('20141023 18:00:00', 'yyyymmdd hh24:mi:ss')

order by e.begin_interval_time;

--- 查某个IO 进程的读写指标, 例如LGWR:

SQL> select * from dba_hist_iostat_function_name;

  DBID FUNCTION_ID FUNCTION_NAME

---------- ----------- ------------------------------
1154030563 0 RMAN
1154030563 1 DBWR
1154030563 2 LGWR
1154030563 3 ARCH
1154030563 4 XDB
1154030563 5 Streams AQ
1154030563 6 Data Pump
1154030563 7 Recovery
1154030563 8 Buffer Cache Reads
1154030563 9 Direct Reads
1154030563 10 Direct Writes
1154030563 11 Smart Scan
1154030563 12 Archive Manager
1154030563 13 Others

14 rows selected

SQL>

--- 查某个IO进程例如LGWR的IO requset和IO bytes dba_hist_iostat_function:

select e.snap_id,

   to_char(e.end_interval_time, 'yyyy-mm-dd hh24:mi:ss') snapshot_end_time,
   b.SMALL_READ_MEGABYTES  - a.SMALL_READ_MEGABYTES     SMALL_READ_MEGABYTES   ,                                                                                                
   b.SMALL_WRITE_MEGABYTES - a.SMALL_WRITE_MEGABYTES    SMALL_WRITE_MEGABYTES  ,                                                                                              
   b.LARGE_READ_MEGABYTES  - a.LARGE_READ_MEGABYTES     LARGE_READ_MEGABYTES   ,                                                                                             
   b.LARGE_WRITE_MEGABYTES - a.LARGE_WRITE_MEGABYTES    LARGE_WRITE_MEGABYTES  ,                                                                                            
   b.SMALL_READ_REQS       - a.SMALL_READ_REQS          SMALL_READ_REQS        ,                                                                                           
   b.SMALL_WRITE_REQS      - a.SMALL_WRITE_REQS         SMALL_WRITE_REQS       ,                                                                                          
   b.LARGE_READ_REQS       - a.LARGE_READ_REQS          LARGE_READ_REQS        ,                                                                                         
   b.LARGE_WRITE_REQS      - a.LARGE_WRITE_REQS         LARGE_WRITE_REQS       ,                                                                                        
   b.NUMBER_OF_WAITS       - a.NUMBER_OF_WAITS          NUMBER_OF_WAITS        ,                                                                                       
   b.WAIT_TIME             - a.WAIT_TIME                WAIT_TIME                                                                        

from dba_hist_iostat_function a,

   dba_hist_iostat_function  b,
   dba_hist_snapshot e

where a.FUNCTION_NAME = 'LGWR'
and b.FUNCTION_NAME = 'LGWR'
and a.snap_id = e.snap_id - 1
and b.snap_id = e.snap_id
and e.end_interval_time BETWEEN

   TO_DATE('20141021 09:00:00', 'yyyymmdd hh24:mi:ss') AND
   TO_DATE('20141023 18:00:00', 'yyyymmdd hh24:mi:ss')

order by e.begin_interval_time;

目录
相关文章
|
3月前
|
存储 监控 Linux
性能分析之从 IO 高定位到具体文件
【8月更文挑战第21天】性能分析之从 IO 高定位到具体文件
44 0
性能分析之从 IO 高定位到具体文件
|
3月前
|
关系型数据库 MySQL Linux
性能分析之解决 jbd2 引起 IO 高问题
【8月更文挑战第19天】性能分析之解决 jbd2 引起 IO 高问题
216 0
|
SQL Oracle 关系型数据库
解决Oracle的状态: 失败 -测试失败: IO 错误: The Network Adapter could not establish the connection
解决Oracle的状态: 失败 -测试失败: IO 错误: The Network Adapter could not establish the connection
2100 0
解决Oracle的状态: 失败 -测试失败: IO 错误: The Network Adapter could not establish the connection
|
Oracle 关系型数据库
Oracle中filesystemio_options 和 disk_asynch_io 参数的设置
参考文档Doc ID 1987437.1 filesystemio_options参数,中间是这个参数的配置值。
368 0
|
SQL Oracle 网络协议
SQL Developer 连接 oracle数据库 报错 Io 异常 The Network Adapter could not establish the connection的三种解决方法
SQL Developer 连接 oracle数据库 报错 Io 异常 The Network Adapter could not establish the connection的三种解决方法
1316 0
SQL Developer 连接 oracle数据库 报错 Io 异常 The Network Adapter could not establish the connection的三种解决方法
|
SQL Oracle 关系型数据库
循序渐进解读Oracle AWR性能分析报告
Oracle中的AWR为我们分析数据库提供了非常好的便利条件,那如何解读AWR的数据呢?本文针对最为常见的一种报告——《AWR数据库报告》进行说明。
|
Oracle 关系型数据库 测试技术
|
1月前
|
存储 Oracle 关系型数据库
Oracle数据库的应用场景有哪些?
【10月更文挑战第15天】Oracle数据库的应用场景有哪些?
171 64
|
25天前
|
SQL Oracle 关系型数据库
Oracle数据库优化方法
【10月更文挑战第25天】Oracle数据库优化方法
33 7