最近客户有个测试项目,需要将大量的文本文件用sqlloder加载到oracle数据库中。特此做出如下实验,演练一下sqlloader的几个重要参数。
主要参考官方文档上关于redo产生的控制
Minimize use of the redo log
One way to speed a direct load dramatically is to minimize use of the redo log. There are three ways to do this. You can disable archiving, you can specify that the load is unrecoverable, or you can set the SQL NOLOGGING parameter for the objects being loaded. This section discusses all methods.
关闭归档日志模式
关闭所有实例
# srvctl stop instance -d devdb -i "devdb1,devdb2”
mount节点1
# srvctl start instance -d devdb -n node1 -o mount
关闭归档
SYS@devdb1 >alter database noarchivelog;
关闭所有实例
# srvctl stop instance -d devdb -i "devdb1,devdb2”
启动所有实例
# srvctl start instance -d devdb -i "devdb1,devdb2"
查看sqlloader所需的文件及大小
$ du -sh *
4.0K c_stock.ctl <==控制文件
1021M c_stock_large.txt <==数据文件为大小为 1021MB将近1GB,同等大小的文件还有N个
4.0K c_stock.log <==输入日志文件
控制文件
$ cat c_stock.ctl
load data
into table tpcc.c_stock
append
FIELDS TERMINATED BY '|'
(
s_i_id,
s_w_id,
s_quantity,
s_dist_01,
s_dist_02,
s_dist_03,
s_dist_04,
s_dist_05,
s_dist_06,
s_dist_07,
s_dist_08,
s_dist_09,
s_dist_10,
s_data,
s_ytd,
s_order_cnt,
s_remote_cnt
)
数据文件
$ cat c_stock.txt
1|1|80|RfIBeBlKPqdPouSIcIdvFmh3|3gZUHn1oqzPvY03Df4iroZZo|x2J0ePT5VJPsUGH93ksRbjie|drvpekDBabpcQ85BFKUqepn5|zFKyOSAP7V4b3xA1wP3cMWEh|wmvyBIrYbZnNINsDxkDoG3g2|H75jUxoeJKB8gSUiI5Te
JcH4|m8iQ0mJxkEE2ZS4wqvsxccDK|8mMFmhzxgFd6QMrYeTQ1BKnQ|4Fb6GEEEOg9ZnwmtZnnFYEt0|7SLI5rAzsw5U0zy6wnhQxBFLM9zfspqq71W1imqyIjiwGHq|0|0|0
2|1|71|SxGgW9Rz6eCprSDpKneQzsf5|znr5SFVB2pHPmmcjoEwFmXSU|aWzxcEsBPHuycfX2ssFDCHMQ|96lkT22ROrePTuez1zV1EGRV|YnoqSY4SUf2B7UiKaknTErgt|QZkjv12hcgvV2pDKttbobhWb|r9VVrZe5Pme8b6geazX2
PoC8|HBmIG7IX6rGlgIhTSjSSeWWc|lJ3YVvWsV7Yr4wdYNHaSefBW|mhEq3ordVk1GFMYoHLGB5HpJ|npisQ5FWxTNMaQvkG2OiO2Pnwej3rS6F5eXL|0|0|0
…….
……. 略去剩余部分
观察当前的log序列号
SYS@devdb1 >select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARCHIVED STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
1 1 133 52428800 512 2 NO CURRENT 1935103 2014/07/23 19:16:57 2.8147E+14
2 1 132 52428800 512 2 NO INACTIVE 1929910 2014/07/23 19:16:45 1935103 2014/07/23 19:16:57
3 2 51 52428800 512 2 NO CURRENT 1696072 2014/07/23 18:39:50 2.8147E+14
4 2 50 52428800 512 2 NO INACTIVE 1660718 2014/07/23 13:53:34 1696072 2014/07/23 18:39:50
在oracle用户下执行sqlloader,参数silent=all 不输出屏幕信息
$ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt silent=all
查看日志输出
$ cat c_stock.log
SQL*Loader: Release 11.2.0.3.0 - Production on Wed Jul 23 19:31:46 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Control File: c_stock.ctl
Data File: c_stock_large.txt
Bad File: c_stock_large.bad
Discard File: none specified
(Allow all discards)
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array: 64 rows, maximum of 256000 bytes
Continuation: none specified
Path used: Conventional
Silent options: FEEDBACK, ERRORS and DISCARDS
Table TPCC.C_STOCK, loaded from every logical record.
Insert option in effect for this table: APPEND
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
S_I_ID FIRST * | CHARACTER
S_W_ID NEXT * | CHARACTER
S_QUANTITY NEXT * | CHARACTER
S_DIST_01 NEXT * | CHARACTER
S_DIST_02 NEXT * | CHARACTER
S_DIST_03 NEXT * | CHARACTER
S_DIST_04 NEXT * | CHARACTER
S_DIST_05 NEXT * | CHARACTER
S_DIST_06 NEXT * | CHARACTER
S_DIST_07 NEXT * | CHARACTER
S_DIST_08 NEXT * | CHARACTER
S_DIST_09 NEXT * | CHARACTER
S_DIST_10 NEXT * | CHARACTER
S_DATA NEXT * | CHARACTER
S_YTD NEXT * | CHARACTER
S_ORDER_CNT NEXT * | CHARACTER
S_REMOTE_CNT NEXT * | CHARACTER
value used for ROWS parameter changed from 64 to 58
Table TPCC.C_STOCK:
3500000 Rows successfully loaded. <==成功读取350万条记录
0 Rows not loaded due to data errors. <==没有错误
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Space allocated for bind array: 254388 bytes(58 rows)
Read buffer bytes: 1048576
Total logical records skipped: 0
Total logical records read: 3500000
Total logical records rejected: 0
Total logical records discarded: 0
Run began on Wed Jul 23 19:31:46 2014
Run ended on Wed Jul 23 19:36:11 2014
Elapsed time was: 00:04:24.27 <==执行时间4分钟24秒27
CPU time was: 00:00:50.02
查看执行sqlloader后log的变化,测试环境没有其他应用,可以认为log的变化全部来自于sqlloder加载数据产生。
SYS@devdb1 >select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARCHIVED STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
1 1 159 52428800 512 2 NO CURRENT 2153818 2014/07/23 19:36:03 2.8147E+14
2 1 158 52428800 512 2 NO INACTIVE 2148648 2014/07/23 19:35:53 2153818 2014/07/23 19:36:03
3 2 51 52428800 512 2 NO CURRENT 1696072 2014/07/23 18:39:50 2.8147E+14
4 2 50 52428800 512 2 NO INACTIVE 1660718 2014/07/23 13:53:34 1696072 2014/07/23 18:39:50
本机是rac双节点环境,sqlloader在节点一的服务器上执行,可以看出日志的序列从刚才的133,132变化到了159,158。 节点二的日志不发生变化。
下面开始优化sqlloader
一,优化:直接路径加载
SYS@devdb1 >truncate table tpcc.c_stock; <==清理环境
$ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt direct=true
$ cat c_stock.log
…...
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Continuation: none specified
Path used: Direct <==直接路径加载
…..
3500000 Rows successfully loaded. <==成功读取350万条记录
…...
Run began on Wed Jul 23 19:54:40 2014
Run ended on Wed Jul 23 19:55:10 2014
Elapsed time was: 00:00:30.31 <==执行时间0分钟30秒31, 效果显著!
CPU time was: 00:00:14.94
SYS@devdb1 >select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARCHIVED STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
1 1 159 52428800 512 2 NO INACTIVE 2153818 2014/07/23 19:36:03 2161051 2014/07/23 19:54:58
2 1 160 52428800 512 2 NO CURRENT 2161051 2014/07/23 19:54:58 2.8147E+14
3 2 51 52428800 512 2 NO CURRENT 1696072 2014/07/23 18:39:50 2.8147E+14
4 2 50 52428800 512 2 NO INACTIVE 1660718 2014/07/23 13:53:34 1696072 2014/07/23 18:39:50
虽然直接加载的方式不经过sql引擎,不会产生redo和undo,但是我们看到还是会有微乎其微的日志增长。
二,优化:nologging
SYS@devdb1 >truncate table tpcc.c_stock; <==清理环境
SYS@devdb1 >alter table tpcc.c_stock nologging; <==更改表属性,强制不生成日志
$ cat c_stock.log
…..
Path used: Direct
Run began on Wed Jul 23 20:07:54 2014
Run ended on Wed Jul 23 20:08:30 2014
Elapsed time was: 00:00:35.18 <==执行时间0分钟35秒18, 比刚才慢了将近5秒
CPU time was: 00:00:17.18
SYS@devdb1 >select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARCHIVED STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
1 1 159 52428800 512 2 NO INACTIVE 2153818 2014/07/23 19:36:03 2161051 2014/07/23 19:54:58
2 1 160 52428800 512 2 NO CURRENT 2161051 2014/07/23 19:54:58 2.8147E+14
3 2 51 52428800 512 2 NO CURRENT 1696072 2014/07/23 18:39:50 2.8147E+14
4 2 50 52428800 512 2 NO INACTIVE 1660718 2014/07/23 13:53:34 1696072 2014/07/23 18:39:50
日志序列已经不再变化了,可是并没有使速度变得更快,反而慢了5秒
三,优化:unrecoverable
SYS@devdb1 >truncate table tpcc.c_stock;
unrecoverable <==在控制文件头加入该参数,不再向数据库控制文件中写入SCN
load data
into table tpcc.c_stock
append
FIELDS TERMINATED BY '|'
(
s_i_id,
s_w_id,
s_quantity,
s_dist_01,
s_dist_02,
s_dist_03,
s_dist_04,
s_dist_05,
s_dist_06,
s_dist_07,
s_dist_08,
s_dist_09,
s_dist_10,
s_data,
s_ytd,
s_order_cnt,
s_remote_cnt
)
$ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt direct=true silent=all
$ cat c_stock.log
…..
Path used: Direct
,,,
3500000 Rows successfully loaded.
….
Run began on Wed Jul 23 20:17:57 2014
Run ended on Wed Jul 23 20:18:29 2014
Elapsed time was: 00:00:32.90 <==执行时间0分钟32秒90
CPU time was: 00:00:16.33
效果平平,没有什么变化
四,优化 — parallel
SYS@devdb1 >truncate table tpcc.c_stock;
$ $ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt direct=true parallel=true
$ cat c_stock.log
…..
Path used: Direct - with parallel option. <==直接路径加载,并行全开启了
…..
Run began on Wed Jul 23 20:23:36 2014
Run ended on Wed Jul 23 20:24:12 2014
...
Elapsed time was: 00:00:35.53 <==执行时间0分钟35秒53, 最慢的一次。
CPU time was: 00:00:19.03
实验总结:
0,如果是测试环境做数据加载,一定要先关闭归档日志(alter database no archivelog)。如果归档日志不关闭,后面的所有优化都没有效果。
1, sqlloder的direct 加载方式效果最突出。
2,设置表为nologing虽然可以彻底避免日志的生成,但是优化的效果已经不明显了。
3,parallel需要在同时执行多个控制文件,访问多个数据文件是才会有效果。例如:
sqlldr USERID=scott CONTROL=load1.ctl DIRECT=TRUE PARALLEL=true
sqlldr USERID=scott CONTROL=load2.ctl DIRECT=TRUE PARALLEL=true
sqlldr USERID=scott CONTROL=load3.ctl DIRECT=TRUE PARALLEL=true
本文转自ICT时空 dbasdk博客,原文链接:SQL*Loader 笔记 (二) 性能优化 2014,如需转载请自行联系原博主。