SQL*Loader 笔记 (二) 性能优化 2014

简介:

最近客户有个测试项目,需要将大量的文本文件用sqlloder加载到oracle数据库中。特此做出如下实验,演练一下sqlloader的几个重要参数。

主要参考官方文档上关于redo产生的控制

Minimize use of the redo log 

One way to speed a direct load dramatically is to minimize use of the redo log. There are three ways to do this. You can disable archiving, you can specify that the load is unrecoverable, or you can set the SQL NOLOGGING parameter for the objects being loaded. This section discusses all methods.

关闭归档日志模式

关闭所有实例

# srvctl stop instance -d devdb -i "devdb1,devdb2”
mount节点1
# srvctl start instance -d devdb -n node1 -o mount
关闭归档
SYS@devdb1 >alter database noarchivelog;
关闭所有实例
# srvctl stop instance -d devdb -i "devdb1,devdb2”
启动所有实例
# srvctl start instance -d devdb -i "devdb1,devdb2"

查看sqlloader所需的文件及大小

$ du -sh *
4.0K    c_stock.ctl                <==控制文件
1021M   c_stock_large.txt   <==数据文件为大小为 1021MB将近1GB,同等大小的文件还有N个
4.0K    c_stock.log               <==输入日志文件

控制文件

$ cat c_stock.ctl
load data
into table tpcc.c_stock
append
FIELDS TERMINATED BY '|'
(
s_i_id,
s_w_id,
s_quantity,
s_dist_01,
s_dist_02,
s_dist_03,
s_dist_04,
s_dist_05,
s_dist_06,
s_dist_07,
s_dist_08,
s_dist_09,
s_dist_10,
s_data,
s_ytd,
s_order_cnt,
s_remote_cnt
)

数据文件

$ cat c_stock.txt
1|1|80|RfIBeBlKPqdPouSIcIdvFmh3|3gZUHn1oqzPvY03Df4iroZZo|x2J0ePT5VJPsUGH93ksRbjie|drvpekDBabpcQ85BFKUqepn5|zFKyOSAP7V4b3xA1wP3cMWEh|wmvyBIrYbZnNINsDxkDoG3g2|H75jUxoeJKB8gSUiI5Te
JcH4|m8iQ0mJxkEE2ZS4wqvsxccDK|8mMFmhzxgFd6QMrYeTQ1BKnQ|4Fb6GEEEOg9ZnwmtZnnFYEt0|7SLI5rAzsw5U0zy6wnhQxBFLM9zfspqq71W1imqyIjiwGHq|0|0|0
2|1|71|SxGgW9Rz6eCprSDpKneQzsf5|znr5SFVB2pHPmmcjoEwFmXSU|aWzxcEsBPHuycfX2ssFDCHMQ|96lkT22ROrePTuez1zV1EGRV|YnoqSY4SUf2B7UiKaknTErgt|QZkjv12hcgvV2pDKttbobhWb|r9VVrZe5Pme8b6geazX2
PoC8|HBmIG7IX6rGlgIhTSjSSeWWc|lJ3YVvWsV7Yr4wdYNHaSefBW|mhEq3ordVk1GFMYoHLGB5HpJ|npisQ5FWxTNMaQvkG2OiO2Pnwej3rS6F5eXL|0|0|0
…….
……. 略去剩余部分

观察当前的log序列号

SYS@devdb1 >select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
         1          1        133   52428800        512          2 NO        CURRENT          1935103 2014/07/23 19:16:57   2.8147E+14
         2          1        132   52428800        512          2 NO        INACTIVE         1929910 2014/07/23 19:16:45      1935103 2014/07/23 19:16:57
         3          2         51   52428800        512          2 NO        CURRENT          1696072 2014/07/23 18:39:50   2.8147E+14
         4          2         50   52428800        512          2 NO        INACTIVE         1660718 2014/07/23 13:53:34      1696072 2014/07/23 18:39:50

在oracle用户下执行sqlloader,参数silent=all 不输出屏幕信息

$ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt silent=all

查看日志输出

$ cat c_stock.log

SQL*Loader: Release 11.2.0.3.0 - Production on Wed Jul 23 19:31:46 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Control File:   c_stock.ctl
Data File:      c_stock_large.txt
Bad File:     c_stock_large.bad
Discard File:  none specified
 
 (Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array:     64 rows, maximum of 256000 bytes
Continuation:    none specified
Path used:      Conventional
Silent options: FEEDBACK, ERRORS and DISCARDS

Table TPCC.C_STOCK, loaded from every logical record.
Insert option in effect for this table: APPEND

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
S_I_ID                              FIRST     *   |       CHARACTER           
S_W_ID                               NEXT     *   |       CHARACTER           
S_QUANTITY                           NEXT     *   |       CHARACTER           
S_DIST_01                            NEXT     *   |       CHARACTER           
S_DIST_02                            NEXT     *   |       CHARACTER           
S_DIST_03                            NEXT     *   |       CHARACTER           
S_DIST_04                            NEXT     *   |       CHARACTER           
S_DIST_05                            NEXT     *   |       CHARACTER           
S_DIST_06                            NEXT     *   |       CHARACTER           
S_DIST_07                            NEXT     *   |       CHARACTER           
S_DIST_08                            NEXT     *   |       CHARACTER           
S_DIST_09                            NEXT     *   |       CHARACTER           
S_DIST_10                            NEXT     *   |       CHARACTER           
S_DATA                               NEXT     *   |       CHARACTER           
S_YTD                                NEXT     *   |       CHARACTER           
S_ORDER_CNT                          NEXT     *   |       CHARACTER           
S_REMOTE_CNT                         NEXT     *   |       CHARACTER           

value used for ROWS parameter changed from 64 to 58

Table TPCC.C_STOCK:
  3500000 Rows successfully loaded.      <==成功读取350万条记录
  0 Rows not loaded due to data errors.   <==没有错误
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.


Space allocated for bind array:                 254388 bytes(58 rows)
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:       3500000
Total logical records rejected:         0
Total logical records discarded:        0

Run began on Wed Jul 23 19:31:46 2014
Run ended on Wed Jul 23 19:36:11 2014

Elapsed time was:     00:04:24.27     <==执行时间4分钟24秒27
CPU time was:         00:00:50.02

查看执行sqlloader后log的变化,测试环境没有其他应用,可以认为log的变化全部来自于sqlloder加载数据产生。

SYS@devdb1 >select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
         1          1        159   52428800        512          2 NO        CURRENT          2153818 2014/07/23 19:36:03   2.8147E+14
         2          1        158   52428800        512          2 NO        INACTIVE         2148648 2014/07/23 19:35:53      2153818 2014/07/23 19:36:03
         3          2         51   52428800        512          2 NO        CURRENT          1696072 2014/07/23 18:39:50   2.8147E+14
         4          2         50   52428800        512          2 NO        INACTIVE         1660718 2014/07/23 13:53:34      1696072 2014/07/23 18:39:50

本机是rac双节点环境,sqlloader在节点一的服务器上执行,可以看出日志的序列从刚才的133,132变化到了159,158。 节点二的日志不发生变化。

下面开始优化sqlloader

一,优化:直接路径加载

SYS@devdb1 >truncate table tpcc.c_stock;    <==清理环境

$ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt direct=true

$ cat c_stock.log
…...
Number to load: ALL
Number to skip: 0
Errors allowed: 50
Continuation:    none specified
Path used:      Direct    <==直接路径加载
…..
  3500000 Rows successfully loaded.     <==成功读取350万条记录
…...
Run began on Wed Jul 23 19:54:40 2014
Run ended on Wed Jul 23 19:55:10 2014

Elapsed time was:     00:00:30.31   <==执行时间0分钟30秒31, 效果显著!
CPU time was:         00:00:14.94


SYS@devdb1 >select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
         1          1        159   52428800        512          2 NO        INACTIVE         2153818 2014/07/23 19:36:03      2161051 2014/07/23 19:54:58
         2          1        160   52428800        512          2 NO        CURRENT          2161051 2014/07/23 19:54:58   2.8147E+14
         3          2         51   52428800        512          2 NO        CURRENT          1696072 2014/07/23 18:39:50   2.8147E+14
         4          2         50   52428800        512          2 NO        INACTIVE         1660718 2014/07/23 13:53:34      1696072 2014/07/23 18:39:50

虽然直接加载的方式不经过sql引擎,不会产生redo和undo,但是我们看到还是会有微乎其微的日志增长。

二,优化:nologging

SYS@devdb1 >truncate table tpcc.c_stock;    <==清理环境

SYS@devdb1 >alter table tpcc.c_stock nologging;  <==更改表属性,强制不生成日志

$ cat c_stock.log
…..
Path used:      Direct
Run began on Wed Jul 23 20:07:54 2014
Run ended on Wed Jul 23 20:08:30 2014

Elapsed time was:     00:00:35.18  <==执行时间0分钟35秒18, 比刚才慢了将近5秒
CPU time was:         00:00:17.18

SYS@devdb1 >select * from v$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARCHIVED  STATUS     FIRST_CHANGE# FIRST_TIME          NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --------- ---------- ------------- ------------------- ------------ -------------------
         1          1        159   52428800        512          2 NO        INACTIVE         2153818 2014/07/23 19:36:03      2161051 2014/07/23 19:54:58
         2          1        160   52428800        512          2 NO        CURRENT          2161051 2014/07/23 19:54:58   2.8147E+14
         3          2         51   52428800        512          2 NO        CURRENT          1696072 2014/07/23 18:39:50   2.8147E+14
         4          2         50   52428800        512          2 NO        INACTIVE         1660718 2014/07/23 13:53:34      1696072 2014/07/23 18:39:50

日志序列已经不再变化了,可是并没有使速度变得更快,反而慢了5秒

三,优化:unrecoverable

SYS@devdb1 >truncate table tpcc.c_stock;

unrecoverable  <==在控制文件头加入该参数,不再向数据库控制文件中写入SCN
load data
into table tpcc.c_stock
append
FIELDS TERMINATED BY '|'
(
s_i_id,
s_w_id,
s_quantity,
s_dist_01,
s_dist_02,
s_dist_03,
s_dist_04,
s_dist_05,
s_dist_06,
s_dist_07,
s_dist_08,
s_dist_09,
s_dist_10,
s_data,
s_ytd,
s_order_cnt,
s_remote_cnt
) 


$ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt direct=true silent=all
$ cat c_stock.log
…..
Path used:      Direct
,,,
  3500000 Rows successfully loaded.
 ….
Run began on Wed Jul 23 20:17:57 2014
Run ended on Wed Jul 23 20:18:29 2014

Elapsed time was:     00:00:32.90  <==执行时间0分钟32秒90
CPU time was:         00:00:16.33
效果平平,没有什么变化

四,优化 — parallel

SYS@devdb1 >truncate table tpcc.c_stock;

$ $ sqlldr userid=tpcc/tpcc control=c_stock.ctl data=c_stock_large.txt direct=true parallel=true
$ cat c_stock.log
…..
Path used:      Direct - with parallel option.  <==直接路径加载,并行全开启了
…..
Run began on Wed Jul 23 20:23:36 2014
Run ended on Wed Jul 23 20:24:12 2014
...
Elapsed time was:     00:00:35.53  <==执行时间0分钟35秒53, 最慢的一次。
CPU time was:         00:00:19.03

实验总结:
0,如果是测试环境做数据加载,一定要先关闭归档日志(alter database no archivelog)。如果归档日志不关闭,后面的所有优化都没有效果。
1, sqlloder的direct 加载方式效果最突出。
2,设置表为nologing虽然可以彻底避免日志的生成,但是优化的效果已经不明显了。
3,parallel需要在同时执行多个控制文件,访问多个数据文件是才会有效果。例如:

sqlldr USERID=scott CONTROL=load1.ctl DIRECT=TRUE PARALLEL=true
sqlldr USERID=scott CONTROL=load2.ctl DIRECT=TRUE PARALLEL=true
sqlldr USERID=scott CONTROL=load3.ctl DIRECT=TRUE PARALLEL=true

本文转自ICT时空 dbasdk博客,原文链接:SQL*Loader 笔记 (二) 性能优化 2014,如需转载请自行联系原博主。

相关实践学习
日志服务之使用Nginx模式采集日志
本文介绍如何通过日志服务控制台创建Nginx模式的Logtail配置快速采集Nginx日志并进行多维度分析。
相关文章
|
2月前
|
SQL 关系型数据库 MySQL
《SQL必知必会》个人笔记(一)
《SQL必知必会》个人笔记(一)
42 0
|
7月前
|
SQL 数据库 索引
阿里P8整理出SQL笔记:收获不止SOL优化抓住SQL的本质
开头我先说: 有人就有江湖,有江湖就有IT系统,有IT系统就有数据库,有数据库就有SQL,SQL应用可一字概括:“"广"。加之其简单易学,SQL实现也可一字概括:“乐”。
|
5月前
|
SQL 分布式计算 HIVE
pyspark笔记(RDD,DataFrame和Spark SQL)1
pyspark笔记(RDD,DataFrame和Spark SQL)
55 1
|
4天前
|
SQL XML 前端开发
sql 性能优化基于explain调优(二)
sql 性能优化基于explain调优(二)
11 0
|
15天前
|
SQL 存储 关系型数据库
【MySQL实战笔记】02.一条SQL更新语句是如何执行的-2
【4月更文挑战第5天】两阶段提交是为确保`redo log`和`binlog`逻辑一致,避免数据不一致。若先写`redo log`, crash后数据可能丢失,导致恢复后状态错误;若先写`binlog`,crash则可能导致重复事务,影响数据库一致性。一天一备相较于一周一备,能缩短“最长恢复时间”,但需权衡额外的存储成本。
16 1
|
1月前
|
SQL 关系型数据库 MySQL
干货!SQL性能优化,书写高质量SQL语句
干货!SQL性能优化,书写高质量SQL语句
31 2
|
2月前
|
SQL 存储 数据库
【数据库SQL server】自学终极笔记
【数据库SQL server】自学终极笔记
94 0
|
2月前
|
SQL 存储 安全
《SQL必知必会》个人笔记(四)
《SQL必知必会》个人笔记(四)
32 0
|
2月前
|
SQL 关系型数据库 数据库
《SQL必知必会》个人笔记(三)
《SQL必知必会》个人笔记(三)
28 0
|
2月前
|
SQL 存储 Oracle
《SQL必知必会》个人笔记(二)
《SQL必知必会》个人笔记(二)
30 0