通过案例学调优之--Oracle 全文索引-阿里云开发者社区

全文检索(oracle text)

Oracle Text使Oracle9i具备了强大的文本检索能力和智能化的文本管理能力,Oracle Text 是 Oracle9i 采用的新名称,在 oracle8/8i 中被称为 oracle intermedia text,oracle8 以前是 oracle context cartridge。Oracle Text 的索引和查找功能并不局限于存储在数据库中的数据。它可以对存储于文件系统中的文档进行检索和查找,并可检索超过 150 种文档类型,包括 Microsoft Word、PDF和XML。Oracle Text查找功能包括模糊查找、词干查找(搜索mice 和查找 mouse)、通配符、相近性等查找方式,以及结果分级和关键词突出显示等。你甚至可以增加一个词典,以查找搭配词,并找出包含该搭配词的文档。

Oracle text 需要为可检索的数据项建立索引,用户才能够通过搜索查找内容,索引进程是根据管道建模的,在这个管道中,数据经过一系列的转换后,将其关键字会添加到索引中。该索引进程分为多个阶段,如下图

1.数据检索(Datastore):只是将数据从数据存储(例如 web 页面、数据库大型对象或本地文件系统)中取出,然后作为数据流传送到下一个阶段。

2. 过滤(Filter):过滤器负责将各种文件格式的数据转换为纯文本格式,索引管道中的其他组件只能处理纯文本数据,不能识别 Ms word 或 excel 等文件格式。

3. 分段(Sectioner):分段器添加关于原始数据项结构的元数据。

4. 词法分析(Lexer):根据数据项的语言将字符流分为几个字词。 5. 索引(Index):最后一个阶段将关键字添加到实际索引中。

全文检索和普通检索的区别

不使用Oracle text功能，当然也有很多方法可以在Oracle数据库中搜索文本，比如INSTR函数和LIKE操作：

1 、SELECT *FROM mytext WHERE INSTR (thetext, 'Oracle') > 0;

2 、SELECT * FROM mytext WHERE thetext LIKE '%Oracle%';

有很多时候，使用instr和like是很理想的, 特别是搜索仅跨越很小的表的时候。然而通过这些文本定位的方法将导致全表扫描，对资源来说消耗比较昂贵，而且实现的搜索功能也非常有限，因此对海量的文本数据进行搜索时，建议使用oralce提供的全文检索功能。

附：这里顺带记录一下INSTR和LIKE:

Oracle中，可以使用 Instr 函数对某个字符串进行判断，判断其是否含有指定的字符。其语法为：Instr(string, substring, position, occurrence)。

string：代表源字符串（写入字段则表示此字段的内容）。

substring：代表想从源字符串中查找的子串。

position：代表查找的开始位置，该参数可选的，默认为1。

occurrence：代表想从源字符中查找出第几次出现的substring，该参数也是可选的，默认为1。

position 的值为负数，那么代表从右往左进行查找。

instr和like的性能比较

其实从效率角度来看，谁能用到索引，谁的查询速度就会快。

like有时可以用到索引，例如：name like ‘李%’，而当下面的情况时索引会失效：name like ‘%李’。所以一般我们查找中文类似于‘%字符%’时，索引都会失效。与其他数据库不同的是，oracle支持函数索引。例如在name字段上建个instr索引，查询速度就比较快了，这也是为什么instr会比like效率高的原因。

注:instr(title,’手册’)>0 相当于like‘%手册%’

instr(title,’手册’)=0 相当于not like‘%手册%’

Oracle Text 索引原理

Oracle text 索引将文本中所有的字符转化成记号(token),如 www.taobao.com 会转化成 www,taobao,com 这样的记号。
Oracle10g 里面支持四种类型的索引：

context、ctxcat、ctxrule、ctxxpath

CONTEXT

用于对含有大量连续文本数据进行检索。支持 word、html、xml、text 等很多数据格式。支持范围(range)分区,支持并行创建索引(Parallel indexing)的索引类型。支持类型:VARCHAR2, CLOB, BLOB, CHAR, BFILE, XMLType, and URIType.

DML 操作后,需要 CTX_DDL.SYNC_INDEX 手工同步索引如果有查询包含多个词语,直接用空格隔开(如 oracle itpub)

案例分析：

设置全文检索

步骤步骤一：检查和设置数据库角色

首先检查数据库中是否有CTXSYS用户和CTXAPP脚色。如果没有这个用户和角色，意味着你的数据库创建时未安装intermedia功能（10G默认安装都有此用户和角色）。你必须修改数据库以安装这项功能。默认安装情况下，ctxsys用户是被锁定的，因此要先启用ctxsys的用户。

 
         11
         :
         53
         :
         13 
         SYS@ prod >select username,account_status 
         from 
         dba_users 
         where 
         username like 
         'CTX%'
         ; 
        
         USERNAME                       ACCOUNT_STATUS 
        
         ------------------------------ -------------------------------- 
        
         CTXSYS                         EXPIRED & LOCKED 
        
         11
         :
         54
         :
         17 
         SYS@ prod >alter user ctxsys identified by oracle account unlock; 
        
         User altered. 
        
         11
         :
         55
         :
         07 
         SYS@ prod >select username,account_status 
         from 
         dba_users 
         where 
         username like 
         'CTX%'
         ; 
        
         USERNAME                       ACCOUNT_STATUS 
        
         ------------------------------ -------------------------------- 
        
         CTXSYS                         OPEN 
        
         12
         :
         00
         :
         13 
         SYS@ prod >select role 
         from 
         dba_roles 
        
         12
         :
         00
         :
         23   
         2   
         where 
         role like 
         'CTX%'
         ; 
        
         ROLE 
        
         ------------------------------ 
        
         CTXAPP

步骤二：赋权

在ctxsys用户下，授予测试用户scott以下权限：

 
         [oracle@RH6 ~]$ cat t.sql 
        
         GRANT resource, CONNECT, ctxapp TO scott; 
        
         GRANT EXECUTE ON ctxsys.ctx_cls TO scott; 
        
         GRANT EXECUTE ON ctxsys.ctx_ddl TO scott; 
        
         GRANT EXECUTE ON ctxsys.ctx_doc TO scott; 
        
         GRANT EXECUTE ON ctxsys.ctx_output TO scott; 
        
         GRANT EXECUTE ON ctxsys.ctx_query TO scott; 
        
         GRANT EXECUTE ON ctxsys.ctx_report TO scott; 
        
         GRANT EXECUTE ON ctxsys.ctx_thes TO scott; 
        
         GRANT EXECUTE ON ctxsys.ctx_ulexer TO scott; 
        
         11
         :
         58
         :
         04 
         SYS@ prod >@/home/oracle/t.sql 
        
         Grant succeeded. 
        
         Elapsed: 
         00
         :
         00
         :
         00.15 
        
         Grant succeeded. 
        
         Elapsed: 
         00
         :
         00
         :
         00.21 
        
         Grant succeeded. 
        
         Elapsed: 
         00
         :
         00
         :
         00.09 
        
         Grant succeeded. 
        
         Elapsed: 
         00
         :
         00
         :
         00.09 
        
         Grant succeeded. 
        
         Elapsed: 
         00
         :
         00
         :
         00.13 
        
         Grant succeeded. 
        
         Elapsed: 
         00
         :
         00
         :
         00.07 
        
         Grant succeeded. 
        
         Elapsed: 
         00
         :
         00
         :
         00.09 
        
         Grant succeeded. 
        
         Elapsed: 
         00
         :
         00
         :
         00.10 
        
         Grant succeeded. 
        
         Elapsed: 
         00
         :
         00
         :
         00.07

步骤三：设置词法分析器(lexer)

Oracle实现全文检索，其机制其实很简单。即通过Oracle专利的词法分析器(lexer),将文章中所有的表意单元(Oracle 称为 term)找出来，记录在一组以dr$开头的表中，同时记下该term出现的位置、次数、hash值等信息。检索时，Oracle从这组表中查找相应的term，并计算其出现频率，根据某个算法来计算每个文档的得分(score),即所谓的‘匹配率’。而lexer则是该机制的核心，它决定了全文检索的效率。Oracle针对不同的语言提供了不同的lexer,而我们通常能用到其中的三个：

 
         basic_lexer: 
        
         针对英语。它能根据空格和标点来将英语单词从句子中分离，还能自动将一些出现频率过高已经失去检索意义的单词作为‘垃圾’处理，如
         if
         ,is等，具有较高的处理效率。但该lexer应用于汉语则有很多问题，由于它只认空格和标点，而汉语的一句话中通常不会有空格，因此，它会把整句话作为一个term,事实上失去检索能力。以‘中国人民站起来了’这句话为例，basic_lexer分析的结果只有一个term,就是‘中国人民站起来了’。此时若检索‘中国’，将检索不到内容。 
        
         chinese_vgram_lexer: 
        
         专门的汉语分析器，支持所有汉字字符集(ZHS16CGB231280 ZHS16GBK ZHT32EUC ZHT16BIG5 ZHT32TRIS ZHT16MSWIN950 ZHT16HKSCS UTF8 )。该分析器按字为单元来分析汉语句子。‘中国人民站起来了’这句话，会被它分析成如下几个term:‘中’，‘中国’，‘国人’，‘人民’，‘民站’，‘站起’，起来’，‘来了’，‘了’。可以看出，这种分析方法，实现算法很简单，并且能实现‘一网打尽’，但效率则是差强人意。 
        
         chinese_lexer: 
        
         这是一个新的汉语分析器，只支持utf8字符集。上面已经看到，chinese vgram lexer这个分析器由于不认识常用的汉语词汇，因此分析的单元非常机械，像上面的‘民站’，‘站起’在汉语中根本不会单独出现，因此这种term是没有意义的，反而影响效率。chinese_lexer的最大改进就是该分析器能认识大部分常用汉语词汇，因此能更有效率地分析句子，像以上两个愚蠢的单元将不会再出现，极大提高了效率。但是它只支持utf8,如果你的数据库是zhs16gbk字符集，则只能使用笨笨的那个Chinese vgram lexer。如果不做任何设置，Oracle缺省使用basic_lexer这个分析器。

 
    
     
       
       
         12
         :
         05
         :
         01 
         SYS@ prod >select userenv(
         'language'
         ) from dual; 
        
 
         USERENV(
         'LANGUAGE'
         ) 
        
 
         ---------------------------------------------------- 
        
 
         AMERICAN_AMERICA.ZHS16GBK 
        

            
        
 
         12
         :
         08
         :
         05 
         SCOTT@ prod >desc ctx_ddl 
        
 
         PROCEDURE CREATE_PREFERENCE 
        
 
          
         Argument Name                  Type                    In/Out Default? 
        
 
          
         ------------------------------ ----------------------- ------ -------- 
        
 
          
         PREFERENCE_NAME                VARCHAR2                IN 
        
 
          
         OBJECT_NAME                    VARCHAR2                IN 
        
 
            
        
 
         12
         :
         12
         :
         25 
         SCOTT@ prod >EXEC ctx_ddl.create_preference (
         'my_lexer'
         , 
         'chinese_vgram_lexer'
         ); 
        
 
         PL/SQL procedure successfully completed. 
        

            
        
 
         创建表 
        
 
         12
         :
         13
         :
         15 
         SCOTT@ prod >CREATE TABLE textdemo( 
        
 
         12
         :
         15
         :
         47   
         2       
         id NUMBER NOT NULL PRIMARY KEY, 
        
 
         12
         :
         15
         :
         47   
         3       
         book_author varchar2(
         100
         ),--作者 
        
 
         12
         :
         15
         :
         47   
         4       
         publish_time DATE,--发布日期 
        
 
         12
         :
         15
         :
         47   
         5       
         title varchar2(
         400
         ),--标题 
        
 
         12
         :
         15
         :
         47   
         6       
         book_abstract varchar2(
         2000
         ),--摘要 
        
 
         12
         :
         15
         :
         47   
         7       
         path varchar2(
         200
         )--路径 
        
 
         12
         :
         15
         :
         47   
         8  
         ); 
        
 
         Table created. 
        
 
     
 
    
  

 
         插入数据 
        
         14
         :
         53
         :
         20 
         SCOTT@ prod >
         insert 
         into 
         textdemo values (
         10
         ,
         'luyao'
         ,sysdate,
         'pingfan de world'
         ,
         'zhen shi de gushi'
         ,
         '/home/1.txt'
         ); 
        
         1 
         row created. 
        
         14
         :
         54
         :
         32 
         SCOTT@ prod >commit;

步骤四：在book_abstract字段建立索引使用刚刚设置的ORATEXT_LEXER ：chinese_vgram_lexer作为分析器。

 
         12
         :
         16
         :
         15 
         SCOTT@ prod >CREATE INDEX demo_abstract ON textdemo(book_abstract) indextype IS ctxsys.context parameters(
         'lexer my_LEXER'
         );   
        
         之后如上所述多出很多dr$开头的表和索引，系统会创建四个相关的表： 
        
         DR$DEMO_ABSTRACT$I(分词后的TOKEN表) 
        
         DR$DEMO_ABSTRACT$K 
        
         DR$DEMO_ABSTRACT$N  
        
         DR$DEMO_ABSTRACT$R 
        
         14
         :
         56
         :
         16 
         SCOTT@ prod >select * 
         from 
         tab; 
        
         TNAME                          TABTYPE  CLUSTERID 
        
         ------------------------------ ------- ---------- 
        
         BONUS                          TABLE 
        
         DEPT                           TABLE 
        
         DR$DEMO_ABSTRACT$I             TABLE 
        
         DR$DEMO_ABSTRACT$K             TABLE 
        
         DR$DEMO_ABSTRACT$N             TABLE 
        
         DR$DEMO_ABSTRACT$R             TABLE 
        
         EMP                            TABLE 
        
         SALGRADE                       TABLE 
        
         TEXTDEMO                       TABLE 
        
         9 
         rows selected. 
        
         14
         :
         56
         :
         36 
         SCOTT@ prod >desc DR$DEMO_ABSTRACT$I 
        
         Name                                                              Null?    Type 
        
         ----------------------------------------------------------------- -------- -------------------------------------------- 
        
         TOKEN_TEXT                                                        NOT NULL VARCHAR2(
         64
         ) 
        
         TOKEN_TYPE                                                        NOT NULL NUMBER(
         3
         ) 
        
         TOKEN_FIRST                                                       NOT NULL NUMBER(
         10
         ) 
        
         TOKEN_LAST                                                        NOT NULL NUMBER(
         10
         ) 
        
         TOKEN_COUNT                                                       NOT NULL NUMBER(
         10
         ) 
        
         TOKEN_INFO                                                                 BLOB 
        
         14
         :
         57
         :
         45 
         SCOTT@ prod >desc DR$DEMO_ABSTRACT$K 
        
         Name                                                              Null?    Type 
        
         ----------------------------------------------------------------- -------- -------------------------------------------- 
        
         DOCID                                                                      NUMBER(
         38
         ) 
        
         TEXTKEY                                                           NOT NULL ROWID 
        
         14
         :
         57
         :
         57 
         SCOTT@ prod >desc DR$DEMO_ABSTRACT$N 
        
         Name                                                              Null?    Type 
        
         ----------------------------------------------------------------- -------- -------------------------------------------- 
        
         NLT_DOCID                                                         NOT NULL NUMBER(
         38
         ) 
        
         NLT_MARK                                                          NOT NULL CHAR(
         1
         ) 
        
         14
         :
         58
         :
         11 
         SCOTT@ prod >desc DR$DEMO_ABSTRACT$R 
        
         Name                                                              Null?    Type 
        
         ----------------------------------------------------------------- -------- -------------------------------------------- 
        
         ROW_NO                                                                     NUMBER(
         3
         ) 
        
         DATA                                                                       BLOB 
        
         14
         :
         58
         :
         26 
         SCOTT@ prod >select index_name,index_type,table_name 
         from 
         user_indexes; 
        
         INDEX_NAME                     INDEX_TYPE                  TABLE_NAME 
        
         ------------------------------ --------------------------- ------------------------------ 
        
         DEMO_ABSTRACT                  DOMAIN                      TEXTDEMO 
        
         SYS_C0013418                   NORMAL                      TEXTDEMO 
        
         PK_EMP                         NORMAL                      EMP 
        
         SYS_IL0000076525C00002$$       LOB                         DR$DEMO_ABSTRACT$R 
        
         SYS_IOT_TOP_76528              IOT - TOP                   DR$DEMO_ABSTRACT$N 
        
         SYS_IOT_TOP_76523              IOT - TOP                   DR$DEMO_ABSTRACT$K 
        
         SYS_IL0000076520C00006$$       LOB                         DR$DEMO_ABSTRACT$I 
        
         DR$DEMO_ABSTRACT$X             NORMAL                      DR$DEMO_ABSTRACT$I 
        
         PK_DEPT                        NORMAL                      DEPT 
        
         9 
         rows selected.

下面的语句可以查看索引创建过程中是否发生了错误：

 
         SELECT * FROM ctx_USER_index_errors 
        
         附：对于建立索引的类型（例如ctxsys.context），包括四种：context,ctxcat,ctxrule,ctxxpath。 
        
         CONTEXT用于对含有大量连续文本数据进行检索。支持word、html、xml、text等很多数据格式。支持范围（range）分区，支持并行创建索引（Parallel indexing）的索引类型。 
        
         支持类型：VARCHAR2, CLOB, BLOB, 
         CHAR
         , BFILE, XMLType, and URIType.DML。操作后，需要CTX_DDL.SYNC_INDEX手工同步索引如果有查询包含多个词语，直接用空格隔开(如 oracle itpub)。 
        
         查询标识符CONTAINS 
        
         CTXCAT适用于混合查询语句（如查询条件包括产品id，价格，描述等）。适合于查询较小的具有一定结构的文本段。具有事务性。DML 操作后，索引会自动进行同步。 
        
         操作符：and,or,&gt,;<, =,between,in 
        
         查询标识符CATSEARCH 
        
         CTXRULE查询标识符MATCHES。 
        
         CTXXPATH（这两个索引没有去更多搜索相关内容） 
        
         一般来说我们建立CONTEXT类型的索引（CONTAINS来查询）。

步骤五：查询测试

 
         查看执行计划 
        
         15
         :
         04
         :
         36 
         SCOTT@ prod >r 
        
         1
         * select * 
         from 
         textdemo 
         where 
         contains(book_abstract,
         'gushi'
         )>
         0 
        
         Elapsed: 
         00
         :
         00
         :
         00.02 
        
         Execution Plan 
        
         ---------------------------------------------------------- 
        
         Plan hash value: 
         2570915478 
        
         --------------------------------------------------------------------------------------------- 
        
         | Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     | 
        
         --------------------------------------------------------------------------------------------- 
        
         |   
         0 
         | SELECT STATEMENT            |               |     
         1 
         |  
         1392 
         |     
         4   
         (
         0
         )| 
         00
         :
         00
         :
         01 
         | 
        
         |   
         1 
         |  TABLE ACCESS BY INDEX ROWID| TEXTDEMO      |     
         1 
         |  
         1392 
         |     
         4   
         (
         0
         )| 
         00
         :
         00
         :
         01 
         | 
        
         |*  
         2 
         |   DOMAIN INDEX              | DEMO_ABSTRACT |       |       |     
         4   
         (
         0
         )| 
         00
         :
         00
         :
         01 
         | 
        
         --------------------------------------------------------------------------------------------- 
        
         Predicate Information (identified by operation id): 
        
         --------------------------------------------------- 
        
         2 
         - access(
         "CTXSYS"
         .
         "CONTAINS"
         (
         "BOOK_ABSTRACT"
         ,
         'gushi'
         )>
         0
         ) 
        
         Note 
        
         ----- 
        
         - dynamic sampling used 
         for 
         this 
         statement (level=
         2
         ) 
        
         Statistics 
        
         ---------------------------------------------------------- 
        
         29  
         recursive calls 
        
         0  
         db block gets 
        
         33  
         consistent gets 
        
         0  
         physical reads 
        
         0  
         redo size 
        
         796  
         bytes sent via SQL*Net to client 
        
         415  
         bytes received via SQL*Net 
         from 
         client 
        
         2  
         SQL*Net roundtrips to/
         from 
         client 
        
         0  
         sorts (memory) 
        
         0  
         sorts (disk) 
        
         1  
         rows processed 
        
         15
         :
         04
         :
         37 
         SCOTT@ prod > 
        
         通过sql trace查看详细计划（部分内容） 
        
         SQL ID: 2rsr1z6zkp24p 
        
         Plan Hash: 
         2570915478 
        
         select * 
        
         from 
        
         textdemo 
         where 
         contains(book_abstract,:
         "SYS_B_0"
         )>:
         "SYS_B_1" 
        
         call     count       cpu    elapsed       disk      query    current        rows 
        
         ------- ------  -------- ---------- ---------- ---------- ----------  ---------- 
        
         Parse        
         1      
         0.00       
         0.00          
         0          
         0          
         0           
         0 
        
         Execute      
         1      
         0.01       
         0.01          
         0        
         250          
         0           
         0 
        
         Fetch        
         2      
         0.00       
         0.00          
         0          
         2          
         0           
         1 
        
         ------- ------  -------- ---------- ---------- ---------- ----------  ---------- 
        
         total        
         4      
         0.01       
         0.01          
         0        
         252          
         0           
         1 
        
         Misses 
         in 
         library cache during parse: 
         1 
        
         Optimizer mode: ALL_ROWS 
        
         Parsing user id: 
         101 
        
         Rows     Row Source Operation 
        
         -------  --------------------------------------------------- 
        
         1  
         TABLE ACCESS BY INDEX ROWID TEXTDEMO (cr=
         12 
         pr=
         0 
         pw=
         0 
         time=
         0 
         us cost=
         4 
         size=
         1392 
         card=
         1
         ) 
        
         1   
         DOMAIN INDEX  DEMO_ABSTRACT (cr=
         11 
         pr=
         0 
         pw=
         0 
         time=
         0 
         us cost=
         4 
         size=
         0 
         card=
         0
         ) 
        
         declare 
        
         cost sys.ODCICost := sys.ODCICost(NULL, NULL, NULL, NULL); 
        
         arg0 VARCHAR2(
         1
         ) := 
         null
         ; 
        
         begin 
        
         :
         1 
         := 
         "CTXSYS"
         .
         "TEXTOPTSTATS"
         .ODCIStatsFunctionCost( 
        
         sys.ODCIFuncInfo(
         'CTXSYS'
         , 
        
         'CTX_CONTAINS'
         , 
        
         'TEXTCONTAINS'
         , 
        
         2
         ), 
        
         cost, 
        
         sys.ODCIARGDESCLIST(sys.ODCIARGDESC(
         2
         , 
         'TEXTDEMO'
         , 
         'SCOTT'
         , 
         '"BOOK_ABSTRACT"'
         , NULL, NULL, NULL), sys.ODCIARG 
        
         DESC(
         1
         , NULL, NULL, NULL, NULL, NULL, NULL)) 
        
         , arg0, :
         5
         , 
        
         sys.ODCIENV(:
         6
         ,:
         7
         ,:
         8
         ,:
         9
         )); 
        
         if 
         cost.CPUCost IS NULL 
         then 
        
         :
         2 
         := 
         -1.0
         ; 
        
         else 
        
         :
         2 
         := cost.CPUCost; 
        
         end 
         if
         ; 
        
         if 
         cost.IOCost IS NULL 
         then 
        
         :
         3 
         := 
         -1.0
         ; 
        
         else 
        
         :
         3 
         := cost.IOCost; 
        
         end 
         if
         ; 
        
         if 
         cost.NetworkCost IS NULL 
         then 
        
         :
         4 
         := 
         -1.0
         ; 
        
         else 
        
         :
         4 
         := cost.NetworkCost; 
        
         end 
         if
         ; 
        
         exception 
        
         when others 
         then 
        
         raise; 
        
         end; 
        
         call     count       cpu    elapsed       disk      query    current        rows 
        
         ------- ------  -------- ---------- ---------- ---------- ----------  ---------- 
        
         Parse        
         2      
         0.00       
         0.00          
         0          
         0          
         0           
         0 
        
         Execute      
         2      
         0.00       
         0.00          
         0         
         18          
         0           
         2 
        
         Fetch        
         0      
         0.00       
         0.00          
         0          
         0          
         0           
         0 
        
         ------- ------  -------- ---------- ---------- ---------- ----------  ---------- 
        
         total        
         4      
         0.00       
         0.00          
         0         
         18          
         0           
         2 
        
         Misses 
         in 
         library cache during parse: 
         0 
        
         Optimizer mode: ALL_ROWS 
        
         Parsing user id: 
         101     
         (recursive depth: 
         1
         )

全文索引和DML操作

Insert 操作：

 
    
     
       
       
         15
         :
         19
         :
         21 
         SCOTT@ prod >
         insert 
         into 
         textdemo values (
         20
         ,
         'huoda'
         ,sysdate,
         'musilin de zangli'
         ,
         'meili de rensheng'
         ,
         '/home/2.txt'
         ); 
        
 
         1 
         row created. 
        
 
         15
         :
         20
         :
         10 
         SCOTT@ prod >commit; 
        

            
        
 
         15
         :
         21
         :
         35 
         SCOTT@ prod >select id,BOOK_ABSTRACT 
         from 
         textdemo 
         where 
         BOOK_ABSTRACT like 
         '%rensheng%' 
        
 
                 
         ID BOOK_ABSTRACT 
        
 
         ---------- -------------------------------------------------- 
        
 
                 
         20 
         meili de rensheng 
        
 
                   
        
 
         15
         :
         23
         :
         12 
         SCOTT@ prod >set autotrace 
         on 
        
 
         15
         :
         23
         :
         38 
         SCOTT@ prod >select id,BOOK_ABSTRACT 
         from 
         textdemo 
         where 
         contains(BOOK_ABSTRACT,
         'rensheng'
         )>
         0 
        
 
         no rows selected 
        
 
         Execution Plan 
        
 
         ---------------------------------------------------------- 
        
 
         Plan hash value: 
         2570915478 
        
 
         --------------------------------------------------------------------------------------------- 
        
 
         | Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     | 
        
 
         --------------------------------------------------------------------------------------------- 
        
 
         |   
         0 
         | SELECT STATEMENT            |               |     
         1 
         |  
         1027 
         |     
         4   
         (
         0
         )| 
         00
         :
         00
         :
         01 
         | 
        
 
         |   
         1 
         |  TABLE ACCESS BY INDEX ROWID| TEXTDEMO      |     
         1 
         |  
         1027 
         |     
         4   
         (
         0
         )| 
         00
         :
         00
         :
         01 
         | 
        
 
         |*  
         2 
         |   DOMAIN INDEX              | DEMO_ABSTRACT |       |       |     
         4   
         (
         0
         )| 
         00
         :
         00
         :
         01 
         | 
        
 
         --------------------------------------------------------------------------------------------- 
        
 
         Predicate Information (identified by operation id): 
        
 
         --------------------------------------------------- 
        
 
            
         2 
         - access(
         "CTXSYS"
         .
         "CONTAINS"
         (
         "BOOK_ABSTRACT"
         ,
         'rensheng'
         )>
         0
         ) 
        
 
         Note 
        
 
         ----- 
        
 
            
         - dynamic sampling used 
         for 
         this 
         statement (level=
         2
         ) 
        
 
         Statistics 
        
 
         ---------------------------------------------------------- 
        
 
                  
         23  
         recursive calls 
        
 
                   
         0  
         db block gets 
        
 
                  
         33  
         consistent gets 
        
 
                   
         0  
         physical reads 
        
 
                   
         0  
         redo size 
        
 
                 
         349  
         bytes sent via SQL*Net to client 
        
 
                 
         404  
         bytes received via SQL*Net 
         from 
         client 
        
 
                   
         1  
         SQL*Net roundtrips to/
         from 
         client 
        
 
                   
         0  
         sorts (memory) 
        
 
                   
         0  
         sorts (disk) 
        
 
                   
         0  
         rows processed 
        
 
         15
         :
         26
         :
         40 
         SYS@ prod >select * 
         from 
         ctxsys.dr$pending; 
        
 
            
         PND_CID    PND_PID PND_ROWID          PND_TIMES P 
        
 
         ---------- ---------- ------------------ --------- - 
        
 
               
         1082          
         0 
         AAASrlAAEAAAAI1AAB 
         21
         -NOV
         -14 
         N 
        
 
         15
         :
         26
         :
         26 
         SCOTT@ prod >alter index demo_abstract rebuild parameters(
         'sync'
         ); 
        
 
         Index altered. 
        
 
         15
         :
         30
         :
         10 
         SCOTT@ prod >select id,BOOK_ABSTRACT 
         from 
         textdemo 
         where 
         contains(BOOK_ABSTRACT,
         'rensheng'
         )>
         0
         ; 
        
 
                 
         ID BOOK_ABSTRACT 
        
 
         ---------- -------------------------------------------------- 
        
 
                 
         20 
         meili de rensheng 
        
 
     
 
    
  

在做Insert操作时，Oracle会把一条信息放入到CTXSYS.DR$PENDING表里，必须手工进行同步才能更新全文索引。

Delete 操作：

 
         15
         :
         30
         :
         37 
         SCOTT@ prod >
         delete 
         from 
         textdemo 
         where 
         id=
         20
         ; 
        
         1 
         row deleted. 
        
         15
         :
         33
         :
         06 
         SCOTT@ prod >select id,BOOK_ABSTRACT 
         from 
         textdemo; 
        
         ID BOOK_ABSTRACT 
        
         ---------- -------------------------------------------------- 
        
         10 
         zhen shi de gushi 
        
         15
         :
         33
         :
         39 
         SCOTT@ prod >rollback; 
        
         Rollback complete. 
        
         15
         :
         33
         :
         50 
         SCOTT@ prod >select id,BOOK_ABSTRACT 
         from 
         textdemo; 
        
         ID BOOK_ABSTRACT 
        
         ---------- -------------------------------------------------- 
        
         10 
         zhen shi de gushi 
        
         20 
         meili de rensheng

Delete 操作后，索引会立刻更新。

Update 操作：

 
         15
         :
         38
         :
         14 
         SCOTT@ prod >update textdemo set BOOK_ABSTRACT=
         'meili de gushi' 
         where 
         id=
         20
         ; 
        
         1 
         row updated. 
        
         15
         :
         39
         :
         48 
         SYS@ prod >select * 
         from 
         ctxsys.dr$
         delete
         ; 
        
         no rows selected 
        
         15
         :
         39
         :
         59 
         SYS@ prod >select * 
         from 
         ctxsys.dr$pending; 
        
         no rows selected 
        
         15
         :
         43
         :
         03 
         SCOTT@ prod >select id,BOOK_ABSTRACT 
         from 
         textdemo 
         where 
         contains(BOOK_ABSTRACT,
         'gushi'
         )>
         0
         ; 
        
         ID BOOK_ABSTRACT 
        
         ---------- -------------------------------------------------- 
        
         10 
         zhen shi de gushi 
        
         15
         :
         43
         :
         14 
         SCOTT@ prod >alter index demo_abstract rebuild parameters(
         'sync'
         ); 
        
         Index altered. 
        
         15
         :
         43
         :
         39 
         SCOTT@ prod >select id,BOOK_ABSTRACT 
         from 
         textdemo 
         where 
         contains(BOOK_ABSTRACT,
         'gushi'
         )>
         0
         ; 
        
         ID BOOK_ABSTRACT 
        
         ---------- -------------------------------------------------- 
        
         10 
         zhen shi de gushi 
        
         20 
         meili de gushi

对于update操作，应该是包含了Delete和Insert的操作，需要手工同步后才能更新索引。

对多字段建立全文索引

很多时候需要从多个文本字段中查询满足条件的记录，这时就需要建立针对多个字段的全文索引，例如需要从pmhsubjects(专题表)的 subjectname(专题名称)和briefintro(简介)上进行全文检索，则需要按以下步骤进行操作：

建立多字段索引的preference，以ctxsys登录，并执行：

BEGIN

ctx_ddl.create_preference('ctx_demo_abstract_title','MULTI_COLUMN_DATASTORE');

END;

建立preference对应的字段值(以ctxsys登录) 对应title path book_abstract三个字段建立索引：

BEGIN

ctx_ddl.set_attribute('ctx_demo_abstract_title ','columns','title,path');

END;

建立全文索引:

CREATE INDEX demo_abstract_title ON textdemo(book_abstract) indextype IS ctxsys.context parameters(' DATASTORE ctxsys. ctx_demo_ abstract_title lexer ORATEXT_LEXER');

commit;

测试

SELECT score(20),t.* FROM textdemo t WHERE contains(book_abstract,'移动城堡 or 俄罗斯',20)>0;

对大字段进行检索测试

CREATE TABLE mytable(id NUMBER PRIMARY KEY, docs CLOB);

INSERT INTO mytable VALUES(111555,'this text will be indexed');

INSERT INTO mytable VALUES(111556,'this is a direct_datastore example');

Commit;

CREATE INDEX myindex ON mytable(docs)

indextype IS ctxsys.context

parameters ('datastore ctxsys.default_datastore');

SELECT * FROM mytable WHERE contains(docs, 'text') > 0;

---以上内容是对Oracle 全文索引的一点理解，后续的学习在继续中，希望大家批评指正。

本文转自客居天涯 51CTO博客，原文链接：http://blog.51cto.com/tiany/1580942，如需转载请自行联系原作者

通过案例学调优之--Oracle 全文索引

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

直播

下载

镜像站

技术资料