Oracle DBA手记 3:数据库性能优化与内部原理解析
上QQ阅读APP看本书,新人免费读10天
设备和账号都新为新人

TRUNCATE语句时间过长的诊断

有人提出问题:执行TRUNCATE语句为什么比执行DELETE语句还慢。默认情况下TRUNCATE要回收空间,而且会将所有的脏块写回到数据文件,因此这个DDL语句会涉及很多后台的递归调用。

基于上述原因,对于小数据量的表而言,TRUNCATE操作比DELETE慢也是正常的。不过在提问网友给出的例子中,TRUNCATE语句慢得太多。下面通过测试例子来看问题出在哪里。

测试案例

问题发生在网友的环境中,下面的这个例子是在找到问题原因之后,通过重新模拟问题得到的。为了更直观地说明这个问题,把这个例子放在最开始的部分:表中只有5条记录时TRUNCATE操作要比DELETE慢得多:

    SQL> SET TIMING ON
    SQL> DELETE T_BIG;
    4 rows deleted.
    Elapsed: 00:00:00.01
    SQL> TRUNCATE TABLE T_BIG;
    Table truncated.
    Elapsed: 00:00:03.25

如果表中数据量不大,TRUNCATE比DELETE慢一点是正常的,但是二者的执行时间一般是同一个数量级的:

    SQL> CREATE TABLE T_TRUNCATE (ID NUMBER);
    Table created.
    Elapsed: 00:00:00.06
    SQL> INSERT INTO T_TRUNCATE VALUES (1);
    1 row created.
    Elapsed: 00:00:00.00
    SQL> DELETE T_TRUNCATE;
    1 row deleted.
    Elapsed: 00:00:00.03
    SQL> TRUNCATE TABLE T_TRUNCATE;
    Table truncated.
    Elapsed: 00:00:00.10

上面是一个正常的例子,而文章开始模拟的例子中,TRUNCATE语句和DELETE语句相差了几个数量级,这肯定不是“数据量较小”就可以解释得通的。

提问者除了给出TRUNCATE语句和DELETE语句的时间对比外,还对TRUNCATE语句设置了10046事件进行跟踪(TRACE):

    SQL> ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';
    Session altered.
    Elapsed: 00:00:00.04
    SQL> TRUNCATE TABLE T_BIG;
    Table truncated.
    Elapsed: 00:00:01.52
    SQL> ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';
    Session altered.
    Elapsed: 00:00:00.00

第二次运行TRUNCATE,虽然时间比第一次运行要短1/3,但是仍然要花费将近2秒的时间。

提问者使用TKPROF对得到的trace文件进行了格式化,并将格式化后的文件发到了论坛上,其中格式化后的结果类似于如下:

    [oracle@yans1 udump]$ more trace_result.txt
    TKPROF: Release 10.2.0.3.0- Production on星期四 7月 1 15:07:052010
    Copyright (c) 1982, 2005, Oracle.  All rights reserved.
    TRUNCATE TABLE T_BIG
    call    count       cpu    elapsed      disk     query     current           rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1     0.00      0.00          0         0         0                 0
    Execute      1     0.36      1.47      4453       365      9211                 0
    Fetch        0     0.00      0.00          0         0         0                 0
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        2     0.37      1.48      4453       365      9211                 0
    Misses in library cache during parse: 1
    Optimizer mode: ALL_ROWS
    Parsing user id: 74
    Elapsed times include waiting on following events:
      Event waited on                              Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      db file sequential read               4452       0.00         0.06
      local write wait                      2229       0.00         0.67
    ********************************************************************************
      . . .
    OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
    call     count      cpu    elapsed        disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        2     0.00       0.00           0          0          0           0
    Execute      2     0.36       1.47        4453        365       9211           0
    Fetch        0     0.00       0.00           0          0          0           0
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        4     0.37       1.48        4453        365       9211           0
    Misses in library cache during parse: 1
    Elapsed times include waiting on following events:
      Event waited on                              Times   Max. Wait  Total Waited
      . . .
      ----------------------------------------   Waited  ----------  ------------
      db file sequential read                  4452      0.00          0.06
      local write wait                          2229      0.00          0.67
    OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse       63      0.01       0.10          0          0          0           0
    Execute     80      0.04       0.14          0        110         39          28
    Fetch       36      0.00       0.02          1        246          0          29
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total      179      0.06       0.27          1        356         39          57
    Misses in library cache during parse: 20
    Misses in library cache during execute: 20
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      db file sequential read 1 0.01 0.01
      10  user  SQL statements in session.
      70  internal SQL statements in session.
      80  SQL statements in session.
    ********************************************************************************

根据格式化后的trace信息可以做出下面的判断:

Oracle的递归SQL的运行时间只占了很小的比重,绝大部分时间都是消耗在TRUNCATE语句上

TRUNCATE语句居然产生了4000多次的物理读和9000多次的逻辑读,而表本身并没有那么多的BLOCK存在。何况即使表本身比较大,在第一次执行了TRUNCATE操作以后,高水位线已经重置,再次执行TRUNCATE时,表本身应该已经很小才对。

TRUNCATE语句的主要等待事件是db file sequential read和local write wait,这两个等待时间一般和索引有关,而与表的关系并不大。

仅根据现有的信息无法判断具体的原因,因此要求提问者提供进一步的信息:包含表结构的DBMS_METADATA输出结果以及TKPROF格式化之前的原始trace文件。

其中DBMS_METADATA.GET_DDL的输出类似如下:

    SQL> SET LONG 10000
    SQL> SELECT DBMS_METADATA.GET_DDL('TABLE', 'T_BIG') FROM DUAL;
    DBMS_METADATA.GET_DDL('TABLE','T_BIG')
    --------------------------------------------------------------------------------
      CREATE TABLE "TEST"."T_BIG"
        (    "OWNER" VARCHAR2(30),
            "OBJECT_NAME" VARCHAR2(128),
            "SUBOBJECT_NAME" VARCHAR2(30),
            "OBJECT_ID" NUMBER,
            "DATA_OBJECT_ID" NUMBER,
            "OBJECT_TYPE" VARCHAR2(19),
            "CREATED" DATE,
            "LAST_DDL_TIME" DATE,
            "TIMESTAMP" VARCHAR2(19),
            "STATUS" VARCHAR2(7),
            "TEMPORARY" VARCHAR2(1),
            "GENERATED" VARCHAR2(1),
            "SECONDARY" VARCHAR2(1),
            PRIMARY KEY ("OBJECT_ID")
      USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
      STORAGE(INITIAL 1073741824 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
      PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
      TABLESPACE "GPO"  ENABLE ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
      STORAGE(INITIAL 973078528 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
      PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)  TABLESPACE "GPO"

其中表和主键索引的初始EXTENT的大小比较奇怪,既然表中记录很少,为什么表和主键的初始化INITIAL EXTENT会那么大。

进一步分析详细的trace信息如下:

    [oracle@yans1 udump]$ more test08_ora_20627.trc
    /opt/ora10g/admin/test08/udump/test08_ora_20627.trc
    . . .
    ORACLE_HOME = /opt/ora10g/product/10.2.0/db_1
    System name:    Linux
    Node name:     yans1
    Release:       2.6.9-42.0.0.0.1.ELsmp
    Version:       #1 SMP Sun Oct 15 15:13:57 PDT 2006
    Machine:       x86_64
    Instance name: test08
    Redo thread mounted by this instance: 1
    Oracle process number: 16
    Unix process pid: 20627, image: oracle@yans1 (TNS V1-V3)
    *** 2010-07-01 15:03:54.687
    *** ACTION NAME:() 2010-07-01 15:03:54.673
    *** MODULE NAME:(SQL*Plus) 2010-07-01 15:03:54.673
    *** SERVICE NAME:(SYS$USERS) 2010-07-01 15:03:54.673
    *** SESSION ID:(317.11354) 2010-07-01 15:03:54.673
    WAIT #3: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=147960
    tim=1248015463547954
    WAIT #3: nam='SQL*Net message from client' ela= 5759303 driver id=1650815232 #bytes=1 p3=0
    obj#=147960 tim=1248015469321192
    XCTEND rlbk=0, rd_only=1
    =====================
    PARSING IN CURSOR #10 len=20 dep=0 uid=74 oct=85 lid=74 tim=1248015469332441 hv=3136993528
    ad='ff547c80'
    TRUNCATE TABLE T_BIG
    END OF STMT
    PARSE #10:c=1000,e=9511,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1248015469330783
    BINDS #10:
    =====================
    . . .
    EXEC #5:c=2000,e=2298,p=0,cr=2,cu=2,mis=1,r=1,dep=1,og=4,tim=1248015469436750
    STAT #5 id=1 cnt=0 pid=0 pos=1 obj=0 op='UPDATE  OBJ$ (cr=2 pr=0 pw=0 time=264 us)'
    STAT #5 id=2 cnt=1 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=2 pr=0 pw=0 time=21 us)'
    WAIT #10: nam='db file sequential read' ela= 31 file#=51 block#=57365 blocks=1 obj#=147967
    tim=1248015469436975
    . . .
    WAIT #10: nam='db file sequential read' ela= 9 file#=54 block#=32261 blocks=1 obj#=147965
    tim=1248015469849171
    WAIT  #10:  nam='local  write  wait'  ela=  257  file#=54  block#=32261  p3=0  obj#=147965
    tim=1248015469849462
    . . .

可以看到TRUNCATE语句,也就是CURSOR 10在运行过程中出现了大量的等待,等待事件主要就是前面提到的db file sequential read和local write wait,而等待的对象在不断地变化,对应的OBJECT_ID从147967递减到147960。

下面检查这些对象:

    SQL> SELECT OBJECT_NAME, OBJECT_TYPE
      2  FROM DBA_OBJECTS
      3  WHERE OBJECT_ID IN (147967, 147966, 147965, 147964, 147963, 147962, 147961, 147960);
    OBJECT_NAME                     OBJECT_TYPE
    ------------------------------ -------------------
    IND_BIG_TEMP                    INDEX
    IND_BIG_TYPE                    INDEX
    IND_BIG_CREATED                 INDEX
    IND_BIG_STATUS                  INDEX
    IND_BIG_ID                      INDEX
    IND_BIG_NAME                    INDEX
    IND_BIG_OWNER                   INDEX
    T_BIG                           TABLE

这些索引就是T_BIG表上的索引,检查这些索引的大小如下:

    SQL> SELECT SEGMENT_NAME, SUM(BYTES)/1024/1024/1024
      2  FROM USER_SEGMENTS
      3  WHERE SEGMENT_NAME IN
      4  (SELECT INDEX_NAME
      5  FROM USER_INDEXES
      6  WHERE TABLE_NAME = 'T_BIG')
      7  GROUP BY SEGMENT_NAME;
    SEGMENT_NAME                    SUM(BYTES)/1024/1024/1024
    ------------------------------ -------------------------
    IND_BIG_ID                                               1
    IND_BIG_OWNER                                            1
    IND_BIG_CREATED                                          1
    IND_BIG_NAME                                             1
    IND_BIG_STATUS                                      1.0625
    IND_BIG_TEMP                                        1.0625
    IND_BIG_TYPE                                        1.0625
    7 rows selected.
    SQL> SELECT INDEX_NAME, INITIAL_EXTENT
      2  FROM USER_INDEXES
      3  WHERE TABLE_NAME = 'T_BIG';
    INDEX_NAME                      INITIAL_EXTENT
    ------------------------------ --------------
    IND_BIG_OWNER                       1073741824
    IND_BIG_NAME                        1073741824
    IND_BIG_ID                          1073741824
    IND_BIG_STATUS                      1098907648
    IND_BIG_CREATED                     1073741824
    IND_BIG_TYPE                        1098907648
    IND_BIG_TEMP                        1098907648
    7 rows selected.

这些索引每个都是1GB大小,而且这1GB空间就是初始EXTENT的大小。导致问题的原因已经明确了:表包含了多个索引,且每个索引的初始段太大,使得TRUNCATE语句执行时对索引执行了大量的db file sequential read操作,从而导致了TRUNCATE语句的性能问题。

出现问题的原因找到以后,解决问题就很简单了,只需要降低索引的初始EXTENT,就可以避免这个问题:

    SQL> ALTER INDEX IND_BIG_OWNER REBUILD STORAGE (INITIAL 1M);
    Index altered.
    SQL> ALTER INDEX IND_BIG_NAME REBUILD STORAGE (INITIAL 1M);
    Index altered.
    SQL> ALTER INDEX IND_BIG_ID REBUILD STORAGE (INITIAL 1M);
    Index altered.
    SQL> ALTER INDEX IND_BIG_STATUS REBUILD STORAGE (INITIAL 1M);
    Index altered.
    SQL> ALTER INDEX IND_BIG_CREATED REBUILD STORAGE (INITIAL 1M);
    Index altered.
    SQL> ALTER INDEX IND_BIG_TYPE REBUILD STORAGE (INITIAL 1M);
    Index altered.
    SQL> ALTER INDEX IND_BIG_TEMP REBUILD STORAGE (INITIAL 1M);
    Index altered.
    SQL> SET TIMING ON
    SQL> TRUNCATE TABLE T_BIG;
    Table truncated.
    Elapsed: 00:00:00.27

重建索引后,TRUNCATE语句执行时间基本恢复正常了。注意到表的初始化EXTENT也将近1GB,对表进行 MOVE缩小初始EXTENT,还可以进一步提高TRUNCATE操作的效率:

    SQL> ALTER TABLE T_BIG MOVE STORAGE (INITIAL 1M);
    Table altered.
    Elapsed: 00:00:00.09
    SQL> TRUNCATE TABLE T_BIG;
    Table truncated.
    Elapsed: 00:00:00.07

耗时恢复正常数量级。

小结

问题解决后深入分析,该问题看似是由于索引和表的INITIAL EXTENT造成的,但是真正导致问题的罪魁祸首是EXP的COMPRESS=Y。正是由于提问者在执行EXP导出时没有设置COMPRESS=N,因此EXP会尝试压缩表的EXTENT:根据表中所有的EXTENT的大小创建一个很大的初始EXTENT,将表的全部EXTENT压缩到一个EXTENT中。如果利用这种情况下导出的dmp文件导入到目标库,就会像例子中那样创建出初始EXTENT非常大的表和索引了。