`

double trouble

    博客分类:
  • CBO
 
阅读更多

double trouble

Filed under: Execution plans,Performance,Tuning — Jonathan Lewis @ 7:06 pm BST May 18,2010 

In the latest Quiz Night, I asked how you could make a query more efficient by changing a two table join into a three table join – with the clue that my third table was a repeat of the first table. Gary Myers, in comment 4,  provided the type of answer I was looking for. Sometimes it is more efficient to get a small amount of data from a table on a first pass then go back and get the rest of the data on a second pass – especially if the first pass is an ‘index only’ operation.

I’ve created a little demonstration that gives you some idea of the approach:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
 
create table t1
as
with generator as (
    select  --+ materialize
        rownum id
    from dual
    connect by
        rownum <= 10000
)
select
    rownum                  id,
    mod(rownum,100)             mod1,
    trunc(dbms_random.value(0,10000))   random1,
    lpad(rownum,10,'0')         small_vc,
    rpad('x',60)                padding
from
    generator   v1,
    generator   v2
where
    rownum <= 100000
;
 
create table t2
as
with generator as (
    select  --+ materialize
        rownum id
    from dual
    connect by
        rownum <= 10000
)
select
    rownum                  id,
    mod(rownum,100)             mod2,
    trunc(dbms_random.value(0,10000))   random2,
    lpad(rownum,10,'0')         small_vc,
    rpad('x',60)                padding
from
    generator   v1,
    generator   v2
where
    rownum <= 100000
;
 
create index t1_i1 on t1(mod1, random1);
create index t2_i1 on t2(mod2, random2);

This creates two tables of 100,000 (fairly short) rows. Note the mod columns which return 1,000 rows per value, and the random columns which return approximately 10 rows per value. When I give Oracle the following query, it overestimates the final result set and chooses what I know to be a relatively resource-intensive execution plan:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
select
    t1.padding,
    t2.padding
from
    t1, t2
where
    t1.mod1 = 50
and t2.random2 = t1.random1
and t2.mod2 = 50
;
 
-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1045 |   138K|   388 |
|*  1 |  HASH JOIN         |      |  1045 |   138K|   388 |
|*  2 |   TABLE ACCESS FULL| T1   |  1000 | 68000 |   193 |
|*  3 |   TABLE ACCESS FULL| T2   |  1000 | 68000 |   193 |
-----------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."RANDOM2"="T1"."RANDOM1")
   2 - filter("T1"."MOD1"=50)
   3 - filter("T2"."MOD2"=50)

This plan is dictated largely by the fact that I have to collect quite a lot of data from both tables then eliminate a large fraction of the data I have collected. This pattern is the driver for what I am about to do: I know that I want a small volume of data eventually but if I have to go to the table at every step of the plan then I will have to do a lot of redundant work and carry a lot of redundant data at some point. Remember – it’s often the case that “visiting the table” is the expensive part of any query.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
select
    /*+
        leading(t1 t2 t3)
        use_nl(t3)
        rowid(t3)
    */
    t3.padding,
    t2.padding
from
    t1,
    t2,
    t1  t3
where
    t1.mod1 = 50
and t2.random2 = t1.random1
and t2.mod2 = 50
and t3.rowid = t1.rowid
;
 
---------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |  1045 |   163K|  1244 |
|   1 |  NESTED LOOPS               |       |  1045 |   163K|  1244 |
|*  2 |   HASH JOIN                 |       |  1045 | 90915 |   199 |
|*  3 |    INDEX RANGE SCAN         | T1_I1 |  1000 | 19000 |     4 |
|*  4 |    TABLE ACCESS FULL        | T2    |  1000 | 68000 |   193 |
|   5 |   TABLE ACCESS BY USER ROWID| T1    |     1 |    73 |     1 |
---------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   3 - SEL$1 / T1@SEL$1
   4 - SEL$1 / T2@SEL$1
   5 - SEL$1 / T3@SEL$1
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."RANDOM2"="T1"."RANDOM1")
   3 - access("T1"."MOD1"=50)
   4 - filter("T2"."MOD2"=50)

I don’t think the optimizer can generate a plan like this at present – but I may be wrong. I’ve reduced my workload by taking advantage of an existing index on table t1 to do a range scan that picks up only the columns that I need to join to t2. In this case the t2 access path was still a full tablescan – but even so I have reduced the workload against t1, and by the time I revisit it by rowid I will only be visiting the (relatively) small number of rows I really need.

(Left as an exercise to the reader: I could have written the query as a four part join, visiting both table segments by rowid for just those rows that I really needed; have a go, and check that you’ve got it right. Don’t forget that any references to the “non-index” columns that appear in the query have to be changed to reference the second occurrence of the table – note how I’ve changed t1.padding in my original query to t3.padding in the rewrite.)

Footnote:
If you think this type of path is a little odd take a look at the typical stucture of a nested loop join that appears under “nlj_batching” in 11g (this isnt the same t1 and t2 as above, by the way):

1
2
3
4
5
6
7
8
9
10
11
12
13
select
        /*+ ordered use_nl(t1) index(t1(n1)) */
        t2.n1, t1.n2
from    t2,t1
where
        t2.n2 = 45
and     t1.n1 = t2.n1;
 
------------------------------------------------------
| Id  | Operation                    | Name  | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT             |       |   225 |
|   1 |  NESTED LOOPS                |       |       |
|   2 |   NESTED LOOPS               |       |   225 |
|*  3 |    TABLE ACCESS FULL         | T2    |    15 |
|*  4 |    INDEX RANGE SCAN          | T1_I1 |    15 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1    |    15 |
------------------------------------------------------

Notice how Oracle can present a single join as two nested loops – one into the index and a second into the table. This is why I think there may be options within the optimizer to do my little trick automatically – if not now, then soon.

Update June 2012

I’ve just had an exchange of email with an Oak Table member who has pointed me to US patent 8103658 (dated November 2009) – which looks like a remarkably good description of this technique. So maybe the method will become an automatic option for the optimizer some time in the next couple of years.

Comment it's also very fantastic, please enjoy on the original post address, which you can find below

 

参考至:http://jonathanlewis.wordpress.com/2010/05/18/double-trouble/#comment-36301

如有错误,欢迎指正

邮箱:czmcj@163.com

分享到:
评论

相关推荐

    Mathematics for Computer Science 2017.7z

    14.6 Double Trouble 14.7 渐进的符号表示(Asymptotic Notation) 15 基数法则(Cardinality Rules) 15.1 由计算另一项计算该项(Counting One Thing by Counting Another) 15.2 计算序列(Counting Sequences) ...

    js_double-trouble

    双重麻烦 开始之前请阅读指南

    HEX-Float转换工具 16进制转成float 或double类型数据的一个小工具

    16进制转成float 或double类型数据的一个小工具。

    double-trouble:适用于您的 BIOE 双专业的网络实用程序

    生物工程双专业网页工具这是一个网络实用程序,可帮助正在考虑攻读双学位的西雅图华盛顿大学生物工程专业的学生。

    华为刷机工具

    测试 * FTP dialogs have trouble with very long directory path. * FTP Open account drop down list very small on Windows 2000. * Relative file paths no longer work with FTP. * Incorrect behavior of ...

    6级作文模板轻松写六级作文

    3To conclude, …..are just like a double-edged sword. With them we may have less trouble dealing with problems in life and enjoy a better-off life. However, one point should be kept in mind that we ...

    Saliency Filters的Matlab实现

    Name: Saliency Filters Reference: Perazzi, Federico, et al. "Saliency filters: Contrast based ...3. if you have a trouble in mexGenerateSuperPixel.mexw64, please use another superpixel method instead.

    经典6级考试作文模板,不看不要后悔有

    3To conclude, …..are just like a double-edged sword. With them we may have less trouble dealing with problems in life and enjoy a better-off life. However, one point should be kept in mind that we ...

    DELPHI7托盘图标控件,230(好用).zip

    give trouble if it was 64 chars. Thanks to Toms Baugis and Teus de Jong. 3) The popup menu would not close itself auto- matically if the StartMinimized property was true. Thanks to Ingo Krueger,...

    arcgis工具

    Dim Output as double Dim pArea as Iarea Set pArea = [shape] 在最后的一个空格里面写入代码(即:字段名)pArea.area 长度计算: Dim Output as double Dim pCurve as ICurve Set pCurve = [shape] ...

    雷达技术知识

    关于雷达方面的知识! EFFECTIVENESS OF EXTRACTING WATER SURFACE SLOPES FROM LIDAR DATA WITHIN THE ACTIVE CHANNEL: SANDY RIVER, OREGON, USA by JOHN THOMAS ENGLISH A THESIS Presented to the Department ...

    WordPress 3 Plugin Development Essentials.pdf

    Double check your interface 230 Documentation 230 Identify the purpose 230 Learning to drive: Keeping it relevant 231 Phrasebooks vs. dictionaries: Give examples 232 Analogy: The three bears 232...

    xplite_trial

    you trouble you can completely remove it, and then reinstall it as cleanly as the day Windows was first installed on your computer. The ultimate security system? There can be little argument that ...

    S7A驱动720版本

    The driver automatically opens its window which shows trace log messages useful for trouble shooting, when the "Yes" Radio button is on. - New HTML Help Build 218 : Solved problems: - Problems ...

    Commercial Wireless Circuits and Components Handbook

    22 Nonlinear RF and Microwave Circuit Analysis Michael B. Steer and John F. Sevic 22.1 Introduction .......................................................................................................

    Google C++ Style Guide(Google C++编程规范)高清PDF

    Table of Contents Header Files The #define Guard Header File Dependencies Inline Functions The -inl.h Files Function Parameter Ordering Names and Order of Includes Scoping Namespaces Nested Classes ...

    java压缩上传图片的工具方法

    java上传图片的压缩方法,只是一个初入java的小白的一些简单积累。还望各个大神指点!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!...

Global site tag (gtag.js) - Google Analytics