`

《Pro Oracle SQL》Chapter 3 -- 3.3.1 Nested Loops Joins

阅读更多

Nested Loops Joins  嵌套循环连接       (page 111)
    Nested loops joins use each row of the query result reached through one access operation to drive into another table.  These joins are typically most effective if the result set is limited in size and indexes
are present on the columns used for the join. 
With nested loops, the cost of the operation is based on
reading each row of the outer row source and joining it with the matching row of the inner row source.  
    嵌套循环连接使用触及到的查询结果集中每一行,通过一次访问操作驱动另一张表。 如果结果集是有限大小且用于连接的列上有索引存在时,这些连接通常是最有效率的。 因为用了嵌套循环,操作的成本是基于读取外部行源的每一行同内部行源的匹配行相连接(算出来的)。
    A nested loops join is, as its name implies, a loop inside a loop.  The outer loop is basically a query
against the driving table that uses only the conditions from the WHERE  clause that pertain to that table.  As rows pass the outer conditional check and are confirmed to match the request, they are passed into
the second inner loop
one at time. Each row is then checked to see if it matches the joined-to table
based on the join column. If the row matches this second check, it is then passed on to the next step in
the plan or is included in the final result set if no further steps are present.
    嵌套循环连接是,如名字所暗示的,一循环在另一循环中。外部循环是对驱动表的基本的查询,只使用WHERE子句中属于那张表的条件。通过了外部条件检查和确认匹配需求的行集,它们一次一个的传给第二个内循环。 接着根据连接列,检查每一行是否匹配被连接表。如果行匹配第二次检查,它再传到计划中的下一步或者若没有再多的步骤了则包含到最终的结果集中。
    These kinds of joins are quite robust in that they use very little memory. Since row sets are built one row at a time, there is little overhead required. For that reason, they are actually good for huge result sets except for the fact that building a huge result set one row at a time can take quite a long time. That's why I mentioned earlier that nested loops are typically best when the result sets are smaller. The primary measurement for nested loops is the number of block accesses required to prepare the final result set.
    这种连接非常稳健,因为他们使用非常少的内存。由于行集是一次构建一行,所以开销较小。因此,他们很适合巨型结果集,除了事实上构造巨型结果集一次一行很耗时间外。这就是为什么我之前提到的,嵌套循环典型的最适用于当结果集较小时(的场景)。嵌套循环的(成本的)主要衡量标准是准备最终结果集需要访问的块数。
    Let's take a simple query and break it down into how the nested loop join would be processed.
    我们考察一个简单的查询再分解它,看看嵌套循环是如何进行的。
    select empno,ename,dname,loc
     from emp,dept
    where emp.deptno = dept.deptno
    This query would be processed as if it were written like the following pseudocode:
    该查询的处理过程将如下面所写伪代码所示:
    for each row in (select empno,ename, deptno from emp) loop
                 for (select dname, loc from dept where deptno = outer.deptno) loop
                       If match then pass the row on to the next step                  如果匹配则把行传递到下一步
                       If inner join and no match then discard the row                  如果是内连接而不匹配则抛弃该行
                       If outer join and no match set inner column values to null    如果是外连接而不匹配则设置内部列值为null
                                  and pass the row on to the next step                             再把行传递到下一步
                 end loop
          end loop
    List 3-17 shows the plan for this query .     列表3-17 展示这个计划的查询
List 3-17. Nested Loops
SQL> set autotrace traceonly explain
SQL>
SQL> select empno,ename,dname,loc
     2   from emp,dept
     3   where emp.deptno = dept.deptno;
    Execution Plan
    -------------------------------------------------------------------------------------------------------
    Plan hash value: 351108634
    ----------------------------------------------------------------------------------------------------------
    | Id |  Operation                                             | Name          | Rows | Bytes |  Cost (%CPU) |
    ----------------------------------------------------------------------------------------------------------
    |   0|  SELECT  STATEMENT                         |                    |   14   |     462 |      4(0) |
    |   1|     NESTED  LOOPS                               |                   |   14    |    462 |     4(0) |
    |   2|       TABLE ACCESS FULL                     |   EMP          |   14    |   182 |     3(0) |
    |   3|       TABLE ACCESS BY INDEX ROWID |  DEPT         |      1   |     20  |     1(0) |
    | * 4 |        INDEX UNIQUE SCAN                   |  PK_DEPT   |    1     |          |     0(0) |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
 
    The plan shows the nested loops method with the emp table as the driving table and the  dept table
as the inner (or driven-to) table.  With a nested loops plan, the first table listed after the  NESTED LOOPS
operation is the driving table. 
That table will be accessed via the method chosen for it.  In this case, it is
a full table scan on  emp.  That means that all the blocks in the  emp table are read using multiblock reads, then each row is accessed one at a time, and the deptno (the join column) is passed to the inner loop query against the dept table.  For an inner join, each row where there is a match on the dept table’s
deptno column, the row will be returned.  For an outer join, each row from  emp will be returned and null
values will be used to populate the columns from  dept.
    计划显示嵌套循环方法用emp表作为驱动表而dept表作为内部表(或者被驱动表)。从嵌套循环计划可以看出,在NESTED LOOPS操作之后的第一张表就是驱动表。 该表被选择的方法访问。在本例中,在emp表上全表扫描。这意味着emp表的所有块通过多块读取得,再一次一行的访问,再把deptno(连接列)传递给查询dept表的内循环。对于内连接,匹配dept表deptno列的每一行将会返回。对于外循环,来至emp的每行都将返回且null值将用于填充dept的列。
    If you’re wondering why the  emp table was chosen as the driving table, just take a second to think
about the query.  The query is asking for all rows where there is a match between the two tables on
deptno.  In my test, the emp table did not have an index on  deptno so the only way it could be accessed was with a full table scan.  Since the way a nested loops join works is to process the inner join for each row of the outer table, if the dept table had been the driving table, for every row in  dept a full table scan on  emp  would have occurred.  On the other hand, driving the join with the  emp table means that only one full table scan is needed, and since there is an index on deptno in the dept table (it’s the
primary key), the inner loop can directly access the row it needs from dept. 
Listing 3-18 shows the
comparison of the autotrace statistics output for both join orders.
    如果你奇怪为什么emp表被选作驱动表,只需要花点时间思考一下查询。查询要求获取所有在两表之间匹配deptno列的行集。在我的测试中,emp表在 deptno上没有索引,所以只能通过全表扫描访问。由于嵌套循环连接的工作方式是对外表的每一行进行内连接处理,如果dept表作为驱动表,对于 dept中的每行都要在emp表上发生一次全表扫描。另一方面,用emp表驱动连接意味着只需要一次全表扫描,而且在dept表的deptno上有索引 (它是主键),内部循环能直接从dept访问所需的行。 列表3-18展示了两种连接顺序的autotrace统计输出的对比。

Listing 3-18. Nested Loops Join Order Comparison
SQL> set autotrace traceonly statistics
SQL>
SQL> select empno, ename, dname, loc
  2  from scott.emp, scott.dept
  3  where emp.deptno = dept.deptno;
 
Statistics
----------------------------------------------------
          0  recursive calls
          0  db block gets
         24  consistent gets
          0  physical reads
          0  redo size
        999  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         14  rows processed
 
SQL> select /*+ ordered use_nl (dept emp) */ empno, ename, dname, loc
  2  from scott.dept, scott.emp
  3  where emp.deptno = dept.deptno;
Statistics
-----------------------------------------------------
          0  recursive calls
          0  db block gets
         37  consistent gets
          0  physical reads
          0  redo size
        853  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         14  rows processed
 
    I had to use hints ( hints will be covered later in the book) to force the optimizer to choose a plan
with the dept table as the driving table.  Notice that when the join is driven by dept, the logical reads
( consistent gets ) are higher than when the join is driven by the emp  table.  So, the optimizer made the
correct join order choice by choosing to lead with  emp.  One of the keys to optimizing performance is to
make sure that only work that needs to happen is done. 
The extra work (i.e. extra logical reads) that
would have occurred if the dept table had been the driving table was avoided with this join order
choice.
    我必须使用提示(提示将在本书后面的章节讲解)强制优化器选择使用dept表作为驱动表的计划。注意连接是由dept驱动的,逻辑读(一致获取)将高于由 emp表做驱动的连接。如此,优化器通过前置emp做出正确的连接顺序选择。优化性能的一个重要原则是务必只是把所需的工作做完即可。 若dept表作为驱 动表就会产生额外的工作(如:额外的逻辑读),通过连接顺序的选择就能避免。

 

 

2
0
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics