`

《Pro Oracle SQL》Chapter 3 -- 3.3.2 Sort-Merge Joins

阅读更多

Sort-Merge Joins    排序-合并连接       (page 114)
   Sort-merge joins read the two tables to be joined independently, sorts the rows from each table (but only those rows that meet the conditions for the table in the  WHERE  clause) in order by the join key, and then merges the sorted rowsets.  The sort operations are the expensive part for this join method.  For large row sources that won’t fit into memory, the sorts will end up using temporary disk space to complete.  This can be quite memory and time-consuming to complete.  But once the rowsets are sorted, the merge happens quickly.  To merge, the database alternates down the two lists, compares the top rows, discards rows that are earlier in the sort order than the top of the other list, and only returns matching rows.
    排序-合并连接读取独立连接的两张表,每表都把行按连接键排序(但只有那些满足WHERE子句条件的行),再合并排序好的行集。排序操作是这种连接方法“昂贵”的部分。 对于大的行源内存中就可能放不下,排序最终就必须放到临时磁盘空间去完成。这将需要很多内存且消耗时间。但是一旦行集排序好了,合并将是很快的。合并时,数据库自上而下交替比较两列的连接键值,抛弃(连接键值)在一边有序列中有而另一边没有的行,而只返回匹配行。(注:上面这句话如果直接翻译的话是“抛弃(连接键值)在一有序列中较另一列中早出现的行”,怎么都觉得原文有错)。
    Let’s use the same query used earlier and break it down into how the sort-merge join would be processed.  
    让我们用之前使用的相同查询,分解它,看看排序-合并连接如何进行的。
select empno, ename, dname, loc
from emp, dept
where emp.deptno = dept.deptno
 
This query would be processed as if it we rewritten like the following pseudocode:     查询过程就如下重写的伪代码:
 
select empno, ename, deptno from emp order by deptno
 
select dname, loc, deptno from dept order by deptno
 
compare the rowsets and return rows where deptno in both lists match   比较行集且返回在两个列表中都匹配的deptno的行集。
 
for an outer join, compare the rowsets and return all rows from the first list    对于外连接,比较行集且返回至第一列中的所有行
 
setting column values for the other table to null                                            设定另一表的列值为null
注附上上述查询结果集:

    EMPNO ENAME DEPTNO
DEPTNO DNAME LOC
1 7782 CLARK 10
10 ACCOUNTING NEW YORK
2 7934 MILLER 10
20 RESEARCH DALLAS
3 7839 KING 10
30 SALES CHICAGO
4 8001 Tomcat 11
40 OPERATIONS BOSTON
5 8000 Jack 11



6 7566 JONES 20



7 7369 SMITH 20



8 7902 FORD 20



9 7876 ADAMS 20



10 7788 SCOTT 20



11 7521 WARD 30



12 7844 TURNER 30



13 7499 ALLEN 30



14 7900 JAMES 30



15 7654 MARTIN 30



16 7698 BLAKE 30



















 黄色为 左边列抛弃行

 绿色为 右边列抛弃行



Listing 3-19 shows the plan for this query.
Listing 3-19. Sort-Merge Join 
SQL> select /*+ ordered */ empno, ename, dname, loc
  2  from scott.dept, scott.emp
  3  where emp.deptno = dept.deptno;
 
-----------------------------------------------------------------------------
| Id  | Operation                                           | Name         | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                  |    14    |   462  |     6  (17)    |
|   1 |   MERGE JOIN                                   |                  |    14    |   462  |     6  (17)     |
|   2 |     TABLE ACCESS BY INDEX ROWID | DEPT        |     4     |    80   |     2   (0)     |
|   3 |       INDEX FULL SCAN                        | PK_DEPT |     4      |           |     1   (0)     |
|*  4 |     SORT JOIN                                    |                  |    14    |   182   |     4  (25)    |
|   5 |        TABLE ACCESS FULL                  | EMP         |    14     |   182  |     3   (0)      |
-----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
       filter("EMP"."DEPTNO"="DEPT"."DEPTNO")
 
    I used the same query as before but had to force the plan with an  ordered  hint.  Notice how the plan
operations show a  MERGE JOIN operation followed by an index access on the  dept table and a  SORT JOIN  operation of a full table scan on the  emp table.  The first thing to note is the use of the index scan on dept.  In this case, the optimizer chose to read the table data from the index since the index would
return the data in sorted order.  That means a separate sort step could be avoided.  The emp table was
full scanned and required a separate sort step since there was no index on  deptno that could be used. 

After both rowsets were ready and in sorted order, they were merged together.
    我使用前面相同的查询,但是需要用ordered提示强制(选择)计划。注意计划操作显示MERGE JOIN操作后面跟一个dept表的索引访问,和在emp表上全表扫描的SORT JOIN操作。首先要注意的是使用了在dept上的索引。在这种情况下,优化器选择从索引读取表数据因为索引将返回有序的数据。 这意味着单独的排序步骤就可以省去。 emp表是全表扫描而且需要单独的排序步骤,因为在deptno列上没有索引可用。 在两个行集准备好且排序完后,他们再合并在一起。
    A sort-merge join will access the blocks needed and then do the work to sort and merge them in
memory (or by using temp disk space if there isn’t enough memory).  So, when you do a comparison of
logical reads for a sort-merge join to a nested loops join, particularly for a query against a larger row
source, you will likely find that there are more block accesses required for the nested loops join.  Does
that mean that the sort-merge is a better choice?  It depends.  You have to take into account all the work required to complete the sort and merge steps and realize that work may end up taking much more
time than doing more block accesses might.
    排序-合并连接将访问所需的块再在内存中(或者若没有足够内存就用临时磁盘空间)做排序和合并。因此,当你比较排序-合并连接和嵌套循环连接的逻辑读,特别是针对大行源的查询,你可能发现嵌套循环连接需要更多的块访问。是否说排序-合并连接是更佳的选择?看情况。你必须考虑完成排序和合并步骤所需的所有工作和实现这些工作可能最终要比(直接的)块访问耗费更多的时间。
    Sort-merge joins are typically best suited to queries that have limited data filtering and return lots of
rows.  They are also often a better choice if there are no suitable indexes that can be used to access the
data more directly.  Finally, a sort-merge is often the best choice when the join is an inequality. 
For
example, a join condition of  WHERE table1.column1 between table2.column1 and table2.column2  would
be a candidate for a sort-merge.  As you’ll see in the next section, a hash join is not possible for such a
join; if the row sources are large, the sort-merge will likely be the only viable choice .
    排序-合并连接典型的最适合于有限的数据筛选而返回很多行。若没有合适的索引可直接的访问数据,它通常也是更好的选择。最后,排序-合并通常是不等值连接的最佳选择。 例如连接条件WHERE table1.column1 between table2.column1 and table2.column2排序-合并将是候选者。在下一节你将看到,对于这种连接(情况)不可能用哈希连接(完成)。若是大行集,排序-合并很可能就是唯一 行的选择

 

 



0
0
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics