`

《Pro Oracle SQL》Chapter 3 -- 3.3.4 Cartesian Joins

阅读更多

Cartesian Joins     笛卡尔连接      (page 118)
    Cartesian joins occur when all the rows from one table are joined to all the rows of another table.  Therefore, the total number of rows resulting from the join equals the number of rows from one table ( A) multiplied by the number of rows in the other table ( B) such that  A x B = total rows in the result set.  Cartesian joins often occur when a join condition is overlooked or left out such that there isn’t a specified join column so the only operation possible is to simply join everything from one row source to everything from the other.  
    当一表的全部行连接到另一表全部行(两两互连接)时就发生了笛卡尔连接。因此,连接生成的总行数等于某表A的行数乘以另一表B的行数,既A x B = 结果集的总行数。 笛卡尔连接通常在某连接条件被忽略或者左外连接而又没有指定连接列,使得唯一可能的操作是简单的把一行源的所有行连接到另一个(行源)上。
    Let’s use the same query used earlier, but leave off the WHERE  clause, and break it down into how
the Cartesian join would be processed.  
    我们还是使用之前同样的查询,但是去掉WHERE子句,分析它,看看笛卡尔连接是如何运行的。
select empno, ename, dname, loc
from emp, dept
 
This query would be processed as if it we rewritten like the following pseudocode:     该查询的处理过程如下重写的伪代码:
 
determine the smaller table
 
select dname, loc from dept
 
select empno, ename from emp
 
for each row in dept match it to every row in emp retaining all rows    对于dept表的每行匹配到emp表的每行保留所有行集
 
Listing 3-22 shows the plan for this query.
Listing 3-22. Cartesian Join 
------------------------------------------------------------------
| Id  | Operation                             | Name | Rows  | Bytes  | Cost (%CPU)|
------------------------------------------------------------------
|   0 | SELECT STATEMENT           |          |    56    |  1568  |     9   (0)|
|   1 |   MERGE JOIN CARTESIAN    |           |    56    |  1568  |     9   (0)|
|   2 |      TABLE ACCESS FULL     | DEPT |     4     |    72    |     3   (0)|
|   3 |      BUFFER SORT                |           |    14    |   140   |     6   (0)|
|   4 |        TABLE ACCESS FULL   | EMP  |    14     |   140   |     2   (0)|
------------------------------------------------------------------
 
    Notice the rows estimates in the plan and how the final row estimate is the product of the rows
from the two tables (4 x 14  = 56).  What you end up with in this case is likely a result set that has a whole lot more rows than you want or intended to have.  When plans aren’t checked properly while
developing SQL, Cartesian joins may end up causing the result set to appear to have numerous
duplicate rows.  And, unfortunately, the first thing many people will do is to add a  distinct operator to
the SQL.  This has the effect of getting rid of the duplicates so that the result set is correct, but at a
significant cost.  The duplicates shouldn’t have been there in the first place but since they’re there,
adding distinct will cause a sort to occur and then all the duplicates will be eliminated. 
That’s a lot of
wasted work.  So, make sure to always verify the plan for Cartesian joins if you end up with unexpected
duplicate rows in your result set before you simply add  distinct out of hand
    注意计划中评估的Rows和最后评估的行数是来至两表行数的乘积 (4 x 14  = 56)。在本例中你最终获得的结果集远大于你想要的或者意图有的(数目)。当开发SQL时,没有适当的检查计划,笛卡尔连接可能最终导致结果集出现大量的重复行。然而不幸的是,很多人可能会做的第一件事是给SQL加distinct运算符。首先 重复就不应该出现但是由于他们存在,加distinct 将会导致产生一次排序接着又要去除重复行。 做了很多无用功。因此,如果在你的结果集中出现了不期望的重复行,在你脱手 加distinct之前,务必总是确认计划中是否有笛卡尔连接。
    One thing you’ll notice about the Cartesian join plan is the presence of the  BUFFER SORT  operation. 
This isn’t really a sort but since Oracle is joining every row to every row, using the buffer sort
mechanism to copy the blocks from the second row source out of the buffer cache and into private
memory has the benefit of not requiring the same blocks in the buffer cache to be revisited over and
over. 
These revisits would require a lot more logical reads and would also create more opportunity for
contention on these blocks in the buffer cache.  So, buffering the blocks into a private memory area can
be a much more efficient way to accomplish the repeated join.

    关于笛卡尔连接计划你需要注意的一点是BUFFER SORT操作的出现。这实际上不是排序,但是因为Oralce是每行到每行的连接,使用buffer sort机制从缓冲区缓存拷贝出第二个行源的块,再放入私有内存,有一个好处,不需要一遍又一遍的重复访问在缓冲区缓存中相同的块。 这种重复访问需要多很多逻辑读,也将在缓冲区缓存的这些块上创建更多的竞争机会。因此,缓存这些块到私有内存区PGA将非常更有效率的完成重复连接

 

 

0
0
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics