`

《Pro Oracle SQL》Chapter3--3.2.2 Index Scan Types

阅读更多

Index Scan Types  索引扫描类型      (page 97)
    There are several different types of index scans but each share some common ground in how they must traverse the index structure to access the leaf block entries that match the values being searched.  First, the root block of the index is accessed with a single block read.  The next step is to read a branch block.  Depending on the height of the index, one or more branch blocks may need to be read.   Each read is for a separate single block.  Finally, the first index leaf block that contains the start of the index entries
needed is read.  If the height of an index is 4, to get to the leaf block needed, 4 single block reads will be
performed.  At this point, the rowid for the first matching index value in the leaf block is read and used
to make a single block read call to retrieve the table block where the entire row resides.  Therefore, in
this example, to retrieve a single row from a table using an index, Oracle would have to read 5 blocks: 4
index blocks and 1 table block.   
    有好几种类型的索引扫描,但是都享有一些共同点,它们如何必须遍历索引结构获取访问匹配搜索值的叶块记录。 首先,用一次单块读访问索引的根块。下一步是读一个分支块。依赖于索引的高度,可能需要读取一个或多个分支块。 对每个独立的单块都要读取一次。最后,第一个索引叶块,包含所需索引记录的开端(的那个块),被读入。如果索引的高度是4,要获取所需的叶块,就要执行4次单块读。从这点起,在叶块中第一次匹配的索引值的rowid被读入,且用于一单块读调用检索整个行驻留的表块。因此,在本例中,用一索引从表中检索一单行,Oracle必须读5个块:4个索引块和一个表块。
    The various index scan types you will review are index range scan, index unique scan, index full
scan, index skip scan, and index fast full scan.  An index fast full scan is actually more like a full table
scan, but since they are scans against an index structure I’ll cover them in this section.
    你将学到各种索引扫描类型:索引范围扫描,索引唯一扫描,索引全扫描,索引跳跃扫描,索引快速全扫描。索引快速全扫描实际上更像全表扫描,但是他是扫描索引结构,我将在本节中讨论它。
    Before I review the different scan types, I want to point out a very important index statistic called
clustering factorThe clustering factor statistic of an index helps the optimizer generate the cost of
using the index and is a measure of how well ordered the table data is as related to the indexed values.
 
Recall that index entries are stored in sorted order while table data is stored in random order.  Unless
an effort has been made to specifically load data into a table in a specific order, you are not
guaranteed where individual rows of data will end up.
  For example, rows from the orders table that
share the same order_date may not all reside in the same blocks.  They are likely to be scattered
randomly across the blocks in the table.
    在我讨论不同的扫描类型之前,我要指出一个非常重要的索引统计信息称为“(聚)簇因子 ”。一个索引的簇因子统计帮助优化器产生索引使用的成本而且是相关索引值的表数据如何有序排列的度量。 回想一下,索引记录是按顺序存储的而表数据是随机顺序存储的。除非故意以特定的顺序向表装载数据,你不能保证行之间的数据会结束。 例如,orders表的行共享相同的order_date列,可能不会驻留在相同的块中。它们可能随机的、跨越的分布的在表的多个块中。
    The clustering factor of an index indicates to the optimizer if data rows containing the same
indexed values will be located in the same or a small set of contiguous blocks, or if rows will be
scattered across numerous table blocks. 
Figure 3-5 shows how the rows might be stored physically in
the table’s data blocks.
    一索引的簇因子向优化器指出是否包含相同的索引值的数据行将位于相同的或一小段连续的块中,或者行是否跨越了很多表块。 图3-5展示了行是如何可能物理的存入表的数据块中。
    In the diagram showing table  T1, you see how rows containing the value 2 were loaded into the
same block.  But, in table  T2, rows with a value of 2 are not loaded in contiguous blocks.  In this
example, an index on this column for table  T1 would have a lower clustering factor.  Lower numbers
that are closer to the number of table blocks are used to indicate highly ordered, or clustered, rows of
data based on the indexed value. 
The clustering factor for this column in table T2, however, would be
higher and typically closer to the number of rows in the table.  Listing 3-8 shows the clustering factor
statistic for each of these two tables.

    在图中所示表T1,你看到包含值2的行是如何装载在相同的块中的。但是在表T2中,带有值2的行没有装载进连续的块中。在本例中,在表T1这列的索引将有一个低的簇因子。数值越低、接近表的块数表明基于索引值的数据行是高有序的,或聚集的。然而T2表的簇因子将会较高而且典型的接近表中的行数。 列表3-8 展示了这两张表各自的簇因子统计信息。


 
Figure 3-5. Diagram of random vs. sequentially loaded row values     随机和顺序装载行记录比较图


Listing 3-8. Index clustering_factor
SQL> select t.table_name||'.'||i.index_name idx_name,
 2          i.clustering_factor, t.blocks, t.num_rows
 3     from user_indexes i, user_tables t
 4    where i.table_name = t.table_name
 5      and t.table_name in ('T1','T2')
 6    order by t.table_name, i.index_name;
 
IDX_NAME        CLUSTERING_FACTOR          BLOCKS        NUM_ROWS
--------------- ----------------- --------------- ---------------
T1.T1_N1                      152             164           10000
T2.T2_N1                    10000             164           10000
 
2 rows selected.

     As demonstrated earlier in this chapter (see Listings 3-3 and 3-4), the optimizer would choose an
index scan when querying table T1 but a full table scan when querying table  T2.  The  clustering_factor 
was the key piece of information that helped the optimizer make that decision.

So, while clustering factor is a statistic associated with an index, it is computed by looking at the
blocks of data in the table.
  When computing clustering factor, Oracle will do something similar to what
is shown in Listing 3-9.  
    正如本章之前所示(查看列表3-3和3-4),查询表T1时优化器将选择索引扫描,但是当查询T2时将是全表扫描。簇因子是帮助优化器做出决定的关键信息。因此,虽然簇因子是与索引关联的统计信息,它的计算需查看表中的数据块。 当计算簇因子时,Oracle将会做类似列表3-9所示的那些事。
Listing 3-9. Computing Index clustering_factor
SQL> select t.table_name||'.'||i.index_name idx_name,
  2        i.clustering_factor, t.blocks, t.num_rows
  3   from all_indexes i, all_tables t
  4  where i.table_name = t.table_name
  5    and t.table_name = 'EMPLOYEES'
  6    and t.owner = 'HR'
  7    and i.index_name = 'EMP_DEPARTMENT_IX'
  8  order by t.table_name, i.index_name;
 
IDX_NAME                        CLUSTERING_FACTOR BLOCKS NUM_ROWS
------------------------------- ----------------- ------ --------
EMPLOYEES.EMP_DEPARTMENT_IX                     7      5      107
 
1 row selected.
SQL> select department_id, last_name, blk_no,
  2         lag (blk_no,1,blk_no) over (order by department_id) prev_blk_no,
  3         case when blk_no != lag (blk_no,1,blk_no) over (order by department_id)
  4                or rownum = 1
  5              then '***  +1'
  6              else null
  7        end cluf_ct
  8  from (
  9  select department_id, last_name,
 10          dbms_rowid.rowid_block_number(rowid) blk_no
 11    from hr.employees
 12   where department_id is not null
 13   order by department_id
 14  );
 
DEPARTMENT_ID     LAST_NAME          BLK_NO         PREV_BLK_NO      CLUF_CT
-------------------------    ---------------           -------              ------------               -------
           10                   Whalen                 84                             84                  ***  +1
           20                   Hartstein               84                             84
           20                   Fay                      84                             84
           30                   Raphaely              88                             84                  ***  +1
           30                   Colmenares          88                             88
...
           30                   Himuro                  88                             88
           40                   Mavris                   84                            88                  *** +1
           50                   OConnell               84                            84
           50                   Grant                    84                            84
           50                   Weiss                   88                            84                  *** +1
           50                   Fripp                     88                            88
           50                   Kaufling                 88                            88
...
           70                   Baer                     84                            88                  *** +1
           80                   Bates                   88                            84                   *** +1
           80                   Smith                   88                            88
          100                  Sciarra                 88                            88
          110                  Gietz                    84                            88                   *** +1
          110                  Higgins                 84                            84
 
106 rows selected.
 
    As I mentioned, this isn’t precisely how the clustering factor is computed, but this query can help
you see how it is done in general terms.  Note that I deleted some of the output rows for brevity, but left
enough of the output so you could see where the block number for the row changed from the previous
row’s block number.
  Clustering factor is computed by adding one to a counter each time the block
number for the current row is different from the previous row.
  In this example, that happens seven
times.  What this number is supposed to represent is seven different  table blocks that hold data for this
table.  As you can see from the output, there are really only two blocks that contain data (block
numbers 84 and 88).  In reality, the clustering factor isn’t exactly accurate.  In this case, it is off by a
factor of 3.5.  

    我所示的,簇因子是如何计算的是不精确的,但是这个查询有助于你理解一般 意义上它是如何做的。注意为了简单起见我删除 了一些行输出,但是留下足够的输出可让你从之前行的块号看出行改变处的块号。簇因子计算就是:每次当前行的块号不同于之前行的块号时就对计数器加1。 你可以从输出中看出,实际上只有两个块包含数据(块号84和88)。实际上,簇因子不是很准确的。例子中,它的因子是3.5
    Although most of the time this inaccuracy in the way clustering_factor  is computed won’t make
enough difference to cause the optimizer to over-cost the index enough to prevent it from being
chosen, it is possible that situation could occur.  If the optimizer doesn ’t choose the index you expect, it
may choose another index that can satisfy the predicate that contains similar columns.  In these
situations, you may need to do a careful analysis of the indexes you have created to see if there is a way
to consolidate several indexes into a single compound index.
  Do not make the mistake of rebuilding
the index thinking it will help “fix” the  clustering_factor .  As I have demonstrated here, the
clustering_factor  is related to the table data, not the index.
  So, rebuilding the index won’t have any
effect on it. 

    虽然大部分时候簇因子计算的不准确,不会产生足够的差异使得优化器高估索引成本 ,而不选择它。这种情况可能发生。如果优化器没有选择你期望的索引,它可能选择其他索引,能满足包含相似列的谓词。在这种情况下,你可能需要小心的分析你所创建的索引,看是否可能联合几个索引成一个复合索引。不要错误的认为重构索引将帮助“修复”簇因子。正如我在这里所示的,簇因子与表数据相关,不是和索引。 所以,重建索引不会对它有任何影响。
    On the other hand, if you start to consider rebuilding the table to improve the  clustering_factor ,
proceed with caution.  Tables typically have numerous indexes.  You can’t rebuild the table to make the
order match one index without causing it to be less ordered by other columns.  So, a rebuild may help in
relation to one index but hurt others.  Also, rebuilding tables is typically a time-consuming and
resource-intensive process.  Just because you rebuild the table in a particular order today doesn’t
mean it’s going to stay in that order over time as rows are inserted, updated, and deleted.  As you
proceed through the rest of the book, you’ll learn enough to understand when clustering_factor  may
be part of a problem and you’ll likely be able to find ways to adjust for it if needed.  
    另一方面,如果你开始考虑重建表来改进簇因子,请小心进行。表典型的都有好几个索引。你不能重建表使得(行的)顺序匹配一个索引而又使它对于其它的列失序。因此,重构表可能有助于(改善)与一个索引的关系但是伤及其它。 而且,重建表通常是消耗时间和资源密集型 的操作。只是因为今天你重建表成特定的顺序并不意味着它保持那个顺序,随着行的插入、更新和删除。当你学完本书其它部分,你将充分理解何时簇因子可能是问题的一部分,而如果必要你将可能找出方法调整它。
   NOTE   In each of the following examples that explain plan output, the output has been edited; I’ve removed the Time column for brevity.
    注意   在下面的每个例子中的解释计划输出,输出结果被编辑过;为了简洁我去除了Time列。

  • 大小: 64.9 KB
0
1
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics