`

《Pro Oracle SQL》--Chapter 5--5.5 Questions about Data

阅读更多

Questions about Data  关于数据的问题 (page 134)
    I hope at this point you agree that you do need to concern yourself with how data is stored and how it should be accessed.  Where do you find this information?  The database can give you most of the answers you need by executing a few simple queries.  Once you have this data, you then need to determine how data should be accessed.  This comes from understanding how the various access and join methods work and when it is appropriate to use each.  I’ve already covered access and join methods, so you’ve got the information you need to help you there.  But how do you discover how the data is stored?  Let’s walk through the questions you need to ask and queries you can execute to get the answers.
    我希望现在起你开始关心数据是如何存储及访问了。你能从哪获得这些信息?你只要执行几个简单的查询数据库就能给你大多数答案。一旦你有了数据,你就判断数 据是如何访问的。这需要理解各种访问和链接方法是如何工作的以及何时适当的使用它们。我已经讲解过访问和连接方法,你可以在那(第四章)获得这些(知识) 信息。但是你如何发现数据是怎么存储的?让我们一起过一下你需要问的问题以及演示运行能获得答案的查询。
    As a first step, try to think like the optimizer would.  The optimizer needs statistics and instance
parameter values to be able to compute a plan.  Therefore, it’s a good idea for you to put yourself in the optimizer’s place and gather the information that will help to formulate the execution plan.  You should always seek out the answers to the following questions about the data:
•  Which tables will be needed to gather all the data required?
•  Are any of the tables partitioned, and if so, how are the partitions defined?
•  What columns are in each table?
•  What indexes are available in each table?
•  What are the statistics for each table, column, and index?
•  Are there histograms on any of the columns?
     第一步,像优化器一样思考。优化器需要统计信息和实例参数值来计算(生成)一个(执行)计划。因此,你设身处地的站在优化器的立场上收集能计算执行计划信息是聪明的想法。你应当总是查找下面关于数据的问题的答案:
• 为了收集所有需要的数据要用到哪些表?
• 有哪些表分区了?如果是这样,分区是如何定义的?
• 每张表有哪些列?
• 每张表,每个列,每个索引的统计信息有哪些?
• 有哪些列分簇了?
    Statistics help the optimizer paint a picture of how the various ways of accessing and joining data
will perform.  You can know what the optimizer knows.  All you need to be able to do is query the
information from the data dictionary.  One thing to keep in mind when you’re reviewing statistics is that statistics may or may not accurately represent your data.  If the statistics are stale, missing, or poorly collected, it’s possible that they paint the wrong picture.  The optimizer can only know what the
statistics tell it.  You, on the other hand, have the ability to determine if the statistics make sense.  For
example, if a date column in one of your tables has a high value of six months ago, you can quickly see
that and know that rows exist with current date values.  That visual inspection can help you determine if statistics need to be updated.  But you can’t know these kinds of things unless you look.  A key question you must always ask is whether or not the statistics accurately represent your data.  Listing 5-3 uses a single script named st-all.sql (previously used in Chapter 2) to answer each of the questions listed above in one simple script. It gives you a single source to review to verify how representative the available statistics really are.
    统计信息帮助优化器绘制出一幅如何访问和连接数据的运行图。你能知道优化器知道什么。你都能查询数据字典获得这些信息。请记住一件事:你查看的统计信息可能,也可能不代表你的数据。如果统计信息过时了,遗漏了,或者收集不良。有可能就会绘制错误的(运行)图。优化器只知道统计所告诉它的信息。而你,从另一 反面,有能力判断统计是否合理。例如,如果你的表的一个数据列有一个6个月前的高位值,你能快速的看出存在当前日期的行。目视检查能帮你确定是否统计需要 更新。除非你检查才知道这种事。一个关键的你需要不断问的问题是:统计是否准确的代表你当前的数据。列表5-3用一个名叫st-all.sql的简单脚本 (之前在第二章使用过)来解答所有上面列出的问题。它作为单一来源让你去复习,确认有效的统计实际上是如何具有代表性的。
SQL> @st-all
Enter the owner name: sh
Enter the table name: sales
======================================================================================
  TABLE STATISTICS
======================================================================================
Owner         : sh
Table name    : sales
Tablespace    : EXAMPLE
Partitioned   : yes
Last analyzed : 09/03/2010 20:17:03
Sample size   : 918843
Degree        : 1
# Rows        : 918843
# Blocks      : 1769
Empty Blocks  : 0

Avg Space     : 0
Avg Row Length: 29
Monitoring?   : yes
======================================================================================
  PARTITION INFORMATION
======================================================================================
 Part# Partition Name      Sample Size          # Rows        # Blocks 
------ --------------- --------------- --------------- --------------- 
     1 SALES_1995      .                             0               0 
     2 SALES_1996      .                             0               0 
     3 SALES_H1_1997   .                             0               0 
     4 SALES_H2_1997   .                             0               0 
     5 SALES_Q1_1998             43687           43687              90 
...
    28 SALES_Q4_2003   .                             0               0 
 
 Part# Partition Name  Partition Bound                                                
------ --------------- -------------------------------------------------------------
     1 SALES_1995      TO_DATE(' 1996-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', ... 
     2 SALES_1996      TO_DATE(' 1997-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', ... 
     3 SALES_H1_1997   TO_DATE(' 1997-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', ... 
     4 SALES_H2_1997   TO_DATE(' 1998-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', ... 
     5 SALES_Q1_1998   TO_DATE(' 1998-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', ... 
...

    28 SALES_Q4_2003   TO_DATE(' 2004-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', ...     
======================================================================================   
  COLUMN STATISTICS
======================================================================================
 Name           Null?  NDV      Density  # Nulls   # Bkts  AvgLen  Lo-Hi Values
======================================================================================
amount_sold     N      3586     .000279  0         1       5       6.4 | 1782.72
channel_id      N      4        .250000  0         1       3       2 | 9
cust_id         N      7059      .000142  0         1       5       2 | 101000
prod_id         N      72       .000001  0         72      4       13 | 148
promo_id        N      4        .000001  0         4       4       33 | 999
quantity_sold   N      1       1.000000  0         1       3       1 | 1
time_id         N      1460     .000685  0         1       8       01/01/1998 00:00:00 | 
                                                                   12/31/2001 00:00:00

======================================================================================
  HISTOGRAM STATISTICS     Note: Only columns with buckets containing > 5% are shown.
======================================================================================
 
PROMO_ID (4 buckets)
1 97%
 
======================================================================================
  INDEX INFORMATION
======================================================================================
 
Index Name                                Dstnct  Lf/Blks Dt/Blks Cluf Unq? Type Part? 
                    BLevel Lf Blks # Rows   Keys  /Key    /Key
------------------ ------- ------- ------ ------ ------- ------- ----- ---- ---- ----- 
SALES_CHANNEL_BIX        1      47     92      4      11      23    92 NO   BITM YES   
SALES_CUST_BIX            1     475  35808   7059       1       5 35808 NO   BITM YES   
SALES_PROD_BIX           1      32   1074     72       1      14  1074 NO   BITM YES   
SALES_PROMO_BIX          1      30     54      4       7      13    54 NO   BITM YES   
SALES_TIME_BIX           1      59   1460   1460       1       1  1460 NO   BITM YES   
 
Index Name                           Pos# Order Column Name
------------------------------ ---------- ----- ------------------------------
sales_channel_bix                       1 ASC   channel_id
sales_cust_bix                          1 ASC   cust_id
 
sales_prod_bix                          1 ASC   prod_id
 
sales_promo_bix                         1 ASC   promo_id
 
sales_time_bix                          1 ASC   time_id
 
======================================================================================    
  PARTITIONED INDEX INFORMATION
======================================================================================    
 
Index: SALES_CHANNEL_BIX
                                           Dst LfBlk DtBlk 
Part# Partition Name  BLevel LfBlks # Rows Keys /Key  /Key CluF Partition Bound
----- --------------- ------ ------ ------ ---- ----  ---- ---- ----------------------
    1 SALES_1995           0      0      0    0    0     0    0 TO_DATE('1996-01-01...
    2 SALES_1996           0      0      0    0    0     0    0 TO_DATE('1997-01-01...
    3 SALES_H1_1997        0      0      0    0    0     0    0 TO_DATE('1997-07-01...
    4 SALES_H2_1997        0      0      0    0    0     0    0 TO_DATE('1998-01-01...
    5 SALES_Q1_1998        1      2      5    4    1     1    5 TO_DATE('1998-04-01...
...
   28 SALES_Q4_2003        0      0      0    0    0     0    0 TO_DATE('2004-01-01...
                                                                                  
Index: SALES_CUST_BIX                                                              
                                                                                  
    1 SALES_1995           0      0      0    0    0     0    0 TO_DATE('1996-01-01...
    2 SALES_1996           0      0      0    0    0     0    0 TO_DATE('1997-01-01...
    3 SALES_H1_1997        0      0      0    0    0     0    0 TO_DATE('1997-07-01...
    4 SALES_H2_1997        0      0      0    0    0     0    0 TO_DATE('1998-01-01...
    5 SALES_Q1_1998        1     28   3203 3203    1     1 3203 TO_DATE('1998-04-01...
...
   28 SALES_Q4_2003        0      0      0    0    0     0    0 TO_DATE('2004-01-01...
                                                                                  
Index: SALES_PROD_BIX                                                             
    1 SALES_1995           0      0      0    0    0     0    0 TO_DATE('1996-01-01...
    2 SALES_1996           0      0      0    0    0     0    0 TO_DATE('1997-01-01...
    3 SALES_H1_1997        0      0      0    0    0     0    0 TO_DATE('1997-07-01...
    4 SALES_H2_1997        0      0      0    0    0     0    0 TO_DATE('1998-01-01...
    5 SALES_Q1_1998        1      2     60   60    1     1   60 TO_DATE('1998-04-01...
...
   28 SALES_Q4_2003        0      0      0    0    0     0    0 TO_DATE('2004-01-01...
                                                                                  
Index: SALES_PROMO_BIX                                                            
                                                                                  
    1 SALES_1995           0      0      0    0    0     0    0 TO_DATE('1996-01-01...
    2 SALES_1996           0      0      0    0    0     0    0 TO_DATE('1997-01-01...
    3 SALES_H1_1997        0      0      0    0    0     0    0 TO_DATE('1997-07-01...
    4 SALES_H2_1997        0      0      0    0    0     0    0 TO_DATE('1998-01-01...
    5 SALES_Q1_1998        0      1      3    2    1     1    3 TO_DATE('1998-04-01...
...
   28 SALES_Q4_2003        0      0      0    0    0     0    0 TO_DATE('2004-01-01...
                                                                                  
Index: SALES_TIME_BIX                                                             
                                                                                  
    1 SALES_1995           0      0      0    0    0     0    0 TO_DATE('1996-01-01...
    2 SALES_1996           0      0      0    0    0     0    0 TO_DATE('1997-01-01...
    3 SALES_H1_1997        0      0      0    0    0     0    0 TO_DATE('1997-07-01...
    4 SALES_H2_1997        0      0      0    0    0     0    0 TO_DATE('1998-01-01...
    5 SALES_Q1_1998        1      3     90   90    1     1   90 TO_DATE('1998-04-01...
...
   27 SALES_Q3_2003        0      0      0    0    0     0    0 TO_DATE('2003-10-01...
   28 SALES_Q4_2003        0      0      0    0    0     0    0 TO_DATE('2004-01-01...
    With this information, you can answer almost any question about the data.  It is best if these
statistics are from your production database where the SQL you are writing will be executed.  If your
development database doesn’t have a copy of the production statistics, it’s a good idea to request that the production stats be imported into the development database so that the optimizer is formulating plans based on information that is as close to production as possible.  Even if the data doesn’t match, remember that it’s the statistics that the optimizer uses to determine the plan.
   通过上面的信息,你可以解答几乎任何关于数据的问题。如果这些统计来自于你所写SQL执行的生产数据库那将是最好的。如果开发数据库没有复制生产数据库的统计。最好是请求将生产数据库统计导出到开发数据库中这样优化器就能基于尽可能接近生产数据库的信息来计算生成计划。即使数据不匹配,也要记住优化器使 用统计去决定计划。
    Now that you’ve obtained the statistics, you can use the information to ask, and answer, questions
about what you’d expect the optimizer to do with your SQL.  For example, if you were writing a query
that needed to return all sales data for a specified customer (cust_id), you might want to know how
many rows the optimizer will estimate the query to return.  With the statistics information you have
queried, you could compute the number of rows estimated to be returned by the query to be 130
(918,843 total rows x 1/7,059 distinct values).  You can see that there is an index on cust_id, so the
proper access operation to use to satisfy the query should be the SALES_CUST_BIX index.  When you
execute the query, you can verify this operation is selected by checking the execution plan.
    既然你获得了统计信息,你能用这些信息,思考优化器对你的SQL做什么优化。例如,如果你写一个查询需要返回针对某一顾客(customer)所有销售记 录,你可能想要知道优化器估计查询查询返回多少行记录。有了刚查出的统计,你就可以算出(优化器)估计查询返回的条数是130行(总数918,843条 /distinct值7059)。你可以看到这里有一个索引在cust_id上,所以满足查询的适当访问操作应该(使用)SALES_CUST_BIX索 引。当你执行查询,你就能通过检查执行计划确认这个操作被选择了。
     In Chapter 3, I discussed the index statistic called clustering factor.  This statistic helps the optimizer compute how many blocks of data will be accessed.  Basically, the closer the clustering factor is to the number of blocks in the table, the fewer the estimated number of blocks to be accessed when using the index will be.  The closer the clustering factor is to the number of rows in the table, the greater the estimated number of blocks will be.  The fewer blocks to be accessed, the lower the cost of using that index and the more likely it is that the optimizer will choose that index for the plan.  Therefore, you can check this statistic to determine how favorable the index will appear.  Listing 5-4 shows the clustering factor statistics for the SALES table.
   在第三章,我讨论了索引统计被称为簇因子(clustering factor). 这个统计帮助优化器计算有多少个块将被访问。基本上,簇因子越接近表的块数,估计的访问块数就越小,将使用索引。簇因子越接近表的行数,估计的访问块数就 越大。访问的块越少,使用索引访问的成本就越低,就越有可能优化器将在执行计划中使用索引。因此,你能检查这个统计信息来判断使用索引的可能性有多大。列 表5-4显示了SALES表的簇因子统计信息。
Listing 5-4. Index clustering_factor
SQL> select t.table_name||'.'||i.index_name idx_name,
 2          i.clustering_factor, t.blocks, t.num_rows
 3     from user_indexes i, user_tables t
 4    where i.table_name = t.table_name
 5      and t.table_name = 'SALES'
 6    order by t.table_name, i.index_name;
 
IDX_NAME                 Clustering Factor        # Blocks          # Rows
------------------------ ----------------- --------------- ---------------
SALES.SALES_CHANNEL_BIX                 92            1769          918843
SALES.SALES_CUST_BIX                 35808            1769          918843
SALES.SALES_PROD_BIX                  1074            1769          918843
SALES.SALES_PROMO_BIX                   54            1769          918843
SALES.SALES_TIME_BIX                  1460            1769          918843
 
5 rows selected.

     In this case, the clustering factors for all of the indexes for the SALES table have a low value (i.e.
closer to the number of blocks in the table).  That is a good indication that when the optimizer computes the cost of using these indexes, they will not be weighted too heavily based on the estimated number of blocks they will return if used.
    在这个例子中,SALES表的所有索引的簇因子都是低位值(例如,接近表的块数)。当优化器计算使用这些索引的成本时,这是一个好的指标。如果使用索引,基于估计的它们返回的块数,成本不会评估的很高。
    In addition to using statistics, you can execute actual queries against the tables to get an idea of the data and number of rows that will be accessed or returned from a single table.  Regardless of how
complex a statement is, you can do just what the optimizer would do and break the statement down into single table accesses.  For each table involved, simply execute one or more queries to count and review the data that would be returned using the filter conditions your SQL will use.  As discussed previously, always think “divide and conquer.”  Breaking a statement down into small increments will help you understand how best to put it together in the most efficient way to arrive at the final result.
    除了使用统计,你能通过实际查询表来获得从单表访问或返回的数据和行数的印象。无论多复杂的语句,你可以做的像优化器一样,分解它成一个个单表访问语句。 对于涉及的每张表,简单的执行一个或多个查询去count和检查被你的SQL使用的过滤条件返回的数据。正如之前讨论的,不断的考虑“分而治之”。将一个 语句拆分成小块,将帮助你理解如何用最有效的方式组合它们成为最佳结果。

1
2
分享到:
评论

相关推荐

    Pro Oracle SQL-成为SQL语言编写专家

    ### Pro Oracle SQL - 成为SQL语言编写专家 #### 核心概念回顾与SQL语言能力介绍 本书《Pro Oracle SQL》由Karen Morton、Kerry Osborne、Robyn Sands、Riyaj Shamsudeen 和 Jared Still 共同撰写,旨在帮助读者...

    flink-sql-connector-oracle-cdc-2.5-SNAPSHOT.jar

    flink-sql-connector-oracle-cdc 2.5-SNAPSHOT

    c3p0-oracle-thin-extras-0.9.5.5.jar

    c3p0-0.9.5.5.bin.tgz的lib包,含有此c3p0-oracle-thin-extras-0.9.5.5.jar文件。

    pro oracle sql pdf

    综上,文件提供的信息主要涵盖了关于“Pro Oracle SQL”这本书的出版细节、版权信息和内容摘要,通过这些信息我们可以推断出该书将为读者提供深入的Oracle SQL知识,并针对Oracle数据库的特定特性进行详细讲解。

    Oracle-SQL-Developer-使用教程

    Oracle-SQL-Developer-使用教程

    Pro Oracle SQL

    Readers should already know the basic four SQL statements, and be ready to learn deeply about Oracle’s specific implementation of the language, including Oracle-specific features and syntax....

    Pro Oracle SQL (2010)

    ### Pro Oracle SQL (2010) 知识点概览 #### 一、书籍简介与作者背景 《Pro Oracle SQL》是一本于2010年出版的专业性极强的技术书籍,由Karen Morton、Kerry Osborne、Robyn Sands、Riyaj Shamsudeen 和 Jared ...

    《Pro Oracle SQL》Chapter10 -- 10.2 Optimizing SQL -10.2.1Testing Execution Plans

    《Pro Oracle SQL》一书的第10章深入探讨了SQL优化,特别是10.2节,重点关注如何测试执行计划,这是SQL性能调优的关键环节。在这个部分,作者旨在帮助读者理解如何有效地评估和改进SQL查询的性能,以提高数据库系统...

    oracle-pl-sql-programming-5th-edition

    Optimize PL/SQL performance with the aid of a brand-new chapter in the fifth edition Explore datatypes, conditional and sequential control statements, loops, exception handling, security features, ...

    flink-sql-connector-oracle-cdc-2.3.0.jar

    flinkcdc oracle 2.3.0

    SQL-SERVER-64位配置ORACLE连接-中文乱码问题

    ### SQL-SERVER-64位配置ORACLE连接-中文乱码问题 在IT行业中,不同数据库之间的连接配置是一项常见的任务,特别是在需要实现跨平台数据交换的场景下。本文将详细介绍如何解决64位系统下的SQL Server连接Oracle...

    oracle-SQL-note.rar_oracle

    本文档“oracle-SQL-note.rar_oracle”显然是一份关于Oracle SQL的练习集,旨在帮助用户深入理解和熟练掌握SQL的基本语法和用法。 首先,SQL(Structured Query Language,结构化查询语言)是所有关系型数据库管理...

    oracle-sql-the-essential-reference

    - **Datatypes**(数据类型):详细说明了Oracle SQL支持的各种数据类型,包括数值型、字符型、日期时间型等,并解释了每种数据类型的适用场景。 ##### 2. **数据操作** - **Data Conversion**(数据转换):探讨...

    mybatis-sql-dialect

    通过使用SQL方言包,MyBatis能够更好地适应各种数据库,如MySQL、Oracle和DB2,使得在切换数据库时无需对SQL语句进行大量修改。 1. **MyBatis框架概述** MyBatis是一个轻量级的ORM(对象关系映射)框架,它消除了...

    《Pro Oracle SQL》 读书笔记--Chapter 6--6.2 Execution Plans--之四

    《Pro Oracle SQL》是Oracle数据库查询优化的经典之作,第六章主要聚焦在Execution Plans(执行计划)上,这是数据库查询性能优化的关键。本章节的第四部分深入探讨了如何理解和解析执行计划,以及它对SQL性能的影响...

    oracleasm-support-2.1.8-1.el6.x86_64.rpm

    oracleasm-support-2.1.8-1.el6.x86_64.rpm

    Oracle Database 12C-SQL-Chp5

    Oracle Database 12C-SQL-Chp5

    Oracle-PL/SQL-windows-32位-客户端

    这个压缩包“Oracle-PL/SQL-windows-32位-客户端”包含了Oracle数据库32位客户端所需的组件,主要用于在Windows环境下进行数据库管理和开发工作。 1. **Oracle Instant Client**: `instantclient_11_2`是Oracle ...

    Pro Oracle SQL 无水印pdf

    Pro Oracle SQL 英文无水印pdf pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除

Global site tag (gtag.js) - Google Analytics