`

《Pro Oracle SQL》Chapter 8 -- 8.7 Other Analytic Functions 之二

阅读更多
Ratio_to_report         (page 259)
    The analytic function ratio_to_report calculates the ratio of a value to the sum of values in the data
partition. If the partitioning clause is not specified, this function calculates the ratio of a value to the sum
values in the whole result set.
This analytic function is very useful in calculating ratios at various levels
without a need for self-joins.
    分析函数ratio_to_report计算在数据分区中对汇总的比率值。如果分区子句没有指定,函数在整个结果集上计算对汇总的比率值。 该分析函数对计算各个层次上的比率,而不需要自连接,是很有用的。
    Ratio_to_report is useful in computing the percentage of a value compared to the total value in a
report.
For example, consider a sales report of a product in a retail chain. Each outlet in the retail chain
contributed to the total sum of sales computed for that product
, and knowing what percentage of sales is generated from an outlet is quite useful for market trend analysis. Ratio_to_report allows you to
compute the percentage easily. Further, this ratio can be calculated at various levels such as district,
region, and country. Essentially, data can be diced and sliced at various ways for market trend analysis.
    Ratio_to_report 在报告中计算某数值相对于总数的百分比是有用的。 例如,考虑一零售链上某产品的销售报告。在零售链中的每一分店某款产品的销售额汇总成了那款产品的总销售金额 ,知道某一分店的销售额百分比,对于市场趋势分析是非常有用的。Ratio_to_report使得你计算百分比非常容易。再者,这个比率能在各种层次上计算,诸如district(地区),region(区域), and country(国家). 本质上,数据可以用各种方式切割 以利于市场趋势分析。
    In the Listing 8-14, the SQL statement is computing two ratios: Sales_ratio_yr is computed at
product, country, region, and year level and the ratio sales_ratio_prod is computed at the product,
country, and region level. Ratio_to_report function returns a ratio and it is multipled by 100 to compute
percentage.
    在列表8-14,SQL语句计算两个比率值:Sales_ratio_yr 计算product, country, region,和 year 层的,sales_ratio_prod计算product, country, 和region 层次的。Ratio_to_report函数返回比率值再乘以100得到百分比。
    The ratio_to_report(sale) over( partition by product, country, region, year) clause
calculates the ratio of Sale column value to the sum of Sale column values in a data partition, partitioned
by the columns Product, Country, Region, and Year. The next clause ratio_to_report(sale) over(
partition by product, country, region) is different as the Year column is not included in the
partitioning columns. So, the ratio is calculated for all years.
    ratio_to_report(sale) over( partition by product, country, region, year) 子句在数据分区中计算Sale列值对汇总Sale列值的比率,按列Product, Country, Region, 和Year分区。下一个子句 ratio_to_report(sale) over( partition by product, country, region) 的区别是Year列没有包含在数据分区中。这样计算的是在所有年份中的比率。
    The ratio_to_report function will return a null value if the expression or column specified in the
function returns null values. But other null values in the data partition will be handled as either zero
values or empty strings, similar to aggregation functions.
    如果在函数中指定的表达式或列返回null值则 ratio_to_report 函数将返回null值。但是在数据分区中的其他的null值将被处理成要么0值要么空串,类似于聚合函数。
 
Listing 8-14. ratio_to_report Function
1 select year, week,sale,
2 trunc(100*
3         ratio_to_report(sale) over(partition by product, country, region ,year)
4         ,2) sales_yr,
5 trunc(100*
6         ratio_to_report(sale) over(partition by product, country, region)
7         ,2) sales_prod
8 from sales_fact
9 where country in ('Australia') and product ='Xtend Memory'
10 order by product, country,year, week
/
YEAR      WEEK         SALE               SALES_YR         SALES_PROD
----          ----         ----------       ----------               ----------
2000         1              46.70           1.31                       .35
2000         3              93.41           2.63                       .7
2000         4              46.54           1.31                        .35
2000         5               46.70          1.31                        .35
2000         7               70.80               2                        .53
2000         8               46.54           1.31                      .35
...

Percent_rank

    The  percent_rank function returns the rank of a value in a data partition, expressed as a fraction between 0 and 1.  Percent_rank is calculated as (rank -1)/(N-1) whereas N is the number of elements in the data partition if the partitioning clause is specified or the total number of rows in the result set if the
partitioning clause is not specified.
The percent_rank function is useful to compute the relative standing
of a value in a result set, as a percentile. 
    percent_rank 返回数据分区中的秩(rank)值,用一个0到1之间分数表示。percent_rank的计算公式是(rank -1)/(N-1),这里的N,如果分区子句指定了则是数据分区中的元素数量,若没有指定分区子句则是结果集中的所有行数。 percent_rank函数 在计算一个结果集中的值的相对位置时是有用的,用百分率表示。
    This relative rank can be calculated relative to a partition or the whole result set. For example,
computing the sales percentile of a retail outlet in a district or region helps find the top performing
outlets or the worst performing outlet. 
    相对秩能相对于分区或整个结果集来计算。例如,计算在一地区或区域某零售商店的销售百分比有助于找出业绩最佳和最差的商店。
    In Listing 8-15, you calculate the top fifty sale percentile by year using the  percent_rank function.
The clause percent_rank() over(partition by product, country, region , year order by sale desc)
calculates the percent rank of the Sale column in a data partition defined by the partitioning columns
Product, Country, Region, and Year. Rows are ordered by the Sale column in descending order. Function
output is multiplied by 100 to compute percentile. 
    在列表8-15中,你使用percent_rank函数按年计算前50名销售额的百分比值。子句percent_rank() over(partition by product, country, region , year order by sale desc) 计算分区列所定义数据分区中Sale列的百分比秩。
Listing 8-15.  percent_rank Function 
 1  select * from (
 2  select  year, week,sale,
 3     100 * percent_rank() over(
 4            partition by product, country, region, year
 5            order by sale desc
 6            ) pr
 7    from sales_fact
 8    where country in ('Australia')  and product ='Xtend Memory'
 9  ) where pr <50
10* order by year, sale desc
 
 YEAR      WEEK         SALE             PR
-----       ----           ----------    -------
2001        16              278.44        .00
2001         4               256.70        2.27
2001         21             233.70        4.55
2001         48             182.96        6.82
...

Percentile_cont

    The  percentile_cont function is useful to compute the interpolated values, such as the median
household income per region or city.
The percentile_cont function takes a probability value between 0
and 1 and returns an interpolated percentile value that would equal the  percent_rank value with respect
to the sort specification. In fact,  percentile_cont function performs inverse of percent_rank function
and it is easier to understand the percentile_cont function in conjunction with the output of
percent_rank function.  

    percentile_cont函数用于计算内插值,诸如每一区域或城市家庭收入的中位数。percentile_cont函数接收一个0到1的概率值, 返回一内插的百分比值,它等于percent_rank相对于排序规范的值。事实上,percentile_cont函数是 percent_rank函数的倒数,结合percent_rank函数的输出就容易理解percentile_cont函数了。
    The  percentile_cont  function retrieves the column value matching (or interpolated) with the
percent_rank of the argument. For example, the clause percentile_cont(0.25) retrieves the value that
has  percent_rank of 0.25, assuming matching sort order for these two functions. Another example is
computing the median household income in a city or region. The median value will have a percent_rank
of 0.5 by the definition of  median value.
The clause  percentile_cont(0.5)  will return the median value
as the  percentile_cont function is calculating the value with a  percent_rank of 0.5. In fact, median
function is a specific case of the  percentile_cont function with a default value of 0.5.

    percentile_cont函数检索匹配(或者差值替换)percent_rank参数的列值。例如,子句 percentile_cont(0.25) 检索percent_rank为0.25的值,假定这两个函数的排序顺序一致。另一例子是计算某一城市或区域家庭收入的中位数。依照中位数值的定义中位数对应的percent_rank值是0.5。 子句 percentile_cont(0.5) 将返回中位数,因为percentile_cont函数计算的是percent_rank为0.5的值。事实上,median函数是 percentile_cout 函数默认值为0.5的特例。
    Nulls are ignored by the function. This function does not support windowing clauses either.
    percentile_cont函数忽略Null。该函数也不支持开窗子句。
Syntax for the percentile_cont function is:      percentile_cont 函数的语法是:
Percentile_cont(expr) within group (sort-clause) 
     over (partition-clause order-by-clause)

 
    The syntax for the percentile_cont function is slightly different from the analytic functions
discussed so far. A new clause  within group (order by sale desc) replaces the  order-by  clause, and it
is functionally same as specifying an  order-by clause.
In Listing 8-16, the clause  percentile_cont (0.5)
within group (order by sale desc) over( partition by product, country, region , year)  is calling
the  percentile_cont function and passing a probability value of 0.5. Sort order is defined by the clause
within group (order by sale desc) . The  partition-by clause  over( partition by product, country,
region , year) is specifying the partitioning columns. 
    percentile_cont函数的句法不同于现在为止所讨论的那些分析函数。新子句group (order by sale desc)代替了order-by子句,且它的功能和order-by子句一样。 列表8-16中,子句percentile_cont (0.5) within group (order by sale desc) over( partition by product, country, region , year) 调用percentile_cont函数传入概率值0.5。排序顺序由子句group (order by sale desc) 定义。分区(partition-by)子句 over( partition by product, country, region , year) 指定分区列。
    Listing 8-16 shows the output of  percent_rank in a side-by-side comparison to that from
percentile_cont , with a similar  partition-by clause and  order-by clause. Notice that for the column
values year=2001 and week=5, the Sale column value is 93.44 and the  percent_rank of that value is 0.5.
Essentially, value of 93.44 is occurring with a percent_rank of 0.5 in the descending order of Sale column
values in that data partition. In a nutshell, the value of 93.44 is a median value and thus the  percent_rank is 0.5. Hence the  percent_rank function with an argument of 0.5 returns a value of 93.44.
    列表8-16展示了,带有相似分区(partition-by)子句和排序(order-by)子句的,percent_rank一对一 percentile_cont 的输出。注意列值year=2001 and week=5,Sale列值是93.44,percent_rank值是0.5 。实质上,值93.44出现在那个数据分区降序排列的Sale列值的percent_rank为0.5处。简而言之,值93.44是中位数且 percent_rank是0.5。因此percent_rank函数带有参数0.5返回的是值93.44 。
    Further, note that the output row for the column values with year=2000. There is no Sale column
value with a  percent_rank matching exactly to 0.5 in the data partition. If there is no value matching
exactly, then the  percentile_cont function computes an interpolated value using the nearest values.

Note that there is a row in that data partition with a percent_rank of 0.48 for the Sale column value of
79.36 and the next row in that sort order has a percent_rank of 0.51 for the Sale column value of 78.82.
Since the specified  percent_rank of 0.5 is between 0.48 and 0.51, the percentile_cont function
interpolated these two corresponding Sale column values 79.36 and 78.82 and calculated
percentile_cont (0.5) as 79.09, an average of those two Sale column values. Values are averaged as this
function assumes continuous distribution.

    还有,注意列值为year=2000所在输出行。在数据分区中没有percent_rank准确匹配0.5的Sale列值。如果没有准确的匹配值,则 percentile_cout函数计算一个最接近的内插值。 注意在数据分区中有一行的percent_rank值是0.48和Sale列值为79.36 且顺序的下一行的percent_rank值是0.51和Sale列值是78.82。因此指定的percent_rank值0.5介于0.48和0.51 之间,percentile_cont函数内插于这两者对应的Sale列值79.36和78.82中且计算percentile_cont (0.5) 是79.09,这两个Sale列值的平均值。因为函数假定是连续分布所以取的平均值。
    Notice that output rows are not sorted in the Listing 8-16. Reason is that, even though there is an
order-by clause specified in the analytic function specification (line 3 and line 7), there is no order-by
clause in the main body of the query.
Should you need rows to be sorted, you need to specify sorting
order explicitly in the main body of the query also.

    注意在列表8-16中输出行没有排序。因为,即使在分析函数规范中指定order-by子句(行3和行7),在查询的主体中没有order-by子句。如果你需要行有序,就需要在查询主体中明确的指定排序顺序。

Listing 8-16.  The percentile_cont Function
  1  select  year, week,sale,
  2     percentile_cont (0.5)  within group
  3       (order by sale desc)
  4       over( partition by product, country, region , year ) pc,
  5     percent_rank () over (
  6            partition by product, country, region , year
  7            order by sale desc ) pr
  8    from sales_fact
  9*   where country in ('Australia')  and product ='Xtend Memory'
 
YEAR    WEEK    SALE         PC             PR
----          ----       ----------      ----------      ----------
...
2001      27         94.48          93.44        .454545455
2001      46         93.58          93.44        .477272727
2001        5         93.44          93.44               .5
2001      37         93.16          93.44         .522727273
2001        9         92.67          93.44         .545454545
...
2000      40          89.56          79.09         .435897436
2000      28          88.96          79.09         .461538462
2000      38          79.36          79.09         .487179487
2000      35          78.82          79.09         .512820513
2000        7          70.80          79.09         .538461538
2000      15          70.47          79.09         .564102564
...


Percentile_disc
    The percentile_disc function is functionally similar to percentile_cont except that the percentile_cont function uses a continuous distribution model and the percentile_disc function assumes a discrete distribution model. As discussed in the earlier section, when there is no value matching exactly with the specified percent_rank, then the percentile_cont(0.5) computes an average of two nearest values. In contrast, the percentile_disc function retrieves the value with a percent_rank just greater than the passed argument, in the case of ascending order. In the case of descending order, the percentile_cont function retrieves the value that has a percent_rank just smaller than the passed argument.
    percentile_dise函数在功能上相似于percentile_cont函数,只不过percentile_cont函数使用连续分布模型而 percentile_disc函数假定离散分布模型。 正如前面一节讨论的,当没有值准确的匹配指定的percent_rank,则 percentile_cont(0.5)计算两个最接近的平均值。相比而言,若升序percentile_disc函数检索其percent_rank 大于传入参数的值。若降序,percentile_disc函数检索其percent_rank小于传入参数的值。
    In Listing 8-17, the percentile_cont function is replaced by two calls to the percentile_disc function. The first call to the function starting in line 2 specifies decending sort order, and the next call in line 4 specifies no sort order, so it defaults to ascending sort order. In both calls to the percentile_disc function, and argument of 0.5 is passed. as there is no row with a percent_rank of 0.5, the percentile_disc function with the descending sort order specification returns a value of 79.36, as this value has a percent_rank of 0.48 just below the specified argument 0.5. For the ascending order, this function return a value of 78.82 as this value has a percent_rank of 0.51 ---- just above 0.5 .
    在列表8-17中,percentile_cont 函数被两次percentile_disc函数的调用所取代。 第一次函数调用开始于第2行,指定为降序。下一次调用在第4行没有指定排序顺序,所以取默认 值升序排列。两次percentile_disc函数调用,传入的参数都是0.5. 因为没有任何行的percent_rank为0.5,指定降序排列的percentile_disc函数返回值79.36,因为这个值的 percent_rank是0.48低于指定的参数0.5。对于升序,函数返回值78.82该值的percent_rank是0.51 ---- 正好高于0.5  。
Listing 8-17. The percentile_disc Function
1     select  year, week, sale,
2     percentile_disc(0.5)    within group( order by sale desc )
3            over( partition by product, country, region, year) pd_desc,
4     percentile_disc(0.5)    within group( order by sale )
5            over( partition by product, country, region, year) pd_asc,
6     percent_rank() over(
7              partition by product, country, region, year
8               order by sale desc ) pr
9       from sales_fact
10*    where country in ('Australia') and  product = 'Xtend  Memory'
YEAR   WEEK       SALE       PD_DESC     PD_ASC         PR
----        ----          ----------    ----------       ----------          ----------
2000     3            93.41        79.36           78.82            .41025641
2000     40          89.56        79.36           78.82            .435897436
2000     28          88.96        79.36           78.82            .461538462
2000     38          79.36        79.36           78.82            .487179487
2000     35          78.82        79.36           78.82            .512820513
2000      7          70.80         79.36           78.82            .538461538
2000     15         70.47         79.36           78.82            .564102564
2000     45         67.62         79.36           78.82            .58974359
2000     52         67.45         79.36           78.82            .615384615

1
1
分享到:
评论

相关推荐

    Pro Oracle SQL

    Pro Oracle SQL unlocks the power of SQL in the Oracle Database—one of the most potent SQL implementations on the market today. To master it requires a three-pronged approach: learn the language ...

    Advanced-SQL-MYSQL-for-Analytic-Business-Intelligence

    本课程"Advanced-SQL-MYSQL-for-Analytic-Business-Intelligence"深入探讨了如何利用高级SQL技术与MySQL的强大功能来提升数据分析能力,以支持更高效、更精准的商业决策。 首先,课程会介绍SQL的基本语法,包括数据...

    SQL reference manual -- vertica

    根据给定的文件信息,我们可以深入探讨与Vertica数据库及其SQL参考手册相关的多个关键知识点。 ### SQL Reference Manual -- Vertica #### 1. **Vertica® Analytic Database 3.5 Preview** - **版本**: 此处提及...

    oracle sql program

    二、Oracle SQL高级特性 1. 分区(Partitioning):Oracle支持基于不同方式(如范围、列表、哈希等)的表分区,以提高查询性能和管理大规模数据。 2. 索引(Indexes):索引能加快查询速度,Oracle支持B树、位图、...

    Oracle Hyperion Essbase - Storing Analytic Data - 培训资料

    ### Oracle Hyperion Essbase - 存储分析数据 #### 一、Essbase 数据存储选项 在 Oracle Hyperion Essbase 中,提供了两种主要的数据存储选项:块存储(Block Storage)和聚合存储(Aggregate Storage)。这两种...

    129255-oracle_timesten_in-memory_database

    - **标准接口**:支持通过 ODBC、JDBC、ODP.NET、OCI 和 Pro*C/C++ 等多种方式访问 SQL 和 PL/SQL。 ### 关键优势 #### 实时性能 (Real-Time Performance) - **改变数据存储位置的假设**:与传统基于磁盘的数据库...

    oracle高级sql讲座

    4. **窗口函数(Analytic Functions)**:Oracle的窗口函数允许你在结果集的特定“窗口”内执行计算,如RANK()、ROW_NUMBER()、LAG()、LEAD()等,它们在数据分析和排名问题中非常有用。 5. **集合操作(UNION、...

    ORACLE SQL性能調整

    ### ORACLE SQL性能调整 #### 1. 选用适合的ORACLE优化器 ORACLE提供了三种优化器选项:基于规则的优化器(RULE)、基于成本的优化器(COST)和选择性优化器(CHOOSE)。默认情况下,ORACLE使用CHOOSE优化器,其...

    Oracle中的分析函数详解

    分析函数(Analytic Functions)是Oracle SQL中的高级特性,它们在数据集上执行计算,并返回基于分组或排序的数据结果。与聚合函数(如SUM, AVG, COUNT等)不同,分析函数可以在每个行级别上返回结果,而不只是返回...

    alibabacloud-AnalyticDB-python-demo-AI:阿里云AnalyticDB-python-demo-AI

    首先需要获取AnalyticDB PostgreSQL版连接信息 获得AnalyticDB连接串. 可以加入下方钉钉群1元试用. 运行demo cd $PROJ_DIR pip install -r requirements.txt cp config/config_template config/config.yml # 将链接...

    Microsoft SQL Server 2008 - T-SQL Querying

    serving informational or analytic needs. ■■ The supporting technology for operational processing is fundamentally different from the technology used to support informational or analytical needs. ■...

    Large-Scale-Log-Analytic-and-DRS-DPM:使用 Logstash 和 MongoDB 进行大规模日志分析

    大规模对数分析和 DRS-DPM 使用 Logstash 和 MongoDB 进行大规模日志分析第 1 部分:从数据中心的 VM 收集实时日志。 这些日志使用 Logstash 结构化、解析并转发到 MongoDB。 使用谷歌图表和 Tableau 可视化的日志。...

    Oracle SQL优化

    Oracle SQL优化是一个重要的数据库管理技能,对于提升数据库性能和系统响应速度至关重要。本文将深入探讨Oracle SQL优化的几个关键方面,以帮助初学者理解和实践。 首先,了解Oracle的优化器是优化过程的基础。...

    real-time-sentiment-analytic

    Oracle JDK &gt;= 1.7.x Apache Maven &gt;= 3.0.5 克隆这个 repo 并作为一个现有的 Maven 项目导入到 Eclipse IDE 或 IntelliJ IDEA。 此应用程序使用使生活变得简单,同时使用集合和其他通用内容。 此应用程序还...

    藏经阁-StreamSets and Spark_ Analytic.pdf

    藏经阁-StreamSets and Spark_ Analytic.pdf

    Oracle试题及答案

    - **分析函数(Analytic Functions)**:如ROW_NUMBER()、RANK()等,用于处理分组数据或执行复杂的数据排序和窗口运算。 - **分组函数(Grouping Functions)**:如GROUP_CONCAT,用于组合多行数据成一个字符串。 - *...

    程序员考试刷题-msds-computation-for-analytic:msds-计算分析

    程序员考试刷题 分析计算旧金山大学的 MSDS 501 秋季 2020 课程说明 本课程是该课程的一部分,专门设计用于促进那些还不是熟练程序员的人提高计算机编程技能。 本课程将侧重于与数据科学直接相关的应用计算机编程...

Global site tag (gtag.js) - Google Analytics