NTILE
(page 263)
The NTILE function divides an ordered set of rows in a data partition, groups them in to
buckets
, and
assigns a unique group number to each group.
This function is useful in statistical analysis. For example,
if you want to remove the outliers (values that are outside the norm), you can group them in the top or
bottom buckets and eliminate those values from the statistical analysis. Oracle Database statistics
collection packages also use NTILE functions to calculate histogram boundaries. In statistical
terminology, the NTILE function creates equi-width histograms.
NTILE函数在一数据分区中划分一有序行集,将他们分组成
桶
,再赋予每一组唯一的组号。该函数在统计分析中很有用。
例如,如果你想要去除极端值(超出正常范围的值),你能把它们归组到头或者末位桶,再从统计分析的数据中把它们清除掉。Oracle数据库统计集合包也使用NTILE函数计算柱状图边界。用统计术语说,NTILE函数创建等宽柱状图。
The number of buckets is passed as the argument to this analytic function. For example, ntile(100)
will group the rows into 100 buckets, assigning an unique number for each bucket. This function does
not support windowing clauses, however.
桶的数量作为参数传给该分析函数。例如,ntile(100)将把行集分成100桶,赋予每桶唯一的数。然而,该函数不支持开窗子句。
In the Listing 8-18, you split a data partition into 10 buckets using the clause ntile (10). Rows are
sorted by the Sale column in the descending order. The NTILE function groups rows into buckets with
each bucket containing equal number of rows. Since the rows are sorted by the Sale column values in
descending order, rows with lower group numbers have higher Sale column value. Outliers in the data
can be easily removed with this technique.
在列表8-18中,你使用子句ntile(10)划分数据区成10桶。行集按Sale列降序排列。NTILE函数将行集归组成桶,使得各桶包含相等数量的行。由于行集按Sale列降序排列,低组号的行集对应的Sale列值高。使用这种技术数据区中的极端值容易被去除。
There may be a row count difference of at most 1 between the buckets if the rows can not be divided
equally.
In this example, rows for the year= 2001 is divided in to 10 buckets, each bucket having 5 rows,
but the last bucket 10 has only 4 rows.
如果行集不能均匀划分可能使得桶之间的行数最多相差1。
在本例中,year=2001的行集划分成10桶,每桶有5行,但是最后第10桶只有4行。
Listing 8-18. NTILE Function
1 select year, week,sale,
2 ntile (10) over(
3 partition by product, country, region , year
4 order by sale desc
5 ) group#
6 from sales_fact
7* where country in ('Australia') and product ='Xtend Memory'
YEAR WEEK SALE GROUP#
----- ---- ---------- ----------
2001 16 278.44 1
2001 4 256.70 1
2001 21 233.70 1
2001 48 182.96 1
2001 14 162.91 1
...
2001 52 23.14 9
2001 50 23.14 10
2001 6 22.44 10
2001 23 22.38 10
2001 18 22.37 10
The NTILE function is useful in real world applications such as dividing total work among N parallel
processes. Let’s say you have ten parallel processes; you can divide the total work into 10 buckets and
assign each bucket to a process.
NTILE 函数在现实应用中很有用,诸如划分整体工作成N个并发的进程。我们说你有十个并发的进程;你可划分整个工作成10个桶然后赋予每个桶一个进程。
Stddev
The stddev function can be used to calculate standard deviation among a set of rows in a data partition or in the result set if no partitioning clause is specified.
This function calculates the standard deviation, defined as square root of variance,
for a data partition specified using a partitioning clause. If
partitioning clause is not specified, this function calculates the stddev for all rows in the result set.
stddev能被用于
计算
在一数据分区,或者若没有指定分区子句则在结果集,的行集中的标准差。
对于一使用分区子句指定的数据分区,该函数计算标准差,定义平方根的变化
。如果分区子句没有指定,该函数计算结果集所有行的标准差。
In the Listing 8-19, the clause stddev (sale) is calculating the stddev on Sale column among the
rows in a data partition. Partitioning clause partition by product, country, region, year specifies
the partitioning columns. The windowing clause rows between unbounded preceding and unbounded
following specifies the window as all rows in that data partition. Essentially, this SQL is calculating the
standard deviation on Sale column amongst all rows in a data partition.
在列表8-19中,子句stdev(sale)计算在一数据分区行集中Sale列的标准差。分区子句 partition by product,
country, region, year 指定分区列。开窗子句 rows between unbounded preceding and
unbounded following指定那个数据分区的所有行为窗口。本质上,这条SQL计算在数据分区上所有行间Sale列的标准差。
Standard deviation can be calculated at coarser
or granular
level by specifying appropriate
partition-by clause and windowing clause.
能够通过指定适当的分区子句和开窗子句在粗粒度
或细粒度
层上计算标准差。
Listing 8-19. STDDEV Function
1 select year, week,sale,
2 stddev (sale) over(
3 partition by product, country, region , year
4 order by Sale desc
5 rows between unbounded preceding and unbounded following
6 ) stddv
7 from sales_fact
8 where country in ('Australia') and product ='Xtend Memory'
9* order by year, week
YEAR WEEK SALE STDDV
----- ---- ---------- ----------
...
2000 50 21.19 49.8657423
2000 52 67.45 49.8657423
2001 1 92.26 59.1063592
2001 2 118.38 59.1063592
2001 3 47.24 59.1063592
...
There are various other statistics functions that can be used to calculate statistical metrics; for
example, stddev_samp calculates the cumulative sample standard deviation
, stddev_pop calculates the
population standard deviation
, etc. Detailed discussion about various statistics functions is out of the
scope of this book, however.
有多种其他的统计函数能用来计算统计标准;例如stddev_samp计算
累积样本标准差
,stddev_pop计算总体标准差
,等等。然而,关于
各种统计函数的
详细讨论超出了本书的范围。
Listagg
Oracle Database version 11gR2 introduced another analytic function, the Listagg function, which is very useful in string manipulation. This analytic function provides the ability to convert column values from
multiple rows in to a list format.
For example, if you want to concatenate all the employee names in a
department, then you can use this function to concatenate all names in to a list.
Oracle数据库版本11gR2
引入了另一个分析函数,Listagg函数,在字符串操作上十分有用。该分析函数提供把多行列值转换成一格式化列表的能力。
例如,如果你想连结某部门的所有员工名字,你就能用这个函数吧所有名字连成一列表。
Syntax for this function is of the format: 该函数的语法格式:
Listagg (string, separator ) within group (order-by-clause)
Over (partition-by-clause )
Syntax for the Listagg function uses the clause within group (order-by-clause) to specify sorting
order. This clause is similar to order-by clause in other analytic functions. The first argument to this
function is the string or column name to concatenate. The second argument is the separator for the
values.
In the Listing 8-20, the partitioning clause is not specified and rows are ordered by the Country
column in the descending order. The output shows that country names are converted to a list separated
by comma.
Listagg 函数使用子句
within group (order-by-clause) 定义排序顺序。该子句类似于
其他分析函数中的order-by子句。该函数的第一个参数是要连结的字符串或者列名
。第二个参数是这些值的分隔符。
在列表8-20中,分区子句没有指定且行按照Country列降序排列。输出展示,国家名称被转换成由逗号分隔的列表了。
Note that Listagg function does not support windowing clauses.
注意Listagg函数也不支持开窗子句。
Listing 8-20. LISTAGG Function
1 select listagg (country, ',')
2 within group (order by country desc)
3 from (
4 select distinct country from sales_fact
5 order by country
6* )
LISTAGG(COUNTRY,',')WITHINGROUP(ORDERBYCOUNTRYDESC)
--------------------------------------------------------------
United States of America,United Kingdom,Turkey,Spain,Singapore,
Saudi Arabia,Poland,New Zealand, Japan,Italy,Germany,France,
Denmark,China,Canada,Brazil,Australia,Argentina
分享到:
相关推荐
Pro Oracle SQL unlocks the power of SQL in the Oracle Database—one of the most potent SQL implementations on the market today. To master it requires a three-pronged approach: learn the language ...
本课程"Advanced-SQL-MYSQL-for-Analytic-Business-Intelligence"深入探讨了如何利用高级SQL技术与MySQL的强大功能来提升数据分析能力,以支持更高效、更精准的商业决策。 首先,课程会介绍SQL的基本语法,包括数据...
根据给定的文件信息,我们可以深入探讨与Vertica数据库及其SQL参考手册相关的多个关键知识点。 ### SQL Reference Manual -- Vertica #### 1. **Vertica® Analytic Database 3.5 Preview** - **版本**: 此处提及...
Oracle SQL是Oracle数据库管理系统中的结构化查询语言,用于管理和操作数据。它允许用户查询、更新、插入和删除数据库中的信息,以及创建和修改数据库结构。本篇将深入探讨Oracle SQL的相关知识点。 一、SQL基础 ...
### Oracle Hyperion Essbase - 存储分析数据 #### 一、Essbase 数据存储选项 在 Oracle Hyperion Essbase 中,提供了两种主要的数据存储选项:块存储(Block Storage)和聚合存储(Aggregate Storage)。这两种...
- **标准接口**:支持通过 ODBC、JDBC、ODP.NET、OCI 和 Pro*C/C++ 等多种方式访问 SQL 和 PL/SQL。 ### 关键优势 #### 实时性能 (Real-Time Performance) - **改变数据存储位置的假设**:与传统基于磁盘的数据库...
### ORACLE SQL性能调整 #### 1. 选用适合的ORACLE优化器 ORACLE提供了三种优化器选项:基于规则的优化器(RULE)、基于成本的优化器(COST)和选择性优化器(CHOOSE)。默认情况下,ORACLE使用CHOOSE优化器,其...
4. **窗口函数(Analytic Functions)**:Oracle的窗口函数允许你在结果集的特定“窗口”内执行计算,如RANK()、ROW_NUMBER()、LAG()、LEAD()等,它们在数据分析和排名问题中非常有用。 5. **集合操作(UNION、...
分析函数(Analytic Functions)是Oracle SQL中的高级特性,它们在数据集上执行计算,并返回基于分组或排序的数据结果。与聚合函数(如SUM, AVG, COUNT等)不同,分析函数可以在每个行级别上返回结果,而不只是返回...
首先需要获取AnalyticDB PostgreSQL版连接信息 获得AnalyticDB连接串. 可以加入下方钉钉群1元试用. 运行demo cd $PROJ_DIR pip install -r requirements.txt cp config/config_template config/config.yml # 将链接...
Oracle SQL优化是一个重要的数据库管理技能,对于提升数据库性能和系统响应速度至关重要。本文将深入探讨Oracle SQL优化的几个关键方面,以帮助初学者理解和实践。 首先,了解Oracle的优化器是优化过程的基础。...
serving informational or analytic needs. ■■ The supporting technology for operational processing is fundamentally different from the technology used to support informational or analytical needs. ■...
大规模对数分析和 DRS-DPM 使用 Logstash 和 MongoDB 进行大规模日志分析第 1 部分:从数据中心的 VM 收集实时日志。 这些日志使用 Logstash 结构化、解析并转发到 MongoDB。 使用谷歌图表和 Tableau 可视化的日志。...
Oracle JDK >= 1.7.x Apache Maven >= 3.0.5 克隆这个 repo 并作为一个现有的 Maven 项目导入到 Eclipse IDE 或 IntelliJ IDEA。 此应用程序使用使生活变得简单,同时使用集合和其他通用内容。 此应用程序还...
藏经阁-StreamSets and Spark_ Analytic.pdf
- **分析函数(Analytic Functions)**:如ROW_NUMBER()、RANK()等,用于处理分组数据或执行复杂的数据排序和窗口运算。 - **分组函数(Grouping Functions)**:如GROUP_CONCAT,用于组合多行数据成一个字符串。 - *...
程序员考试刷题 分析计算旧金山大学的 MSDS 501 秋季 2020 课程说明 本课程是该课程的一部分,专门设计用于促进那些还不是熟练程序员的人提高计算机编程技能。 本课程将侧重于与数据科学直接相关的应用计算机编程...