NTILE
(page 263)
The NTILE function divides an ordered set of rows in a data partition, groups them in to
buckets
, and
assigns a unique group number to each group.
This function is useful in statistical analysis. For example,
if you want to remove the outliers (values that are outside the norm), you can group them in the top or
bottom buckets and eliminate those values from the statistical analysis. Oracle Database statistics
collection packages also use NTILE functions to calculate histogram boundaries. In statistical
terminology, the NTILE function creates equi-width histograms.
NTILE函数在一数据分区中划分一有序行集,将他们分组成
桶
,再赋予每一组唯一的组号。该函数在统计分析中很有用。
例如,如果你想要去除极端值(超出正常范围的值),你能把它们归组到头或者末位桶,再从统计分析的数据中把它们清除掉。Oracle数据库统计集合包也使用NTILE函数计算柱状图边界。用统计术语说,NTILE函数创建等宽柱状图。
The number of buckets is passed as the argument to this analytic function. For example, ntile(100)
will group the rows into 100 buckets, assigning an unique number for each bucket. This function does
not support windowing clauses, however.
桶的数量作为参数传给该分析函数。例如,ntile(100)将把行集分成100桶,赋予每桶唯一的数。然而,该函数不支持开窗子句。
In the Listing 8-18, you split a data partition into 10 buckets using the clause ntile (10). Rows are
sorted by the Sale column in the descending order. The NTILE function groups rows into buckets with
each bucket containing equal number of rows. Since the rows are sorted by the Sale column values in
descending order, rows with lower group numbers have higher Sale column value. Outliers in the data
can be easily removed with this technique.
在列表8-18中,你使用子句ntile(10)划分数据区成10桶。行集按Sale列降序排列。NTILE函数将行集归组成桶,使得各桶包含相等数量的行。由于行集按Sale列降序排列,低组号的行集对应的Sale列值高。使用这种技术数据区中的极端值容易被去除。
There may be a row count difference of at most 1 between the buckets if the rows can not be divided
equally.
In this example, rows for the year= 2001 is divided in to 10 buckets, each bucket having 5 rows,
but the last bucket 10 has only 4 rows.
如果行集不能均匀划分可能使得桶之间的行数最多相差1。
在本例中,year=2001的行集划分成10桶,每桶有5行,但是最后第10桶只有4行。
Listing 8-18. NTILE Function
1 select year, week,sale,
2 ntile (10) over(
3 partition by product, country, region , year
4 order by sale desc
5 ) group#
6 from sales_fact
7* where country in ('Australia') and product ='Xtend Memory'
YEAR WEEK SALE GROUP#
----- ---- ---------- ----------
2001 16 278.44 1
2001 4 256.70 1
2001 21 233.70 1
2001 48 182.96 1
2001 14 162.91 1
...
2001 52 23.14 9
2001 50 23.14 10
2001 6 22.44 10
2001 23 22.38 10
2001 18 22.37 10
The NTILE function is useful in real world applications such as dividing total work among N parallel
processes. Let’s say you have ten parallel processes; you can divide the total work into 10 buckets and
assign each bucket to a process.
NTILE 函数在现实应用中很有用,诸如划分整体工作成N个并发的进程。我们说你有十个并发的进程;你可划分整个工作成10个桶然后赋予每个桶一个进程。
Stddev
The stddev function can be used to calculate standard deviation among a set of rows in a data partition or in the result set if no partitioning clause is specified.
This function calculates the standard deviation, defined as square root of variance,
for a data partition specified using a partitioning clause. If
partitioning clause is not specified, this function calculates the stddev for all rows in the result set.
stddev能被用于
计算
在一数据分区,或者若没有指定分区子句则在结果集,的行集中的标准差。
对于一使用分区子句指定的数据分区,该函数计算标准差,定义平方根的变化
。如果分区子句没有指定,该函数计算结果集所有行的标准差。
In the Listing 8-19, the clause stddev (sale) is calculating the stddev on Sale column among the
rows in a data partition. Partitioning clause partition by product, country, region, year specifies
the partitioning columns. The windowing clause rows between unbounded preceding and unbounded
following specifies the window as all rows in that data partition. Essentially, this SQL is calculating the
standard deviation on Sale column amongst all rows in a data partition.
在列表8-19中,子句stdev(sale)计算在一数据分区行集中Sale列的标准差。分区子句 partition by product,
country, region, year 指定分区列。开窗子句 rows between unbounded preceding and
unbounded following指定那个数据分区的所有行为窗口。本质上,这条SQL计算在数据分区上所有行间Sale列的标准差。
Standard deviation can be calculated at coarser
or granular
level by specifying appropriate
partition-by clause and windowing clause.
能够通过指定适当的分区子句和开窗子句在粗粒度
或细粒度
层上计算标准差。
Listing 8-19. STDDEV Function
1 select year, week,sale,
2 stddev (sale) over(
3 partition by product, country, region , year
4 order by Sale desc
5 rows between unbounded preceding and unbounded following
6 ) stddv
7 from sales_fact
8 where country in ('Australia') and product ='Xtend Memory'
9* order by year, week
YEAR WEEK SALE STDDV
----- ---- ---------- ----------
...
2000 50 21.19 49.8657423
2000 52 67.45 49.8657423
2001 1 92.26 59.1063592
2001 2 118.38 59.1063592
2001 3 47.24 59.1063592
...
There are various other statistics functions that can be used to calculate statistical metrics; for
example, stddev_samp calculates the cumulative sample standard deviation
, stddev_pop calculates the
population standard deviation
, etc. Detailed discussion about various statistics functions is out of the
scope of this book, however.
有多种其他的统计函数能用来计算统计标准;例如stddev_samp计算
累积样本标准差
,stddev_pop计算总体标准差
,等等。然而,关于
各种统计函数的
详细讨论超出了本书的范围。
Listagg
Oracle Database version 11gR2 introduced another analytic function, the Listagg function, which is very useful in string manipulation. This analytic function provides the ability to convert column values from
multiple rows in to a list format.
For example, if you want to concatenate all the employee names in a
department, then you can use this function to concatenate all names in to a list.
Oracle数据库版本11gR2
引入了另一个分析函数,Listagg函数,在字符串操作上十分有用。该分析函数提供把多行列值转换成一格式化列表的能力。
例如,如果你想连结某部门的所有员工名字,你就能用这个函数吧所有名字连成一列表。
Syntax for this function is of the format: 该函数的语法格式:
Listagg (string, separator ) within group (order-by-clause)
Over (partition-by-clause )
Syntax for the Listagg function uses the clause within group (order-by-clause) to specify sorting
order. This clause is similar to order-by clause in other analytic functions. The first argument to this
function is the string or column name to concatenate. The second argument is the separator for the
values.
In the Listing 8-20, the partitioning clause is not specified and rows are ordered by the Country
column in the descending order. The output shows that country names are converted to a list separated
by comma.
Listagg 函数使用子句
within group (order-by-clause) 定义排序顺序。该子句类似于
其他分析函数中的order-by子句。该函数的第一个参数是要连结的字符串或者列名
。第二个参数是这些值的分隔符。
在列表8-20中,分区子句没有指定且行按照Country列降序排列。输出展示,国家名称被转换成由逗号分隔的列表了。
Note that Listagg function does not support windowing clauses.
注意Listagg函数也不支持开窗子句。
Listing 8-20. LISTAGG Function
1 select listagg (country, ',')
2 within group (order by country desc)
3 from (
4 select distinct country from sales_fact
5 order by country
6* )
LISTAGG(COUNTRY,',')WITHINGROUP(ORDERBYCOUNTRYDESC)
--------------------------------------------------------------
United States of America,United Kingdom,Turkey,Spain,Singapore,
Saudi Arabia,Poland,New Zealand, Japan,Italy,Germany,France,
Denmark,China,Canada,Brazil,Australia,Argentina
分享到:
相关推荐
Pro Oracle SQL unlocks the power of SQL in the Oracle Database—one of the most potent SQL implementations on the market today. To master it requires a three-pronged approach: learn the language ...
Advanced-SQL-MYSQL-for-Analytic-Business-Intelligence
Aster Data SQL and MapReduce by Tom Coffing, John Nolan Publisher: Coffing Publishing Published: Feb, 2014 ...Chapter 10 – Aster Windows Functions 122 Chapter 11 – SQL-MapReduce 159
Take advantage of advanced Oracle SQL features such as support for hierarchical queries, analytic and window functions, flashback to a given point in time, and more Get a handle on Oracle Database’s...
Oracle Hyperion Essbase - Storing Analytic Data - 培训资料 Describe Essbase data storage options Explain the concepts of dense and sparse dimensions Describe data blocks and index structure View the ...
首先需要获取AnalyticDB PostgreSQL版连接信息 获得AnalyticDB连接串. 可以加入下方钉钉群1元试用. 运行demo cd $PROJ_DIR pip install -r requirements.txt cp config/config_template config/config.yml # 将链接...
中文名: SQL袖珍参考手册(第3版,涵盖DB2,MySQL,Oracle,PostgreSQL,SQL Server) 原名: SQL Pocket Guide: A Guide to SQL Usage 作者: Jonathan Gennick 资源格式: PDF 版本: 英文文字版/更新源代码 出版社: O'...
serving informational or analytic needs. ■■ The supporting technology for operational processing is fundamentally different from the technology used to support informational or analytical needs. ■...
大规模对数分析和 DRS-DPM 使用 Logstash 和 MongoDB 进行大规模日志分析第 1 部分:从数据中心的 VM 收集实时日志。 这些日志使用 Logstash 结构化、解析并转发到 MongoDB。 使用谷歌图表和 Tableau 可视化的日志。...
42-董永鹏-GEM-Tree_Tree-Based_Analytic_Geometrical_Multi-Dim1
Oracle JDK >= 1.7.x Apache Maven >= 3.0.5 克隆这个 repo 并作为一个现有的 Maven 项目导入到 Eclipse IDE 或 IntelliJ IDEA。 此应用程序使用使生活变得简单,同时使用集合和其他通用内容。 此应用程序还...
# skipbranch............1 = Skip 'branch of' state objects cause by SQL*NET # loopback sessions etc (default) # 0 = don't skip 'branch of' transactions # seqinfo...............1 = Output sequence ...
藏经阁-StreamSets and Spark_ Analytic.pdf
Oracle Hyperion Essbase - Essbase介绍 - 培训资料 At the end of this lesson, you should be able to: Describe the Hyperion analytic solution List the key features of the Essbase products Describe the ...
程序员考试刷题 分析计算旧金山大学的 MSDS 501 秋季 2020 课程说明 本课程是该课程的一部分,专门设计用于促进那些还不是熟练程序员的人提高计算机编程技能。 本课程将侧重于与数据科学直接相关的应用计算机编程...
藏经阁-StreamSets and Spark_Analytic Insights In Retail.pdf
java采购供应平台源码phonegap-ios-google-analytics Phonegap 的分析插件 分析客户端允许您向 Google Analytics 发送页面浏览量、事件、时间。 它还允许您设置已在此帐户上设置的自定义维度(见下文)。...
藏经阁-AnalyticDB基础版:云原生My SQL 敏捷数仓.pdf
权限:Dynamic SQL Analytic Privilege in SAP HANA 权限:Dynamic SQL Analytic Privilege in SAP HANA