摘自《Data
Mining - Concepts and Techniques》
Measures can be organized into three categories (i.e., distributive, algebraic, holistic), based on the
kind of aggregate functions used.
Distributive: An aggregate function is distributive
if it can be computed in a distributed manner as follows. Suppose the data are partitioned into n
sets.We apply the function to each partition,
resulting in n aggregate
values. If the result derived by applying the function to the n
aggregate values is the same as that derived by
applying the function to the entire data set (without partitioning), the
function can be computed in a distributed manner. For example, count() can be computed for a data cube by first partitioning the cube into
a set of subcubes, computing count() for each
subcube, and then summing up the counts obtained for each subcube. Hence, count() is a distributive aggregate function. For the same reason, sum(),
min(), and max() are
distributive aggregate functions. A measure is distributive
if it is obtained by applying a distributive aggregate
function. Distributive measures can be computed efficiently because they can be
computed in a distributive manner.
Algebraic: An aggregate function is
algebraic if it can be computed by an algebraic function with M arguments
(where M is a bounded positive integer), each of which is obtained by applying
a distributive aggregate function. For example, avg() (average) can be computed
by sum()/count(), where both sum() and count() are distributive aggregate
functions. Similarly, it can be shown that min N() and max N() (which find the
N minimum and N maximum values, respectively, in a given set) and standard
deviation() are algebraic aggregate functions. A measure is algebraic if it is
obtained by applying an algebraic aggregate function.
Holistic: An
aggregate function is holistic if there is no constant bound on the storage size needed to describe a subaggregate. That is, there does
not exist an algebraic function with M arguments (where M is a constant) that
characterizes the computation. Common examples of holistic functions include
median(), mode(), and rank(). A measure is holistic if it is obtained by
applying a holistic aggregate function.
分享到:
相关推荐
NIST Special Publication 800-34, Rev.... This guide addresses specific contingency planning recommendations for three platform types and provides strategies and techniques common to all systems.
In terms of system design, the platform likely employs a three-tier architecture, with presentation, business logic, and data access layers. This separation of concerns enhances maintainability, ...
The harmonized standards mentioned in the directive are categorized into three types: 1. **Type-A Standards:** These are basic safety standards that provide general principles for the design and ...
Next, the chapter discusses how to use three different types of conditional logic to filter for specific rows from the input file and write them to a CSV output file. Then the chapter presents two ...
They support various types of sources and destinations, as shown in Figure 1, which illustrates the different ways streams can be used in Cocoa applications. #### Reading From InputStreams When ...
Five different types of pages represent the system’s working set: system cache; paged pool; pageable code and data in the kernel; page-able code and data in device drivers; and system mapped views. ...
One-way ANOVA tests whether there is a significant difference between the means of three or more independent groups. This technique is commonly used in experimental designs. **Crosstabulation Tables...
Committers fall into three groups: committers who are only concerned with one area of the project (for instance file systems), committers who are involved only with one sub-project and committers who ...