`

THREE TYPES OF MEASURES

 
阅读更多


摘自《Data Mining - Concepts and Techniques

 

Measures can be organized into three categories (i.e., distributive, algebraic, holistic), based on the kind of aggregate functions used.

Distributive: An aggregate function is distributive if it can be computed in a distributed manner as follows. Suppose the data are partitioned into n sets.We apply the function to each partition, resulting in n aggregate values. If the result derived by applying the function to the n aggregate values is the same as that derived by applying the function to the entire data set (without partitioning), the function can be computed in a distributed manner. For example, count() can be computed for a data cube by first partitioning the cube into a set of subcubes, computing count() for each subcube, and then summing up the counts obtained for each subcube. Hence, count() is a distributive aggregate function. For the same reason, sum(), min(), and max() are distributive aggregate functions. A measure is distributive if it is obtained by applying a distributive aggregate function. Distributive measures can be computed efficiently because they can be computed in a distributive manner.

       Algebraic: An aggregate function is algebraic if it can be computed by an algebraic function with M arguments (where M is a bounded positive integer), each of which is obtained by applying a distributive aggregate function. For example, avg() (average) can be computed by sum()/count(), where both sum() and count() are distributive aggregate functions. Similarly, it can be shown that min N() and max N() (which find the N minimum and N maximum values, respectively, in a given set) and standard deviation() are algebraic aggregate functions. A measure is algebraic if it is obtained by applying an algebraic aggregate function.

       Holistic: An aggregate function is holistic if there is no constant bound on the storage size needed to describe a subaggregate. That is, there does not exist an algebraic function with M arguments (where M is a constant) that characterizes the computation. Common examples of holistic functions include median(), mode(), and rank(). A measure is holistic if it is obtained by applying a holistic aggregate function.

分享到:
评论

相关推荐

    NIST SP800-34 rev1 errata.pdf

    NIST Special Publication 800-34, Rev.... This guide addresses specific contingency planning recommendations for three platform types and provides strategies and techniques common to all systems.

    毕业论文springboot299基于Java的家政服务平台的设计与实现论文.doc

    In terms of system design, the platform likely employs a three-tier architecture, with presentation, business logic, and data access layers. This separation of concerns enhances maintainability, ...

    Rexroth_Machine_Safety

    The harmonized standards mentioned in the directive are categorized into three types: 1. **Type-A Standards:** These are basic safety standards that provide general principles for the design and ...

    Foundations for Analytics with Python O-Reilly-2016-Clinton W. Brownley

    Next, the chapter discusses how to use three different types of conditional logic to filter for specific rows from the input file and write them to a CSV output file. Then the chapter presents two ...

    Streams apple

    They support various types of sources and destinations, as shown in Figure 1, which illustrates the different ways streams can be used in Cocoa applications. #### Reading From InputStreams When ...

    微软内部资料-SQL性能优化2

    Five different types of pages represent the system’s working set: system cache; paged pool; pageable code and data in the kernel; page-able code and data in device drivers; and system mapped views. ...

    statistica 全套教程包括数据挖掘

    One-way ANOVA tests whether there is a significant difference between the means of three or more independent groups. This technique is commonly used in experimental designs. **Crosstabulation Tables...

    a project model for the FreeBSD Project.7z

    Committers fall into three groups: committers who are only concerned with one area of the project (for instance file systems), committers who are involved only with one sub-project and committers who ...

Global site tag (gtag.js) - Google Analytics