`

THREE TYPES OF MEASURES

 
阅读更多


摘自《Data Mining - Concepts and Techniques

 

Measures can be organized into three categories (i.e., distributive, algebraic, holistic), based on the kind of aggregate functions used.

Distributive: An aggregate function is distributive if it can be computed in a distributed manner as follows. Suppose the data are partitioned into n sets.We apply the function to each partition, resulting in n aggregate values. If the result derived by applying the function to the n aggregate values is the same as that derived by applying the function to the entire data set (without partitioning), the function can be computed in a distributed manner. For example, count() can be computed for a data cube by first partitioning the cube into a set of subcubes, computing count() for each subcube, and then summing up the counts obtained for each subcube. Hence, count() is a distributive aggregate function. For the same reason, sum(), min(), and max() are distributive aggregate functions. A measure is distributive if it is obtained by applying a distributive aggregate function. Distributive measures can be computed efficiently because they can be computed in a distributive manner.

       Algebraic: An aggregate function is algebraic if it can be computed by an algebraic function with M arguments (where M is a bounded positive integer), each of which is obtained by applying a distributive aggregate function. For example, avg() (average) can be computed by sum()/count(), where both sum() and count() are distributive aggregate functions. Similarly, it can be shown that min N() and max N() (which find the N minimum and N maximum values, respectively, in a given set) and standard deviation() are algebraic aggregate functions. A measure is algebraic if it is obtained by applying an algebraic aggregate function.

       Holistic: An aggregate function is holistic if there is no constant bound on the storage size needed to describe a subaggregate. That is, there does not exist an algebraic function with M arguments (where M is a constant) that characterizes the computation. Common examples of holistic functions include median(), mode(), and rank(). A measure is holistic if it is obtained by applying a holistic aggregate function.

分享到:
评论

相关推荐

    NIST SP800-34 rev1 errata.pdf

    NIST Special Publication 800-34, Rev.... This guide addresses specific contingency planning recommendations for three platform types and provides strategies and techniques common to all systems.

    毕业论文springboot299基于Java的家政服务平台的设计与实现论文.doc

    In terms of system design, the platform likely employs a three-tier architecture, with presentation, business logic, and data access layers. This separation of concerns enhances maintainability, ...

    Rexroth_Machine_Safety

    The harmonized standards mentioned in the directive are categorized into three types: 1. **Type-A Standards:** These are basic safety standards that provide general principles for the design and ...

    Foundations for Analytics with Python O-Reilly-2016-Clinton W. Brownley

    Next, the chapter discusses how to use three different types of conditional logic to filter for specific rows from the input file and write them to a CSV output file. Then the chapter presents two ...

    微软内部资料-SQL性能优化2

    Five different types of pages represent the system’s working set: system cache; paged pool; pageable code and data in the kernel; page-able code and data in device drivers; and system mapped views. ...

    statistica 全套教程包括数据挖掘

    One-way ANOVA tests whether there is a significant difference between the means of three or more independent groups. This technique is commonly used in experimental designs. **Crosstabulation Tables...

    a project model for the FreeBSD Project.7z

    Committers fall into three groups: committers who are only concerned with one area of the project (for instance file systems), committers who are involved only with one sub-project and committers who ...

Global site tag (gtag.js) - Google Analytics