`
jarod2008
  • 浏览: 82403 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

association rules关联规则

阅读更多

Introduction

Association rule mining finds interesting associations and/or correlation relationships among large set of data items. Association rules show attribute value conditions that occur frequently together in a given dataset. A typical and widely-used example of association rule mining is Market Basket Analysis. 

For example, data are collected using bar-code scanners in supermarkets. Such ‘market basket’ databases consist of a large number of transaction records. Each record lists all items bought by a customer on a single purchase transaction. Managers would be interested to know if certain groups of items are consistently purchased together. They could use this data for adjusting store layouts (placing items optimally with respect to each other), for cross-selling, for promotions, for catalog design and to identify customer segments based on buying patterns. 

Association rules provide information of this type in the form of "if-then" statements. These rules are computed from the data and, unlike the if-then rules of logic, association rules are probabilistic in nature. 

In addition to the antecedent (the "if" part) and the consequent (the "then" part), an association rule has two numbers that express the degree of uncertainty about the rule. In association analysis the antecedent and consequent are sets of items (called itemsets) that are disjoint (do not have any items in common). 

The first number is called the support for the rule. The support is simply the number of transactions that include all items in the antecedent and consequent parts of the rule. (The support is sometimes expressed as a percentage of the total number of records in the database.) 

The other number is known as the confidence of the rule. Confidence is the ratio of the number of transactions that include all items in the consequent as well as the antecedent (namely, the support) to the number of transactions that include all items in the antecedent. 

For example, if a supermarket database has 100,000 point-of-sale transactions, out of which 2,000 include both items A and B and 800 of these include item C, the association rule "If A and B  are purchased then C is purchased on the same trip" has a support of 800 transactions (alternatively 0.8% = 800/100,000) and a confidence of 40% (=800/2,000). One way to think of support is that it is the probability that a randomly selected transaction from the database will contain all items in the antecedent and the consequent, whereas the confidence is the conditional probability that a randomly selected transaction will include all the items in the consequent given that the transaction includes all the items in the antecedent.

Lift is one more parameter of interest in the association analysis. Lift is nothing but the ratio of Confidence to Expected Confidence. Expected Confidence in this case means, using the above example, "confidence, if buying A and B does not enhance the probability of buying C."  It is the number of transactions that include the consequent  divided by the total number of transactions. Suppose the number of total number of transactions for C are 5,000. Thus Expected Confidence is 5,000/1,00,000=5%. For our supermarket example the Lift = Confidence/Expected Confidence = 40%/5% = 8. Hence Lift is a value that  gives us information about the increase in probability of the "then" (consequent)  given the "if" (antecedent) part.

分享到:
评论

相关推荐

    关联规则Association Rules

    关联规则Association Rules 关联规则(Association Rules)是一种数据挖掘技术,用于发现事务数据库中的频繁模式、关联、相关或因果关系的结构。关联规则可以应用于市场购物篮分析、推荐系统、异常检测等领域。 ...

    1.rar_association_关联规则_关联规则 matlab_关联规则Apriori算法

    这个“1.rar_association_关联规则_关联规则 matlab_关联规则Apriori算法”压缩包文件显然是一个关于使用MATLAB实现Apriori算法的例子。Apriori算法是关联规则学习中最著名的算法之一,由 Agrawal 和 Srikant 在1994...

    Sampling large databases for association rules

    本文主要探讨了如何在大型数据库中高效地发现关联规则(Association Rules)。随着计算机技术的发展,许多组织积累了大量的电子数据,例如超市的销售记录、银行及信用卡公司的交易历史等。这些数据中蕴含着丰富的...

    Mining Association Rules and Frequent Itemsets

    - `rules`:存储关联规则的对象,提供规则的评估、排序和筛选等功能。 - `apriori`和`eclat`:分别实现了Apriori和Eclat算法,用于挖掘频繁项集。 - `inspect`:用于查看和打印规则或项集的具体内容。 - `plot`和`...

    大数据之数据挖掘课程:海量数据集挖掘 02-关联规则 Association rules 共59页.pdf

    ### 大数据之数据挖掘课程:海量数据集挖掘——关联规则 Association rules #### 课程概述 本课程旨在探讨如何从海量数据集中提取有价值的信息。在众多的数据挖掘技术中,关联规则挖掘是其中一个重要的组成部分,它...

    Mining Association Rules between Sets of Items

    Mining Association Rules between Sets of Items in Large Databases 是一种数据挖掘技术,旨在发现大型数据库中项之间的关联规则。这种技术可以应用于各种领域,如零售业、金融业、医疗保健等,帮助企业更好地...

    导入Excel数据,并进行关联规则分析

    在数据分析领域,关联规则分析是一种常用的技术,用于发现数据集中不同项之间的有趣关系。这个过程通常涉及使用Python等编程语言,结合相应的库如`pandas`进行数据预处理,然后利用`mlxtend`或`apriori`等库进行关联...

    正、负关联规则间的置信度关系研究

    **正关联规则**(Positive Association Rules)指的是那些在统计学上显示出较高概率同时出现的项目组合。例如,“如果客户买了商品A,则有很大可能性也会买商品B”。这种类型的规则在零售业中非常有用,可以帮助商家...

    Association Rules

    本文探讨了一种在数据挖掘领域中用于发现关联规则的新算法——封闭项集格剪枝(Pruning Closed Itemset Lattices for Association Rules),该算法由Pasquier等人于1998年提出并发表在《BDA》期刊上。传统上,关联...

    关联规则在购物篮数据分析中的应用-数据挖掘.doc

    关联规则(Association Rules)是数据挖掘技术中的一种重要方法,用于发现数据中的相关关系。通过关联规则挖掘,可以发现购物篮中商品之间的相关关系,例如啤酒和尿布之间的关系,从而帮助企业制定更有针对性的营销...

    数据挖掘-关联规则挖掘

    关联规则挖掘是数据挖掘的一种方法,它旨在找出数据集中项集之间的有趣关系,如“如果用户购买了商品A,那么他们也可能会购买商品B”。在这个场景中,我们将探讨如何使用Python进行关联规则挖掘。 关联规则通常由两...

    使用Apriori算法进行关联规则挖掘的实验报告与代码实现

    代码部分可能涉及Python编程语言,使用了像`pandas`库处理数据,`apriori`函数实现Apriori算法,以及`association_rules`函数生成和过滤关联规则。 实验报告可能会讨论实验过程中遇到的问题、优化策略以及对挖掘...

    人工智能-机器学习-关联规则-中医证型关联规则挖掘

    3. 算法实现:项目可能采用了Apriori、FP-Growth等经典的关联规则挖掘算法,也可能使用了更现代的算法如Eclat或Mining Association Rules (MAR)。这些算法会找出频繁项集,并基于这些项集生成关联规则。 4. 结果...

    CPAR:Classification based on Predictive Association Rules

    关联规则挖掘算法CPAR(Classification based on Predictive Association Rules)是由Xiaoxin Yin和Jiawei Han提出的,属于伊利诺伊大学厄巴纳-香槟分校的研究成果。CPAR算法是一种新的分类方法,其目的在于结合传统...

    FUP2算法---解决数据库更新后的关联规则挖掘.rar_association_david w cheung_关联规则_关联规

    香港大学的David W.Cheung写的“A General Incremental Technique for Maintaining Discovered Association Rules”中提到的FUP2算法,用来解决数据库更新后的关联规则挖掘

    关联规则orar算法matlab实现

    ORAR(Order-Reduction Algorithm for Association Rules)算法是关联规则挖掘中的一个方法,它基于关系代数理论,旨在有效地减少计算量,提高挖掘效率。 ORAR算法的核心思想是通过剪枝策略减少候选规则的生成,...

    关联规则挖掘的文章--基于关联规则重要性的产品购买序列模式发现.caj

    关联规则挖掘 association rules mining的文章 基于关联规则重要性的产品购买序列模式发现.caj

    关联规则数据挖掘.doc

    4. 相关性分组或关联规则(Affinity grouping or association rules):发现数据之间的相关关系和规律。 5. 聚类(Clustering):根据数据的特征,将其聚类成不同的组。 六、数据挖掘的重要性 数据挖掘对于企业的...

Global site tag (gtag.js) - Google Analytics