Introduction
Association rule mining finds
interesting associations and/or correlation relationships among large set
of data items. Association rules show attribute value conditions that occur
frequently together in a given dataset. A typical and widely-used
example of association rule mining is Market Basket Analysis.
For example, data are
collected using bar-code scanners in supermarkets. Such ‘market basket’
databases consist of a large number of transaction records. Each record
lists all items bought by a customer on a single purchase transaction.
Managers would be interested to know if certain groups of items are
consistently purchased together. They
could use this data for adjusting store layouts (placing items optimally with
respect to each other), for cross-selling,
for promotions, for catalog design and to identify customer segments based
on buying patterns.
Association rules provide information of this type in
the form of "if-then" statements. These rules are computed from
the data and, unlike the if-then rules of logic, association rules are
probabilistic in nature.
In addition to the antecedent (the "if"
part) and the consequent (the "then" part), an association rule
has two numbers that express the degree of uncertainty about the rule. In
association analysis the antecedent and consequent are sets of items
(called itemsets) that are disjoint (do not have any items in common).
The
first number is called the support
for the rule. The support is
simply the number of transactions that include all items in the antecedent
and consequent parts of the rule. (The support is sometimes expressed as a
percentage of the total number of records in the database.)
The other
number is known as the confidence of the rule. Confidence
is the
ratio of the number of transactions that include all items in the
consequent as well as the antecedent (namely, the support) to the number
of transactions that include all items in the antecedent.
For example, if a
supermarket database has 100,000 point-of-sale transactions, out of which
2,000 include both items A and B and 800 of these include item C, the
association rule "If A and B are purchased then C is purchased
on the same trip" has a support of 800 transactions (alternatively
0.8% = 800/100,000) and a confidence of 40% (=800/2,000). One way to think
of support is that it is the probability that a randomly selected
transaction from the database will contain all items in the antecedent and
the consequent, whereas the confidence is the conditional probability that a randomly selected
transaction will include all the items in the consequent given that
the transaction includes all the items in the antecedent.
Lift
is
one more parameter of interest in the association analysis. Lift is nothing but the ratio of Confidence
to Expected Confidence. Expected Confidence in this case means, using the above
example, "confidence, if buying A and B does not enhance the probability of
buying C." It is the number of transactions that
include the consequent divided by the total number of transactions. Suppose the number of total number of
transactions for C are 5,000. Thus Expected Confidence is 5,000/1,00,000=5%. For
our supermarket example the Lift = Confidence/Expected Confidence = 40%/5% = 8.
Hence Lift is a value
that gives us information about the increase in probability of the "then"
(consequent) given the "if" (antecedent) part.
分享到:
相关推荐
Apriori 算法 Frequent-pattern tree 和FP-growth 算法 多维关联规则挖掘 相关规则 基于约束的关联规则挖掘 总结
CPAR,关联规则挖掘算法,较新的关联规则挖掘算法,建议研读
香港大学的David W.Cheung写的“A General Incremental Technique for Maintaining Discovered Association Rules”中提到的FUP2算法,用来解决数据库更新后的关联规则挖掘
关联规则挖掘 association rules mining的文章 基于关联规则重要性的产品购买序列模式发现.caj
最后在多尺度数据理论研究的基础上提出了尺度上推关联规则挖掘算法SU-ARMA(scaling-up association rules mining algorithm)。该算法利用采样理论和Jaccard相似性系数对数据集挖掘结果中的频繁项集进行处理,实现...
提出一种矩阵加权关联模式支持度计算方法及其相关定理,给出矩阵加权项集剪枝策略,基于该剪枝策略提出一种基于项权值变化的矩阵加权关联规则挖掘算法MWAR-Miner(matrix-weighted association rules-miner)。...
关联规则的matlab代码CS171-协会规则 编写代码以对groceries.txt中提供的数据执行关联分析。 为了提供帮助,提供了以下功能。 D = loaddata(filename)将返回一个代表数据集的对象(在下面的函数中用作D参数。您...
基于关联规则和遗传算法的Web文档分类,唐常杰,张天庆,Web文档分类,例如BBS、HTML、e-mail的分类是Web引用中的重要任务。为了解决这一问题,该翁做 了下列工作:(1)提出了一个用于中文文
arulesViz-使用R可视化关联规则和频繁项目集 该R包扩展包与各种可视化技术用于关联规则和项目集。 该软件包还包括用于规则浏览的多个交互式可视化文件。安装稳定的CRAN版本:从R内安装install.packages( " arulesViz...
提出一种基于统计关联规则的增量决策树分类算法,称为SARMT(Statistic Association Rules Miner Tree),它基于快速决策树(Very Fast Decision Tree,VFDT)技术来挖掘医疗数据。与VFDT不同,改进的SARMT算法不依赖于...
基于关联规则挖掘的序规则发现,刘大中,论文研究了有序数据中序规则的发现。已有的算法主要基于粗集理论挖掘序规则,我们提出了基于关联规则的挖掘方法。方法的关键是将
关联分析(Association Analysis) 关联分析是一种在大规模数据集中寻找有趣关系的任务。 这些关系有两种: 频繁项集(frequent item sets):经常出现在一起物品组合 关联规则(association rules):暗示物品之间很强关系
基于关联规则的分类 将R包 (Hahsler等人,2020年)是在封装的延伸来执行关联基于规则的分类。 该软件包提供了用于类关联规则的基础结构,并基于以下算法实现了关联分类器: CBA(Liu et al,1998) bCBA,wCBA(Ian...
ECLAT关联规则挖掘等价类转换关联规则挖掘算法的Python实现我在无聊的时候写了这篇文章,并希望找到一个很好的算法来加快Cython的速度。 不幸的是,这个问题并不能轻易实现优化(而频繁模式挖掘的FP-tree方法要快得...
main文件夹中存放频繁项集挖掘与关联规则生成与关联规则匹配与推荐分值计算这两个模块的代码。 util包里FPTree、AssociationRules是频繁项集挖掘所必须的数据结构,FPNewDef是基于mllib的FP-Growth算法的优化版本。 ...
一、啤酒与尿布 案例背后 数据支持:购物篮数据 数据挖掘技术:关联规则挖掘(Association Rules Mining) 经典算法: Apriori FP-Growth …… 一、啤酒与尿布 案例背后 数据支持:购物篮数据 数据挖掘技术:关联...
关联规则和决策树是感兴趣的挖掘算法,可用于查找和探索数据集中属性之间的关系。 在本文中,关联规则和决策树算法应用于肿瘤数据集,以获得分析结果以支持医疗决策。 结果可用于肿瘤的早期检测,而不是为临床医生...
提出了一种DARM (Distributed Association Rules Mining)算法,该算法在站点间以传送频繁标记取代传送频繁项集,减少了站点间的通讯量,并且避免了频繁项目集的丢失,从而保证了关联规则挖掘结果的完整性和正确性....
本文在分析已有并行挖掘关联规则算法的优缺点的基础上,提出了一个效率较高的并行优化关联规则挖掘算法 EPMAR (Efficient Parallel Mining Association Rules),并与其它相应的算法进行了比较。实验结果证明:算法EPMAR...