数据挖掘领域十大经典算法
下面是参与评比的18种算法,实际上随便拿出一种来都可以称得上是经典算法,它们在数据挖掘领域都产生了极为深远的影响。在我们学习数据挖掘时,可以以这18种算法为主线,如果能把每一种算法都弄懂,整个数据挖掘领域就掌握得差不多了。另外,也可以用这18种算法的熟悉程度来判断自己知识的掌握程度。
Classification
==============
#1. C4.5
Quinlan, J. R. 1993. C4.5: Programs for Machine Learning.
Morgan Kaufmann Publishers Inc.
Google Scholar Count in October 2006: 6907
#2. CART
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, Belmont, CA, 1984.
Google Scholar Count in October 2006: 6078
#3. K Nearest Neighbours (kNN)
Hastie, T. and Tibshirani, R. 1996. Discriminant Adaptive Nearest Neighbor Classification. IEEE Trans. Pattern
Anal. Mach. Intell. (TPAMI). 18, 6 (Jun. 1996), 607-616.
DOI= http://dx.doi.org/10.1109/34.506411
Google SCholar Count: 183
#4. Naive Bayes
Hand, D.J., Yu, K., 2001. Idiot's Bayes: Not So Stupid After All?
Internat. Statist. Rev. 69, 385-398.
Google Scholar Count in October 2006: 51
Statistical Learning
====================
#5. SVM
Vapnik, V. N. 1995. The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc.
Google Scholar Count in October 2006: 6441
#6. EM
McLachlan, G. and Peel, D. (2000). Finite Mixture Models. J. Wiley, New York.
Google Scholar Count in October 2006: 848
Association Analysis
====================
#7. Apriori
Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int'l Conference on Very Large
Databases (VLDB '94), Santiago, Chile, September 1994.
http://citeseer.comp.nus.edu.sg/agrawal94fast.html
Google Scholar Count in October 2006: 3639
#8. FP-Tree
Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without
candidate generation. In Proceedings of the 2000 ACM SIGMOD
international Conference on Management of Data (Dallas, Texas, United
States, May 15 - 18, 2000). SIGMOD '00. ACM Press, New York, NY, 1-12.
DOI= http://doi.acm.org/10.1145/342009.335372
Google Scholar Count in October 2006: 1258
Link Mining
===========
#9. PageRank
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual
Web search engine. In Proceedings of the Seventh international Conference on World Wide Web (WWW-7) (Brisbane,Australia). P. H. Enslow and A. Ellis, Eds. Elsevier Science
Publishers B. V., Amsterdam, The Netherlands, 107-117.
DOI= http://dx.doi.org/10.1016/S0169-7552(98)00110-X
Google Shcolar Count: 2558
#10. HITS
Kleinberg, J. M. 1998. Authoritative sources in a hyperlinked
environment. In Proceedings of the Ninth Annual ACM-SIAM Symposium on
Discrete Algorithms (San Francisco, California, United States, January
25 - 27, 1998). Symposium on Discrete Algorithms. Society for
Industrial and Applied Mathematics, Philadelphia, PA, 668-677.
Google Shcolar Count: 2240
Clustering
==========
#11. K-Means
MacQueen, J. B., Some methods for classification and analysis of
multivariate observations, in Proc. 5th Berkeley Symp. Mathematical
Statistics and Probability, 1967, pp. 281-297.
Google Scholar Count in October 2006: 1579
#12. BIRCH
Zhang, T., Ramakrishnan, R., and Livny, M. 1996. BIRCH: an efficient
data clustering method for very large databases. In Proceedings of the
1996 ACM SIGMOD international Conference on Management of Data
(Montreal, Quebec, Canada, June 04 - 06, 1996). J. Widom, Ed.
SIGMOD '96. ACM Press, New York, NY, 103-114.
DOI= http://doi.acm.org/10.1145/233269.233324
Google Scholar Count in October 2006: 853
Bagging and Boosting
====================
#13. AdaBoost
Freund, Y. and Schapire, R. E. 1997. A decision-theoretic
generalization of on-line learning and an application to
boosting. J. Comput. Syst. Sci. 55, 1 (Aug. 1997), 119-139.
DOI= http://dx.doi.org/10.1006/jcss.1997.1504
Google Scholar Count in October 2006: 1576
Sequential Patterns
===================
#14. GSP
Srikant, R. and Agrawal, R. 1996. Mining Sequential Patterns:
Generalizations and Performance Improvements. In Proceedings of the
5th international Conference on Extending Database Technology:
Advances in Database Technology (March 25 - 29, 1996). P. M. Apers,
M. Bouzeghoub, and G. Gardarin, Eds. Lecture Notes In Computer
Science, vol. 1057. Springer-Verlag, London, 3-17.
Google Scholar Count in October 2006: 596
#15. PrefixSpan
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In Proceedings of the 17th international Conference on Data Engineering (April 02 - 06, 2001). ICDE '01. IEEE Computer Society, Washington, DC.
Google Scholar Count in October 2006: 248
Integrated Mining
=================
#16. CBA
Liu, B., Hsu, W. and Ma, Y. M. Integrating classification and association rule mining. KDD-98, 1998, pp. 80-86.
http://citeseer.comp.nus.edu.sg/liu98integrating.html
Google Scholar Count in October 2006: 436
Rough Sets
==========
#17. Finding reduct
Zdzislaw Pawlak, Rough Sets: Theoretical Aspects of Reasoning about
Data, Kluwer Academic Publishers, Norwell, MA, 1992
Google Scholar Count in October 2006: 329
Graph Mining
============
#18. gSpan
Yan, X. and Han, J. 2002. gSpan: Graph-Based Substructure Pattern
Mining. In Proceedings of the 2002 IEEE International Conference on
Data Mining (ICDM '02) (December 09 - 12, 2002). IEEE Computer
Society, Washington, DC.
Google Scholar Count in October 2006: 155
分享到:
相关推荐
第七部分,数据结构与算法,讲解了如数组、链表、树、图等基本数据结构以及排序、查找等经典算法,这些都是有效解决数据挖掘问题的基础。 第八部分,SQL知识,介绍了SQL(结构化查询语言),它是处理关系数据库的...
数据挖掘十大经典机器学习算法,国际权威的学术组织 the IEEE International Conference on Data Mining (ICDM) 2006 年 12 月评选出了数据挖掘领域的十大经典算法: C4.5, k-Means, SVM, Apriori, EM, PageRank, ...
在这个“数据挖掘算法知识包”中,我们可以深入探讨三个关键方面:全球的大数据工具、经典的数据挖掘算法以及算法理论基础。 首先,全球100款大数据工具.pages 文件很可能是对当前市场上广泛使用的数据处理和分析...
### 数据挖掘经典算法综述 #### 一、引言 数据挖掘是一门融合了数据库技术、统计学、机器学习等多个领域的综合性学科。随着信息技术的飞速发展,数据挖掘已成为企业和研究机构的重要工具,用于从海量数据中提取有...
数据挖掘是一种从海量数据中提取有价值知识的...总的来说,C4.5和随机森林作为数据挖掘中的经典改进算法,为理解和解决复杂数据问题提供了有力工具。通过不断的研究和优化,这些算法将继续在数据科学领域发挥重要作用。
十大经典数据挖掘算法概论,视频不是很清晰,但不影响对算法的学习和了解。想了解这方面内容的同学请下载
本书在介绍了数据挖掘原理的基础上,从实用的角度出发,详细地介绍了数据挖掘的经典算法。第1章从不同的角度对数据挖掘进行了介绍,第2章介绍了数据仓库技术的概念并给出了数据立方体的理论基础。第3章讲述了数据...
Java作为广泛应用的编程语言,因其跨平台特性和丰富的库支持,常被用来实现各种数据挖掘算法。本篇将深入探讨如何使用Java来实现数据挖掘算法,重点关注决策树和粗糙集两种方法。 首先,我们要了解决策树这一机器...
本压缩包文件包含了一系列数据挖掘算法的源代码实现,旨在帮助开发者和研究者深入理解和应用这些算法。 首先,关联规则是数据挖掘中的一个经典概念,Apriori算法是关联规则学习的代表。Apriori算法通过迭代生成频繁...
学习数据挖掘必须知道的10大经典算法,包括C4.5, Apriori, SVM, k-means等10个算法,里面这个算法的介绍,和现在的状况,未来的发展发向,介绍的很具体,现在被大量论文引用,所以说学习数据挖掘算法,必须先看这篇...
本资源包含一系列用Java编程语言实现的经典数据挖掘算法,这些算法是数据挖掘领域的基石,对于理解算法原理及应用具有很高的价值。 1. C4.5算法: C4.5是由Ross Quinlan开发的决策树学习算法,是对ID3算法的改进。C...
改进数据挖掘算法能够提高数据挖掘效率,准确度和实用性。本篇论文主要探讨了一种改进的数据挖掘算法,以解决传统Apriori算法存在的问题。 首先,Apriori算法是数据挖掘中关联规则挖掘的经典算法之一。其核心思想是...
此外,书中还提到了开放的数据挖掘平台,这些平台为数据挖掘提供了更加灵活的应用环境,使得开发者和研究人员能够更加便利地探索和实现各种数据挖掘算法。 本书是国内外第一本对数据挖掘技术基础算法进行详细描述的...
以下是一些在标题和描述中提到的经典数据挖掘算法的详细说明: 1. C4.5:C4.5是ID3算法的升级版,由Ross Quinlan开发,主要用于决策树的构建。它通过信息增益和信息增益率来选择最佳属性进行划分,可以处理连续和...
第一部分给出了常用数据挖掘算法的介绍资料以及对应的C语言源代码,第二部分给出了实例应用的具体实例,第三部分分享了经典PPT教程。
本资源聚焦于Java实现的经典数据挖掘算法,涵盖了分类、关联分析、集成采矿、聚类、连接挖掘以及统计学习等多个方面。 首先,分类算法是数据挖掘中的核心部分,它根据已有的特征将数据分为不同的类别。常见的Java...