https://www.ibm.com/developerworks/java/library/j-mahout/index.html引用
Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous experiences. The field is closely related to data mining and often uses techniques from statistics, probability theory, pattern recognition, and a host of other areas.
Supervised learning is tasked with learning a function from labeled training data in order to predict the value of any valid input. Common examples of supervised learning include classifying e-mail messages as spam, labeling Web pages according to their genre, and recognizing handwriting. Many algorithms are used to create supervised learners, the most common being neural networks, Support Vector Machines (SVMs), and Naive Bayes classifiers.
Unsupervised learning, as you might guess, is tasked with making sense of data without any examples of what is correct or incorrect. It is most commonly used for clustering similar input into logical groups. It also can be used to reduce the number of dimensions in a data set in order to focus on only the most useful attributes, or to detect trends. Common approaches to unsupervised learning include k-Means, hierarchical clustering, and self-organizing maps.
ppt:
http://www.slideshare.net/Cataldo/tutoria-mahout-recommendation
http://www.slideshare.net/JeanPierreKnig/what-are-product-recommendations-and-how-do-they-work
http://www.slideshare.net/erikbern/collaborative-filtering-at-spotify-16182818
text:
http://www.bidn.com/blogs/cprice1979/ssas/4388/-mahout-recommendation-engines-part-2-ride-the-elephant
Algorithms:
https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
http://answers.google.com/answers/main?cmd=threadview&id=225316
LSH (Locality-sensitive hashing)
SimHash http://my.oschina.net/pathenon/blog/63747
MinHash http://my.oschina.net/pathenon/blog/65210
TF-IDF & VSM http://pyevolve.sourceforge.net/wordpress/?p=2497
Clustering - unsupervised learning - 不知道什么结果
Classification - supervised learning - 有已知的固定结果集
Machine Learning Terms:
Association Rules - 关联规则,一个消费者购买了产品A,那么他有多大机会购买产品B?
latent semantic analysis - LSA
probabilistic latent semantic analysis - pLSA
Latent Dirichlet Allocation - LDA
http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/
Math Terms:
Derivative - 导数
covariance - 协方差 常缩写为 cov
standard deviation - 标准差 常缩写为 stddev
dot product - 数量积(也称为内积、标量积、点积、点乘)
Math symbols:
http://en.wikipedia.org/wiki/List_of_mathematical_symbols
http://zh.wikipedia.org/wiki/%E7%94%A8%E6%96%BC%E6%95%B8%E5%AD%B8%E3%80%81%E7%A7%91%E5%AD%B8%E5%92%8C%E5%B7%A5%E7%A8%8B%E7%9A%84%E5%B8%8C%E8%87%98%E5%AD%97%E6%AF%8D
Text processing:
Feature Selection,Feature Extraction, Terminology Extraction,Keyword Extraction, 有什么不同?
这里介绍了 Feature Extraction 和 Feature Selection 的区别:
http://stackoverflow.com/questions/2163330/difference-between-feature-selection-feature-extraction-feature-weights
至于 Feature Extraction / Terminology Extraction / Keyword Extraction 这三个词在含义上有没有什么本质的区别,我tm也不知道,只知道 wikipedia 上FE和TE两个条目是独立的。
Collocation - 搭配词
n-gram - a contiguous sequence of n items from a given sequence of text.
Stemming - process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form. 即 cats catty ... 都统一为 cat。
Evaluation of recommendation:
有 preferences 存在的,可以使用 rms root mean squared 等来做;
对于 没有 preferences 的 boolean 推荐,怎么评估那?理论上可以用 classic Information Retrieval metrics:Precision & Recall 来做;但是要注意,基于 Precision & Recall 来做对推荐结果的评估并不是理想的方案。见 Sean:
http://lucene.472066.n3.nabble.com/Evaluating-Boolean-Preferences-for-Item-Recommenders-td688560.html引用
I am not surprised by low precision. It doesn't necessarily mean the recommender is bad (though it could!). I think a precision-recall test is somewhat flawed for recommenders. It measures how well the recommender returns things the user has already seen, which are not necessarily the best recommendations. That is, a recommender gets penalized in this test if it recommends something the user *would* like, but hasn't rated.
http://stackoverflow.com/questions/7529333/similarity-function-for-mahout-boolean-user-based-recommender
关于 mahout 实现的 IR metrics 中的 Precision / Recall / Fall-out / nDCG 等:
http://en.wikipedia.org/wiki/Information_retrieval
http://stackoverflow.com/questions/16478192/how-to-interpret-irstatisticsimpl-data-in-mahout
分享到:
相关推荐
that machine learning has become one of the most exciting technologies of our time and age. Large companies, such as Google, Facebook, Apple, Amazon, and IBM, heavily invest in machine learning ...
### 《Machine Learning》by Tom M. Mitchell — Key Concepts and Insights #### Introduction to Machine Learning Machine learning is a subfield of artificial intelligence that focuses on the ...
This course will guide you to upgrade your skills in Machine Learning by practically applying them by building real-world Machine Learning projects. Each section should cover a specific project on a ...
Thoughtful Machine Learning with Python: A Test-Driven Approach English | 25 Aug. 2016 | ISBN: 1491924136 | 250 Pages | AZW3/MOBI/EPUB/PDF (conv) | 16.77 MB By teaching you how to code machine-...
When most people hear “Machine Learning,” they picture a robot: a dependable butler or a deadly Terminator depending on who you ask. But Machine Learning is not just a futuristic fantasy, it’s ...
Practical Machine Learning with Python A Problem-Solver's Guide to Building Real-World Intelligent Systems Author: Dipanjan Sarkar, Raghav Bali, Tushar SharmaISBN-10: 1484232062Year: 2018...
The dramatic growth in practical applications for machine learning over the last ten years has been accompanied by many important developments in the underlying algorithms and techniques. For example,...
Pratap Dangeti, "Statistics for Machine Learning" English | ISBN: 1788295757 | 2017 | EPUB | 311 pages | 12 MB Key Features Learn about the statistics behind powerful predictive models with p-value, ...
machine learning课件(国外大学) machine learning课件(国外大学) machine learning课件(国外大学) machine learning课件(国外大学) machine learning课件(国外大学) machine learning课件(国外大学) ...
"Machine Learning Design Patterns" 《Machine Learning Design Patterns》是一本关于机器学习设计模式的书籍,作者是Valliappa Lakshmanan, Sara Robinson和Michael Munn。该书的主要内容是解决数据准备、模型...
in Machine Learning solutions and driving adoption across diverse segments of the industry. The ability to learn complex models underlying the real-world processes from observed (training) data ...
Machine Learning Projects for .NET Developers shows you how to build smarter .NET applications that learn from data, using simple algorithms and techniques that can be applied to a wide range of real-...
Machine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, ...
Machine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, ...
source libraries for deep learning, data wrangling, and data visualization, Learn effective strategies and best practices to improve and optimize machine learning systems and algorithms, Ask – and ...
Master the statistical aspect of machine learning with the help of this example-rich guide in R & Python. Book Description Complex statistics in machine learning worries a lot of developers. Knowing ...
《NI LabVIEW Machine Learning Toolkit深度解析》 NI LabVIEW Machine Learning Toolkit是一款专为LabVIEW用户设计的机器学习工具包,它极大地扩展了LabVIEW在数据分析和模式识别领域的应用能力。该工具包集成了...