原文:http://chentingpc.me/article/?id=616
Topic Modeling(主题模型)是一个比较神奇的东西,之前听说过,没意识到它的重要性。直到唐总的点拨后重新认真看看,可以说文本挖掘的一个基础吧(比较 高级的基础?)。问题的输入是文档,输出是低维空间的主题,是unsupervised算法。基本经历发展是 LSI->pLSI->LDA->various LDA,pLSI和LDA都是生成模型,特别是LDA,这种看待文本的思想是很奇妙的。LDA的思想虽简单,但是利用EM/Gibbs等进行概率推导学起 来就没那么简单(写此文时候这部分还没完全弄清楚;唐总说TM是用一个月来学的问题或用两三个月来学的问题,呼呼,真的假的。。不知道他说这句话时候的要 求是多高)。
仔细看LDA有两三天了,今晚也跑了跑Mallet,也有了感性的认识。下面就把入门的文章整理下吧(这些文章都可以从网上公开下载,所以这里附件其中不算侵权吧。。。):
Survey
- David M. Blei主页上的Topic modeling页面,有很多资料(从tutorial到implementation)
- 自然语言处理中主题模型的发展
- Probabilistic Topic Models.pdf
- Introduction to Probabilistic Topic Models.pdf
Specific
- LSI : Latent semantic indexing a probabilistic analysis.pdf
- pLSI : Probabilistic Latent Semantic Indexing.pdf
- LDA : Latent Dirichlet Allocation.pdf
Video Lecture
- D.Blei的一个很不错的lecture,由于网速原因,我只能看到其课件不能看lecture,但毫无疑问是好lecture(这东西就是D.Blei等人03年提出的)。
- 另一个D. Blei的lecture
Open Source
Derived (not recommended for newcomers)
- dynamic LDA : dynamic_topic_models.pdf
- The Author-Topic Model for Authors and Documents
- Correlated Topic Models
- Automatic Labeling of Multinomial Topic Model
相关推荐
Machine Learning Algorithms ... Topic Modeling And Sentiment Analysis In Nlp Chapter 14. A Brief Introduction To Deep Learning And Tensorflow Chapter 15. Creating A Machine Learning Architecture
– Basic algorithms: Chapters 1 through 7 discuss the classical algorithms for machine learning from text such as preprocessing, similarity computation, topic modeling, matrix factorization, ...
To deepen your understanding of Python scripting in Maya, consider exploring additional resources such as online tutorials, community forums, and books dedicated to the topic. **Conclusion** By ...
Starting from the beginning, this book introduces you to unsupervised learning and provides a high-level introduction to the topic. We quickly move on to discuss the application of key concepts and ...
ASIN: B077NQGV1G, ISBN: 1788392019 Year: 2017 Format: AZW3 ... Practical tips and examples are provided at every step to ensure you are able to grasp each topic as quickly as possible.
We will learn how to use these techniques to do sentiment analysis and topic modeling. Chapter 11, Probabilistic Reasoning for Sequential Data, shows you techniques used to analyze time series and ...
- **Natural Language Processing**: In NLP, PLSA can help in tasks such as text classification, sentiment analysis, and topic modeling. It can also aid in understanding the semantic relationships ...
- **Introduction to Statsmodels**: Statsmodels is a library for statistical modeling and econometric analysis in Python. It includes classes and functions for regression analysis, time series analysis...
Tibshirani proposed the Lasso and is co-author of the very successful <EM>An Introduction to the Bootstrap</EM>. Friedman is the co-inventor of many data-mining tools including CART, MARS, and ...
### AI in CMPT 310: An Introduction to Artificial Intelligence #### Overview The PowerPoint (PPT) files for CMPT 310 provide an introduction to artificial intelligence (AI), covering various topics ...
9. **Topic Modeling**: Methods for identifying and extracting topics from large collections of documents. 10. **Machine Learning**: Advanced machine learning algorithms and frameworks for predictive ...
Soft computing and nature-inspired computing both play a significant role in developing a better understanding to machine learning. When studied together, they can offer new perspectives on the ...
0.7.3 主题模型的其他相关方法(Other Topic Modeling Methods) - 除LDA和PLSA之外,可能还介绍了其他主题模型方法,如SVD(奇异值分解)等。 文章的内容涉及了统计学、概率论以及机器学习中的核心概念,通过深入...