`

Mahout: Integerate jcseg with mahout seq2parse

 
阅读更多

Google global sites url

https://github.com/justjavac/Google-IPs

 

JCSEG

http://www.oschina.net/p/jcseg

MMSEG

http://technology.chtsai.org/mmseg/

 

//convert maven project to eclipse project

#mvn eclipse:eclipse -DskipTests

 

//tranfer text docs to seq docs

#mahout seqdirectory -c UTF-8  -i mahout/topics/textdocs -o mahout/topics/seqdocs

 

//dump tokenized docs(seq format) to text format

mahout seqdumper -i mahout/topics/docsvectors2/tokenized-documents -o ./tokenized-docs2

 

 

//recompile jcseg

#mvn clean package -DskipTests

 

 

Lucene Analyzer

http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/analysis/Analyzer.html

www.cnblogs.com/forfuture1978/archive/2010/06/06/1752837.html

http://www.360doc.com/content/12/0512/21/1542811_210601163.shtml

 

mongodb + lucene/solr    MongoDB+Sphinx做全文检索  coreseek   MongoDB 2.6的文本搜索现在可用于生产环境

http://www.open-open.com/lib/view/1343210299443

http://www.gasimzade.org/2012/11/under-hood-architectural-overview-of.html

http://www.jayway.com/2010/11/14/full-text-search-with-mongodb-and-lucene-analyzers/

http://docs.mongodb.org/manual/tutorial/model-data-for-keyword-search/

http://lumongo.org/

http://baike.sogou.com/v54377490.htm

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics