lucene3.5 中文分解IKAnalyzer 和元分解 StandardAnalyzer

sungang_1120

浏览: 326696 次
性别:
来自: 成都

最近访客更多访客>>

bxl994

eplang

zhongzunfa

sdyjmc

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

细细品味-Lucene

lucene

lucene3.5 中文分解IKAnalyzer 和元分解 StandardAnalyzer
[size=large][/size][align=center][/align]

package com.txt.test2;

import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.util.Version;
import org.wltea.analyzer.lucene.IKAnalyzer;

public class AnalyzerTest02 {
private static String text = "中国，古时通常泛指中原地区，与中华中夏中土中州含义相同。"
+ "古代华夏族、汉族建国于黄河流域一带，以为居天下之中，故称中国";

public static void main(String[] args) throws Exception{
//添加不同的分词器
Analyzer analyzer = new IKAnalyzer();
System.out.println("======中文=======IKAnalyzer======分词=======");
AnalyzerTest02.showToken(analyzer, text);
System.out.println("=====一元========StandardAnalyzer=====分词========");
Analyzer analyzer2 = new StandardAnalyzer(Version.LUCENE_30);
AnalyzerTest02.showToken(analyzer2, text);
}

/**
* 分词及打印分词结果的方法
*
* @param analyzer
*            分词器名称
* @param text
*            要分词的字符串
* @throws IOException
*             抛出的异常
*/

public static void showToken(Analyzer analyzer, String text) throws Exception {
Reader reader = new StringReader(text);

TokenStream stream = analyzer.tokenStream("", reader);

// 添加工具类注意：以下这些与之前lucene2.x版本不同的地方
TermAttribute term = stream.addAttribute(TermAttribute.class);

OffsetAttribute offser = stream.addAttribute(OffsetAttribute.class);
// 循环打印出分词的结果，及分词出现的位置
while (stream.incrementToken()) {
System.out.print(term.term() + "|(" + offser.startOffset() + " "
+ offser.endOffset() + ")");
}
System.out.println();
}
}
运行结果：

查看图片附件

分享到：

lucene概念、API使用方法与和性能优化 | lucene3.0.0 field 用法及其 Store In ...

2012-12-28 12:54
浏览 995
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论