基于Lucene3.5.0如何从TokenStream获得Token

yeshaoting

浏览: 692161 次
性别:
来自: 北京

最近访客更多访客>>

lengrenhanbing

goldtoad

天涯a海角

yinbangmin

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Solr/Lucene

基于Lucene3.5.0如何从TokenStream获得Token

http://blog.csdn.net/hiphopmattshi/article/details/7226326

通过学习Lucene3.5.0的doc文档，对不同release版本 lucene版本的API改动做分析。最后找到了有价值的改动信息。

LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute. The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java's StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir)

以上信息可以知道，原来的通过的方法已经不能够提取响应的Token了

StringReader reader = new StringReader(s);

TokenStream ts =analyzer.tokenStream(s, reader);  
TermAttribute ta = ts.getAttribute(TermAttribute.class);

通过分析Api文档信息可知，CharTermAttribute已经成为替换TermAttribute的接口

因此我编写了一个例子来更好的从TokenStream中提取Token

package com.segment;  
  
import java.io.StringReader;  
import org.apache.lucene.analysis.Analyzer;  
import org.apache.lucene.analysis.Token;  
import org.apache.lucene.analysis.TokenStream;  
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;  
import org.apache.lucene.analysis.tokenattributes.TermAttribute;  
import org.apache.lucene.util.AttributeImpl;  
import org.wltea.analyzer.lucene.IKAnalyzer;  
  
  
public class Segment {  
    public static String show(Analyzer a, String s) throws Exception {  
  
        StringReader reader = new StringReader(s);  
        TokenStream ts = a.tokenStream(s, reader);  
        String s1 = "", s2 = "";  
        boolean hasnext= ts.incrementToken();  
        //Token t = ts.next();  
        while (hasnext) {  
            //AttributeImpl ta = new AttributeImpl();  
            CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);  
            //TermAttribute ta = ts.getAttribute(TermAttribute.class);  
              
            s2 = ta.toString() + " ";  
            s1 += s2;  
            hasnext = ts.incrementToken();  
        }  
        return s1;  
    }  
  
    public String segment(String s) throws Exception {  
        Analyzer a = new IKAnalyzer();  
        return show(a, s);  
    }  
    public static void main(String args[])  
    {  
        String name = "我是俊杰，我爱编程，我的测试用例";  
        Segment s = new Segment();  
        String test = "";  
        try {  
            System.out.println(test+s.segment(name));  
        } catch (Exception e) {  
            // TODO Auto-generated catch block  
            e.printStackTrace();  
        }  
    }  
  
}

分享到：