lucene的分组查询

mxdxm

浏览: 2056795 次
性别:
来自: 北京

最近访客更多访客>>

linxl2011

ningzong

u012363178

wangyy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Lucene

lucene Apache J#SQL

通过lucene搜索去除相同结果。

在网上找了很久到没有答案，到apache看了文档，http://lucene.apache.org/java/2_4_1/queryparsersyntax.html

搜索语法之中是没有类似group by的。只好换个思路，想到了过滤器。

结果发现了org.apache.lucene.search.DuplicateFilter这个类。对此类的解释如下： "Full" processing mode starts by setting all bits to false and only setting bits for documents that contain the given field and are identified as none-duplicates. 这就是说这个过滤器可以保证搜索的唯一。这样就可以实现类似sql的group by（和group by还是有一定区别的，我想要的要求就是去除相同结果，但次方法经过修改也可实现group by 其他功能）。
多的不说了贴个例子自己研究下吧。

public static void main(String [] args) throws Exception{
RAMDirectory directory=new RAMDirectory();
IndexWriter writer=new IndexWriter(directory,new StandardAnalyzer(),true);

//数组中有3个重复值133700
   String[] link ={"",
     "shtml#Ayi:263791429",
     "133700",
     "133700",
     "133700",
     "#Ayi:468534543",
     "#Ayi:-992539968",
     "#Ayi:442193484"};
   String[] parentLink={"110905.shtml",
     "110905.shtml",
     "110905.shtml",
     "110905.shtml",
     "905.shtml",
     "5.shtml",
     "110905.shtml",
     "1"};
   for (int i = 0; i < link.length; i++){
            Document doc = new Document();
            Field fields=new Field("link",link[i], Field.Store.YES, Field.Index.TOKENIZED);
            doc.add(fields);
            fields=null;
            fields=new Field("plink","a"+i, Field.Store.YES, Field.Index.TOKENIZED);
            doc.add(fields);
            writer.addDocument(doc);
        }
   writer.optimize();
   writer.close();
     IndexSearcher indexSearcher=new IndexSearcher(directory);
     QueryParser queryParser=new QueryParser("link",new StandardAnalyzer());
     String xsfd="link:(133700)";

//实例化DuplicateFilter 参数为想要过滤的字段名
     Filter filter = new DuplicateFilter("link");
     Query query=queryParser.parse(xsfd);
     Hits hits=indexSearcher.search(query,filter);
     System.out.println(hits.length());
     for(int j=0;j<hits.length();j++){
        Document doc=(Document)hits.doc(j);
        System.out.println(doc.get("link"));
     }

}

注意：DuplicateFilter在lucene的核心包里并没有在lucene-queries-2.4.1.jar包中，找不到这个包就下个lucene源码在contrib\queries里。哎，还真的不太好发现。

分享到：

TermEnum | Lucene中的自定义排序功能

2011-04-07 10:51
浏览 1743
评论(0)
分类:非技术
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论