Lucene全文检索样例（解决大文本建索引）

damies

浏览: 240021 次
性别:
来自: 北京

最近访客更多访客>>

tpcrack

.zzzzzz

shenkerer

tingxuelouwq

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

JAVA技术

全文检索 lucene Apache Java C

建索引:

package  com.pccw;   
  
 import  java.io.BufferedReader;   
 import  java.io.File;   
 import  java.io.FileInputStream;   
 import  java.io.IOException;   
 import  java.io.InputStreamReader;   
 import  java.util.Date;   
  
 import  org.apache.lucene.analysis.Analyzer;   
 import  org.apache.lucene.analysis.standard.StandardAnalyzer;   
 import  org.apache.lucene.document.Document;   
 import  org.apache.lucene.document.Field;   
 import  org.apache.lucene.index.IndexWriter;   
  
 /** */ /**   
 * author Shane in PCCW
 *
  */   
 public   class  TextFileIndexer   {   
     public   static   void  main(String[] args)  throws  Exception   {   
         /**/ /*  指明要索引文件夹的位置,这里是C盘的S文件夹下  */   
        File fileDir  =   new  File( "c:\\s" );   
  
         /**/ /*  这里放索引文件的位置  */   
        File indexDir  =   new  File( "c:\\index" );   
        Analyzer luceneAnalyzer  =   new  StandardAnalyzer();   
        IndexWriter indexWriter  =   new  IndexWriter(indexDir, luceneAnalyzer,   
                 true );
        indexWriter.setMaxFieldLength(99999999);//增加内存域长度限制（非常重要）
        File[] textFiles  =  fileDir.listFiles();   
         long  startTime  =   new  Date().getTime();   
           
         // 增加document到索引去    
           for  ( int  i  =   0 ; i  <  textFiles.length; i ++ )   {   
             if  (textFiles[i].isFile()   
                     &&  textFiles[i].getName().endsWith( ".txt" ))   {   
                System.out.println( " File  "   +  textFiles[i].getCanonicalPath()   
                         +   " 正在被索引. " );   
                String temp  =  FileReaderAll(textFiles[i].getCanonicalPath(),   
                         " GBK " );   
                System.out.println(temp);   
                Document document  =   new  Document();   
                Field FieldPath  =   new  Field( " path " , textFiles[i].getPath(),   
                        Field.Store.YES, Field.Index.NO);   
                Field FieldBody  =   new  Field( " body " , temp, Field.Store.YES,   
                        Field.Index.TOKENIZED,   
                        Field.TermVector.WITH_POSITIONS_OFFSETS);   
                document.add(FieldPath);   
                document.add(FieldBody);   
                indexWriter.addDocument(document);   
            }    
        }    
         // optimize()方法是对索引进行优化    
         indexWriter.optimize();   
        indexWriter.close();   
           
         // 测试一下索引的时间    
          long  endTime  =   new  Date().getTime();   
        System.out   
                .println( " 这花费了 "   
                         +  (endTime  -  startTime)   
                         +   "  毫秒来把文档增加到索引里面去! "   
                         +  fileDir.getPath());   
    }    
  
     public   static  String FileReaderAll(String FileName, String charset)   
             throws  IOException   {   
        BufferedReader reader  =   new  BufferedReader( new  InputStreamReader(   
                 new  FileInputStream(FileName), charset));   
        String line  =   new  String();   
        String temp  =   new  String();   
           
         while  ((line  =  reader.readLine())  !=   null )   {   
            temp  +=  line + "\n";   
        }    
        reader.close();   
         return  temp;   
    }    
}

查询:

package  com.pccw;   
  
 import  java.io.IOException;   
  
 import  org.apache.lucene.analysis.Analyzer;   
 import  org.apache.lucene.analysis.standard.StandardAnalyzer;   
 import  org.apache.lucene.queryParser.ParseException;   
 import  org.apache.lucene.queryParser.QueryParser;   
 import  org.apache.lucene.search.Hits;   
 import  org.apache.lucene.search.IndexSearcher;   
 import  org.apache.lucene.search.Query;   
  
 public   class  TestQuery   {   
     public   static   void  main(String[] args)  throws  IOException, ParseException   {   
        Hits hits  =   null ;   
        String queryString  =   "中华" ;   
        Query query  =   null ;   
        IndexSearcher searcher  =   new  IndexSearcher( " c:\\index " );   
  
        Analyzer analyzer  =   new  StandardAnalyzer();   
         try    {   
            QueryParser qp  =   new  QueryParser( " body " , analyzer);   
            query  =  qp.parse(queryString);   
        }   catch  (ParseException e)   {   
        }    
         if  (searcher  !=   null )   {   
            hits  =  searcher.search(query);   
             if  (hits.length()  >   0 )   {   
                System.out.println( " 找到: "   +  hits.length()  +   "  个结果! " );   
            }    
        }    
    }  
  
}

分享到：

SQL排重并获得排重后最新记录 | txt读取字符串（为实习生作的例子）

2008-01-31 10:05
浏览 6559
评论(4)
查看更多

4 楼 sunzongbao2007 2011-03-04

1G的文件必死无疑

3 楼凤凰山 2009-04-19

非常感谢分享，呵呵

2 楼 eonn 2009-04-08

确实讲的不错，蛮详细的

1 楼 cjc19762338 2008-12-28

好。多谢共享~~

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Lucene全文检索样例（解决大文本建索引）

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Lucene全文检索样例（解决大文本建索引）

评论

发表评论

相关推荐

txt读取字符串（为实习生作的例子）

关于Lucene索引合并解决方法

Spring 2.0 AOP 与事务配置

关于Soeckt 流操作的数据转换工具方法

Java socket开发实例入门

不使用webwork标签直接用Jsp取得Action中的值

23种模式趣味解释

关于领域逻辑的三个主要模式

又谈领域模型

简单Socket编程,来理解Socket

[JMX一步步来] 9、基于JBoss来写MBean

[JMX一步步来] 8、编写程序来连接MBean

[JMX一步步来] 7、用JDK5.0的JConsole来连接MBean

[JMX一步步来] 6、模型Bean：Model Bean

[JMX一步步来] 5、用Apache的commons-modeler来辅助开发JMX

[JMX一步步来] 4、动态MBean：DynamicMBean

[JMX一步步来] 3、Notification的使用

[JMX一步步来] 2、JMX简介

[JMX一步步来] 1、JMX的Hello World

JAVA RMI 实例

最近访客更多访客>>

[JMX一步步来]　9、基于JBoss来写MBean

[JMX一步步来]　8、编写程序来连接MBean

[JMX一步步来]　7、用JDK5.0的JConsole来连接MBean

[JMX一步步来]　6、模型Bean：Model Bean

[JMX一步步来]　5、用Apache的commons-modeler来辅助开发JMX

[JMX一步步来]　4、动态MBean：DynamicMBean

[JMX一步步来]　3、Notification的使用

[JMX一步步来]　2、JMX简介

[JMX一步步来]　1、JMX的Hello World