- 浏览: 118905 次
- 性别:
- 来自: 宁波
文章分类
最新评论
lucene 可以自己建立操作日志,刚在源码中发现,给个我刚建的日志文件:
IFD [Wed Dec 22 22:08:20 CST 2010; main]: setInfoStream deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@15dfd77 IW 0 [Wed Dec 22 22:08:20 CST 2010; main]: setInfoStream: dir=org.apache.lucene.store.SimpleFSDirectory@G:\package\lucene_test_dir lockFactory=org.apache.lucene.store.NativeFSLockFactory@1027b4d mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@c55e36 mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@1ac3c08 ramBufferSizeMB=16.0 maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=10000 index= maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens maxFieldLength 10000 reached for field contents, ignoring following tokens IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: optimize: index now IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: flush: now pause all indexing threads IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: flush: segment=_0 docStoreSegment=_0 docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=false numDocs=104 numBufDelTerms=0 IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: index before flush IW 0 [Wed Dec 22 22:08:23 CST 2010; main]: DW: flush postings as segment _0 numDocs=104 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: DW: oldRAMSize=2619392 newFlushedSize=1740286 docs/MB=62.663 new/old=66.439% IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flushedFiles=[_0.nrm, _0.tis, _0.fnm, _0.tii, _0.frq, _0.prx] IFD [Wed Dec 22 22:08:24 CST 2010; main]: now checkpoint "segments_1" [1 segments ; isCommit = false] IFD [Wed Dec 22 22:08:24 CST 2010; main]: now checkpoint "segments_1" [1 segments ; isCommit = false] IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: LMP: findMerges: 1 segments IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: LMP: level 6.2247195 to 6.2380013: 1 segments IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: now merge IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: index: _0:C104->_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: no more merges pending; now return IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: now merge IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: index: _0:C104->_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: no more merges pending; now return IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now flush at close IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flush: now pause all indexing threads IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flush: segment=null docStoreSegment=_0 docStoreOffset=104 flushDocs=false flushDeletes=true flushDocStores=true numDocs=0 numBufDelTerms=0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: index before flush _0:C104->_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flush shared docStore segment _0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flushDocStores segment=_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: closeDocStores segment=_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: DW: closeDocStore: 2 files to flush to segment _0 numDocs=104 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: flushDocStores files=[_0.fdt, _0.fdx] IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: now merge IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: index: _0:C104->_0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: CMS: no more merges pending; now return IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now call final commit() IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: startCommit(): start sizeInBytes=0 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: startCommit index=_0:C104->_0 changeCount=3 IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.nrm IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.tis IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.fnm IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.tii IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.frq IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.fdx IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.prx IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: now sync _0.fdt IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: done all syncs IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: commit: pendingCommit != null IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: commit: wrote segments file "segments_2" IFD [Wed Dec 22 22:08:24 CST 2010; main]: now checkpoint "segments_2" [1 segments ; isCommit = true] IFD [Wed Dec 22 22:08:24 CST 2010; main]: deleteCommits: now decRef commit "segments_1" IFD [Wed Dec 22 22:08:24 CST 2010; main]: delete "segments_1" IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: commit: done IW 0 [Wed Dec 22 22:08:24 CST 2010; main]: at close: _0:C104->_0
接下来是我的建立索引的类,代码大多借鉴lucene自带的demo
indexer类用来建立索引:
package my.firstest.copy; import java.io.File; import java.io.FileNotFoundException; import java.io.IOException; import java.io.PrintStream; import java.util.Date; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class Indexer { private static File INDEX_DIR = new File("G:/package/lucene_test_dir"); private static final File docDir = new File("G:/package/lucene_test_docs"); public static void main(String[] args) throws Exception { if (!docDir.exists() || !docDir.canRead()) { System.out.println("索引的文件不存在!"); System.exit(1); } int fileCount=INDEX_DIR.list().length; if(fileCount!=0){ System.out.println("The old files is existed, begin to delete these files"); File[] files=INDEX_DIR.listFiles(); for(int i=0;i<fileCount;i++){ files[i].delete(); System.out.println("File "+files[i].getAbsolutePath()+"is deleted!"); } } Date start = new Date(); IndexWriter writer = new IndexWriter(FSDirectory.open(INDEX_DIR), new StandardAnalyzer(Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.LIMITED); writer.setUseCompoundFile(false); //writer.setMergeFactor(2); writer.setInfoStream(new PrintStream(new File("G:/package/lucene_test_log/log.txt"))); System.out.println("MergeFactor -> "+writer.getMergeFactor()); System.out.println("maxMergeDocs -> "+writer.getMergeFactor()); indexDocs(writer, docDir); writer.optimize(); writer.close(); Date end = new Date(); System.out.println("takes "+(end.getTime() - start.getTime()) + "milliseconds"); } protected static void indexDocs(IndexWriter writer, File file) throws IOException { if (file.canRead()) { if (file.isDirectory()) { String[] files = file.list(); if (files != null) { for (int i = 0; i < files.length; i++) { indexDocs(writer, new File(file, files[i])); } } } else { System.out.println("adding " + file); try { writer.addDocument(FileDocument.Document(file)); } catch (FileNotFoundException fnfe) { ; } } } } }
FileDocument:
package my.firstest.copy; import java.io.File; import java.io.FileReader; import org.apache.lucene.document.DateTools; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; public class FileDocument { public static Document Document(File f) throws java.io.FileNotFoundException { Document doc = new Document(); doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("modified", DateTools.timeToString(f.lastModified(), DateTools.Resolution.MINUTE), Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("contents", new FileReader(f))); return doc; } private FileDocument() { } }
关键就是writer.setInfoStream(new PrintStream(new File("G:/package/lucene_test_log/log.txt")));
在lucene的代码里,很多地方多充斥着类似:
if (infoStream != null) { message("init: hit exception on init; releasing write lock"); }
者个message方法时:
public void message(String message) { if (infoStream != null) infoStream.println("IW " + messageID + " [" + new Date() + "; " + Thread.currentThread().getName() + "]: " + message); }
这里的infoStream是IndexWriter的一个属性:
private PrintStream infoStream = null;
这个属性不去设置它是为null的
可以用writer.setInfoStream(PrintStream infoStream);这个方法去设置它
设置了以后日志信息就会自动写入到自己设的文件中去了.
相关推荐
**Lucene.net 知识点详解** Lucene.net 是 Apache Lucene 的 .NET 版本,...以上就是关于 Lucene.net 建立索引、检索及分页的关键知识点。理解并熟练运用这些概念,可以帮助开发者构建高效、灵活的全文搜索解决方案。
- Lucene不仅用于搜索引擎,还可以应用于任何需要全文搜索的场景,如日志分析、知识图谱、邮件搜索等。 - 其他相关框架,如Solr和Elasticsearch,是在Lucene基础上构建的,提供了更高级的服务,如分布式搜索、集群...
4. 建立索引(Indexing):将分析后的词项与文档关联,形成倒排索引。 三、查询与搜索 1. 查询解析(Query Parsing):用户输入的查询字符串被解析成查询对象,支持布尔操作符(AND、OR、NOT)、短语查询、模糊查询...
1. **全文检索基础**:书中首先介绍了全文检索的基本概念,包括倒排索引、TF-IDF、布尔查询等,帮助读者建立起对全文检索的理解。 2. **Lucene API**:详细讲解了Lucene的各个核心组件,如Analyzer(分析器)用于...
以一个简单的博客搜索引擎为例,可以使用Lucene 8.6.1建立博客文章的索引,通过查询解析器处理用户输入的关键词,然后利用搜索功能找到相关度最高的文章。在这个过程中,分析器将处理文章内容,提取关键词;索引构建...
- **日志分析**: 在日志分析系统中,Lucene 可帮助快速定位和分析关键事件。 总之,Lucene 4.0.0 是一个强大且灵活的全文检索引擎工具包,不仅适用于Java环境,还可以通过其他语言的封装库(如Python的PyLucene,C#...
通过这本书,读者可以学习如何使用Lucene进行文本分析、建立倒排索引、执行复杂的查询,并优化搜索性能。 书中详细讲解了以下几个关键知识点: 1. **Lucene基础知识**:介绍Lucene的基本概念,如文档、字段、术语...
面向切面编程则允许开发者定义横切关注点,如日志、事务管理等,从而实现代码的模块化。 **2. SpringMVC** SpringMVC是Spring框架的一个模块,专门用于构建Web应用程序。它遵循Model-View-Controller(MVC)设计...
这些新特性极大地拓宽了Lucene的应用场景,使其不仅适用于传统的文本搜索,还能应用于日志分析、知识图谱等领域。 总之,Windows环境下的Lucene 8.5.2是一个强大且灵活的全文搜索工具,其性能优化和新特性的引入为...
学会如何正确处理这些问题,以及使用Lucene的日志和调试工具,对优化系统性能至关重要。 10. **性能调优**:最后,了解如何监控和调优Lucene的性能,包括索引和查询速度、内存使用情况等,是确保项目稳定运行的关键...
Lucene的核心理念是建立索引,以便快速查找文档中的信息。它通过将文本数据转换成便于搜索的结构(如倒排索引)来实现这一点。这种架构使得Lucene能够处理大量数据,并且对用户输入的查询进行高速匹配。 2. **查询...
Lucene的核心功能包括分词、建立倒排索引、查询解析和结果排序等。开发者可以利用Lucene快速实现自己的全文搜索引擎,但Lucene本身并不提供分布式处理和集群管理。 **Solr** Solr是基于Lucene构建的企业级搜索平台...
【Lucene结合Sql建立索引】是将数据库中的数据通过Lucene这个全文搜索引擎进行索引,以便快速查询和检索的一种技术。在这个商业源码中,我们可能会看到如何使用Java编程语言来实现这样一个功能。Lucene是Apache软件...
- **索引**:Lucene通过建立倒排索引来实现快速搜索。它将文档内容拆分成称为"术语"(Term)的单词,并记录每个术语在哪些文档中出现,以及出现的位置。 - **文档**:在Lucene中,文档是信息的基本单位,可以是...
1. **倒排索引**:Lucene的核心机制,将每个文档中出现的词语及其对应的文档位置信息建立索引,使得搜索时能快速定位到包含特定词语的文档。 2. **Analyzer(分析器)**:用于将输入的文本分解成可索引的词元。在...
实际应用中,Lucene.NET常用于网站搜索、日志分析、知识图谱等场景。开发者可以通过API轻松集成到自己的.NET应用程序中,实现复杂的信息检索功能。 总结来说,Lucene.NET是.NET开发者构建全文搜索引擎的利器,其...
首先,Lucene是一个高性能、全文本搜索库,由Java编写,广泛应用于各种搜索应用场景,包括网站搜索、文档检索、日志分析等。其核心功能包括分词、索引和查询处理。源码中的主要模块有: 1. 分词器(Tokenizer):这...
总的来说,使用Apache Lucene构建Java日志搜索系统是一项技术性强且具有挑战性的任务,但通过理解Lucene的API和掌握日志处理技巧,我们可以构建出高效、灵活的解决方案,从而更好地管理和分析大量日志数据,提升系统...
当用户发起搜索请求时,Lucene会利用已建立的索引快速找到匹配的文档,从而返回搜索结果。 在分词组件的选择上,"je-analysis-1.5.3"因其简洁的设计和良好的性能而受到青睐。然而,需要注意的是,版本号较旧的...