【Lucene】更合理地使用Document和Field

Tonyguxu

浏览: 281272 次
性别:
来自: 北京

最近访客更多访客>>

greemranqq

1q2w3e4r11q

aaa2672829611

xld800

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2012-06 ( 13)
2012-05 ( 28)
2012-04 ( 20)
更多存档...

博客分类：

writer = ...; //#1
PreparedStatement pstmt = conn.prepareStatement(selectSql);
ResultSet	rs = pstmt.executeQuery();
Document doc = null;
while (rs.next()) {
	doc = new Document(); //#2
	doc.add(new Field(ConstantsUtil.ROW_ID, rs.getString("rowid"), Field.Store.YES,Field.Index.UN_TOKENIZED)); //#3
	doc.add(new Field(ConstantsUtil.FD_COMMAND_ID, String.valueOf(rs.getLong(ConstantsUtil.DB_COMMAND_ID)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
	if (rs.getString(ConstantsUtil.DB_DEST_ID) != null)
		doc.add(new Field(ConstantsUtil.FD_DEST_ID, rs.getString(ConstantsUtil.DB_DEST_ID), Field.Store.YES,
								Field.Index.TOKENIZED));
	if (rs.getString(ConstantsUtil.DB_SRC_ID) != null)
		doc.add(new Field(ConstantsUtil.FD_SRC_ID, rs.getString(ConstantsUtil.DB_SRC_ID), Field.Store.YES,
								Field.Index.TOKENIZED));
	doc.add(new Field(ConstantsUtil.FD_UP_MSG_ID, String.valueOf(rs.getLong(ConstantsUtil.DB_UP_MSG_ID)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
	doc.add(new Field(ConstantsUtil.FD_CREATED_DATE, DateTools.dateToString(rs.getTimestamp(ConstantsUtil.DB_CREATED_DATE), DateTools.Resolution.MINUTE), Field.Store.YES,
Field.Index.UN_TOKENIZED));
	if (rs.getString(ConstantsUtil.DB_STATION_ID) != null)
		doc.add(new Field(ConstantsUtil.FD_STATION_ID, rs.getString(ConstantsUtil.DB_STATION_ID),Field.Store.YES, Field.Index.UN_TOKENIZED));
	writer.addDocument(doc); //#4
}

以上设计、编码存在一些问题：

1.对于ResultSet的一行就实例一个Document。How to make indexing faster 建议重用Document 和 Field实例。 ——性能？

2.数据库一个字段对应一个Field，简单地将需要的字段对应成Field然后 add到Document里（没有理解Docuemt、Field、全文检索，lucene里的Field和数据库中的字段是不是一样的？ ），并且Field的值也直接来自数据库中的值

如果以后需要把数据库中其他字段的值也加入到索引里，该怎么做？按上面的思路，只能把需要的字段构造相应的Field然后add到Document里，需要修改这里的代码，增加doc.add(field) 。 ——修改：灵活性，增加新需求：可扩展性？

并且如果构造一个Filed的value需要在从数据库取出的原始值基础上改造（比如截取数字的部分值）or 新需求需要修改原先的获得值的方法，还是需要对上面代码做修改。 ——修改灵活性

lucene里的Field和数据库中的字段是不是一样的？

不是。

Re-use Document and Field instances

http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/lucene-java/HowTo 写道

Re-use Document and Field instances As of Lucene 2.3 there are new setValue(...) methods that allow you to change the value of a Field. This allows you to re-use a single Field instance across many added documents, which can save substantial GC cost. It's best to create a single Document instance, then add multiple Field instances to it, but hold onto these Field instances and re-use them by changing their values for each added document. For example you might have an idField, bodyField, nameField, storedField1, etc. After the document is added, you then directly change the Field values (idField.setValue(...), etc), and then re-add your Document instance.

Note that you cannot re-use a single Field instance within a Document, and, you should not change a Field's value until the Document containing that Field has been added to the index. See Field for details.

基于原有的设计，如果遇到以下问题该如何处理？

1.不想有这么多的Field，即Field不应该是与数据库中的字段一一对应，如content域，想让数据库中若干个字段的值合在一块构成一个content域。

2.如果一个Field的值不能直接拿数据库中的值，而是需要做些处理（可能是格式上的也可能是跟业务有关的）。

3.需求变更：需要修改某个Field值的获取，比如原先是截取某数字前4位，现在想截取前6位。

4.新增字段索引需求：需要对数据库中某字段的值建索引（该字段的值原先不在索引里）。

TODO:结合《java与模式》3.1 软件系统的可维护性来思考以上问题。

如果基于原来的设计，由数据库中一行数据获得某个Field的值，在该类中一个方法里(getValue(bean))处理，然后将处理结果返回。如果需要修改field的值则需要修改方法，如果要增加某个field，则增加document.add(new Field(name,getValue(),xx))同时增加相应的获取value的方法getValue。

重构：Field对象的创建和值的获取

利用一个继承结构负责Field的创建和值的获取。

分享到：

【UML】交互、交互图 | 【Python基础】函数

2012-03-27 09:39
浏览 5458
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论