如何利用 lucene score机制来实现关键字竞价排名

wutao8818

浏览: 620373 次
性别:
来自: 杭州

最近访客更多访客>>

KevinTeng

malson

rapin

shi007

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

lucene Apache 算法 Flash junit

lucene内置的排序方式是按照一定算法的score来排列,document的boost能影响文档的权重，但与关键字并无法实现直接的关系。

比如说对文档一加了100的boost。那么无论搜什么关键字，这个文档都有可能出现在最前面。这个结果并不是我们期望的效果。

有一个办法可以实现按照关键字的排序。

例如：关键字玻璃

在这个购买了关键字的公司的相关信息document创建lucene索引时。需要人为的添加一个字段.这个数据与用户发付费有关。而与真实信息无关。

Field field=new Field("玻璃","玻璃玻璃玻璃")
field name 为什么这样写,在下面搜索的是就知道了。
document.add(field);

根据lucene score 算法。重复2次的score最高。不过这个score的分值与ANALYZER有关系，不同的ANALYZER重复次数不一样。根据lucene score的算法明显和重复次数不是线性的。

这里是一些测试数据

当关键字是n个字长。ANALYZER＝WhitespaceAnalyzer
按照score分值，从高到低关键字的重复频率应该为 2 5 17 8 11 6 16 20 10 15 19 7 18 14 3 9 4 13 12
如果用WhitespaceAnalyzer，这个频率是固定。不管n个字是什么.
都是这样排列。



import java.io.IOException;

import junit.framework.TestCase;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.store.RAMDirectory;

import com.linkmd.search.constants.XmlConstants;

/**
 * <b>按照购买的关键字来排序</b>
 * @since 2009-5-7
 * @version 1.0
 */
public class KeywordSearchTest extends TestCase {

	private static final Analyzer ANALYZER = new WhitespaceAnalyzer();
	private static final String[] PAY_KEY ={ "芯片","大脑","大菠萝","家乐福"};
	  private static final int comNumber=20; 
	private static final int[][] result=new int[2][comNumber];
                                        
	/**
	 * @throws Exception
	 */
	@SuppressWarnings( { "static-access", "deprecation" })
	public void testSearch2() throws Exception {
		
		for(int y=0,len=PAY_KEY.length;y<len;y++){
			System.out.println("PAY_KEY.length:"+PAY_KEY[y].length());
		test(y);
		System.out.println();
		}
	}

	private void test(int y) throws CorruptIndexException,
			LockObtainFailedException, IOException, ParseException {
		Directory ram = new RAMDirectory();
		IndexWriter writer = new IndexWriter(ram, ANALYZER, true,
				MaxFieldLength.UNLIMITED);

		for (int i = 0; i <comNumber; i++) {
			Document doc = new Document();
			Field field = new Field("companyName", "公司" + i,
					Field.Store.YES, Field.Index.NOT_ANALYZED);
			doc.add(field);

			/**
			 * 付费字段 关键字设置
			 */
			StringBuilder key = new StringBuilder();
			String finalKey = "";
			if (i % 1 == 0) {
				for (int x = 0; x < i; x++) {
					key.append(this.PAY_KEY[y]);
					key.append(" ");
				}
				finalKey = key.toString();
			}
			Field payField = new Field("pay_key_word", finalKey,
					Field.Store.YES, Field.Index.ANALYZED);
			doc.add(payField);
			writer.addDocument(doc);
			// System.out.println(doc);
		}
		writer.close();

		IndexSearcher searcher = new IndexSearcher(ram);
		QueryParser p = new QueryParser("pay_key_word", ANALYZER);
		Query q = p.parse(this.PAY_KEY[y]);
		Hits hits = searcher.search(q);
		System.out.println("total:"+hits.length());
		for (int i = 0, size = hits.length(); i < size; i++) {
			Document doc = hits.doc(i);
			int length = doc.getField("pay_key_word").toString().split(" ").length;
			System.out.print(length+" ");
		}
		searcher.close();
	}

	 
}

现在索引已经建好了。搜索的时候。如果某一个用户输入了玻璃。应该先去数据库查一下，是否这个关键字已经被购买。如果购买了，那么在查询的时刻，我们需要在query的构造上加入
"玻璃":"玻璃"。然后sort数组加上 "玻璃",然后是普通的字段，比如 info 字段的玻璃这样查询的顺序就应该按照我们的意思来排序了。

我需要先试试效果。应该可行。

不过其实要实现关键字还有很多办法。当然直接结合数据库也可以实现。

社区购物彤瑶诗佳西兴店正品名牌全场四折

分享到：

发掘 iGoogle | IndexSearcher

2009-05-08 09:46
浏览 3456
评论(1)
分类:互联网
查看更多

1 楼 chenlb 2009-08-19

觉得有点文不对题，重要的没讲多少。还以为是扩展 scorer

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论