- 浏览: 136289 次
- 性别:
- 来自: 福建省莆田市
-
最新评论
-
houruiming:
tks for your info which helps m ...
setcontent和setcontentobject用的是同一片内存 -
turingfellow:
in.tftpd -l -s /home/tmp -u ro ...
commands -
turingfellow:
LINUX下的网络设置 ifconfig ,routeLINU ...
commands -
turingfellow:
安装 linux loopbackyum install um ...
commands
文章列表
package org.apache.lucene.util;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apa ...
package org.apache.lucene.analysis;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.FlagsAttribute;
import org.apache.lucene.analysis.tokenattributes.PayloadAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionIncre ...
最近lucene已经更新到lucene 3.0版本了 2.X版本的一些用法已经彻底不在支持了。
下面的例子主要是介绍中文分词器IKAnalyzer的使用和Lucene高亮显示。
lucene 3.x版本中有些2.x方法已经完全被剔除了,这里会捎带一下3.x的用法,当然我这里用的还是2.X的版本。
lucene自带的分词方式对中文分词十分的不友好,基本上可以用惨不忍睹来形容,所以这里推荐使用IKAnalyzer进行中文分词。
IKAnalyzer分词器是一个非常优秀的中文分词器。
下面是官方文档上的介绍
采用了特有的“正向迭代最细粒度切分算法“,具有60万字/秒的高速处理能力。
采用了多子处理器分 ...
/**
*作者:夺天策 百度空间名:刹那剑欣
*转载请说明出处!
*/
这几天完成了我的中文分词算法,就着手把它加入到lucene中去,google,baidu一下,倒是有一些人写的中文分词,和加入的方法,但是那些都是符合他 ...
http://lucene.apache.org/java/3_0_0/fileformats.html#Index File Formats
Index File Formats
Definitions
Inverted Indexing
Types of Fields
Segments
Document Numbers
Overview
File Naming
Summary of File Extensions
Primitive Types
Byte
UInt32
Uint64
VInt
Chars
String
Compound Types
Map& ...
最近在为星网将要上线的商城系统开发搜索功能,要求使用lucene和数据库。由于lucene是完全开源的,所以对于学习与使用lucene的人,这么好的源代码资源一定要看并且利用,只有多看源代码,自身的能力才会提高,lucene使用起来,效率也会更高。
从一个小例子中,可以看出看源代码的好处。
商品搜索时,肯定要使用核心包下的org.apache.lucene.search.Searcher类,而这个类的search()方法有八个,其中有三个是abstract类型,被searcher的子类IndexSearcher所实现,如果只看javadoc的话,可以看到这八个方法分别是:
Java代码
...
chapter?
Vocabulary use in classroom
teaching and textbooks
?.?Introduction
The description of vocabulary use in university contexts is an essential prerequi-
site to the development of effective teaching materials and approaches.There are
many important research questions about word use in universit ...
Chapter 1.Introduction
diagnostic tools for the evaluation of exam prompts,using the corpus analyses as
a baseline.
In the first stage of the project,we constructed the T2K-SWAL Corpus,which
was designed to represent both spoken and written university registers,as well
as the major academic disciplin ...
Chapter 1.Introduction
of register,and register is the‘expression-plane’of genre;register is in turn the
‘content-plane’of language.Lee(2001)surveys the use of these terms,providing
one of the most comprehensive discussions of how they have been used in previous
research(as well as terms like text ty ...
chapter?
Introduction
?.?The student perspective:Language in the university
Students who are beginning university studies face a bewildering range of obstacles
and adjustments,and many of these difficulties involve learning to use language in
new ways.The high school experiences of English-educated s ...
Here are some things to try to speed up the indexing speed of your Lucene application. Please see ImproveSearchingSpeed for how to speed up searching.
Be sure you really need to speed things up. Many of the ideas here are simple to try, but others will necessarily add some complexity to your applic ...
Sometimes Lucene runs amok of bugs in Sun's Java implementation. In certain cases we whittle it down to a small test case, open an issue with Sun, and hopefully Sun fixes it. In other cases we know the bug is in the JRE but we haven't narrowed it enough to open a bug with Sun. Sometimes we can work o ...
using System;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using SF.Snowball.Ext;
using System.Collections.Generic;
using System.Collections;
using OpenNLP.Tools.PosTagger;
namespace Lucene.Net.Analysis.Snowball
{
//词汇的实体类,包括两个属性
public class myEwordEntity
{
pub ...
Parsing? Tokenization? Analysis!
Lucene, indexing and search library, accepts only plain text input.
Parsing
Applications that build their search capabilities upon Lucene may support documents in various formats – HTML, XML, PDF, Word – just to name a few. Lucene does not care about the Parsing of ...
11 This use of
12 Contact the
-atta _hment is identical to its original use in Church's parser (Church 1980).
])ata Consortium, 441 Williams Hall, University of Pennsylvania, Philadelphia
PA 19104-605
e-mail to ldc@unagi.cis.upenn.edu for more information.
326Mitchell P. Marcus et al.
Building a Larg ...