`
turingfellow
  • 浏览: 136289 次
  • 性别: Icon_minigender_1
  • 来自: 福建省莆田市
社区版块
存档分类
最新评论
文章列表
package org.apache.lucene.util; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements.  See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apa ...
package org.apache.lucene.analysis; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.FlagsAttribute; import org.apache.lucene.analysis.tokenattributes.PayloadAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncre ...
最近lucene已经更新到lucene 3.0版本了 2.X版本的一些用法已经彻底不在支持了。 下面的例子主要是介绍中文分词器IKAnalyzer的使用和Lucene高亮显示。 lucene 3.x版本中有些2.x方法已经完全被剔除了,这里会捎带一下3.x的用法,当然我这里用的还是2.X的版本。 lucene自带的分词方式对中文分词十分的不友好,基本上可以用惨不忍睹来形容,所以这里推荐使用IKAnalyzer进行中文分词。 IKAnalyzer分词器是一个非常优秀的中文分词器。 下面是官方文档上的介绍 采用了特有的“正向迭代最细粒度切分算法“,具有60万字/秒的高速处理能力。 采用了多子处理器分 ...
/** *作者:夺天策      百度空间名:刹那剑欣 *转载请说明出处!   */     这几天完成了我的中文分词算法,就着手把它加入到lucene中去,google,baidu一下,倒是有一些人写的中文分词,和加入的方法,但是那些都是符合他 ...
http://lucene.apache.org/java/3_0_0/fileformats.html#Index File Formats Index File Formats Definitions Inverted Indexing Types of Fields Segments Document Numbers Overview File Naming Summary of File Extensions Primitive Types Byte UInt32 Uint64 VInt Chars String Compound Types Map& ...
最近在为星网将要上线的商城系统开发搜索功能,要求使用lucene和数据库。由于lucene是完全开源的,所以对于学习与使用lucene的人,这么好的源代码资源一定要看并且利用,只有多看源代码,自身的能力才会提高,lucene使用起来,效率也会更高。 从一个小例子中,可以看出看源代码的好处。 商品搜索时,肯定要使用核心包下的org.apache.lucene.search.Searcher类,而这个类的search()方法有八个,其中有三个是abstract类型,被searcher的子类IndexSearcher所实现,如果只看javadoc的话,可以看到这八个方法分别是: Java代码 ...
chapter? Vocabulary use in classroom teaching and textbooks ?.?Introduction The description of vocabulary use in university contexts is an essential prerequi- site to the development of effective teaching materials and approaches.There are many important research questions about word use in universit ...
Chapter 1.Introduction diagnostic tools for the evaluation of exam prompts,using the corpus analyses as a baseline. In the first stage of the project,we constructed the T2K-SWAL Corpus,which was designed to represent both spoken and written university registers,as well as the major academic disciplin ...
Chapter 1.Introduction of register,and register is the‘expression-plane’of genre;register is in turn the ‘content-plane’of language.Lee(2001)surveys the use of these terms,providing one of the most comprehensive discussions of how they have been used in previous research(as well as terms like text ty ...
chapter? Introduction ?.?The student perspective:Language in the university Students who are beginning university studies face a bewildering range of obstacles and adjustments,and many of these difficulties involve learning to use language in new ways.The high school experiences of English-educated s ...
Here are some things to try to speed up the indexing speed of your Lucene application. Please see ImproveSearchingSpeed for how to speed up searching. Be sure you really need to speed things up. Many of the ideas here are simple to try, but others will necessarily add some complexity to your applic ...
Sometimes Lucene runs amok of bugs in Sun's Java implementation. In certain cases we whittle it down to a small test case, open an issue with Sun, and hopefully Sun fixes it. In other cases we know the bug is in the JRE but we haven't narrowed it enough to open a bug with Sun. Sometimes we can work o ...
using System; using Lucene.Net.Analysis; using Lucene.Net.Analysis.Standard; using SF.Snowball.Ext; using System.Collections.Generic; using System.Collections; using OpenNLP.Tools.PosTagger; namespace Lucene.Net.Analysis.Snowball {      //词汇的实体类,包括两个属性 public class myEwordEntity     {         pub ...
Parsing? Tokenization? Analysis! Lucene, indexing and search library, accepts only plain text input. Parsing Applications that build their search capabilities upon Lucene may support documents in various formats – HTML, XML, PDF, Word – just to name a few. Lucene does not care about the Parsing of ...
11 This use of 12 Contact the -atta _hment is identical to its original use in Church's parser (Church 1980). ])ata Consortium, 441 Williams Hall, University of Pennsylvania, Philadelphia PA 19104-605 e-mail to ldc@unagi.cis.upenn.edu for more information. 326Mitchell P. Marcus et al. Building a Larg ...
Global site tag (gtag.js) - Google Analytics