attributesource

博客分类：

lucene

package org.apache.lucene.util; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apa ...

2010-09-02 14:45
浏览 940
评论(0)
分类:企业架构

token

博客分类：

lucene

lucene prototype Apache performance

package org.apache.lucene.analysis; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.FlagsAttribute; import org.apache.lucene.analysis.tokenattributes.PayloadAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncre ...

2010-09-02 14:37
浏览 1077
评论(0)
分类:移动开发

Lucene 3.0 分词 IKAnalyzer

博客分类：

postagger

lucene Apache 算法全文检索搜索引擎

最近lucene已经更新到lucene 3.0版本了 2.X版本的一些用法已经彻底不在支持了。下面的例子主要是介绍中文分词器IKAnalyzer的使用和Lucene高亮显示。 lucene 3.x版本中有些2.x方法已经完全被剔除了，这里会捎带一下3.x的用法，当然我这里用的还是2.X的版本。 lucene自带的分词方式对中文分词十分的不友好，基本上可以用惨不忍睹来形容，所以这里推荐使用IKAnalyzer进行中文分词。 IKAnalyzer分词器是一个非常优秀的中文分词器。下面是官方文档上的介绍采用了特有的“正向迭代最细粒度切分算法“，具有60万字/秒的高速处理能力。采用了多子处理器分 ...

2010-09-02 12:46
浏览 1468
评论(0)
分类:Web前端

改写lucene的Analyzer，添加自己的中文分词系统的方法

博客分类：

lucene

lucene Apache 算法百度 Google

/** *作者：夺天策百度空间名：刹那剑欣 *转载请说明出处！ */ 这几天完成了我的中文分词算法，就着手把它加入到lucene中去，google，baidu一下，倒是有一些人写的中文分词，和加入的方法，但是那些都是符合他 ...

2010-09-02 12:44
浏览 1247
评论(0)
分类:Web前端

Apache Lucene - Index File Formats

博客分类：

lucene

Apache lucene IE Access OS

http://lucene.apache.org/java/3_0_0/fileformats.html#Index File Formats Index File Formats Definitions Inverted Indexing Types of Fields Segments Document Numbers Overview File Naming Summary of File Extensions Primitive Types Byte UInt32 Uint64 VInt Chars String Compound Types Map& ...

2010-09-01 10:34
浏览 969
评论(0)
分类:企业架构

[zz]学习lucene应该多看源代码

博客分类：

lucene

lucene Apache

最近在为星网将要上线的商城系统开发搜索功能，要求使用lucene和数据库。由于lucene是完全开源的，所以对于学习与使用lucene的人，这么好的源代码资源一定要看并且利用，只有多看源代码，自身的能力才会提高，lucene使用起来，效率也会更高。从一个小例子中，可以看出看源代码的好处。商品搜索时，肯定要使用核心包下的org.apache.lucene.search.Searcher类，而这个类的search()方法有八个，其中有三个是abstract类型，被searcher的子类IndexSearcher所实现，如果只看javadoc的话，可以看到这八个方法分别是： Java代码 ...

2010-08-31 14:45
浏览 1089
评论(0)
分类:企业架构

university 4/n (45)

博客分类：

postagger

Social Ant IDEA Access UP

chapter? Vocabulary use in classroom teaching and textbooks ?.?Introduction The description of vocabulary use in university contexts is an essential prerequi- site to the development of effective teaching materials and approaches.There are many important research questions about word use in universit ...

2010-08-24 07:57
浏览 1071
评论(0)
分类:编程语言

university 3/n

博客分类：

postagger

Office VB Social performance UP

Chapter 1.Introduction diagnostic tools for the evaluation of exam prompts,using the corpus analyses as a baseline. In the first stage of the project,we constructed the T2K-SWAL Corpus,which was designed to represent both spoken and written university registers,as well as the major academic disciplin ...

2010-08-24 07:55
浏览 892
评论(0)
分类:企业架构

university 2/n

博客分类：

jade

Social Office AIR 360

Chapter 1.Introduction of register,and register is the‘expression-plane’of genre;register is in turn the ‘content-plane’of language.Lee(2001)surveys the use of these terms,providing one of the most comprehensive discussions of how they have been used in previous research(as well as terms like text ty ...

2010-08-24 07:54
浏览 904
评论(0)
分类:企业架构

university 1/n

博客分类：

jade

Social Office Exchange UP J#

chapter? Introduction ?.?The student perspective:Language in the university Students who are beginning university studies face a bewildering range of obstacles and adjustments,and many of these difficulties involve learning to use language in new ways.The high school experiences of English-educated s ...

2010-08-24 07:53
浏览 954
评论(0)
分类:企业架构

How to make indexing faster

博客分类：

lucene

lucene UP performance OS Cache

Here are some things to try to speed up the indexing speed of your Lucene application. Please see ImproveSearchingSpeed for how to speed up searching. Be sure you really need to speed things up. Many of the ideas here are simple to try, but others will necessarily add some complexity to your applic ...

2010-08-23 09:02
浏览 760
评论(0)
分类:企业架构

Sun Java Bugs that affect lucene

博客分类：

jade

SUN lucene Java Windows performance

Sometimes Lucene runs amok of bugs in Sun's Java implementation. In certain cases we whittle it down to a small test case, open an issue with Sun, and hopefully Sun fixes it. In other cases we know the bug is in the JRE but we haven't narrowed it enough to open a bug with Sun. Sometimes we can work o ...

2010-08-23 08:59
浏览 741
评论(0)
分类:编程语言

Snowball分词

博客分类：

jade

lucene C C++C#J#

using System; using Lucene.Net.Analysis; using Lucene.Net.Analysis.Standard; using SF.Snowball.Ext; using System.Collections.Generic; using System.Collections; using OpenNLP.Tools.PosTagger; namespace Lucene.Net.Analysis.Snowball { //词汇的实体类，包括两个属性 public class myEwordEntity { pub ...

2010-08-22 13:07
浏览 1244
评论(0)
分类:编程语言

lucene analyzer pos

博客分类：

postagger

lucene performance F#Apache Access

Parsing? Tokenization? Analysis! Lucene, indexing and search library, accepts only plain text input. Parsing Applications that build their search capabilities upon Lucene may support documents in various formats – HTML, XML, PDF, Word – just to name a few. Lucene does not care about the Parsing of ...

2010-08-20 07:16
浏览 937
评论(0)
分类:企业架构

penn tree bank 6/6

博客分类：

jade

Scheme AIR IBM

11 This use of 12 Contact the -atta _hment is identical to its original use in Church's parser (Church 1980). ])ata Consortium, 441 Williams Hall, University of Pennsylvania, Philadelphia PA 19104-605 e-mail to ldc@unagi.cis.upenn.edu for more information. 326Mitchell P. Marcus et al. Building a Larg ...

2010-08-20 07:09
浏览 920
评论(0)
分类:企业架构

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

attributesource

token

Lucene 3.0 分词 IKAnalyzer

改写lucene的Analyzer，添加自己的中文分词系统的方法

Apache Lucene - Index File Formats

[zz]学习lucene应该多看源代码

university 4/n (45)

university 3/n

university 2/n

university 1/n

How to make indexing faster

Sun Java Bugs that affect lucene

Snowball分词

lucene analyzer pos

penn tree bank 6/6

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

最近访客更多访客>>