Summary:
1) In last chapter, we've introduced TermQuery, NumericQuery and TermRangeQuery.
2) In this chapter, we will introduce PrefixQuery/WildcardQuery/BoolenQuery/PhaseQuery/FuzzyQuery.
1) Introduction to PrefixQuery
public void searchByPrefix(String fieldName, String fieldValue, int resultSize) { IndexSearcher indexSearcher = getSearcher(); Query query = new PrefixQuery(new Term(fieldName, fieldValue)); try { TopDocs tds = indexSearcher.search(query, resultSize); Document document = null; for (ScoreDoc sd : tds.scoreDocs) { document = indexSearcher.doc(sd.doc); System.out.println("id = " + document.get("id") + ", name = " + document.get("name") + ", password = " + document.get("password") + ", gender = " + document.get("gender") + ", score = " + document.get("score")); } } catch (IOException e) { e.printStackTrace(); } finally { try { indexSearcher.close(); } catch (IOException e) { e.printStackTrace(); } } }
@Test public void testSearchByPrefix() { testBuildIndex(); searcherUtil.searchByPrefix("gender", "Ma", 100); }
id = 11, name = Davy, password = Jones, gender = Male, score = 100 id = 22, name = Davy, password = Jones, gender = Male, score = 110 id = 33, name = Jones, password = Davy, gender = Male, score = 120
Comments:
1) Query is case-sensitive. searcherUtil.searchByPrefix("gender", "ma", 100); result set would be empty.
2) The result set would not be re-arranged by field that is searching.
3) When we building index using Field.Index.ANALYZED for the field. These fields would be breaked into several pieces.
Then when we are searching such fileds using prefix, It will try to match the prefix for every piece of this field.
2) Introduction to WildcardQuery
public void searchByWildcard(String fieldName, String fieldValue, int resultSize) { IndexSearcher indexSearcher = getSearcher(); Query query = new WildcardQuery(new Term(fieldName, fieldValue)); try { TopDocs tds = indexSearcher.search(query, resultSize); Document document = null; for (ScoreDoc sd : tds.scoreDocs) { document = indexSearcher.doc(sd.doc); System.out.println("id = " + document.get("id") + ", name = " + document.get("name") + ", password = " + document.get("password") + ", gender = " + document.get("gender") + ", score = " + document.get("score")); } } catch (IOException e) { e.printStackTrace(); } finally { try { indexSearcher.close(); } catch (IOException e) { e.printStackTrace(); } } }
@Test public void testSearchByWildcard() { testBuildIndex(); searcherUtil.searchByWildcard("gender", "M*", 100); }
id = 11, name = Davy, password = Jones, gender = Male, score = 100 id = 22, name = Davy, password = Jones, gender = Male, score = 110 id = 33, name = Jones, password = Davy, gender = Male, score = 120
Comments:
1) We can use wildcard to search for matching.
2) For more information of wildcard: http://en.wikipedia.org/wiki/Wildcard_character
3) Introduction to BooleanQuery
1) BooleanQuery can be used for multiple search constraints (To combine multiple sub-query using logic operator : AND, OR, NOT)
public void searchByBooleanQuery(int resultSize) { IndexSearcher indexSearcher = getSearcher(); BooleanQuery booleanQuery = new BooleanQuery(); booleanQuery.add(new TermQuery(new Term("gender", "Male")), Occur.MUST); booleanQuery.add(new TermQuery(new Term("name", "Davy")), Occur.MUST); try { TopDocs tds = indexSearcher.search(booleanQuery, resultSize); Document document = null; for (ScoreDoc sd : tds.scoreDocs) { document = indexSearcher.doc(sd.doc); System.out.println("id = " + document.get("id") + ", name = " + document.get("name") + ", password = " + document.get("password") + ", gender = " + document.get("gender") + ", score = " + document.get("score")); } } catch (IOException e) { e.printStackTrace(); } finally { try { indexSearcher.close(); } catch (IOException e) { e.printStackTrace(); } } }
2) But when we want to add constrains on NumericField like "score", how can we achieve this?
3) Pay attention to Occur.MUST, Occur.MUST_NOT, Occur.SHOULD.
4) Introduction to PhraseQuery --> Is only applicable for fields that has Field.Index.ANALYZED attribute.
1) Is very convenient for English phrase searching. But not applicable for Chinese phrase searching.
2) When we use Field.Index.ANALYZED when building index, all the uppercase alphabets will be translated into lowercase.
So when we execute query, we have to use lowercase in order to get content. Or else, we may get an empty result set.
private void testBuildIndex() { List<Student> studentList = new ArrayList<Student>(); Student student = new Student("11", "Davy", "Jones", "Male aaa Female", 100); studentList.add(student); student = new Student("22", "Davy", "Jones", "Male bbb Male", 110); studentList.add(student); student = new Student("33", "Jones", "Davy", "Male ccc Female", 120); studentList.add(student); student = new Student("44", "Calyp", "Jones", "Female ddd Male", 130); studentList.add(student); student = new Student("55", "Pso", "Caly", "Female eee Female", 140); studentList.add(student); searcherUtil.buildIndex(studentList); }
public void buildIndex(List<Student> studentList) { IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35, new SimpleAnalyzer(Version.LUCENE_35)); IndexWriter writer = null; Document doc = null; try { writer = new IndexWriter(directory, config); for (Student student : studentList) { doc = new Document(); doc.add(new Field("id", student.getId(), Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("name", student.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("password", student.getPassword(), Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("gender", student.getGender(), Field.Store.YES, Field.Index.ANALYZED)); doc.add(new NumericField("score", Field.Store.YES, true) .setIntValue(student.getScore())); writer.addDocument(doc); } } catch (CorruptIndexException e) { e.printStackTrace(); } catch (LockObtainFailedException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { try { writer.close(); } catch (CorruptIndexException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } }
public void searchByPhrase(String fieldName, String startFieldValue, String endFieldValue, int slop, int resultSize) { IndexSearcher indexSearcher = getSearcher(); PhraseQuery query = new PhraseQuery(); query.setSlop(slop); query.add(new Term(fieldName, startFieldValue)); query.add(new Term(fieldName, endFieldValue)); try { TopDocs tds = indexSearcher.search(query, resultSize); Document document = null; for (ScoreDoc sd : tds.scoreDocs) { document = indexSearcher.doc(sd.doc); System.out.println("id = " + document.get("id") + ", name = " + document.get("name") + ", password = " + document.get("password") + ", gender = " + document.get("gender") + ", score = " + document.get("score")); } } catch (IOException e) { e.printStackTrace(); } finally { try { indexSearcher.close(); } catch (IOException e) { e.printStackTrace(); } } }
@Test public void testSearchByPhrase() { testBuildIndex(); searcherUtil.searchByPhrase("gender", "male", "female", 1, 100); }
id = 22, name = Davy, password = Jones, gender = Male bbb Male, score = 110 id = 11, name = Davy, password = Jones, gender = Male aaa Female, score = 100
1) When we using "searcherUtil.searchByPhrase("gender", "Male", "Female", 1, 100);" we'll get an empty result set.
2) We have to make "gender" analyzed in order to use PhraseQuery for this field.
3) Slop means the number of words between startFieldValue and endFieldValue.
4) Analyzer use "space" as delimiter when analyzing phrase.
5) Introduction to FuzzyQuery
1) What's the difference between FuzzyQuery and WildcardQuery?
1) WildcardQuery supports wildcard character when executing query.
2) FuzzyQuery doesn't support wildcard character, it uses auto-replacement mechanism instead.
When search for "Female" we may use "Femall" or "Bemale" or "Famale" instead.
public void searchByFuzzy(String fieldName, String fieldValue, int resultSize) { IndexSearcher indexSearcher = getSearcher(); Query query = new FuzzyQuery(new Term(fieldName, fieldValue)); try { TopDocs tds = indexSearcher.search(query, resultSize); Document document = null; for (ScoreDoc sd : tds.scoreDocs) { document = indexSearcher.doc(sd.doc); System.out.println("id = " + document.get("id") + ", name = " + document.get("name") + ", password = " + document.get("password") + ", gender = " + document.get("gender") + ", score = " + document.get("score")); } } catch (IOException e) { e.printStackTrace(); } finally { try { indexSearcher.close(); } catch (IOException e) { e.printStackTrace(); } } }
@Test public void testSearchByFuzzy() { testBuildIndex(); searcherUtil.searchByFuzzy("gender", "Femall", 100); System.out.println("======================================"); searcherUtil.searchByFuzzy("gender", "Bemale", 100); System.out.println("======================================"); searcherUtil.searchByFuzzy("gender", "Femile", 100); }
id = 44, name = Calyp, password = Jones, gender = Female, score = 130 id = 55, name = Pso, password = Caly, gender = Female, score = 140 ====================================== id = 44, name = Calyp, password = Jones, gender = Female, score = 130 id = 55, name = Pso, password = Caly, gender = Female, score = 140 ====================================== id = 44, name = Calyp, password = Jones, gender = Female, score = 130 id = 55, name = Pso, password = Caly, gender = Female, score = 140
Summary:
1) FuzzyQuery and WildcardQuery should be used as little as possible.
Because the algorithm of doing this is more complex than precise searching.
Cost is high.
Doesn't support Chinese.
相关推荐
赠送jar包:lucene-core-7.7.0.jar; 赠送原API文档:lucene-core-7.7.0-javadoc.jar; 赠送源代码:lucene-core-7.7.0-sources.jar; 赠送Maven依赖信息文件:lucene-core-7.7.0.pom; 包含翻译后的API文档:lucene...
赠送jar包:lucene-analyzers-smartcn-7.7.0.jar; 赠送原API文档:lucene-analyzers-smartcn-7.7.0-javadoc.jar; 赠送源代码:lucene-analyzers-smartcn-7.7.0-sources.jar; 赠送Maven依赖信息文件:lucene-...
赠送jar包:lucene-analyzers-common-6.6.0.jar; 赠送原API文档:lucene-analyzers-common-6.6.0-javadoc.jar; 赠送源代码:lucene-analyzers-common-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-...
赠送jar包:lucene-backward-codecs-7.3.1.jar; 赠送原API文档:lucene-backward-codecs-7.3.1-javadoc.jar; 赠送源代码:lucene-backward-codecs-7.3.1-sources.jar; 赠送Maven依赖信息文件:lucene-backward-...
赠送jar包:lucene-spatial3d-6.6.0.jar; 赠送原API文档:lucene-spatial3d-6.6.0-javadoc.jar; 赠送源代码:lucene-spatial3d-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-6.6.0.pom; 包含...
赠送jar包:lucene-analyzers-smartcn-7.7.0.jar; 赠送原API文档:lucene-analyzers-smartcn-7.7.0-javadoc.jar; 赠送源代码:lucene-analyzers-smartcn-7.7.0-sources.jar; 赠送Maven依赖信息文件:lucene-...
赠送jar包:lucene-spatial-extras-7.3.1.jar; 赠送原API文档:lucene-spatial-extras-7.3.1-javadoc.jar; 赠送源代码:lucene-spatial-extras-7.3.1-sources.jar; 赠送Maven依赖信息文件:lucene-spatial-extras...
赠送jar包:lucene-backward-codecs-6.6.0.jar; 赠送原API文档:lucene-backward-codecs-6.6.0-javadoc.jar; 赠送源代码:lucene-backward-codecs-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-backward-...
赠送jar包:lucene-spatial-extras-7.2.1.jar; 赠送原API文档:lucene-spatial-extras-7.2.1-javadoc.jar; 赠送源代码:lucene-spatial-extras-7.2.1-sources.jar; 赠送Maven依赖信息文件:lucene-spatial-extras...
赠送jar包:lucene-spatial-extras-6.6.0.jar; 赠送原API文档:lucene-spatial-extras-6.6.0-javadoc.jar; 赠送源代码:lucene-spatial-extras-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial-extras...
赠送jar包:lucene-spatial3d-7.2.1.jar; 赠送原API文档:lucene-spatial3d-7.2.1-javadoc.jar; 赠送源代码:lucene-spatial3d-7.2.1-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-7.2.1.pom; 包含...
赠送jar包:lucene-spatial3d-7.3.1.jar; 赠送原API文档:lucene-spatial3d-7.3.1-javadoc.jar; 赠送源代码:lucene-spatial3d-7.3.1-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-7.3.1.pom; 包含...
赠送jar包:lucene-backward-codecs-6.6.0.jar; 赠送原API文档:lucene-backward-codecs-6.6.0-javadoc.jar; 赠送源代码:lucene-backward-codecs-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-backward-...
赠送jar包:lucene-backward-codecs-7.2.1.jar; 赠送原API文档:lucene-backward-codecs-7.2.1-javadoc.jar; 赠送源代码:lucene-backward-codecs-7.2.1-sources.jar; 赠送Maven依赖信息文件:lucene-backward-...
赠送jar包:lucene-spatial3d-6.6.0.jar; 赠送原API文档:lucene-spatial3d-6.6.0-javadoc.jar; 赠送源代码:lucene-spatial3d-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-6.6.0.pom; 包含...
赠送jar包:lucene-spatial3d-7.7.0.jar; 赠送原API文档:lucene-spatial3d-7.7.0-javadoc.jar; 赠送源代码:lucene-spatial3d-7.7.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-7.7.0.pom; 包含...
赠送jar包:lucene-spatial3d-7.3.1.jar; 赠送原API文档:lucene-spatial3d-7.3.1-javadoc.jar; 赠送源代码:lucene-spatial3d-7.3.1-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-7.3.1.pom; 包含...
赠送jar包:lucene-spatial-extras-7.7.0.jar; 赠送原API文档:lucene-spatial-extras-7.7.0-javadoc.jar; 赠送源代码:lucene-spatial-extras-7.7.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial-extras...
赠送jar包:lucene-backward-codecs-7.7.0.jar; 赠送原API文档:lucene-backward-codecs-7.7.0-javadoc.jar; 赠送源代码:lucene-backward-codecs-7.7.0-sources.jar; 赠送Maven依赖信息文件:lucene-backward-...
赠送jar包:lucene-analyzers-common-7.7.0.jar; 赠送原API文档:lucene-analyzers-common-7.7.0-javadoc.jar; 赠送源代码:lucene-analyzers-common-7.7.0-sources.jar; 赠送Maven依赖信息文件:lucene-...