`

Spatial search with Lucene

阅读更多

At the Hadoop Hackathon this weekend at Sybit, we've worked on a getting tiled images into HDFS and HBase. A side-story was how to search for these tiles based on GPS coordinates. I took the task to see how you can perform queries based on spatial coordinates. That is, searching for locations which are within a specified radius from an origin:


 



With the Lucene-Spatial extension, it's quite easy to get this working and I'd like to share my results with you.

First, the usual setup in pom.xml:

<dependency>
 <groupId>org.apache.lucene</groupId>
 <artifactId>lucene-spatial</artifactId>
 <version>3.0.2</version>
</dependency>
<dependency>
 <groupId>org.apache.lucene</groupId>
 <artifactId>lucene-misc</artifactId>
 <version>3.0.2</version>
</dependency>


This fragment will add Lucene-Core, Lucene-Spatial and the Lucene-Misc modules to the classpath. Don't forget to update the project configuration via Maven > Update Project Configuration when you're using M2Eclipse.

Indexing

Then, let's begin the test case by setting up an example database:

@Test
public void testSpatialSearch() throws Exception {
	Directory dir = new RAMDirectory();
	IndexWriter writer = new IndexWriter(dir,
	  new WhitespaceAnalyzer(),MaxFieldLength.UNLIMITED);
	Examples.createExampleLocations(writer);
	writer.commit();
	writer.close(true);
	...


The Examples class contains a few named GPS locations with their longitude and latitude positions (taken from Wikipedia):

public class Examples {
  public static final Coordinate SYBIT = new Coordinate(47.740688, 8.972429);

  public static void createExampleLocations(IndexWriter w) throws IOException {
	Spatial s = new Spatial();
	s.addLoc(w, "Sybit-Radolfzell", SYBIT);
	s.addLoc(w, "Innovations-Immenstaad", new Coordinate(47.666667, 9.366667));
	s.addLoc(w, "Fino-Radolfzell", new Coordinate(47.736944, 8.969722));
	s.addLoc(w, "Zentrum-Berlin", new Coordinate(52.518611, 13.408056));
}
}



The Spatial class is a wrapper for a Lucene Document. It contains a lot of code, so i'm going to explain it in more detail:

public class Spatial {
  public static final String LAT_FIELD = "lat";
  public static final String LON_FIELD = "lon";
  public static final String TIER_PREFIX_FIELD = "_localTier";
  ...



These constants are used as field names in the Lucene index. Latitude and longitude are obviously for the position of the location. The TIER_PREFIX_FIELD constant is used as a prefix for multiple Document fields. For each "tier", a field is added to the Lucene Document, describing in which "box" the coordinates are located. Now, this sentence may cause some questions to emerge, so I'll explain ..

What is a tier and what is a box?

Spatial search has the problem of checking potentially millions of locations. As this would take a long time to process, the search space is optimized by using a Cartesian Grid. A Cartesian Grid splits the map (of Germany, in this case) into multiple rectangular regions.

What is a Cartesian Grid?

Assume the following map of Germany:



Now, the map is cut into four cells on the first tier. Then each of the cells on this tier are again cut into multiple cells - that's the next tier. This is continued until we reach a level of detail where one cell is approx. 1 mile. The cells are called boxes. The higher the tier, the more boxes exist.

tier = a grid, box = a cell of the grid



Now that we understand how the search space is narrowed down, let's go on with the Spatial.java class file:

public void addLoc(IndexWriter writer, String name, Coordinate coord) {
   Document doc = new Document();
   doc.add(new Field("name", name, Field.Store.YES, Index.ANALYZED));
   doc.add(new Field("metafile", "doc", Store.YES, Index.ANALYZED));
   addSpatialLcnFields(coord, doc);
   writer.addDocument(doc);
}


This code will create a new Lucene Document and add the name of the location, such as "Innovations-Immenstaad". The addSpatialLcnFields() method will add the exact coordinate of the location. In this case, that'd be 47.666667, 9.366667:

private void addSpatialLcnFields(Coordinate coord, Document document) {
  document.add(new Field(LAT_FIELD,
      NumericUtils.doubleToPrefixCoded(coord.getLat()), Field.Store.YES,
      Field.Index.NOT_ANALYZED));
   document.add(new Field(LON_FIELD,
      NumericUtils.doubleToPrefixCoded(coord.getLon()), Field.Store.YES,
      Field.Index.NOT_ANALYZED));
   addCartesianTiers(coord, document);
}


Storing the box for each tier

The interesting part, from a search algorithm point of view, is adding the box of our current coordinate per tier to the search index. That's going to be more complex, but not much:

private double maxMiles = 250, minMiles = 1;
private IProjector projector = new SinusoidalProjector();
private CartesianTierPlotter ctp = new CartesianTierPlotter(0, projector, TIER_PREFIX_FIELD);
// startTier is 14 for 25 miles, 15 for 1 miles in lucene 3.0
private int startTier = ctp.bestFit(maxMiles), endTier = ctp.bestFit(minMiles);

private void addCartesianTiers(Coordinate coord, Document document) {
   for (int tier = startTier; tier <= endTier; tier++) {
      ctp = new CartesianTierPlotter(tier, projector, TIER_PREFIX_FIELD);
      double boxId = ctp.getTierBoxId(coord.getLat(), coord.getLon());
      document.add(new Field(ctp.getTierFieldName(), NumericUtils
         .doubleToPrefixCoded(boxId), Field.Store.YES,
         Field.Index.NOT_ANALYZED_NO_NORMS));
   }
}



This code will iterate through multiple tiers and for each tier, it will generate an ID for the box. In simple terms, this would look like:





Now, as our location "Immenstaad" is near the Lake of Constance, the field "_localTier1" would contain the value "C" (the lower left cell of the yellow grid) and the field "_localTier2" would contain the value "N", which is the second cell at the bottom of the red grid. It will of course go on until it reaches the highest tier with the highest number of total boxes. But for each tier, it will only add the box where the location is, of course.

In reality, the boxId is not a single letter, but a specially encoded double value in the form "00013.00032", which means it's the 13th cell in the horizontal axis and the 32nd cell in the vertical axis. This is due to the nature how Lucene is storing values and saves space. (more about Cartesian Grid in Lucene)

Searching

Now, let's perform an actual query against the database:

   ...
   IndexSearcher searcher = new IndexSearcher(dir);
   {
      List locations = find(searcher, Examples.SYBIT, 5.0d);
      assertEquals(locations.toString(), 2, locations.size());
      assertTrue(locations.contains("Sybit-Radolfzell"));
      assertTrue(locations.contains("Fino-Radolfzell"));
   }
}

private List find(IndexSearcher searcher, Coordinate start,
     double miles)  {

      List result = new ArrayList();
      DistanceQueryBuilder dq = new DistanceQueryBuilder(start.getLat(),
         start.getLon(), miles, Spatial.LAT_FIELD, Spatial.LON_FIELD,
         Spatial.TIER_PREFIX_FIELD, true);

      Query query = new TermQuery(new Term("metafile", "doc"));
      TopDocs hits = searcher.search(dq.getQuery(query), 10);
      for (int i = 0; i < hits.totalHits; i++) {
         Document doc = searcher.doc(hits.scoreDocs[i].doc);
         result.add(doc.get("name"));
      }
      return result;
}



The test case starts the search at Examples.SYBIT, which is a pre-defined Coordinate object from the Examples class. The radius for the distance search is set to 5.0d miles. The expected result of the search are two locations: the Sybit location and the nearby restaurant called Fino (We actually didn't go there for lunch, but that's another story).

The DistanceQueryBuild is part of the Lucene-Spatial contribution and performs queries based on the distance of the start coordinate to all the coordinates in the search space. The search space however is not the whole Lucene index. It is pre-filtered based on a dervied best-fit value. The builder calculates the best matching tier for the search of 5-mile radiuses. When it has identified the tier, it will simply use all boxes within the 5 mile radius of the origin:



This narrows down the search to only a few possible locations - compared to all the locations of Germany or world-wide, this decreases the amount of work to a near minimum.

The last step is to actually calculate the distance between the search origin and all the locations in the boxes:



Finally, after the query has been executed by the IndexSearcher, the results are gathered into a list and returned.

 

http://www.java-community.de/archives/156-Spatial-search-with-Lucene.html

分享到:
评论

相关推荐

    lucene-spatial3d-6.6.0-API文档-中文版.zip

    赠送jar包:lucene-spatial3d-6.6.0.jar; 赠送原API文档:lucene-spatial3d-6.6.0-javadoc.jar; 赠送源代码:lucene-spatial3d-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-6.6.0.pom; 包含...

    lucene-spatial-6.6.0-API文档-中文版.zip

    赠送jar包:lucene-spatial-6.6.0.jar; 赠送原API文档:lucene-spatial-6.6.0-javadoc.jar; 赠送源代码:lucene-spatial-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial-6.6.0.pom; 包含翻译后的API...

    Spatial Analytics with ArcGIS azw3

    Spatial Analytics with ArcGIS 英文azw3 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除

    lucene-spatial3d-7.3.1-API文档-中英对照版.zip

    赠送jar包:lucene-spatial3d-7.3.1.jar; 赠送原API文档:lucene-spatial3d-7.3.1-javadoc.jar; 赠送源代码:lucene-spatial3d-7.3.1-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-7.3.1.pom; 包含...

    lucene-spatial-extras-7.3.1-API文档-中英对照版.zip

    赠送jar包:lucene-spatial-extras-7.3.1.jar; 赠送原API文档:lucene-spatial-extras-7.3.1-javadoc.jar; 赠送源代码:lucene-spatial-extras-7.3.1-sources.jar; 赠送Maven依赖信息文件:lucene-spatial-extras...

    lucene-spatial-extras-7.2.1-API文档-中英对照版.zip

    赠送jar包:lucene-spatial-extras-7.2.1.jar; 赠送原API文档:lucene-spatial-extras-7.2.1-javadoc.jar; 赠送源代码:lucene-spatial-extras-7.2.1-sources.jar; 赠送Maven依赖信息文件:lucene-spatial-extras...

    K-DBSCAN: Identifying Spatial Clusters With Differing Density Levels

    In this paper, we propose a novel density based spatial clustering algorithm called K-DBSCAN with the main focus of identifying clusters of points with similar spatial density. This contrasts with ...

    lucene-spatial-6.6.0-API文档-中英对照版.zip

    赠送jar包:lucene-spatial-6.6.0.jar; 赠送原API文档:lucene-spatial-6.6.0-javadoc.jar; 赠送源代码:lucene-spatial-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial-6.6.0.pom; 包含翻译后的API...

    lucene-spatial-extras-6.6.0-API文档-中英对照版.zip

    赠送jar包:lucene-spatial-extras-6.6.0.jar; 赠送原API文档:lucene-spatial-extras-6.6.0-javadoc.jar; 赠送源代码:lucene-spatial-extras-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial-extras...

    lucene-spatial-4.9.0.jar

    lucene-spatial-4.9.0.jar

    lucene-spatial3d-6.6.0-API文档-中英对照版.zip

    赠送jar包:lucene-spatial3d-6.6.0.jar; 赠送原API文档:lucene-spatial3d-6.6.0-javadoc.jar; 赠送源代码:lucene-spatial3d-6.6.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-6.6.0.pom; 包含...

    lucene-spatial3d-7.2.1-API文档-中文版.zip

    赠送jar包:lucene-spatial3d-7.2.1.jar; 赠送原API文档:lucene-spatial3d-7.2.1-javadoc.jar; 赠送源代码:lucene-spatial3d-7.2.1-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-7.2.1.pom; 包含...

    Lucene5学习之Spatial地理位置搜索

    《深入探索Lucene5 Spatial:地理位置搜索》 在信息技术飞速发展的今天,地理位置搜索已经成为许多应用和服务不可或缺的一部分。Apache Lucene作为一个强大的全文搜索引擎库,其在5.x版本中引入了Spatial模块,使得...

    Applied spatial data analysis with R

    **标题**:“Applied Spatial Data Analysis with R” **描述**:“基于R软件的空间数据分析经典教程,一本不错的书......” **知识点概述**: 《应用空间数据分析》是一本专门针对空间数据分析的教材,其主要面向...

    Pro Spatial With Sql Server 2012

    Pro Spatial With Sql Server 2012, Alastair Aitchison, Apress

    lucene-spatial3d-7.7.0-API文档-中文版.zip

    赠送jar包:lucene-spatial3d-7.7.0.jar; 赠送原API文档:lucene-spatial3d-7.7.0-javadoc.jar; 赠送源代码:lucene-spatial3d-7.7.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-7.7.0.pom; 包含...

    lucene-spatial3d-7.3.1-API文档-中文版.zip

    赠送jar包:lucene-spatial3d-7.3.1.jar; 赠送原API文档:lucene-spatial3d-7.3.1-javadoc.jar; 赠送源代码:lucene-spatial3d-7.3.1-sources.jar; 赠送Maven依赖信息文件:lucene-spatial3d-7.3.1.pom; 包含...

    Computing With Spatial Trajectories

    在信息技术领域,"Computing With Spatial Trajectories" 是一个重要的研究方向,它涉及地理信息系统(GIS)、数据挖掘、机器学习等多个领域。空间轨迹是指物体或个体在时间序列中的地理位置信息,如车辆行驶路径、...

    lucene-spatial-extras-7.7.0-API文档-中文版.zip

    赠送jar包:lucene-spatial-extras-7.7.0.jar; 赠送原API文档:lucene-spatial-extras-7.7.0-javadoc.jar; 赠送源代码:lucene-spatial-extras-7.7.0-sources.jar; 赠送Maven依赖信息文件:lucene-spatial-extras...

    lucene-spatial-extras-7.2.1-API文档-中文版.zip

    赠送jar包:lucene-spatial-extras-7.2.1.jar; 赠送原API文档:lucene-spatial-extras-7.2.1-javadoc.jar; 赠送源代码:lucene-spatial-extras-7.2.1-sources.jar; 赠送Maven依赖信息文件:lucene-spatial-extras...

Global site tag (gtag.js) - Google Analytics