Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

Geographical indexing in Lucene

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


evgeny.shadchnev at gmail

Oct 1, 2007, 8:40 AM

Post #1 of 4 (7465 views)
Permalink
Geographical indexing in Lucene

Hello,

As part of my MSc project this summer I developed geoLucene, a
modified version of Lucene (based on 2.3-dev checked out on 05.07)
that can index geographical data using R-trees. It has been shown to
be faster on geographic queries than the unmodified Lucene. The code
is hosted at https://sourceforge.net/projects/geolucene/.

This is an alpha release and it is not intended for production use.
However, I hope somebody will find it useful and will develop it
further. Though the code is far from being finished, I'm unable to
continue developing this project.

geoLucene is able to index documents in two indexes. First, a document
is indexed using inverted index, as usual. Second, it may be (if
developer wishes so) indexed using a spacial index, an R-tree. The
spacial index can be searched and the results may be combined with the
results from the inverted index.

I'll explain how it works by providing the code. So, the search is
done as follows:

GeoSearchArea gsa = new GeoSearchArea();
gsa.setWest(x1);
gsa.setEast(x2);
gsa.setSouth(y1);
gsa.setNorth(y2);
GeoQuery geoQuery = new GeoQuery(gsa, searchMode); // geoQuery extends
Query, search mode is intersection or enclosure
IndexSearcher RTreeSearcher = new IndexSearcher(dir);
Hits rTreeResults = RTreeSearcher.search(geoQuery);

The data is indexed as follows:

Document doc = new Document();
doc.add(new Field("latitude", Double.toString(lat), Field.Store.NO,
Field.Index.UN_TOKENIZED));
doc.add(new Field("longitude", Double.toString(lon), Field.Store.NO,
Field.Index.UN_TOKENIZED));
indexWriter.addDocument(doc);

Here, "latitude" and "longitude" are hard-coded (recompile to change)
field names that should contain geocoordinates. The documents that
contain such coordinates are intercepted and indexed in an R-tree
before being indexed in the inverted index.

This project was a part of my MSc course. More details about the
architecture, implementation and the performance results may be found
in my MSc thesis available online:
http://www.doc.ic.ac.uk/~es106/thesis/MultidimensionalIndexingInLucene.pdf.
There is a web demo, which is available at
http://geolucene.virtual.vps-host.net/ (IE, Firefox and latest
Konqueror only). The web-demo runs in a multitasking environment
(furthermore, the data is cached), so make a number of queries before
averaging the search times.

Examples of how to index and search files may be found in
TestGeoSearchSpeed.java file (in the root directory of SVN
repository). I used this file to run benchmarks.

This project is still in alpha and there are many changes to be made
before it can be called stable but I must warn you that it lacks one
essential feature. I did not implement document deletion method.
However, there is a not so complicated way of implementing it
(preserving coherent document numbers in both indexes). It is
described in the "Future Work" section of my thesis. Therefore, this
version of the library can be tested to evaluate the performance of an
R-tree (it's significantly faster than inverted index) only.

As I've said in the beginning, I am unable to develop this project
anymore, so I release it on SourceForge and I'm willing to pass the
code to anybody who wants to improve it. There is no documentation but
my thesis is a quite good (I hope) explanation of what's going on
inside geoLucene. I'm happy to answer any queries and generally help
with the project, if somebody wishes to develop it further.

Regards,
Evgeny Shadchnev

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


markharw00d at yahoo

Oct 1, 2007, 1:53 PM

Post #2 of 4 (7220 views)
Permalink
Re: Geographical indexing in Lucene [In reply to]

Great work, Evgeny!

I'm certainly interested in this area and will be dissecting this in
some detail.

I've done similar work before but making use of JTS (Java Topology
Suite), using the OpenGIS standards for spatial features/queries and
2-pass spatial queries (first rough pass is MBB only, 2nd pass does full
geometry tests but only for results that satisfied any Lucene-text
queries). What I haven't addressed (which you have here) is disk-based
spatial indexes which is obviously the key to scalability.

Should have more to discuss once I've dug deeper in...
I for one would be interested in making this part of Lucene.

Thanks again,
Mark




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


jdelgado at lendingclub

Oct 1, 2007, 3:38 PM

Post #3 of 4 (7223 views)
Permalink
Re: Geographical indexing in Lucene [In reply to]

Quadtrees and R-trees have been used as special "domain" indexes in Oracle
RDBMS for Spatial:
http://www.oracle.com/technology/products/spatial/htdocs/data_sheet_9i/9iR2_spatial_ds.html


Some lectures and papers:

http://csiweb.ucd.ie/staff/mbertolotto/home/lecture-notes4025-07-08.htm
http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/9217/29235/01320042.pdf
http://www.espatial.com/pdf/TWP_Spatial_Oracle_Spatial_and_Oracle_Locator_10gR2_0513.pdf

This is yet another evidence of the RDBMS/IR integration trend.

-- Joaquin


2007/10/1, markharw00d <markharw00d [at] yahoo>:
>
> Great work, Evgeny!
>
> I'm certainly interested in this area and will be dissecting this in
> some detail.
>
> I've done similar work before but making use of JTS (Java Topology
> Suite), using the OpenGIS standards for spatial features/queries and
> 2-pass spatial queries (first rough pass is MBB only, 2nd pass does full
> geometry tests but only for results that satisfied any Lucene-text
> queries). What I haven't addressed (which you have here) is disk-based
> spatial indexes which is obviously the key to scalability.
>
> Should have more to discuss once I've dug deeper in...
> I for one would be interested in making this part of Lucene.
>
> Thanks again,
> Mark
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>
>


uwe at thetaphi

Oct 5, 2007, 3:31 AM

Post #4 of 4 (7179 views)
Permalink
Re: Geographical indexing in Lucene [In reply to]

Hi Evgeny,

you may look at http://www.panFMP.org

This software uses a similar approach for very fast range queries without
modifying Lucene. It works by storing the double values in a special encoded
form with different precisions in the index, similar to the well known
TRIEs.
It may be very interesting to compare your algorithm with mine!

Nice work,

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

Hello,

As part of my MSc project this summer I developed geoLucene, a
modified version of Lucene (based on 2.3-dev checked out on 05.07)
that can index geographical data using R-trees. It has been shown to
be faster on geographic queries than the unmodified Lucene. The code
is hosted at https://sourceforge.net/projects/geolucene/.

This is an alpha release and it is not intended for production use.
However, I hope somebody will find it useful and will develop it
further. Though the code is far from being finished, I'm unable to
continue developing this project.

geoLucene is able to index documents in two indexes. First, a document
is indexed using inverted index, as usual. Second, it may be (if
developer wishes so) indexed using a spacial index, an R-tree. The
spacial index can be searched and the results may be combined with the
results from the inverted index.

I'll explain how it works by providing the code. So, the search is
done as follows:

GeoSearchArea gsa = new GeoSearchArea();
gsa.setWest(x1);
gsa.setEast(x2);
gsa.setSouth(y1);
gsa.setNorth(y2);
GeoQuery geoQuery = new GeoQuery(gsa, searchMode); // geoQuery extends
Query, search mode is intersection or enclosure
IndexSearcher RTreeSearcher = new IndexSearcher(dir);
Hits rTreeResults = RTreeSearcher.search(geoQuery);

The data is indexed as follows:

Document doc = new Document();
doc.add(new Field("latitude", Double.toString(lat), Field.Store.NO,
Field.Index.UN_TOKENIZED));
doc.add(new Field("longitude", Double.toString(lon), Field.Store.NO,
Field.Index.UN_TOKENIZED));
indexWriter.addDocument(doc);

Here, "latitude" and "longitude" are hard-coded (recompile to change)
field names that should contain geocoordinates. The documents that
contain such coordinates are intercepted and indexed in an R-tree
before being indexed in the inverted index.

This project was a part of my MSc course. More details about the
architecture, implementation and the performance results may be found
in my MSc thesis available online:
http://www.doc.ic.ac.uk/~es106/thesis/MultidimensionalIndexingInLucene.pdf.
There is a web demo, which is available at
http://geolucene.virtual.vps-host.net/ (IE, Firefox and latest
Konqueror only). The web-demo runs in a multitasking environment
(furthermore, the data is cached), so make a number of queries before
averaging the search times.

Examples of how to index and search files may be found in
TestGeoSearchSpeed.java file (in the root directory of SVN
repository). I used this file to run benchmarks.

This project is still in alpha and there are many changes to be made
before it can be called stable but I must warn you that it lacks one
essential feature. I did not implement document deletion method.
However, there is a not so complicated way of implementing it
(preserving coherent document numbers in both indexes). It is
described in the "Future Work" section of my thesis. Therefore, this
version of the library can be tested to evaluate the performance of an
R-tree (it's significantly faster than inverted index) only.

As I've said in the beginning, I am unable to develop this project
anymore, so I release it on SourceForge and I'm willing to pass the
code to anybody who wants to improve it. There is no documentation but
my thesis is a quite good (I hope) explanation of what's going on
inside geoLucene. I'm happy to answer any queries and generally help
with the project, if somebody wishes to develop it further.

Regards,
Evgeny Shadchnev

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.