Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

GeoSort approach - your opinion

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


sascha.fahl at googlemail

Jul 19, 2008, 2:53 AM

Post #1 of 2 (171 views)
Permalink
GeoSort approach - your opinion

Hi,

last week I realized an approach for GeoSort in lucene. Inspired by
"Lucene in action" I modified the algorithm in the following way. When
an IndexReader for a certain index is created, a cache for
geoinformation is created - this simply is a 2 dimensional int Array.
So it is possible to cache geoinformation for 1.000.000 docs in around
8 MB. Everytime the ScoreDocComparator.compare(ScoreDoc i, ScoreDoc j)
method is called I fetch the int Array with the geoinfo from the cache
and calculate the distance.
I think this is a quite good solution:
1. Only the distances of real Hits are calculated. So only needed
operations are done.
2. The geoinformation is not fetched via IndexReader.doc(i) but
directly from the cache that is placed in the RAM
3. All hits get returned because this approach does not work with a
boxed model, that excludes documents that are not within a certain
radius (this is very annoying if there is a hit with a distance of 51
km and the radius is 50 km)

What do you think about this approach? The only possible advantage is
the cache I think because I do not really know if the JVM is good in
handling 10 MB of data in the RAM.


MfG

Sascha Fahl
sascha.fahl[at]gmail.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org


te at statsbiblioteket

Jul 21, 2008, 3:43 AM

Post #2 of 2 (134 views)
Permalink
Re: GeoSort approach - your opinion [In reply to]

On Sat, 2008-07-19 at 11:53 +0200, Sascha Fahl wrote:
> last week I realized an approach for GeoSort in lucene. Inspired by
> "Lucene in action" I modified the algorithm in the following way. When
> an IndexReader for a certain index is created, a cache for
> geoinformation is created - this simply is a 2 dimensional int Array.
> So it is possible to cache geoinformation for 1.000.000 docs in around
> 8 MB.

Be aware that arrays in themselves take up a fair amount of memory, so
you'll want to use only 3 arrays in total and not 1000001:

int[][] coordinates = new int[2];
coordinates[0] = new int[1000000];
coordinates[1] = new int[1000000];

[...]

> What do you think about this approach?

Sounds fine when the index rarely changes.

> The only possible advantage is the cache I think because I do not really
> know if the JVM is good in handling 10 MB of data in the RAM.

The Sun JVM is perfectly capable of handling large arrays efficiently.
We use an array-based structure of ints and longs for quick facet look
up that is approximately 300MB.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe[at]lucene.apache.org
For additional commands, e-mail: java-user-help[at]lucene.apache.org

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.