Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

R-Tree in lucene thoughts?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


ryantxu at gmail

Jan 7, 2010, 7:13 AM

Post #1 of 4 (834 views)
Permalink
R-Tree in lucene thoughts?

I'm getting back into the spatial search stuff and wanted to get some
feedback before starting down any path...

I'm building an app, where I need R-tree like functionality -- that is
search for all items within some extent / that touch some extent. If
anyone has ideas for how this could map to tokens, that would be
great... but i don't know of anything yet.

Currently, I index an extent as "XMin XMax YMin YMax", then have a
custom filter that loads an in memory R-Tree for each Reader from the
FieldCache. This works great, but the memory requirements get big as
the number of documents gets large. Also it takes a while to warm up.

With the new flexible indexing stuff, would it be possible to natively
write an rtree to disk in the index process? Somehow using the lucene
doc id? Alternatively, I can look at writing the spatial index to
some other format (perhaps http://hatbox.sourceforge.net/) and then
try to keep lucene doc ids in sync with application ids.

Any thoughts on what may be possible?

It looks like Lucy has considered something similar:
http://www.lucidimagination.com/search/document/75ac07b7e2d6160d/pluggable_indexreader_was_real_time_updates


thanks
ryan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


marvin at rectangular

Jan 7, 2010, 9:21 AM

Post #2 of 4 (766 views)
Permalink
Re: R-Tree in lucene thoughts? [In reply to]

On Thu, Jan 07, 2010 at 10:13:00AM -0500, Ryan McKinley wrote:
> With the new flexible indexing stuff, would it be possible to natively
> write an rtree to disk in the index process?

The question I'd have is, how would you handle interleaving of hits from
different segments? Meaning, if you have a priority queue with 10 slots, 10
hits from one segment, and 10 hits from another segment, how do you determine
which his win and which fall out the bottom of the queue?

> It looks like Lucy has considered something similar:
> http://www.lucidimagination.com/search/document/75ac07b7e2d6160d/pluggable_indexreader_was_real_time_updates

Yes. That post is from last spring; since then, a prototype of the Lucy
pluggable IndexReader design has been implemented in KinoSearch, so you could
write such a component today.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


ryantxu at gmail

Jan 7, 2010, 10:03 AM

Post #3 of 4 (768 views)
Permalink
Re: R-Tree in lucene thoughts? [In reply to]

On Jan 7, 2010, at 12:21 PM, Marvin Humphrey wrote:

> On Thu, Jan 07, 2010 at 10:13:00AM -0500, Ryan McKinley wrote:
>> With the new flexible indexing stuff, would it be possible to
>> natively
>> write an rtree to disk in the index process?
>
> The question I'd have is, how would you handle interleaving of hits
> from
> different segments? Meaning, if you have a priority queue with 10
> slots, 10
> hits from one segment, and 10 hits from another segment, how do you
> determine
> which his win and which fall out the bottom of the queue?

(I am not familiar enough with the segment internals, so I may be way
off base...)

Perhaps each segment has its own r-tree. For each query, the r-tree
would only be used to say if something could be in the results, it
does not contribute to the score, so I don't think interleaving would
be any different then how it currently works (without knowing how it
currently works) Potentially, the node depth could influence the
score, but lets ignore that for now :)


ryan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


marvin at rectangular

Jan 7, 2010, 10:35 AM

Post #4 of 4 (765 views)
Permalink
Re: R-Tree in lucene thoughts? [In reply to]

On Thu, Jan 07, 2010 at 01:03:27PM -0500, Ryan McKinley wrote:

> Perhaps each segment has its own r-tree. For each query, the r-tree would
> only be used to say if something could be in the results, it does not
> contribute to the score, so I don't think interleaving would be any
> different then how it currently works (without knowing how it currently
> works) Potentially, the node depth could influence the score, but lets
> ignore that for now :)

Yes, if the results of the r-tree search are only for filtering, it wouldn't
matter. If they are used to contribute to a score, interleaving is also easy,
because you just compare scores and the higher score wins.

It's only important if you wanted to, say, sort by distance.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.