Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-Dev

TrieRangeQuery for contrib? was: [jira] Commented: (LUCENE-1461) Cached filter for a single term field

 

 

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded


uwe at thetaphi

Nov 25, 2008, 3:28 AM

Post #1 of 2 (2272 views)
Permalink
TrieRangeQuery for contrib? was: [jira] Commented: (LUCENE-1461) Cached filter for a single term field

I do not want to attach this to the issue report, just for completeness:

I implemented (based on RangeFilter) another approach for faster
RangeQueries, based on longs stored in index in a special format.

The idea behind this is to store the longs in different precision in index
and partition the query range in such a way, that the outer boundaries are
search using terms from the highest precision, but the center of the search
Range with lower precision. The implementation stores the longs in 8
different precisions (using a class called TrieUtils). It also has support
for Doubles, using the IEEE 754 floating-point "double format" bit layout
with some bit mappings to make them binary sortable. The approach is used in
rather big indexes, query times are even on low performance desktop
computers <<100 ms (!) for very big ranges on indexes with 500000 docs.

I called this RangeQuery variant and format "TrieRangeRange" query because
the idea looks like the well-known Trie structures (but it is not identical
to real tries, but algorithms are related to it).

I was thinking about supplying this to Lucene contrib, any thoughts about
it?

More infos from the JavaDoc of the project:
http://www.panfmp.org/apidocs/de/pangaea/metadataportal/search/TrieRangeQuer
y.html
http://www.panfmp.org/apidocs/de/pangaea/metadataportal/utils/TrieUtils.html


Source:
http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me
tadataportal/search/TrieRangeQuery.java?view=markup
http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me
tadataportal/utils/TrieUtils.java?view=markup


Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe [at] thetaphi

> From: Michael McCandless (JIRA) [mailto:jira [at] apache]
> Sent: Tuesday, November 25, 2008 12:14 PM
> To: java-dev [at] lucene
> Subject: [jira] Commented: (LUCENE-1461) Cached filter for a single term
> field
>
>
> [ https://issues.apache.org/jira/browse/LUCENE-
> 1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=12650530#action_12650530 ]
>
> Michael McCandless commented on LUCENE-1461:
> --------------------------------------------
>
> Should we absorb RangeMultiFilter into RangeFilter, and add a "setMethod"
> to RangeFilter?
>
> Someday, I think RangeFilter and RangeQuery should be implemented using
> hierarchical ranges (there was a reference to a page in the wiki,
> specifically about date range searching, recently), which would be another
> method.
>
> > Cached filter for a single term field
> > -------------------------------------
> >
> > Key: LUCENE-1461
> > URL: https://issues.apache.org/jira/browse/LUCENE-1461
> > Project: Lucene - Java
> > Issue Type: New Feature
> > Reporter: Tim Sturge
> > Attachments: DisjointMultiFilter.java, LUCENE-1461a.patch,
> RangeMultiFilter.java, RangeMultiFilter.java, TermMultiFilter.java
> >
> >
> > These classes implement inexpensive range filtering over a field
> containing a single term. They do this by building an integer array of
> term numbers (storing the term->number mapping in a TreeMap) and then
> implementing a fast integer comparison based DocSetIdIterator.
> > This code is currently being used to do age range filtering, but could
> also be used to do other date filtering or in any application where there
> need to be multiple filters based on the same single term field. I have an
> untested implementation of single term filtering and have considered but
> not yet implemented term set filtering (useful for location based
> searches) as well.
> > The code here is fairly rough; it works but lacks javadocs and
> toString() and hashCode() methods etc. I'm posting it here to discover if
> there is other interest in this feature; I don't mind fixing it up but
> would hate to go to the effort if it's not going to make it into Lucene.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene


lucene at mikemccandless

Nov 25, 2008, 9:41 AM

Post #2 of 2 (2087 views)
Permalink
Re: TrieRangeQuery for contrib? was: [jira] Commented: (LUCENE-1461) Cached filter for a single term field [In reply to]

I would love to see this find it's way into Lucene!

It puts a logarithmic bound (I think?) on the number of terms whose
docs must be visited to implement the range query, at the expense
of an increase in index size for those fields that need range search.

For large indexes this should make an incredible difference on
RangeQuery performance.

Mike

Uwe Schindler wrote:

> I do not want to attach this to the issue report, just for
> completeness:
>
> I implemented (based on RangeFilter) another approach for faster
> RangeQueries, based on longs stored in index in a special format.
>
> The idea behind this is to store the longs in different precision in
> index
> and partition the query range in such a way, that the outer
> boundaries are
> search using terms from the highest precision, but the center of the
> search
> Range with lower precision. The implementation stores the longs in 8
> different precisions (using a class called TrieUtils). It also has
> support
> for Doubles, using the IEEE 754 floating-point "double format" bit
> layout
> with some bit mappings to make them binary sortable. The approach is
> used in
> rather big indexes, query times are even on low performance desktop
> computers <<100 ms (!) for very big ranges on indexes with 500000
> docs.
>
> I called this RangeQuery variant and format "TrieRangeRange" query
> because
> the idea looks like the well-known Trie structures (but it is not
> identical
> to real tries, but algorithms are related to it).
>
> I was thinking about supplying this to Lucene contrib, any thoughts
> about
> it?
>
> More infos from the JavaDoc of the project:
> http://www.panfmp.org/apidocs/de/pangaea/metadataportal/search/TrieRangeQuer
> y.html
> http://www.panfmp.org/apidocs/de/pangaea/metadataportal/utils/TrieUtils.html
>
>
> Source:
> http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me
> tadataportal/search/TrieRangeQuery.java?view=markup
> http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me
> tadataportal/utils/TrieUtils.java?view=markup
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe [at] thetaphi
>
>> From: Michael McCandless (JIRA) [mailto:jira [at] apache]
>> Sent: Tuesday, November 25, 2008 12:14 PM
>> To: java-dev [at] lucene
>> Subject: [jira] Commented: (LUCENE-1461) Cached filter for a single
>> term
>> field
>>
>>
>> [ https://issues.apache.org/jira/browse/LUCENE-
>> 1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
>> tabpanel&focusedCommentId=12650530#action_12650530 ]
>>
>> Michael McCandless commented on LUCENE-1461:
>> --------------------------------------------
>>
>> Should we absorb RangeMultiFilter into RangeFilter, and add a
>> "setMethod"
>> to RangeFilter?
>>
>> Someday, I think RangeFilter and RangeQuery should be implemented
>> using
>> hierarchical ranges (there was a reference to a page in the wiki,
>> specifically about date range searching, recently), which would be
>> another
>> method.
>>
>>> Cached filter for a single term field
>>> -------------------------------------
>>>
>>> Key: LUCENE-1461
>>> URL: https://issues.apache.org/jira/browse/
>>> LUCENE-1461
>>> Project: Lucene - Java
>>> Issue Type: New Feature
>>> Reporter: Tim Sturge
>>> Attachments: DisjointMultiFilter.java, LUCENE-1461a.patch,
>> RangeMultiFilter.java, RangeMultiFilter.java, TermMultiFilter.java
>>>
>>>
>>> These classes implement inexpensive range filtering over a field
>> containing a single term. They do this by building an integer array
>> of
>> term numbers (storing the term->number mapping in a TreeMap) and then
>> implementing a fast integer comparison based DocSetIdIterator.
>>> This code is currently being used to do age range filtering, but
>>> could
>> also be used to do other date filtering or in any application where
>> there
>> need to be multiple filters based on the same single term field. I
>> have an
>> untested implementation of single term filtering and have
>> considered but
>> not yet implemented term set filtering (useful for location based
>> searches) as well.
>>> The code here is fairly rough; it works but lacks javadocs and
>> toString() and hashCode() methods etc. I'm posting it here to
>> discover if
>> there is other interest in this feature; I don't mind fixing it up
>> but
>> would hate to go to the effort if it's not going to make it into
>> Lucene.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
>> For additional commands, e-mail: java-dev-help [at] lucene
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
> For additional commands, e-mail: java-dev-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe [at] lucene
For additional commands, e-mail: java-dev-help [at] lucene

Lucene java-dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.