Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

OR'ed boolean queries

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


melola at seinet

Jul 21, 2005, 9:51 AM

Post #1 of 4 (2178 views)
Permalink
OR'ed boolean queries

Hello

I donīt know exactly how is working PrefixQuery,WildcardQuery,RangeQuery and FuzzyQuery expanding to a series of OR'ed boolean queries.

For example I have an index with 200.000 registries. Each registry has two metadatas, NAMEFILE and AGENCY. If I do the search
NAMEFILE:ef*
I am getting TooManyClauses error, but if I do the search
AGENCY:ef*
I am getting correctly the results without any error.

Both metadatas has 200.000 values, but, in the metadata AGENCY there are about 30 diferents values and in the metadata NAMEFILE each registry has an unique value.

Both metadatas have been indexed like Field.Text.

The same happens with RangeQuery. For example:

The user select PAGE > 0. Internally it is translated like PAGE:{0000000000 TO 2147483647} (2147483647 This is Integer.MAX_VALUE)
This returns 130.000 registries with value > 0 without TooManyClauses error, but using another numeric metadatas I am getting TooManyClauses error..

The property maxClauseCount is by default (1024).

Could anybody tell me how it is working?



Thanks in advance


Mari Luz Elola


otis_gospodnetic at yahoo

Jul 21, 2005, 10:20 AM

Post #2 of 4 (2071 views)
Permalink
Re: OR'ed boolean queries [In reply to]

The problem is that you name a lot of NAMEFILEs that start with "ef".
"A lot" means "more than 1024":
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/BooleanQuery.html#getMaxClauseCount()

You could change it with this:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/BooleanQuery.html#setMaxClauseCount(int)

Otis


--- MariLuz Elola <melola [at] seinet> wrote:

> Hello
>
> I donīt know exactly how is working
> PrefixQuery,WildcardQuery,RangeQuery and FuzzyQuery expanding to a
> series of OR'ed boolean queries.
>
> For example I have an index with 200.000 registries. Each
> registry has two metadatas, NAMEFILE and AGENCY. If I do the search
> NAMEFILE:ef*
> I am getting TooManyClauses error, but if I do the search
> AGENCY:ef*
> I am getting correctly the results without any error.
>
> Both metadatas has 200.000 values, but, in the metadata AGENCY
> there are about 30 diferents values and in the metadata NAMEFILE each
> registry has an unique value.
>
> Both metadatas have been indexed like Field.Text.
>
> The same happens with RangeQuery. For example:
>
> The user select PAGE > 0. Internally it is translated like
> PAGE:{0000000000 TO 2147483647} (2147483647 This is
> Integer.MAX_VALUE)
> This returns 130.000 registries with value > 0 without
> TooManyClauses error, but using another numeric metadatas I am
> getting TooManyClauses error..
>
> The property maxClauseCount is by default (1024).
>
> Could anybody tell me how it is working?
>
>
>
> Thanks in advance
>
>
> Mari Luz Elola
>
>
>
>
>


melola at seinet

Jul 21, 2005, 10:28 AM

Post #3 of 4 (2062 views)
Permalink
Re: OR'ed boolean queries [In reply to]

But, the metadata AGENCY has a lot of "ef" too, more than 1024.
What is the difference between NAMEFILE and AGENCY. Why I am getting
maxClause error with NAMEFILE and not with AGENCY??
If I change maxClauseCount to a big value, I am getting OutOfMemoryError.


----- Original Message -----
From: "Otis Gospodnetic" <otis_gospodnetic [at] yahoo>
To: <general [at] lucene>
Sent: Thursday, July 21, 2005 7:20 PM
Subject: Re: OR'ed boolean queries


> The problem is that you name a lot of NAMEFILEs that start with "ef".
> "A lot" means "more than 1024":
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/BooleanQuery.html#getMaxClauseCount()
>
> You could change it with this:
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/BooleanQuery.html#setMaxClauseCount(int)
>
> Otis
>
>
> --- MariLuz Elola <melola [at] seinet> wrote:
>
>> Hello
>>
>> I donīt know exactly how is working
>> PrefixQuery,WildcardQuery,RangeQuery and FuzzyQuery expanding to a
>> series of OR'ed boolean queries.
>>
>> For example I have an index with 200.000 registries. Each
>> registry has two metadatas, NAMEFILE and AGENCY. If I do the search
>> NAMEFILE:ef*
>> I am getting TooManyClauses error, but if I do the search
>> AGENCY:ef*
>> I am getting correctly the results without any error.
>>
>> Both metadatas has 200.000 values, but, in the metadata AGENCY
>> there are about 30 diferents values and in the metadata NAMEFILE each
>> registry has an unique value.
>>
>> Both metadatas have been indexed like Field.Text.
>>
>> The same happens with RangeQuery. For example:
>>
>> The user select PAGE > 0. Internally it is translated like
>> PAGE:{0000000000 TO 2147483647} (2147483647 This is
>> Integer.MAX_VALUE)
>> This returns 130.000 registries with value > 0 without
>> TooManyClauses error, but using another numeric metadatas I am
>> getting TooManyClauses error..
>>
>> The property maxClauseCount is by default (1024).
>>
>> Could anybody tell me how it is working?
>>
>>
>>
>> Thanks in advance
>>
>>
>> Mari Luz Elola
>>
>>
>>
>>
>>
>


hossman_lucene at fucit

Jul 29, 2005, 6:33 PM

Post #4 of 4 (2056 views)
Permalink
Re: OR'ed boolean queries [In reply to]

: But, the metadata AGENCY has a lot of "ef" too, more than 1024.
: What is the difference between NAMEFILE and AGENCY. Why I am getting
: maxClause error with NAMEFILE and not with AGENCY??

the issue is not the number of documents that have a value with that
prefix in that field -- the issue is the number of unique values that have
that prefix in that field -- regardless of the number of documents that
use each value.

: If I change maxClauseCount to a big value, I am getting OutOfMemoryError.

you can either increase your memory footprint, or you can abanndon the use
of prefix query in this situation. there are a lot of other options for
achieving simialr results -- using a custom filter, making more fields
that contain only the first few characters of the field for the purpose of
doiing short prefix queries ... etc.

In my opinion, understanding the way PrefixQuery (and RangeQuery) expand
to BooleanQueries, and why it can cause TooManyClauses exceptions is the
second most important thing people using Lucene need to understand (after
Analyzers). Take the time to read up on it in the wiki, mailing list
archives, and LIA -- it's worth it.


:
:
: ----- Original Message -----
: From: "Otis Gospodnetic" <otis_gospodnetic [at] yahoo>
: To: <general [at] lucene>
: Sent: Thursday, July 21, 2005 7:20 PM
: Subject: Re: OR'ed boolean queries
:
:
: > The problem is that you name a lot of NAMEFILEs that start with "ef".
: > "A lot" means "more than 1024":
: > http://lucene.apache.org/java/docs/api/org/apache/lucene/search/BooleanQuery.html#getMaxClauseCount()
: >
: > You could change it with this:
: > http://lucene.apache.org/java/docs/api/org/apache/lucene/search/BooleanQuery.html#setMaxClauseCount(int)
: >
: > Otis
: >
: >
: > --- MariLuz Elola <melola [at] seinet> wrote:
: >
: >> Hello
: >>
: >> I donīt know exactly how is working
: >> PrefixQuery,WildcardQuery,RangeQuery and FuzzyQuery expanding to a
: >> series of OR'ed boolean queries.
: >>
: >> For example I have an index with 200.000 registries. Each
: >> registry has two metadatas, NAMEFILE and AGENCY. If I do the search
: >> NAMEFILE:ef*
: >> I am getting TooManyClauses error, but if I do the search
: >> AGENCY:ef*
: >> I am getting correctly the results without any error.
: >>
: >> Both metadatas has 200.000 values, but, in the metadata AGENCY
: >> there are about 30 diferents values and in the metadata NAMEFILE each
: >> registry has an unique value.
: >>
: >> Both metadatas have been indexed like Field.Text.
: >>
: >> The same happens with RangeQuery. For example:
: >>
: >> The user select PAGE > 0. Internally it is translated like
: >> PAGE:{0000000000 TO 2147483647} (2147483647 This is
: >> Integer.MAX_VALUE)
: >> This returns 130.000 registries with value > 0 without
: >> TooManyClauses error, but using another numeric metadatas I am
: >> getting TooManyClauses error..
: >>
: >> The property maxClauseCount is by default (1024).
: >>
: >> Could anybody tell me how it is working?
: >>
: >>
: >>
: >> Thanks in advance
: >>
: >>
: >> Mari Luz Elola
: >>
: >>
: >>
: >>
: >>
: >
:
:



-Hoss

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.