Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

benefit of combining fields into one vs. booleanQuery

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


ab at taktik

Aug 21, 2007, 1:38 PM

Post #1 of 2 (667 views)
Permalink
benefit of combining fields into one vs. booleanQuery

Hi everyone,

My question : i have medias with a "title" field and a "caption"
field and a "keywords" field.

I want to be able to search in those 3 fields at the same time. For
example, if i search "black car" the boolean query looks like this
combination of termqueries:

(title=black or keywords=black or caption=black) and (title=car or
keywords= car or caption= car).

So if "black" is in caption and "car" is in title I must find the media.

I'm afraid that those boolean queries will be slow when there are a
lot of terms in the query.

I can, at index time add a "fulltext" field to each media that will
contains title, caption and keywords concatenated.

the query becomes : (fulltext=black and fulltext=car), much simpler.

But i must still be able to search only in title or only in caption
or only in keywords, so I must still add the other fields, doubling
the indexed terms.

Has someone done a similar thing? Is it worth it, or will the First
boolean query remain fast enough?

Thx,

Antoine




--
Antoine Baudoux
Development Manager
ab [at] taktik
Tél.: +32 2 333 58 44
GSM: +32 499 534 538
Fax.: +32 2 648 16 53


erickerickson at gmail

Aug 21, 2007, 1:47 PM

Post #2 of 2 (601 views)
Permalink
Re: benefit of combining fields into one vs. booleanQuery [In reply to]

A three field boolean query isn't very complex, so I don't
think that's a problem. Although it does depend a bit upon
how many terms you allow.

But I'd try the simplest thing first, which would be to put
all the terms in a fulltext field as well as in individual terms.

Then get some performance measurements and some
idea of what the total size of the index will be and make
some decisions at that point.

It's actually remarkably easy to switch from one
of those solutions to the other if performance isn't what
you need. In general, I've had better luck not worrying
about space and going for simple code, *then* changing
things around if there's a problem.

Best
Erick

On 8/21/07, Antoine Baudoux <ab [at] taktik> wrote:
>
> Hi everyone,
>
> My question : i have medias with a "title" field and a "caption"
> field and a "keywords" field.
>
> I want to be able to search in those 3 fields at the same time. For
> example, if i search "black car" the boolean query looks like this
> combination of termqueries:
>
> (title=black or keywords=black or caption=black) and (title=car or
> keywords= car or caption= car).
>
> So if "black" is in caption and "car" is in title I must find the media.
>
> I'm afraid that those boolean queries will be slow when there are a
> lot of terms in the query.
>
> I can, at index time add a "fulltext" field to each media that will
> contains title, caption and keywords concatenated.
>
> the query becomes : (fulltext=black and fulltext=car), much simpler.
>
> But i must still be able to search only in title or only in caption
> or only in keywords, so I must still add the other fields, doubling
> the indexed terms.
>
> Has someone done a similar thing? Is it worth it, or will the First
> boolean query remain fast enough?
>
> Thx,
>
> Antoine
>
>
>
>
> --
> Antoine Baudoux
> Development Manager
> ab [at] taktik
> Tél.: +32 2 333 58 44
> GSM: +32 499 534 538
> Fax.: +32 2 648 16 53
>
>
>

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.