Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

Searching by bit masks

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


ltaylor at employon

Nov 9, 2006, 11:30 AM

Post #1 of 2 (1385 views)
Permalink
Searching by bit masks

Hello,

I am currently evaluating Lucene to see if it would be appropriate to
replace my company's current search software. So far everything has been
looking great, however there is one requirement that I am not too certain
about.

What we need to do is to be able to store a bit mask specifying various
filter flags for a document in the index and then search this field by
specifying another bit mask with desired filters, returning documents that
have any of the specified flags set. In other words, we are doing a bitwise
OR on the stored filter bit mask and the specified filter bit mask and if it
is non-zero, we want to return the document.

Before I started toying around with various options myself, I wanted to see
if any of you good folks in the Lucene community had some suggestions for an
efficient way to implement this.

We currently need to index ~8,000,000 documents. We have several filter
flag fields, the most important of which currently has 7 possible flags with
any combination of the flags being valid. The number of flags is expected
to increase rather rapidly in the near future.

My preemptive thanks for your suggestions.

Lawrence Taylor
Senior Software Engineer
Employon
--
View this message in context: http://www.nabble.com/Searching-by-bit-masks-tf2603692.html#a7264721
Sent from the Lucene - General mailing list archive at Nabble.com.


yonik at apache

Nov 9, 2006, 12:23 PM

Post #2 of 2 (1261 views)
Permalink
Re: Searching by bit masks [In reply to]

On 11/9/06, ltaylor.employon <ltaylor [at] employon> wrote:
> I am currently evaluating Lucene to see if it would be appropriate to
> replace my company's current search software. So far everything has been
> looking great, however there is one requirement that I am not too certain
> about.
>
> What we need to do is to be able to store a bit mask specifying various
> filter flags for a document in the index and then search this field by
> specifying another bit mask with desired filters, returning documents that
> have any of the specified flags set. In other words, we are doing a bitwise
> OR on the stored filter bit mask and the specified filter bit mask and if it
> is non-zero, we want to return the document.

Lucene maintains an inverted index, so you don't need a bit mask...
you can actually use symbolic values.

doc {
id=1
tags = tag1 tag3 tag7
}

doc {
id = 2
tags = tag1 tag2 tag5 tag9
}

Then you can search via a BooleanQuery:

tags:(tag1 OR tag2 OR tag7)

If you are new to Lucene, you might check out Solr first. If nothing
else, it would be a gentle introduction to Lucene, and you could build
a custom Lucene implementation later if it doesn't meet your needs.


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.