Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Restricting search results to a dynamic slice of documents

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


earlhood at gmail

May 4, 2012, 11:51 AM

Post #1 of 3 (195 views)
Permalink
Restricting search results to a dynamic slice of documents

I require the ability to perform a search on a dynamic slice of documents in
an index. For a given event, only a select set of documents should be
considered when performing a query.

Looking at the API, it appears that I can use a Collector during the search
to filter out any documents that do not match the current allowed set.
However, the API docs state the following about the collect() method of
Collector:

Note: This is called in an inner search loop. For good search
performance, implementations of this method should not call
Searcher.doc(int) or IndexReader.document(int) on every hit. Doing so
can slow searches by an order of magnitude or more.

Unfortunately, it appears I need to use such methods since I will need to
access the specific document fields to determine if the document is part of
the allowable search set.

Is the performance hit considerable?

I noticed some information about term caching and filtering, but I'm a bit
fuzzy on how to exactly use it and if it is applicable to what I'm trying
to do.

Any help is appreciated,

--ewh

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


erickerickson at gmail

May 5, 2012, 9:38 AM

Post #2 of 3 (195 views)
Permalink
Re: Restricting search results to a dynamic slice of documents [In reply to]

On the face of it, it looks like one of the subclasss of lucene.search.Filter
should be what you're looking for. Or is the "dynamic slice" something
you couldn't formulate into a query?

Best
Erick

On Fri, May 4, 2012 at 2:51 PM, Earl Hood <earlhood [at] gmail> wrote:
> I require the ability to perform a search on a dynamic slice of documents in
> an index.  For a given event, only a select set of documents should be
> considered when performing a query.
>
> Looking at the API, it appears that I can use a Collector during the search
> to filter out any documents that do not match the current allowed set.
> However, the API docs state the following about the collect() method of
> Collector:
>
>    Note: This is called in an inner search loop. For good search
>    performance, implementations of this method should not call
>    Searcher.doc(int) or IndexReader.document(int) on every hit. Doing so
>    can slow searches by an order of magnitude or more.
>
> Unfortunately, it appears I need to use such methods since I will need to
> access the specific document fields to determine if the document is part of
> the allowable search set.
>
> Is the performance hit considerable?
>
> I noticed some information about term caching and filtering, but I'm a bit
> fuzzy on how to exactly use it and if it is applicable to what I'm trying
> to do.
>
> Any help is appreciated,
>
> --ewh
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


earl at earlhood

May 5, 2012, 4:04 PM

Post #3 of 3 (188 views)
Permalink
Re: Restricting search results to a dynamic slice of documents [In reply to]

On Sat, May 5, 2012 at 11:38 AM, Erick Erickson wrote:
> On the face of it, it looks like one of the subclasss of lucene.search.Filter
> should be what you're looking for. Or is the "dynamic slice" something
> you couldn't formulate into a query?

The query route is possible, but it would make for a large and query.
I'm not sure how lucene will do with a large boolean query.

I looked into Filters and it appears FieldCacheTermsFilter may do what I
need, but I may have to develop a wrapper versions of it since there are
at least two different fields that have to be checked to verify that a
document is allowed to be included in the result set.

Looks like I have some experimentation to do.

Thanks for the response,

--ewh

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.