Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

HitCollector

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


karthik at controlnet

Aug 13, 2004, 3:33 AM

Post #1 of 2 (929 views)
Permalink
HitCollector

Hello

Please somebody explain me how to use the HitCollector on a simple
Searcher.search(query) to obtain score range between 1.0f and 0.02456f.


Thx in advance



WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe [at] jakarta
For additional commands, e-mail: lucene-user-help [at] jakarta


erickerickson at gmail

Jan 22, 2008, 10:32 AM

Post #2 of 2 (783 views)
Permalink
Re: HitCollector [In reply to]

The bitset thing is just an example of a trivial operation in a
HitCollector. You'll want to do something like use TermDocs/TermEnum
to see what category your document is in and add it to some counts
you use rather than just add something to a bitset. Or see the idea
at the end of this mail.

That said, I wouldn't do any sorting here. Just use a Sort object in your
original search. You can even do primary and secondary sorts based
upon different fields. Say category first and relevance second etc.

Also, look at TopDocs.

Hits objects are optimized for getting the top 100 or so documents.
Every time you cross a boundary, you re-execute the
query for the next chunk. So, for instance, you'd re-execute
the query when you asked for doc 101, 201, 301, 401 etc. (although
I think there's been some work on the chunk size recently).

So, if you don't ever expect very many documents or if your searches
are *very* cheap, go ahead and use a Hits object. Otherwise you have
to do some extra work for efficiency if speed issues arise.

Be aware that loading the document in a HitCollector is expensive, but
you can do the lazy loading trick and/or just go directly to your indexed
Category data via TermDocs/TermEnum.

And here's a way to just use the bitset thing, depending. You could
create a filter (or a bitset) for each category for *all* your documents
and cache them. I.e. a b1 for category1, b2 for category 2 etc.
You could do this by using the TermDocs/TermEnum
classes. Then, to count how many of your current search hits
were in each category, AND the bitset from HitCollector you included in your
e-mail with each of your category bitsets and ask for the cardinality of the
result.

Best
Erick

On Jan 22, 2008 12:37 PM, Cam Bazz <cambazz [at] gmail> wrote:

> Hello,
>
> Could someone show me a concrete example of how to use HitCollector?
> I have documents which have a field category. When I run a query, I need
> to
> sort results by category as well as count how many hits are there for a
> given category.
>
> I understand:
>
> searcher.search(Query, new HitCollector() {
> public void collect(int docnum, float score) {
> bitSet.add(docNum);
> }
> }
> );
>
>
> So we now have a bitset that contains docnums.
>
> How do we do sorting and filtering over this, and why is it more efficient
> to do it from hits?
>
> Best Regards,
> -C.B.
>

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.