Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: kinosearch: discuss

(no subject)

 

 

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded


golbin at gmail

Jul 9, 2006, 2:11 PM

Post #1 of 6 (307 views)
Permalink
(no subject)

hi.

First of all, thank you KinoSearch developers and sorry my poor
english. because I am Korean.

I have one question.

How to Sort the search result in KinoSearch.

I want sort by numeric field the search result

Teach me please.

Thank you.


marvin at rectangular

Jul 10, 2006, 7:43 PM

Post #2 of 6 (295 views)
Permalink
(no subject) [In reply to]

Hello,

On Jul 9, 2006, at 2:06 PM, golbin [at] gmai wrote:
> How to Sort the search result in KinoSearch.
>
> I want sort by numeric field the search result

I'm sorry, but it's not currently possible to sort KinoSearch search
results on anything except relevance score.

Best,

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


tony-kino at kasei

Jul 11, 2006, 2:04 AM

Post #3 of 6 (297 views)
Permalink
(no subject) [In reply to]

On Mon, Jul 10, 2006 at 07:38:05PM -0700, Marvin Humphrey wrote:
> >How to Sort the search result in KinoSearch.
> >I want sort by numeric field the search result
> I'm sorry, but it's not currently possible to sort KinoSearch search
> results on anything except relevance score.

Couldn't this be done the same way as the approach in the 'filtering
search results' thread?

Pre-cache the score you want to sort by, then get the BitVector of
search results, map them to the score array and sort.

I haven't actually done this yet, but it seems like it would be quite
simple (as long as you're able to re-build your scoring cache each time
you update the index). Or am I missing something?

Tony


gavin at myknobs

Jul 11, 2006, 8:12 AM

Post #4 of 6 (293 views)
Permalink
(no subject) [In reply to]

Tony Bowden wrote:

>Couldn't this be done the same way as the approach in the 'filtering
>search results' thread?
>
>
I've been playing with this idea.

Sorting is probably the most important feature needed for searching
outside of a "document orientated" search. Searching by price, and
sorting low to high is a feature must customers look for.

To build the cache:

my %prices = ();

for (my $i = 0; $i < $searcher->max_doc; $i++) {
my $doc = $searcher->fetch_doc($i);
$prices{$i} = $doc->get_value('price');
}

store([ sort { $prices{$a} <=> $prices{$b} } keys %prices ],
"/tmp/price_order");

Seaching:

# get $bit_vec as before
my $doc_nums = $bit_vec->to_arrayref;

foreach my $doc_no (@$prices[@$doc_nums]) {
my $hit = $searcher->fetch_doc($doc_no)->to_hashref;
# do something with search result
}

Gavin.


golbin at gmail

Jul 11, 2006, 3:06 PM

Post #5 of 6 (294 views)
Permalink
(no subject) [In reply to]

2006. 07. 12, ?? 12:06, Gavin Estey ??:

>> search results' thread?
>>
> I've been playing with this idea.
>
> Sorting is probably the most important feature needed for searching
> outside of a "document orientated" search. Searching by price, and
> sorting low to high is a feature must customers look for.
>
> To build the cache:
...

Thank you for help.

But I can't use this way.

Because, I want searching for above 10,000,000 documents, and about
3000 documents increase every one day. So, if pre-caching and sorting
for above 10,000,000 documents, that takes many many many time. T-T;;

And if i want use one field by sort, I could use 'set_boost' for
'Doc' object.

But I want use multi field by sort.

How can I sort search result at many many documents?

Do I use another implementation of Lucene?

I like perl and kinosearch. so I wish to use kinosearch for my search
program.

Help me please.

Thank you for read my question.

And Sorry my poor english.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.rectangular.com/pipermail/kinosearch/attachments/20060711/6f75e137/attachment.html


marvin at rectangular

Jul 11, 2006, 8:44 PM

Post #6 of 6 (299 views)
Permalink
(no subject) [In reply to]

On Jul 11, 2006, at 1:59 AM, Tony Bowden wrote:

> On Mon, Jul 10, 2006 at 07:38:05PM -0700, Marvin Humphrey wrote:
>>> How to Sort the search result in KinoSearch.
>>> I want sort by numeric field the search result
>> I'm sorry, but it's not currently possible to sort KinoSearch search
>> results on anything except relevance score.
>
> Couldn't this be done the same way as the approach in the 'filtering
> search results' thread?
>
> Pre-cache the score you want to sort by, then get the BitVector of
> search results, map them to the score array and sort.

In KinoSearch, unlike Plucene, BitVector is not a public class. I
plan to make it public eventually, but with tweaks. Same deal with
the bits() method from QueryFilter, which isn't public either.

The go-slow approach to expanding the API is deliberate and is one of
the reasons that KinoSearch has had relatively few bug reports for a
project of its size and complexity.

Having a few intrepid individuals tinker with non-public stuff helps
to vet potential API changes before they are made public. Also, the
explanations I send to the mailing list serve as first drafts for
documentation.

I have to be up for the task of supporting these experiments and
explaining lightly documented functionality. While KinoSearch is
heavily commented throughout, the private function descriptions are
sometimes rather light.

Last night I was nearing the end of what was a 13-hour workday when I
sent those emails. It was not a good time for me to explain all the
caveats surrounding a hack. However, I didn't want the question to
linger unanswered any longer than it already had.

> I haven't actually done this yet, but it seems like it would be quite
> simple (as long as you're able to re-build your scoring cache each
> time
> you update the index). Or am I missing something?

You're correct that it would work, though as has been pointed out,
the technique does not scale up as well as other aspects of KinoSearch.

I think the right way to handle the need for matching categories is
to implement an abstract IntSet class, of which BitVector would be
one implementation. One way of replacing the QueryFilter::bits()
hack would be to add a search_intset() method to Searcher. However,
Searcher and InvIndexer, as KinoSearch's two main classes, have a
tendency to accumulate clutter, so a better solution is to make
something like Plucene's search_hc() method available.

Unfortunately, the HitCollector class is one of those thing which
simply cannot be done well in a dynamic language -- the callbacks
have to be C function pointers or it's just too slow with large
datasets. I haven't yet figured out how to expose a decent API for
it. We might provide a smorgasboard of pre-built HitCollectors
matching the most common use cases, but that's not nearly as good.
I'm inching towards the conclusion that there's nothing for it but to
make HitCollector functionality a permanent alpha, advanced feature
and document a C API.

Sort will ultimately be implemented using a FieldCache of some kind.
I'd thought I was going to need this for a project, but it turns out
I didn't so I haven't gotten to it. Not a lot is going to happen for
the next 3 or 4 weeks. Stuff is just too crazy with my main clients.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

kinosearch discuss RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.