Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Bricolage: devel

tsearch or kino or lucene ?

 

 

Bricolage devel RSS feed   Index | Next | Previous | View Threaded


hiddenharmony at gmail

May 13, 2008, 10:11 AM

Post #1 of 25 (567 views)
Permalink
tsearch or kino or lucene ?

Hi! All,
I am mentoring the GsoC 08 bricolage project which aims at creating
full text search capability to bricolage. We had been looking at
options available to add full text search. We have following three
options

1) postgresql tsearch :- This one works with postgresql only that too
with postgresql 8.3 onwards
2) KinoSearch :- This is a perl based utility which is an adaption of
lucene. Kino document[1] says kino is an "officially" alpha software
and file format may change in future.
3) Lucene:- Proven utility and used by lot of sites/softwares. In
personal conversation David pointed out that it will add another
dependency on Java.

What do you all suggest ?

regards
VK
[1]http://search.cpan.org/~creamyg/KinoSearch-0.162/lib/KinoSearch.pm
--
The hidden harmony is better than the obvious!!


D-Beaudet at NGA

May 13, 2008, 10:13 AM

Post #2 of 25 (550 views)
Permalink
RE: tsearch or kino or lucene ? [In reply to]

Purposefully poking the horets nest:

How about creating "Javaloge" to go with a 100% Java based CMS :)

________________________________

From: Vivek Khurana [mailto:hiddenharmony[at]gmail.com]
Sent: Tue 5/13/2008 1:11 PM
To: devel[at]lists.bricolage.cc
Subject: tsearch or kino or lucene ?



Hi! All,
I am mentoring the GsoC 08 bricolage project which aims at creating
full text search capability to bricolage. We had been looking at
options available to add full text search. We have following three
options

1) postgresql tsearch :- This one works with postgresql only that too
with postgresql 8.3 onwards
2) KinoSearch :- This is a perl based utility which is an adaption of
lucene. Kino document[1] says kino is an "officially" alpha software
and file format may change in future.
3) Lucene:- Proven utility and used by lot of sites/softwares. In
personal conversation David pointed out that it will add another
dependency on Java.

What do you all suggest ?

regards
VK
[1]http://search.cpan.org/~creamyg/KinoSearch-0.162/lib/KinoSearch.pm
--
The hidden harmony is better than the obvious!!


hiddenharmony at gmail

May 13, 2008, 10:22 AM

Post #3 of 25 (552 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

On Tue, May 13, 2008 at 10:43 PM, Beaudet, David <D-Beaudet[at]nga.gov> wrote:
>
> Purposefully poking the horets nest:
>
> How about creating "Javaloge" to go with a 100% Java based CMS :)

Nah too much of work... and we are simply using a utility written in
java and not much of programing in java. Its like we are using
postgres using which is written in C... So why not write a CMS in C
called Cloge ?

regards
VK
--
The hidden harmony is better than the obvious!!


ktm at rice

May 13, 2008, 12:22 PM

Post #4 of 25 (550 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

I would recommend using tsearch and postgresql. It may
be worth looking at the full text support from other databases
to allow an implimentation that could be extended to support
other backend databases in the future. I am not at all keen
to have either an alpha software or java dependency added to
Bricolage.

Cheers,
Ken

On Tue, May 13, 2008 at 10:41:30PM +0530, Vivek Khurana wrote:
> Hi! All,
> I am mentoring the GsoC 08 bricolage project which aims at creating
> full text search capability to bricolage. We had been looking at
> options available to add full text search. We have following three
> options
>
> 1) postgresql tsearch :- This one works with postgresql only that too
> with postgresql 8.3 onwards
> 2) KinoSearch :- This is a perl based utility which is an adaption of
> lucene. Kino document[1] says kino is an "officially" alpha software
> and file format may change in future.
> 3) Lucene:- Proven utility and used by lot of sites/softwares. In
> personal conversation David pointed out that it will add another
> dependency on Java.
>
> What do you all suggest ?
>
> regards
> VK
> [1]http://search.cpan.org/~creamyg/KinoSearch-0.162/lib/KinoSearch.pm
> --
> The hidden harmony is better than the obvious!!
>


david at kineticode

May 13, 2008, 2:11 PM

Post #5 of 25 (551 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

On May 13, 2008, at 10:13, Beaudet, David wrote:

> Purposefully poking the horets nest:
>
> How about creating "Javaloge" to go with a 100% Java based CMS :)

How about I quit? ;-)

Seriously, Java badly inflames my tendonitis. Too much typing!

David


david at kineticode

May 13, 2008, 2:13 PM

Post #6 of 25 (553 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

On May 13, 2008, at 12:22, Kenneth Marshall wrote:

> I would recommend using tsearch and postgresql. It may
> be worth looking at the full text support from other databases
> to allow an implimentation that could be extended to support
> other backend databases in the future. I am not at all keen
> to have either an alpha software or java dependency added to
> Bricolage.

KinoSearch has been around for years, and Marvin, whom I know
personally (he lives down in Eugene, a couple hours from me) is always
great about providing migration paths. I set up KinoSearch for my blog
a couple years ago; took me about an hour and a half. I'm a big fan.

But I'd be happy with tsearch2, too.

Best,

David


alex at gossamer-threads

May 15, 2008, 5:18 PM

Post #7 of 25 (542 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

Hi,

On Tue May 13 02:13:18, David E. Wheeler wrote:

> On May 13, 2008, at 12:22, Kenneth Marshall wrote:
>
> > I would recommend using tsearch and postgresql. It may
> > be worth looking at the full text support from other databases
> > to allow an implimentation that could be extended to support
> > other backend databases in the future. I am not at all keen
> > to have either an alpha software or java dependency added to
> > Bricolage.
>
> KinoSearch has been around for years, and Marvin, whom I know
> personally (he lives down in Eugene, a couple hours from me) is always
> great about providing migration paths. I set up KinoSearch for my blog
> a couple years ago; took me about an hour and a half. I'm a big fan.
>
> But I'd be happy with tsearch2, too.

tsearch limits it to postgres only, and mysql full text indexing support
is pretty poor -- though not sure if going forward you want to support
mysql, or if it was just a one shot deal?

Kinosearch is great and we've used it pretty extensively on a project,
but have ran into a number of bugs. I second the sentiment though,
Marvin is very responsive and a great resource at helping getting the
bugs fixed. It holds great promise, and is leaps ahead of Plucene (perl
port of lucene).

You might want to consider Solr which is a server wrapped around Lucene.
So a user could just add a url to their solr server and then you post
xml to it to index and retrieve data. That keep integration very simple and
does not really "depend" on java. The Solr server could be anywhere.

It might be useful to have Solr exposed to a front end as well to
provide a site search. I'd highly recommend taking a look.

Cheers,

Alex

--
Alex Krohn <alex[at]gossamer-threads.com>


david at kineticode

May 15, 2008, 8:49 PM

Post #8 of 25 (538 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

On May 15, 2008, at 17:18, Alex Krohn wrote:

> tsearch limits it to postgres only, and mysql full text indexing
> support
> is pretty poor -- though not sure if going forward you want to support
> mysql, or if it was just a one shot deal?

No, we'll support it, but it's InnoDB only, so no full-text search.

> You might want to consider Solr which is a server wrapped around
> Lucene.
> So a user could just add a url to their solr server and then you post
> xml to it to index and retrieve data. That keep integration very
> simple and
> does not really "depend" on java. The Solr server could be anywhere.

Point taken.

> It might be useful to have Solr exposed to a front end as well to
> provide a site search. I'd highly recommend taking a look.

Not sure about that, but it's worth considering, I agree.

Best,

David


kpnunni at gmail

May 17, 2008, 1:22 PM

Post #9 of 25 (521 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

Hi all,

This is my report on the KinoSearch vs Tsearch problem.
Tell me what you guys think.



On Fri, May 16, 2008 at 9:19 AM, David E. Wheeler <david[at]kineticode.com>
wrote:

> On May 15, 2008, at 17:18, Alex Krohn wrote:
>
> tsearch limits it to postgres only, and mysql full text indexing support
>> is pretty poor -- though not sure if going forward you want to support
>> mysql, or if it was just a one shot deal?
>>
>
> No, we'll support it, but it's InnoDB only, so no full-text search.
>
> You might want to consider Solr which is a server wrapped around Lucene.
>> So a user could just add a url to their solr server and then you post
>> xml to it to index and retrieve data. That keep integration very simple
>> and
>> does not really "depend" on java. The Solr server could be anywhere.
>>
>
> Point taken.
>
> It might be useful to have Solr exposed to a front end as well to
>> provide a site search. I'd highly recommend taking a look.
>>
>
> Not sure about that, but it's worth considering, I agree.
>
> Best,
>
> David
>



--
Cheers,
Unni

(http://unni.chipmonks.co.in)


kpnunni at gmail

May 17, 2008, 1:26 PM

Post #10 of 25 (522 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

hi
I think we should go with KinoSearch. Check my report and tell me if agree
or not.

On Sun, May 18, 2008 at 1:52 AM, Unni <kpnunni[at]gmail.com> wrote:

> Hi all,
>
> This is my report on the KinoSearch vs Tsearch problem.
> Tell me what you guys think.
>
>
>
>
> On Fri, May 16, 2008 at 9:19 AM, David E. Wheeler <david[at]kineticode.com>
> wrote:
>
>> On May 15, 2008, at 17:18, Alex Krohn wrote:
>>
>> tsearch limits it to postgres only, and mysql full text indexing support
>>> is pretty poor -- though not sure if going forward you want to support
>>> mysql, or if it was just a one shot deal?
>>>
>>
>> No, we'll support it, but it's InnoDB only, so no full-text search.
>>
>> You might want to consider Solr which is a server wrapped around Lucene.
>>> So a user could just add a url to their solr server and then you post
>>> xml to it to index and retrieve data. That keep integration very simple
>>> and
>>> does not really "depend" on java. The Solr server could be anywhere.
>>>
>>
>> Point taken.
>>
>> It might be useful to have Solr exposed to a front end as well to
>>> provide a site search. I'd highly recommend taking a look.
>>>
>>
>> Not sure about that, but it's worth considering, I agree.
>>
>> Best,
>>
>> David
>>
>
>
>
> --
> Cheers,
> Unni
>
> (http://unni.chipmonks.co.in)
>



--
Cheers,
Unni

(http://unni.chipmonks.co.in)


kpnunni at gmail

May 17, 2008, 1:45 PM

Post #11 of 25 (522 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

sorry guys forgot that the list strips attachments
please check this link for my first report

http://perlswirl.wordpress.com/2008/05/17/kinosearch-vs-tsearch2/


--
Cheers,
Unni

(http://unni.chipmonks.co.in)


david at kineticode

May 17, 2008, 3:15 PM

Post #12 of 25 (525 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

On May 17, 2008, at 13:45, Unni wrote:

> sorry guys forgot that the list strips attachments
> please check this link for my first report
>
> http://perlswirl.wordpress.com/2008/05/17/kinosearch-vs-tsearch2/

Nice report, Unni, thanks. Tell me, did you get both installed and do
any experimentation with them?

Have you given any thought as to *how* to integrate tsearch, since
searching it is not integrated with the database?

Best,

David


kpnunni at gmail

May 18, 2008, 9:19 AM

Post #13 of 25 (509 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

Hi all,

I've installed both the searches,
I'll be giving details of what I tried out in a second report I am
preparing.

Some more considerations have creeped into the selection of eithere
kino or tsearch. Once I finish my second report I would
appreciate a show of hands, on what you guys think we should choose.
--
Cheers,
Unni

(http://unni.chipmonks.co.in)


david at kineticode

May 18, 2008, 9:24 AM

Post #14 of 25 (508 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

On May 18, 2008, at 09:19, Unni wrote:

> Hi all,
>
> I've installed both the searches,
> I'll be giving details of what I tried out in a second report I am
> preparing.
>
> Some more considerations have creeped into the selection of
> eithere
> kino or tsearch. Once I finish my second report I would
> appreciate a show of hands, on what you guys think we should
> choose.

Fair enough, thanks.

Second report when, tomorrow?

Best,

David


simonw at digitalcraftsmen

May 18, 2008, 9:32 AM

Post #15 of 25 (509 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

Unni wrote:
> sorry guys forgot that the list strips attachments
> please check this link for my first report
>
> http://perlswirl.wordpress.com/2008/05/17/kinosearch-vs-tsearch2/

I'd tend towards Kinosearch if only because Tsearch is Postgres
specific. IMO, after people have done so much work to support MySQL it
would be a retrograde step to introduce something that only works in
Postgres.

It is my hope that introducing MySQL in v2 will encourage takeup in
places where people have been wary of Postgres. For that reason, all
features ought to be database agnostic.

But then I'm not doing the work so I have only a small voice in this
discussion.

S.

--
Digital Craftsmen Ltd
Exmouth House, 3 Pine Street, London. EC1R 0JH
t 020 7183 1410 f 020 7099 5140 m 07951 758698
w http://www.digitalcraftsmen.net/


kpnunni at gmail

May 18, 2008, 12:19 PM

Post #16 of 25 (511 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

hi,
second report forthcoming in a couple of hours.
A bit tired.

Cheers,
Unni

(http://unni.chipmonks.co.in)


david at kineticode

May 18, 2008, 3:11 PM

Post #17 of 25 (507 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

On May 18, 2008, at 09:32, Simon Wilcox wrote:

> I'd tend towards Kinosearch if only because Tsearch is Postgres
> specific. IMO, after people have done so much work to support MySQL
> it would be a retrograde step to introduce something that only works
> in Postgres.

I suppose that depends on one's opinion of MySQL. ;-)

> It is my hope that introducing MySQL in v2 will encourage takeup in
> places where people have been wary of Postgres. For that reason, all
> features ought to be database agnostic.
>
> But then I'm not doing the work so I have only a small voice in this
> discussion.

All discussion is welcomed and encouraged. Thanks for chiming in.

Best,

David


kpnunni at gmail

May 19, 2008, 11:24 AM

Post #18 of 25 (500 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

Hi all,

Thanks for your support. I've Posted my second report. Please check it here.

http://perlswirl.wordpress.com/2008/05/19/kinosearch-vs-tsearch2-contd/

Cheers,
Unni

(http://unni.chipmonks.co.in)


david at kineticode

May 19, 2008, 12:14 PM

Post #19 of 25 (502 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

On May 19, 2008, at 11:24, Unni wrote:

> http://perlswirl.wordpress.com/2008/05/19/kinosearch-vs-tsearch2-
> contd/

> Please check this report and let me know. The question to be decided
> now is whether or not we intend to support mysql. If not
> implementing tsearch2 would be much easier as detailed in the
> original application.

Well, maybe. Since the data representing a document can be found in
many tables, building the index entries might be tricky. This issue
doesn't exist for the KS solution, as you would just use the Perl API
to build the index entries.

> Implementing KinoSearch would mean more changes to the installer
> script and bringing in unwanted dependencies.

Well, I think that the only extra dependency would be KinoSearch and
its dependencies, which I haven't had any trouble installing on any
system for the last few years.

Anyway, I think that my opinion here is clear. What's your
recommendation? What do the rest of yous guys think?

Best,

David


phillip at communitybandwidth

May 19, 2008, 12:18 PM

Post #20 of 25 (505 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

On 19-May-08, at 3:14 PM, David E. Wheeler wrote:

> On May 19, 2008, at 11:24, Unni wrote:
>
>> http://perlswirl.wordpress.com/2008/05/19/kinosearch-vs-tsearch2-contd/
>
>> Please check this report and let me know. The question to be
>> decided now is whether or not we intend to support mysql. If not
>> implementing tsearch2 would be much easier as detailed in the
>> original application.
>
> Well, maybe. Since the data representing a document can be found in
> many tables, building the index entries might be tricky. This issue
> doesn't exist for the KS solution, as you would just use the Perl
> API to build the index entries.
>
>> Implementing KinoSearch would mean more changes to the installer
>> script and bringing in unwanted dependencies.
>
> Well, I think that the only extra dependency would be KinoSearch and
> its dependencies, which I haven't had any trouble installing on any
> system for the last few years.
>
> Anyway, I think that my opinion here is clear. What's your
> recommendation? What do the rest of yous guys think?


Sounds like KinoSearch to me.


--
Phillip Smith,
Simplifier of Technology
Community Bandwidth
http://www.communitybandwidth.ca

Don't miss the Social Tech Training:
www.marsdd.com/socialtechtraining
June 22-24, 2008 in Toronto


alex at gossamer-threads

May 19, 2008, 12:41 PM

Post #21 of 25 (504 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

Hi,

> Anyway, I think that my opinion here is clear. What's your
> recommendation? What do the rest of yous guys think?

Personally I'd like to see Solr:

http://lucene.apache.org/solr/

because it's:

* database independent
* scalable -- if search index needs to be moved off to a separate server,
it can be done so easily, whereas it's more difficult separating
bricolage and kinosearch and its associated index.
* powerful -- very flexible search options, very quick, and good quality
results.
* can integrate with public site -- adding a search option has been a
popular requests for the bricolage sites we are running, and solutions
like htdig, swish-e, and others don't come close to what you could do
with solr and having all the meta data available.
* fairly easy to setup -- if you have java installed (most o/s have
packages for this), download tomcat, add solar.war, minor adjust to the
config and you are set. We could help with docs here as we've done this
extensively, and have a plugin for our own forum software with setup +
install instructions.
* could be separate similar to the wyswyg editors, just plug in the url
to your solr instance, and you activate it.
* bigger user community around it to continue develop and enhance

We've built a plugin using Solr for our forum software, so if you need
any help/assistance for how to setup/use, please feel free to ask.

My concerns with tsearch is that it ties you into postgres, as well as
the quality of results. Since MySQL support (and windows?) might bring a
lot of new users into this, should try and keep features supported on
both systems.

My concern with Kinosearch is stability. Again, we've used it
extensively and have run into a few bugs with it. Also, it is still
considered alpha quality. From the docs:

KinoSearch is officially "alpha" software, mostly because the file
format may be changed in the future. If you have an incremental indexing
app set up when that change arrives, and your sysadmin upgrades
KinoSearch unaware of that, your app will suddenly start crashing. Until
the file format issue is settled, changes to the API are also fair game.

so including this could break sites going forward as new versions are
released.

If you don't go this way, it might be good to make the search
mechanisms hookable, so alternate ones could be used. i.e. either the
indexer/searcher are subclassable, or pluggable.

Just my 0.02 =)

Cheers,

Alex

--
Alex Krohn <alex[at]gossamer-threads.com>


hiddenharmony at gmail

May 19, 2008, 1:03 PM

Post #22 of 25 (505 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

On Tue, May 20, 2008 at 1:11 AM, Alex Krohn <alex[at]gossamer-threads.com> wrote:
>
> If you don't go this way, it might be good to make the search
> mechanisms hookable, so alternate ones could be used. i.e. either the
> indexer/searcher are subclassable, or pluggable.

That is a good suggestion. Let there be a freedom for choosing the
alternative. We can have official release with kino and those
interested may implement solr or lucene...
What you guys say ?

regards
VK
--
The hidden harmony is better than the obvious!!


kpnunni at gmail

May 19, 2008, 1:12 PM

Post #23 of 25 (505 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

>
> > That is a good suggestion. Let there be a freedom for choosing the
> >alternative. We can have official release with kino and those
> >interested may implement solr or lucene...
>


I concur.

--
Cheers,
Unni

(http://unni.chipmonks.co.in)


bharder at methodlogic

May 19, 2008, 1:26 PM

Post #24 of 25 (502 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

On Mon, May 19, 2008 at 12:41:44PM -0700, Alex Krohn wrote:
> Hi,
>
> > Anyway, I think that my opinion here is clear. What's your
> > recommendation? What do the rest of yous guys think?
>
> Personally I'd like to see Solr:
>
> http://lucene.apache.org/solr/
>
> because it's:
>
> * database independent
> * scalable -- if search index needs to be moved off to a separate server,
> it can be done so easily, whereas it's more difficult separating
> bricolage and kinosearch and its associated index.
> * powerful -- very flexible search options, very quick, and good quality
> results.
> * can integrate with public site -- adding a search option has been a
> popular requests for the bricolage sites we are running, and solutions
> like htdig, swish-e, and others don't come close to what you could do
> with solr and having all the meta data available.
> * fairly easy to setup -- if you have java installed (most o/s have
> packages for this), download tomcat, add solar.war, minor adjust to the
> config and you are set. We could help with docs here as we've done this
> extensively, and have a plugin for our own forum software with setup +
> install instructions.
> * could be separate similar to the wyswyg editors, just plug in the url
> to your solr instance, and you activate it.
> * bigger user community around it to continue develop and enhance

Alex, can you explain where swish-e falls down?


--

Brad Harder,
Method Digital Logic
http://www.methodlogic.net


alex at gossamer-threads

May 19, 2008, 1:45 PM

Post #25 of 25 (504 views)
Permalink
Re: tsearch or kino or lucene ? [In reply to]

Hi Brad,

On Mon, 19 May 2008 13:26:53 -0700 <bharder[at]methodlogic.net> wrote:

> Alex, can you explain where swish-e falls down?

It's been a year since I last played with swish-e extensively, but if I
remember correctly:

* lack of incremental indexing support -- which is a big headache on
very large sites having to reindex everything nightly, or on sites that
publishes several times an hour and you want the search index always up
to date.

* won't have all the meta data available -- this is big as you can then
offer something like search on author x in category y with keywords z
sorted by post date or sort by relevance.

* not as sophisticated search syntax:

http://lucene.apache.org/java/docs/queryparsersyntax.html

vs

http://swish-e.org/docs/swish-search.html

We've found boosts very helpful to help promote good content (i.e. page
one of stories versus page x).

* doesn't scale well on large sites with very large data sets. We had a
site with 300,000 word files averaging 50-75k in size and were seeing
5-6s response times. We got subsecond results with Lucene. Also, see:

http://lucene.apache.org/solr/features.html

for details on caching, replication and other high volume features
natively supported in solr. You could wrap swish-e and do this yourself
of course, but solr takes care of it for you.

* not nearly as actively developed

And subjectively, we found the quality of results much better with
Lucene (solr is just a wrapper -- originally we had our own perl based
search server using Lucene). We did extensive testing of Lucene, Swish-E,
ht://dig and a few others (no kinosearch) a year or two ago on our
mailing list archive, and found Lucene the clear winner.

If I've got anything wrong, please let me know.

Cheers,

Alex

--
Alex Krohn <alex[at]gossamer-threads.com>

Bricolage devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.