Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

Lucene Index file vs. database

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


zoran.zoki at gmail

Sep 29, 2008, 6:46 AM

Post #1 of 4 (291 views)
Permalink
Lucene Index file vs. database

Hi,

First I want to apologize if I'm asking something that was asked already. I
tried search, but couldn't find what I was looking for (or I simply don't
know how to define the search string for my question).

I'm working on a project that has huge database in the background. We
decided to use Lucene for "faster" search. Our search works similar as all
searches: you write search string, get list of hits with detail link. But
there is dilemma if we should store more data into index that's needed.

One side of developing team insists that we should use lucene index as
somekind of storage for data so when you get hit, you go onto details and
then again use lucene to find document that matches the selected ID. So in
the end you end with copying complete database tables into the lucene index

Other side insists on storing to index only data that is displayed on the
search results list and needed for search criteria. When you go onto
details, you have the matching ID so you can pickup that row from database
by that ID (I also like this better).

Can someone please describe drawbacks and advantages of both approaches.
Actually can someone write down what's the actual profit, where and when of
the Lucene itself.

Thank you
--
View this message in context: http://www.nabble.com/Lucene-Index-file-vs.-database-tp19724177p19724177.html
Sent from the Lucene - General mailing list archive at Nabble.com.


otis_gospodnetic at yahoo

Sep 30, 2008, 7:58 AM

Post #2 of 4 (272 views)
Permalink
Re: Lucene Index file vs. database [In reply to]

Zoki,

A better list to ask this on is java-user[at]lucene.

In short, you can really go either way. Some people feel more comfortable storing everything in DB as they trust it more (RDBMS's have been around longer than Lucene has), know how to back it up, need data integrity (FKs), etc. Storing relational data in Lucene requires flattening of relations and entities.

Storing everything in Lucene means larger indices, which doesn't necessarily affect search speed, but it does mean things like slower optimization, more IO on the machine running search, etc.

I could go on listing various little advantages/disadvantages, but there is no ultimate "do this, don't do that" answer.

Pozdrav,

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: agatone <zoran.zoki[at]gmail.com>
> To: general[at]lucene.apache.org
> Sent: Monday, September 29, 2008 9:46:59 AM
> Subject: Lucene Index file vs. database
>
>
> Hi,
>
> First I want to apologize if I'm asking something that was asked already. I
> tried search, but couldn't find what I was looking for (or I simply don't
> know how to define the search string for my question).
>
> I'm working on a project that has huge database in the background. We
> decided to use Lucene for "faster" search. Our search works similar as all
> searches: you write search string, get list of hits with detail link. But
> there is dilemma if we should store more data into index that's needed.
>
> One side of developing team insists that we should use lucene index as
> somekind of storage for data so when you get hit, you go onto details and
> then again use lucene to find document that matches the selected ID. So in
> the end you end with copying complete database tables into the lucene index
>
> Other side insists on storing to index only data that is displayed on the
> search results list and needed for search criteria. When you go onto
> details, you have the matching ID so you can pickup that row from database
> by that ID (I also like this better).
>
> Can someone please describe drawbacks and advantages of both approaches.
> Actually can someone write down what's the actual profit, where and when of
> the Lucene itself.
>
> Thank you
> --
> View this message in context:
> http://www.nabble.com/Lucene-Index-file-vs.-database-tp19724177p19724177.html
> Sent from the Lucene - General mailing list archive at Nabble.com.


zoran.zoki at gmail

Sep 30, 2008, 10:23 AM

Post #3 of 4 (272 views)
Permalink
Re: Lucene Index file vs. database [In reply to]

Thank you for your reply.
I'm just affraid of choosing wrong option for the project and later it would
be harder to change it.

Pozdrav :)


Otis Gospodnetic wrote:
>
> Zoki,
>
> A better list to ask this on is java-user[at]lucene.
>
> In short, you can really go either way. Some people feel more comfortable
> storing everything in DB as they trust it more (RDBMS's have been around
> longer than Lucene has), know how to back it up, need data integrity
> (FKs), etc. Storing relational data in Lucene requires flattening of
> relations and entities.
>
> Storing everything in Lucene means larger indices, which doesn't
> necessarily affect search speed, but it does mean things like slower
> optimization, more IO on the machine running search, etc.
>
> I could go on listing various little advantages/disadvantages, but there
> is no ultimate "do this, don't do that" answer.
>
> Pozdrav,
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: agatone <zoran.zoki[at]gmail.com>
>> To: general[at]lucene.apache.org
>> Sent: Monday, September 29, 2008 9:46:59 AM
>> Subject: Lucene Index file vs. database
>>
>>
>> Hi,
>>
>> First I want to apologize if I'm asking something that was asked
>> already. I
>> tried search, but couldn't find what I was looking for (or I simply don't
>> know how to define the search string for my question).
>>
>> I'm working on a project that has huge database in the background. We
>> decided to use Lucene for "faster" search. Our search works similar as
>> all
>> searches: you write search string, get list of hits with detail link. But
>> there is dilemma if we should store more data into index that's needed.
>>
>> One side of developing team insists that we should use lucene index as
>> somekind of storage for data so when you get hit, you go onto details and
>> then again use lucene to find document that matches the selected ID. So
>> in
>> the end you end with copying complete database tables into the lucene
>> index
>>
>> Other side insists on storing to index only data that is displayed on the
>> search results list and needed for search criteria. When you go onto
>> details, you have the matching ID so you can pickup that row from
>> database
>> by that ID (I also like this better).
>>
>> Can someone please describe drawbacks and advantages of both approaches.
>> Actually can someone write down what's the actual profit, where and when
>> of
>> the Lucene itself.
>>
>> Thank you
>> --
>> View this message in context:
>> http://www.nabble.com/Lucene-Index-file-vs.-database-tp19724177p19724177.html
>> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>
>

--
View this message in context: http://www.nabble.com/Lucene-Index-file-vs.-database-tp19724177p19746698.html
Sent from the Lucene - General mailing list archive at Nabble.com.


richard.marr at gmail

Oct 20, 2008, 4:12 AM

Post #4 of 4 (156 views)
Permalink
Re: Lucene Index file vs. database [In reply to]

> Thank you for your reply.
> I'm just affraid of choosing wrong option for the project and later it would
> be harder to change it.

I'm making the same kinds of decisions at the moment. My plan is to
use an RDBMS to store the "master" copy of our data, because that
technology is a ubiquitous skill set, and we know its capabilities
almost instinctively.

For example, we can hire a DBA off-the-shelf to deal with database
replication without having to explain the details of our other systems
to them, but doing the same thing with Lucene would require them to
have experience with it.

This way we can save our scarce brain-power for more interesting problems :)

Rich

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.