Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: General

Performance: Field.Store.YES vs. Field.Store.NO + DB

 

 

Lucene general RSS feed   Index | Next | Previous | View Threaded


ywlee522 at gmail

Jun 11, 2009, 12:00 PM

Post #1 of 5 (447 views)
Permalink
Performance: Field.Store.YES vs. Field.Store.NO + DB

My document store has 750K users who wrote 100M reports. The size of a
report ranges from 1k to 2M.
I have read in several places that actual values (text) can be stored in DB,
while lucene only manages index with Field.Store.NO

I wonder any differences in performance (search and match retrieval) between
Field.Store.YES and NO values. For example, if actual report contents are
stored in a DB (Field.Store.NO), given a search that matches 500 reports,
one has to send either 500 SELECT queries to DB, or one long SELECT with IN
clause in WHERE condition. Or something in between. Is this faster than
retrieving them from index created with Field.Store.YES.

Does NOT storing actual values in index make the search faster?

Any pointer would be appreciated. Thanks




--
View this message in context: http://www.nabble.com/Performance%3A-Field.Store.YES-vs.-Field.Store.NO-%2B-DB-tp23987086p23987086.html
Sent from the Lucene - General mailing list archive at Nabble.com.


lucene at mikemccandless

Jun 11, 2009, 1:25 PM

Post #2 of 5 (422 views)
Permalink
Re: Performance: Field.Store.YES vs. Field.Store.NO + DB [In reply to]

You should try it & see & post back.

When using Lucene, you should sort by docID and then retrieve in that order.

There's also another open source project (don't remember the name)
that aims to be a store for cases like this. There was an
announcement a while back... would be a 3rd option to try.

Please post back results if you get that far!

Mike

On Thu, Jun 11, 2009 at 3:00 PM, ywlee522<ywlee522[at]gmail.com> wrote:
>
>
> My document store has 750K users who wrote 100M reports.  The size of a
> report ranges from 1k to 2M.
> I have read in several places that actual values (text) can be stored in DB,
> while lucene only manages index with Field.Store.NO
>
> I wonder any differences in performance (search and match retrieval) between
> Field.Store.YES and NO values.  For example, if actual report contents are
> stored in a DB (Field.Store.NO), given a search that matches 500 reports,
> one has to send either 500 SELECT queries to DB, or one long SELECT with IN
> clause in WHERE condition. Or something in between.  Is this faster than
> retrieving them from index created with Field.Store.YES.
>
> Does NOT storing actual values in index make the search faster?
>
> Any pointer would be appreciated. Thanks
>
>
>
>
> --
> View this message in context: http://www.nabble.com/Performance%3A-Field.Store.YES-vs.-Field.Store.NO-%2B-DB-tp23987086p23987086.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>


ted.dunning at gmail

Jun 11, 2009, 1:27 PM

Post #3 of 5 (422 views)
Permalink
Re: Performance: Field.Store.YES vs. Field.Store.NO + DB [In reply to]

A traditional database is not normally used for this. Look at something
like Voldemort <http://simonwillison.net/2009/Jan/17/voldemort/> or
Hbase<http://hadoop.apache.org/hbase/>or even
memcache <http://www.danga.com/memcached/> instead.

Also, you database is moderately large, but not massively so. With a decent
sharding system like Katta, you should be able to store the text in your
index and still get good retrieval performance.

On Thu, Jun 11, 2009 at 12:00 PM, ywlee522 <ywlee522[at]gmail.com> wrote:

> I have read in several places that actual values (text) can be stored in
> DB,
> while lucene only manages index with Field.Store.NO
>



--
Ted Dunning, CTO
DeepDyve


ywlee522 at gmail

Jun 12, 2009, 5:03 AM

Post #4 of 5 (405 views)
Permalink
Re: Performance: Field.Store.YES vs. Field.Store.NO + DB [In reply to]

Thanks for the pointers. I sure will do explore them as options.




--
View this message in context: http://www.nabble.com/Performance%3A-Field.Store.YES-vs.-Field.Store.NO-%2B-DB-tp23987086p23997587.html
Sent from the Lucene - General mailing list archive at Nabble.com.


ywlee522 at gmail

Jun 12, 2009, 5:03 AM

Post #5 of 5 (405 views)
Permalink
Re: Performance: Field.Store.YES vs. Field.Store.NO + DB [In reply to]

I will try several options and post results.
Thanks.



--
View this message in context: http://www.nabble.com/Performance%3A-Field.Store.YES-vs.-Field.Store.NO-%2B-DB-tp23987086p23997599.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Lucene general RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.