Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

Lucene performance issues..

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


mazhar.lateef at cryoserver

Jul 27, 2008, 1:38 PM

Post #1 of 6 (440 views)
Permalink
Lucene performance issues..

Hi,

we have a system to archive mails and are facing some issues that we are
having with search and indexing performance, the following is what we
are currently facing challenges with, we are currently using lucene
version 2.2 the platform is SLES10.1 and the application is written in
Java.

* Index merging and optimization.
The index merging/optimization takes too long when large
number of documents are being
added, we have resolved this by having two indexes and
keeping one writable at all times so that mails are
constantly processed. however it would be good to find a
solution to this issue. and have the index optimization take less time
and be more efficient,
also this process consumes a lot of memory. and sometime the
application runs of memory.

* email searching
o We are creating very large indexes for emails we are
processing, the size is upto +150GB for indexes only (not
including data content), this we thought would improve
search performance since less indexes to open and read from,
however the searching taking upto minutes and sometime never
returns results

We have also tried upgrading the lucene version to 2.3 in hope to
improve performance but the results were quite the opposite. but from my
research on the internet the Lucene version 2.3 is much faster and
better so why are we seeing such inconsistency.

we are now adopting a slightly different architecture where we will be
able to split the indexes and reduce the size, however I wanted to get
some expert help, to help us with the decision of which lucene version
to use, and how to use the lucene to get the best balance between index
size, merging indexes and most importantly search performance,
Any help would be very much appreciated.

Many thanks in advance.

Maz

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucenelist2007 at danielnaber

Jul 27, 2008, 1:59 PM

Post #2 of 6 (431 views)
Permalink
Re: Lucene performance issues.. [In reply to]

On Sonntag, 27. Juli 2008, Mazhar Lateef wrote:

> We have also tried upgrading the lucene version to 2.3 in hope to
> improve performance but the results were quite the opposite. but from my
> research on the internet the Lucene version 2.3 is much faster and
> better so why are we seeing such inconsistency.

Have you checked out these pages?

http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

Even a large index should be fast, assuming the queries are not
complicated. Also, with a large index the performance depends on the
number of matches, i.e. searching for very common terms might be slow.
Maybe you could post more information about your queries.

Regards
Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


stuhood at mailtrust

Jul 27, 2008, 3:19 PM

Post #3 of 6 (402 views)
Permalink
Re: Lucene performance issues.. [In reply to]

Also, keep in mind that optimization is a very disk intense process (and therefore slow). It completely rewrites the index, and should only be done when you are not expecting the index to change for a while.


-----Original Message-----
From: "Daniel Naber" <lucenelist2007 [at] danielnaber>
Sent: Sunday, July 27, 2008 4:59pm
To: java-user [at] lucene
Subject: Re: Lucene performance issues..

On Sonntag, 27. Juli 2008, Mazhar Lateef wrote:

> We have also tried upgrading the lucene version to 2.3 in hope to
> improve performance but the results were quite the opposite. but from my
> research on the internet the Lucene version 2.3 is much faster and
> better so why are we seeing such inconsistency.

Have you checked out these pages?

http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

Even a large index should be fast, assuming the queries are not
complicated. Also, with a large index the performance depends on the
number of matches, i.e. searching for very common terms might be slow.
Maybe you could post more information about your queries.

Regards
Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


nageshblore at gmail

Jul 28, 2008, 12:47 AM

Post #4 of 6 (408 views)
Permalink
Re: Lucene performance issues.. [In reply to]

Not an answer to your question. But, have you tried IBM's OmniFind Personal
Email Search ? Excerpt from their site :

Simple keyword or text search is not always effective for quickly finding
what you need. IBM(R) has gone beyond keywords by inventing a fast and
accurate semantic search system for personal e-mail.

IBM OmniFind Personal E-mail Search enables semantic searching by extracting
and organizing concepts and relationships from personal e-mail. Any business
e-mail user who must search in order to accomplish a business purpose will
find this tool invaluable. Customization of semantic concepts and the
ability to share these concepts with colleagues make this tool especially
useful for large enterprise customers.

IBM OmniFind Personal E-mail Search is easy to install and configure and
automatically adjusts to desktop load.


More : http://www.alphaworks.ibm.com/tech/emailsearch

Nagesh

On Mon, Jul 28, 2008 at 3:49 AM, Stu Hood <stuhood [at] mailtrust> wrote:

> Also, keep in mind that optimization is a very disk intense process (and
> therefore slow). It completely rewrites the index, and should only be done
> when you are not expecting the index to change for a while.
>
>
> -----Original Message-----
> From: "Daniel Naber" <lucenelist2007 [at] danielnaber>
> Sent: Sunday, July 27, 2008 4:59pm
> To: java-user [at] lucene
> Subject: Re: Lucene performance issues..
>
> On Sonntag, 27. Juli 2008, Mazhar Lateef wrote:
>
> > We have also tried upgrading the lucene version to 2.3 in hope to
> > improve performance but the results were quite the opposite. but from my
> > research on the internet the Lucene version 2.3 is much faster and
> > better so why are we seeing such inconsistency.
>
> Have you checked out these pages?
>
> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>
> Even a large index should be fast, assuming the queries are not
> complicated. Also, with a large index the performance depends on the
> number of matches, i.e. searching for very common terms might be slow.
> Maybe you could post more information about your queries.
>
> Regards
> Daniel
>
> --
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>


te at statsbiblioteket

Jul 28, 2008, 1:47 AM

Post #5 of 6 (400 views)
Permalink
Re: Lucene performance issues.. [In reply to]

On Sun, 2008-07-27 at 21:38 +0100, Mazhar Lateef wrote:
> * email searching
> o We are creating very large indexes for emails we are
> processing, the size is upto +150GB for indexes only (not
> including data content), this we thought would improve
> search performance since less indexes to open and read from,
> however the searching taking upto minutes and sometime never
> returns results

It this with or without warm-up? How many hits does a query typically
return and what do you do with those hits?

> We have also tried upgrading the lucene version to 2.3 in hope to
> improve performance but the results were quite the opposite. but from my
> research on the internet the Lucene version 2.3 is much faster and
> better so why are we seeing such inconsistency.

I encountered the same problem some time ago. It turns out that you get
lower performance if you're using an index from an older version of
Lucene with a newer version. If you haven't done so already, try
converting the old index to the new format.

Another way to go is Solid State Drives. While not cheap in themselves,
the performance-increase might make a purchase favorable. One of the
fine properties of Lucene on SSD, besides the general increase in speed,
if that less warm-up is required. This means that frequent updates of a
large index without a huge performance-hit is attainable. For a 150GB+
index, you would probably want to go for 256GB of storage and split the
index in 64GB chunks, to avoid running out of storage during merge.

We've put some fragmented notes and observations on the subject at
http://wiki.statsbiblioteket.dk/summa/Hardware - I apologize for not
taking the time to polish it, but an important deadline is looming.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Jul 28, 2008, 2:33 AM

Post #6 of 6 (397 views)
Permalink
Re: Lucene performance issues.. [In reply to]

Perhaps one thing to try is a partial optimize
(IndexWriter.optimize(int maxNumSegments)). It makes optimize faster,
but searches may run slower than a full optimize.

EG, optimize(5) will reduce index to <= 5 segments.

Mike

Stu Hood wrote:

> Also, keep in mind that optimization is a very disk intense process
> (and therefore slow). It completely rewrites the index, and should
> only be done when you are not expecting the index to change for a
> while.
>
>
> -----Original Message-----
> From: "Daniel Naber" <lucenelist2007 [at] danielnaber>
> Sent: Sunday, July 27, 2008 4:59pm
> To: java-user [at] lucene
> Subject: Re: Lucene performance issues..
>
> On Sonntag, 27. Juli 2008, Mazhar Lateef wrote:
>
>> We have also tried upgrading the lucene version to 2.3 in hope to
>> improve performance but the results were quite the opposite. but
>> from my
>> research on the internet the Lucene version 2.3 is much faster and
>> better so why are we seeing such inconsistency.
>
> Have you checked out these pages?
>
> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>
> Even a large index should be fast, assuming the queries are not
> complicated. Also, with a large index the performance depends on the
> number of matches, i.e. searching for very common terms might be slow.
> Maybe you could post more information about your queries.
>
> Regards
> Daniel
>
> --
> http://www.danielnaber.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.