Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

is there a way to control when merges happen?

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


doconnor at acquiremedia

May 15, 2009, 1:41 PM

Post #1 of 8 (879 views)
Permalink
is there a way to control when merges happen?

All:

I would like to be able to control when an index merge happens (by wall clock time) so that merges do not occur in the middle of the business day.

I have a lucene system based on v2.3.2 and we add a couple hundred thousand documents per day - and we allow searching while documents are being added - we reopen an IndexReader periodically to expose newly arrived contents.

There are times when merging causes significant performance impacts on search results - I've seen cases where merging will cause 200% load on a system (dual quad core x86_64 running Centos) with a raid-5 disk subsystem of 15k drives.

I've seen some info on the MergeScheduler and ConcurrentMergeScheduler but not necessarily enough to attempt a coding effort.

Looking through the code for ConcurrentMergeScheduler.java, is it as straightforward as over-riding the mergeScheduler.merge() method with a method that checks to see if a merge is allowed (by wall clock time)? If a merge is not allowed at that time, can I just return();? Or do I have to sleep the thread until the merge is allowed?

Thanks,
Dan


Dan O'Connor
SVP, Engineering
Acquire Media<http://www.acquiremedia.com/>
77 South Bedford Street, Suite 350<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
Burlington, MA 01803<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
e: doconnor [at] acquiremedia<mailto:doconnor [at] acquiremedia>
o: 781-250-0565
f: 877-861-7724


jason.rutherglen at gmail

May 15, 2009, 1:48 PM

Post #2 of 8 (843 views)
Permalink
Re: is there a way to control when merges happen? [In reply to]

Hi Dan,

You are looking to throttle the merging? I'd recommend setting
ConcurrentMergeScheduler.setMaxThreadCount(1). This way IW.addDocument
doesn't wait while a merge occurs (like SerialMergeScheduler) however it
should not use as much CPU as only one merge will occur at a time.

In regards to overriding the MS.merge method either way you mentioned would
work.

-J

On Fri, May 15, 2009 at 1:41 PM, Dan OConnor <doconnor [at] acquiremedia>wrote:

> All:
>
> I would like to be able to control when an index merge happens (by wall
> clock time) so that merges do not occur in the middle of the business day.
>
> I have a lucene system based on v2.3.2 and we add a couple hundred thousand
> documents per day - and we allow searching while documents are being added -
> we reopen an IndexReader periodically to expose newly arrived contents.
>
> There are times when merging causes significant performance impacts on
> search results - I've seen cases where merging will cause 200% load on a
> system (dual quad core x86_64 running Centos) with a raid-5 disk subsystem
> of 15k drives.
>
> I've seen some info on the MergeScheduler and but not necessarily enough to
> attempt a coding effort.
>
> Looking through the code for ConcurrentMergeScheduler.java, is it as
> straightforward as over-riding the mergeScheduler.merge() method with a
> method that checks to see if a merge is allowed (by wall clock time)? If a
> merge is not allowed at that time, can I just return();? Or do I have to
> sleep the thread until the merge is allowed?
>
> Thanks,
> Dan
>
>
> Dan O'Connor
> SVP, Engineering
> Acquire Media<http://www.acquiremedia.com/>
> 77 South Bedford Street, Suite 350<
> http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18
> >
> Burlington, MA 01803<
> http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18
> >
> e: doconnor [at] acquiremedia<mailto:doconnor [at] acquiremedia>
> o: 781-250-0565
> f: 877-861-7724
>
>


lucene at mikemccandless

May 15, 2009, 1:50 PM

Post #3 of 8 (841 views)
Permalink
Re: is there a way to control when merges happen? [In reply to]

I think you could subclass ConcurrentMergeScheduler, overriding
merge() to only call super.merge() if the time is right? (And just
return right away if it's not the right time).

Though you might want to allow small merges to run in real-time, and
big merges to wait until after hours.

Mike

On Fri, May 15, 2009 at 4:41 PM, Dan OConnor <doconnor [at] acquiremedia> wrote:
> All:
>
> I would like to be able to control when an index merge happens (by wall clock time) so that merges do not occur in the middle of the business day.
>
> I have a lucene system based on v2.3.2 and we add a couple hundred thousand documents per day - and we allow searching while documents are being added - we reopen an IndexReader periodically to expose newly arrived contents.
>
> There are times when merging causes significant performance impacts on search results - I've seen cases where merging will cause 200% load on a system (dual quad core x86_64 running Centos) with a raid-5 disk subsystem of 15k drives.
>
> I've seen some info on the MergeScheduler and ConcurrentMergeScheduler but not necessarily enough to attempt a coding effort.
>
> Looking through the code for ConcurrentMergeScheduler.java, is it as straightforward as over-riding the mergeScheduler.merge() method with a method that checks to see if a merge is allowed (by wall clock time)?  If a merge is not allowed at that time, can I just return();? Or do I have to sleep the thread until the merge is allowed?
>
> Thanks,
> Dan
>
>
> Dan O'Connor
> SVP, Engineering
> Acquire Media<http://www.acquiremedia.com/>
> 77 South Bedford Street, Suite 350<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
> Burlington, MA 01803<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
> e: doconnor [at] acquiremedia<mailto:doconnor [at] acquiremedia>
> o: 781-250-0565
> f: 877-861-7724
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


doconnor at acquiremedia

May 15, 2009, 1:56 PM

Post #4 of 8 (845 views)
Permalink
Re: is there a way to control when merges happen? [In reply to]

Mike,
Thank for the reply.

A follow up question.
How can I tell the big merges from the small ones?

Regards,
Dan

----- Original Message -----
From: Michael McCandless <lucene [at] mikemccandless>
To: java-user [at] lucene <java-user [at] lucene>
Sent: Fri May 15 16:50:27 2009
Subject: Re: is there a way to control when merges happen?

I think you could subclass ConcurrentMergeScheduler, overriding
merge() to only call super.merge() if the time is right? (And just
return right away if it's not the right time).

Though you might want to allow small merges to run in real-time, and
big merges to wait until after hours.

Mike

On Fri, May 15, 2009 at 4:41 PM, Dan OConnor <doconnor [at] acquiremedia> wrote:
> All:
>
> I would like to be able to control when an index merge happens (by wall clock time) so that merges do not occur in the middle of the business day.
>
> I have a lucene system based on v2.3.2 and we add a couple hundred thousand documents per day - and we allow searching while documents are being added - we reopen an IndexReader periodically to expose newly arrived contents.
>
> There are times when merging causes significant performance impacts on search results - I've seen cases where merging will cause 200% load on a system (dual quad core x86_64 running Centos) with a raid-5 disk subsystem of 15k drives.
>
> I've seen some info on the MergeScheduler and ConcurrentMergeScheduler but not necessarily enough to attempt a coding effort.
>
> Looking through the code for ConcurrentMergeScheduler.java, is it as straightforward as over-riding the mergeScheduler.merge() method with a method that checks to see if a merge is allowed (by wall clock time)?  If a merge is not allowed at that time, can I just return();? Or do I have to sleep the thread until the merge is allowed?
>
> Thanks,
> Dan
>
>
> Dan O'Connor
> SVP, Engineering
> Acquire Media<http://www.acquiremedia.com/>
> 77 South Bedford Street, Suite 350<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
> Burlington, MA 01803<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
> e: doconnor [at] acquiremedia<mailto:doconnor [at] acquiremedia>
> o: 781-250-0565
> f: 877-861-7724
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


doconnor at acquiremedia

May 15, 2009, 1:58 PM

Post #5 of 8 (849 views)
Permalink
Re: is there a way to control when merges happen? [In reply to]

Jason,

Thanks for the reply.

As I was reading the code, it said that if the concurrent merge scheduler ran out of threads, it ran the merge in the foreground.

Does that mean the foreground of the merge thread or the indexwriter thread? The former would be good. The later would seem to be bad.

Regards,
Dan


----- Original Message -----
From: Jason Rutherglen <jason.rutherglen [at] gmail>
To: java-user [at] lucene <java-user [at] lucene>
Sent: Fri May 15 16:48:54 2009
Subject: Re: is there a way to control when merges happen?

Hi Dan,

You are looking to throttle the merging? I'd recommend setting
ConcurrentMergeScheduler.setMaxThreadCount(1). This way IW.addDocument
doesn't wait while a merge occurs (like SerialMergeScheduler) however it
should not use as much CPU as only one merge will occur at a time.

In regards to overriding the MS.merge method either way you mentioned would
work.

-J

On Fri, May 15, 2009 at 1:41 PM, Dan OConnor <doconnor [at] acquiremedia>wrote:

> All:
>
> I would like to be able to control when an index merge happens (by wall
> clock time) so that merges do not occur in the middle of the business day.
>
> I have a lucene system based on v2.3.2 and we add a couple hundred thousand
> documents per day - and we allow searching while documents are being added -
> we reopen an IndexReader periodically to expose newly arrived contents.
>
> There are times when merging causes significant performance impacts on
> search results - I've seen cases where merging will cause 200% load on a
> system (dual quad core x86_64 running Centos) with a raid-5 disk subsystem
> of 15k drives.
>
> I've seen some info on the MergeScheduler and but not necessarily enough to
> attempt a coding effort.
>
> Looking through the code for ConcurrentMergeScheduler.java, is it as
> straightforward as over-riding the mergeScheduler.merge() method with a
> method that checks to see if a merge is allowed (by wall clock time)? If a
> merge is not allowed at that time, can I just return();? Or do I have to
> sleep the thread until the merge is allowed?
>
> Thanks,
> Dan
>
>
> Dan O'Connor
> SVP, Engineering
> Acquire Media<http://www.acquiremedia.com/>
> 77 South Bedford Street, Suite 350<
> http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18
> >
> Burlington, MA 01803<
> http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18
> >
> e: doconnor [at] acquiremedia<mailto:doconnor [at] acquiremedia>
> o: 781-250-0565
> f: 877-861-7724
>
>


lucene at mikemccandless

May 15, 2009, 2:00 PM

Post #6 of 8 (845 views)
Permalink
Re: is there a way to control when merges happen? [In reply to]

You're welcome!

SegmentInfo exposes a sizeInBytes() method, so you can sum up that
result for all segments in the merge.

But NOTE: your merge scheduler must be located in the
org.apache.lucene.index package (this API is currently package
private).

Mike

On Fri, May 15, 2009 at 4:56 PM, Dan OConnor <doconnor [at] acquiremedia> wrote:
> Mike,
> Thank for the reply.
>
> A follow up question.
> How can I tell the big merges from the small ones?
>
> Regards,
> Dan
>
> ----- Original Message -----
> From: Michael McCandless <lucene [at] mikemccandless>
> To: java-user [at] lucene <java-user [at] lucene>
> Sent: Fri May 15 16:50:27 2009
> Subject: Re: is there a way to control when merges happen?
>
> I think you could subclass ConcurrentMergeScheduler, overriding
> merge() to only call super.merge() if the time is right?  (And just
> return right away if it's not the right time).
>
> Though you might want to allow small merges to run in real-time, and
> big merges to wait until after hours.
>
> Mike
>
> On Fri, May 15, 2009 at 4:41 PM, Dan OConnor <doconnor [at] acquiremedia> wrote:
>> All:
>>
>> I would like to be able to control when an index merge happens (by wall clock time) so that merges do not occur in the middle of the business day.
>>
>> I have a lucene system based on v2.3.2 and we add a couple hundred thousand documents per day - and we allow searching while documents are being added - we reopen an IndexReader periodically to expose newly arrived contents.
>>
>> There are times when merging causes significant performance impacts on search results - I've seen cases where merging will cause 200% load on a system (dual quad core x86_64 running Centos) with a raid-5 disk subsystem of 15k drives.
>>
>> I've seen some info on the MergeScheduler and ConcurrentMergeScheduler but not necessarily enough to attempt a coding effort.
>>
>> Looking through the code for ConcurrentMergeScheduler.java, is it as straightforward as over-riding the mergeScheduler.merge() method with a method that checks to see if a merge is allowed (by wall clock time)?  If a merge is not allowed at that time, can I just return();? Or do I have to sleep the thread until the merge is allowed?
>>
>> Thanks,
>> Dan
>>
>>
>> Dan O'Connor
>> SVP, Engineering
>> Acquire Media<http://www.acquiremedia.com/>
>> 77 South Bedford Street, Suite 350<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
>> Burlington, MA 01803<http://maps.google.com/maps?f=q&hl=en&geocode=&q=77+S+Bedford+St,+Burlington,+MA+01803&sll=37.0625,-95.677068&sspn=32.472848,80.859375&ie=UTF8&ll=42.485517,-71.197935&spn=0.002287,0.005193&t=h&z=18>
>> e: doconnor [at] acquiremedia<mailto:doconnor [at] acquiremedia>
>> o: 781-250-0565
>> f: 877-861-7724
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


injecteer at yahoo

Aug 1, 2012, 4:03 AM

Post #7 of 8 (209 views)
Permalink
Re: is there a way to control when merges happen? [In reply to]

Hi Mike.

I have a LogDocMergePolicy + ConcurrentMergeScheduler in my setup.
I tried adding new segments with 800-5000 documents in each of them in a
row, but the scheduler seemed to ignore them at first... only after some
time it managed to merge some of them.

I have an option to use a quartz-scheduler to trigger my mergers, but I
would like to keep that logic where it really belongs: in Lucene's
mergeScheduler.

Is there a way to control merge scheduling now (with 3.6.0)?
When exactly the scheduler is triggered: upon adding a new segment, or is it
running every n hours? Can I configure the scheduler to do both?

TIA






--
View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-control-when-merges-happen-tp560736p3998571.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Aug 2, 2012, 3:04 PM

Post #8 of 8 (210 views)
Permalink
Re: is there a way to control when merges happen? [In reply to]

Whenever a new segment is flushed, or a merge completes, then the
MergePolicy and MergeScheduler are invoked.

You can also invoke them at any time by calling
IndexWriter.maybeMerge() yourself.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Aug 1, 2012 at 7:03 AM, Konstantyn Smirnov <injecteer [at] yahoo> wrote:
> Hi Mike.
>
> I have a LogDocMergePolicy + ConcurrentMergeScheduler in my setup.
> I tried adding new segments with 800-5000 documents in each of them in a
> row, but the scheduler seemed to ignore them at first... only after some
> time it managed to merge some of them.
>
> I have an option to use a quartz-scheduler to trigger my mergers, but I
> would like to keep that logic where it really belongs: in Lucene's
> mergeScheduler.
>
> Is there a way to control merge scheduling now (with 3.6.0)?
> When exactly the scheduler is triggered: upon adding a new segment, or is it
> running every n hours? Can I configure the scheduler to do both?
>
> TIA
>
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-control-when-merges-happen-tp560736p3998571.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.