Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Lucene: Java-User

ConcurrentMergeScheduler, Exception and transaction

 

 

Lucene java-user RSS feed   Index | Next | Previous | View Threaded


Kuro at basistech

Nov 20, 2009, 4:03 PM

Post #1 of 6 (669 views)
Permalink
ConcurrentMergeScheduler, Exception and transaction

I was experimenting how Lucene handles 2-phase commit.
Then I noticed I am not catching all Exceptions
from Lucene. And I think this is because Lucene's
default MergeScheduler is ConcurrentMergeScheduler,
which spawns threads to its job, and Exceptions thrown
in child threads are never reported to the parent.

Isn't this problematic when Lucene participates
in the 2-phase commit? Becuause the application
doesn't get an Exception when something bad happened
at merge time, it proceeds as normal and will
ask other parties in transaction to commit their
writes. If I changed the MergeScheduler
to SerialMergeScheduler, my code could catch
the Exceptions. I'd like to hear what others
think.

By the way, do I need a new instance
of SerialMergeScheduler for each call
to setMergeScheduler on an IndexWriter?
Or can I just share a single instance
of SerialMergeScheduler with multiple
IndexWriters?

-kuro

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


jason.rutherglen at gmail

Nov 20, 2009, 4:13 PM

Post #2 of 6 (643 views)
Permalink
Re: ConcurrentMergeScheduler, Exception and transaction [In reply to]

Teruhiko,

The index remains consistent even when a background merge fails,
meaning commit truly represents a valid index after it's called.
You can share merge schedulers, though in practice it's not
going to improve anything.

Jason

2009/11/20 Teruhiko Kurosaka <Kuro [at] basistech>:
> I was experimenting how Lucene handles 2-phase commit.
> Then I noticed I am not catching all Exceptions
> from Lucene.  And I think this is because Lucene's
> default MergeScheduler is ConcurrentMergeScheduler,
> which spawns threads to its job, and Exceptions thrown
> in child threads are never reported to the parent.
>
> Isn't this problematic when Lucene participates
> in the 2-phase commit? Becuause the application
> doesn't get an Exception when something bad happened
> at merge time, it proceeds as normal and will
> ask other parties in transaction to commit their
> writes.  If I changed the MergeScheduler
> to SerialMergeScheduler, my code could catch
> the Exceptions.  I'd like to hear what others
> think.
>
> By the way, do I need a new instance
> of SerialMergeScheduler for each call
> to setMergeScheduler on an IndexWriter?
> Or can I just share a single instance
> of SerialMergeScheduler with multiple
> IndexWriters?
>
> -kuro
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


Kuro at basistech

Nov 20, 2009, 4:49 PM

Post #3 of 6 (636 views)
Permalink
RE: ConcurrentMergeScheduler, Exception and transaction [In reply to]

Jason,
Even if there is no harm to the index,
I think the error needs to be reported back
to the caller. I am assuming a system error
like a filesystem error (filesystem full etc.)
that causes Lucene not be able to write new docs.
If this happens, the caller needs to be reported
of the error so that it can ask Lucene and other
participants of transaction to roll back. Am
I missing something?

-kuro

> -----Original Message-----
> From: Jason Rutherglen [mailto:jason.rutherglen [at] gmail]
> Sent: Friday, November 20, 2009 4:14 PM
> To: java-user [at] lucene
> Subject: Re: ConcurrentMergeScheduler, Exception and transaction
>
> Teruhiko,
>
> The index remains consistent even when a background merge
> fails, meaning commit truly represents a valid index after
> it's called.
> You can share merge schedulers, though in practice it's not
> going to improve anything.
>
> Jason
>
> 2009/11/20 Teruhiko Kurosaka <Kuro [at] basistech>:
> > I was experimenting how Lucene handles 2-phase commit.
> > Then I noticed I am not catching all Exceptions from Lucene.  And I
> > think this is because Lucene's default MergeScheduler is
> > ConcurrentMergeScheduler, which spawns threads to its job, and
> > Exceptions thrown in child threads are never reported to the parent.
> >
> > Isn't this problematic when Lucene participates in the
> 2-phase commit?
> > Becuause the application doesn't get an Exception when
> something bad
> > happened at merge time, it proceeds as normal and will ask other
> > parties in transaction to commit their writes.  If I changed the
> > MergeScheduler to SerialMergeScheduler, my code could catch the
> > Exceptions.  I'd like to hear what others think.
> >
> > By the way, do I need a new instance
> > of SerialMergeScheduler for each call
> > to setMergeScheduler on an IndexWriter?
> > Or can I just share a single instance
> > of SerialMergeScheduler with multiple
> > IndexWriters?
> >
> > -kuro
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 21, 2009, 2:09 AM

Post #4 of 6 (623 views)
Permalink
Re: ConcurrentMergeScheduler, Exception and transaction [In reply to]

An exception during merging does not affect what's stored in the
index, because merging is a functional no-op. The same docs are
present before and after the merge.

So I would think that an app involving Lucene in a 2-phased commit
does not need to rollback the transaction if CMS hits an exception, as
long as the prepareCommit/commit calls succeed.

That said, it's of course no good if your CMS is out throwing
exceptions... so you should in general get to the bottom of it.

If you really want to rollback your transaction on CMS hitting an
exception, you can subclass ConcurrentMergeScheduler, override the
handleMergeException and do something.

Mike

On Fri, Nov 20, 2009 at 7:49 PM, Teruhiko Kurosaka <Kuro [at] basistech> wrote:
> Jason,
> Even if there is no harm to the index,
> I think the error needs to be reported back
> to the caller.  I am assuming a system error
> like a filesystem error (filesystem full etc.)
> that causes Lucene not be able to write new docs.
> If this happens, the caller needs to be reported
> of the error so that it can ask Lucene and other
> participants of transaction to roll back.  Am
> I missing something?
>
> -kuro
>
>> -----Original Message-----
>> From: Jason Rutherglen [mailto:jason.rutherglen [at] gmail]
>> Sent: Friday, November 20, 2009 4:14 PM
>> To: java-user [at] lucene
>> Subject: Re: ConcurrentMergeScheduler, Exception and transaction
>>
>> Teruhiko,
>>
>> The index remains consistent even when a background merge
>> fails, meaning commit truly represents a valid index after
>> it's called.
>> You can share merge schedulers, though in practice it's not
>> going to improve anything.
>>
>> Jason
>>
>> 2009/11/20 Teruhiko Kurosaka <Kuro [at] basistech>:
>> > I was experimenting how Lucene handles 2-phase commit.
>> > Then I noticed I am not catching all Exceptions from Lucene.  And I
>> > think this is because Lucene's default MergeScheduler is
>> > ConcurrentMergeScheduler, which spawns threads to its job, and
>> > Exceptions thrown in child threads are never reported to the parent.
>> >
>> > Isn't this problematic when Lucene participates in the
>> 2-phase commit?
>> > Becuause the application doesn't get an Exception when
>> something bad
>> > happened at merge time, it proceeds as normal and will ask other
>> > parties in transaction to commit their writes.  If I changed the
>> > MergeScheduler to SerialMergeScheduler, my code could catch the
>> > Exceptions.  I'd like to hear what others think.
>> >
>> > By the way, do I need a new instance
>> > of SerialMergeScheduler for each call
>> > to setMergeScheduler on an IndexWriter?
>> > Or can I just share a single instance
>> > of SerialMergeScheduler with multiple
>> > IndexWriters?
>> >
>> > -kuro
>> >
>> >
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> > For additional commands, e-mail: java-user-help [at] lucene
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


Kuro at basistech

Nov 23, 2009, 9:40 AM

Post #5 of 6 (589 views)
Permalink
RE: ConcurrentMergeScheduler, Exception and transaction [In reply to]

Thank you, Mike, for explanation.

So I understand that all the data is kept even if any of these
merging threads fail. Will Lucene keep attempting merge every
time addDocument is called afterwards once this happened
(and the error is persistent - such as filesystem full)?
Will IndexWriter.addDocument, commit(), or flush() eventually
throw an IOException?

-kuro

> -----Original Message-----
> From: Michael McCandless [mailto:lucene [at] mikemccandless]
> Sent: Saturday, November 21, 2009 2:10 AM
> To: java-user [at] lucene
> Subject: Re: ConcurrentMergeScheduler, Exception and transaction
>
> An exception during merging does not affect what's stored in
> the index, because merging is a functional no-op. The same
> docs are present before and after the merge.
>
> So I would think that an app involving Lucene in a 2-phased
> commit does not need to rollback the transaction if CMS hits
> an exception, as long as the prepareCommit/commit calls succeed.
>
> That said, it's of course no good if your CMS is out throwing
> exceptions... so you should in general get to the bottom of it.
>
> If you really want to rollback your transaction on CMS
> hitting an exception, you can subclass
> ConcurrentMergeScheduler, override the handleMergeException
> and do something.
>
> Mike
>
> On Fri, Nov 20, 2009 at 7:49 PM, Teruhiko Kurosaka
> <Kuro [at] basistech> wrote:
> > Jason,
> > Even if there is no harm to the index, I think the error
> needs to be
> > reported back to the caller.  I am assuming a system error like a
> > filesystem error (filesystem full etc.) that causes Lucene
> not be able
> > to write new docs.
> > If this happens, the caller needs to be reported of the
> error so that
> > it can ask Lucene and other participants of transaction to
> roll back.  
> > Am I missing something?
> >
> > -kuro
> >
> >> -----Original Message-----
> >> From: Jason Rutherglen [mailto:jason.rutherglen [at] gmail]
> >> Sent: Friday, November 20, 2009 4:14 PM
> >> To: java-user [at] lucene
> >> Subject: Re: ConcurrentMergeScheduler, Exception and transaction
> >>
> >> Teruhiko,
> >>
> >> The index remains consistent even when a background merge fails,
> >> meaning commit truly represents a valid index after it's called.
> >> You can share merge schedulers, though in practice it's
> not going to
> >> improve anything.
> >>
> >> Jason
> >>
> >> 2009/11/20 Teruhiko Kurosaka <Kuro [at] basistech>:
> >> > I was experimenting how Lucene handles 2-phase commit.
> >> > Then I noticed I am not catching all Exceptions from
> Lucene.  And I
> >> > think this is because Lucene's default MergeScheduler is
> >> > ConcurrentMergeScheduler, which spawns threads to its job, and
> >> > Exceptions thrown in child threads are never reported to
> the parent.
> >> >
> >> > Isn't this problematic when Lucene participates in the
> >> 2-phase commit?
> >> > Becuause the application doesn't get an Exception when
> >> something bad
> >> > happened at merge time, it proceeds as normal and will ask other
> >> > parties in transaction to commit their writes.  If I changed the
> >> > MergeScheduler to SerialMergeScheduler, my code could catch the
> >> > Exceptions.  I'd like to hear what others think.
> >> >
> >> > By the way, do I need a new instance of SerialMergeScheduler for
> >> > each call to setMergeScheduler on an IndexWriter?
> >> > Or can I just share a single instance of
> SerialMergeScheduler with
> >> > multiple IndexWriters?
> >> >
> >> > -kuro
> >> >
> >> >
> >>
> ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> > For additional commands, e-mail: java-user-help [at] lucene
> >> >
> >> >
> >>
> >>
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> >> For additional commands, e-mail: java-user-help [at] lucene
> >>
> >>
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> > For additional commands, e-mail: java-user-help [at] lucene
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene


lucene at mikemccandless

Nov 23, 2009, 9:45 AM

Post #6 of 6 (587 views)
Permalink
Re: ConcurrentMergeScheduler, Exception and transaction [In reply to]

IndexWriter will try the merge again, the next time it checks merges
(eg after flushing a new segment, but not after adding a new
document).

You'll only get an exception out of addDocument/commit/flush if they
hit the problem, eg, if on flushing a new segment it runs out of
space.

But often merges will fail first since they can require alot more
temporary free space than flushing.

Mike

On Mon, Nov 23, 2009 at 12:40 PM, Teruhiko Kurosaka <Kuro [at] basistech> wrote:
> Thank you, Mike, for explanation.
>
> So I understand that all the data is kept even if any of these
> merging threads fail.  Will Lucene keep attempting merge every
> time addDocument is called afterwards once this happened
> (and the error is persistent - such as filesystem full)?
> Will IndexWriter.addDocument, commit(), or flush() eventually
> throw an IOException?
>
> -kuro
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene [at] mikemccandless]
>> Sent: Saturday, November 21, 2009 2:10 AM
>> To: java-user [at] lucene
>> Subject: Re: ConcurrentMergeScheduler, Exception and transaction
>>
>> An exception during merging does not affect what's stored in
>> the index, because merging is a functional no-op.  The same
>> docs are present before and after the merge.
>>
>> So I would think that an app involving Lucene in a 2-phased
>> commit does not need to rollback the transaction if CMS hits
>> an exception, as long as the prepareCommit/commit calls succeed.
>>
>> That said, it's of course no good if your CMS is out throwing
>> exceptions... so you should in general get to the bottom of it.
>>
>> If you really want to rollback your transaction on CMS
>> hitting an exception, you can subclass
>> ConcurrentMergeScheduler, override the handleMergeException
>> and do something.
>>
>> Mike
>>
>> On Fri, Nov 20, 2009 at 7:49 PM, Teruhiko Kurosaka
>> <Kuro [at] basistech> wrote:
>> > Jason,
>> > Even if there is no harm to the index, I think the error
>> needs to be
>> > reported back to the caller.  I am assuming a system error like a
>> > filesystem error (filesystem full etc.) that causes Lucene
>> not be able
>> > to write new docs.
>> > If this happens, the caller needs to be reported of the
>> error so that
>> > it can ask Lucene and other participants of transaction to
>> roll back.
>> > Am I missing something?
>> >
>> > -kuro
>> >
>> >> -----Original Message-----
>> >> From: Jason Rutherglen [mailto:jason.rutherglen [at] gmail]
>> >> Sent: Friday, November 20, 2009 4:14 PM
>> >> To: java-user [at] lucene
>> >> Subject: Re: ConcurrentMergeScheduler, Exception and transaction
>> >>
>> >> Teruhiko,
>> >>
>> >> The index remains consistent even when a background merge fails,
>> >> meaning commit truly represents a valid index after it's called.
>> >> You can share merge schedulers, though in practice it's
>> not going to
>> >> improve anything.
>> >>
>> >> Jason
>> >>
>> >> 2009/11/20 Teruhiko Kurosaka <Kuro [at] basistech>:
>> >> > I was experimenting how Lucene handles 2-phase commit.
>> >> > Then I noticed I am not catching all Exceptions from
>> Lucene.  And I
>> >> > think this is because Lucene's default MergeScheduler is
>> >> > ConcurrentMergeScheduler, which spawns threads to its job, and
>> >> > Exceptions thrown in child threads are never reported to
>> the parent.
>> >> >
>> >> > Isn't this problematic when Lucene participates in the
>> >> 2-phase commit?
>> >> > Becuause the application doesn't get an Exception when
>> >> something bad
>> >> > happened at merge time, it proceeds as normal and will ask other
>> >> > parties in transaction to commit their writes.  If I changed the
>> >> > MergeScheduler to SerialMergeScheduler, my code could catch the
>> >> > Exceptions.  I'd like to hear what others think.
>> >> >
>> >> > By the way, do I need a new instance of SerialMergeScheduler for
>> >> > each call to setMergeScheduler on an IndexWriter?
>> >> > Or can I just share a single instance of
>> SerialMergeScheduler with
>> >> > multiple IndexWriters?
>> >> >
>> >> > -kuro
>> >> >
>> >> >
>> >>
>> ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> > For additional commands, e-mail: java-user-help [at] lucene
>> >> >
>> >> >
>> >>
>> >>
>> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> >> For additional commands, e-mail: java-user-help [at] lucene
>> >>
>> >>
>> >
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> > For additional commands, e-mail: java-user-help [at] lucene
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
>> For additional commands, e-mail: java-user-help [at] lucene
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
> For additional commands, e-mail: java-user-help [at] lucene
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe [at] lucene
For additional commands, e-mail: java-user-help [at] lucene

Lucene java-user RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.