marius at gedmin
Aug 30, 2012, 9:14 AM
Post #1 of 1
On Wed, Aug 29, 2012 at 06:30:50AM -0400, Jim Fulton wrote:
Storm/ZEO deadlocks (was Re: [ZODB-Dev] [announce] NEO 1.0 - scalable and redundant storage for ZODB)
> On Wed, Aug 29, 2012 at 2:29 AM, Marius Gedminas <marius [at] gedmin> wrote:
> > On Tue, Aug 28, 2012 at 06:31:05PM +0200, Vincent Pelletier wrote:
> >> On Tue, 28 Aug 2012 16:31:20 +0200,
> >> Martijn Pieters <mj [at] zopatista> wrote :
> >> > Anything else different? Did you make any performance comparisons
> >> > between RelStorage and NEO?
> >> I believe the main difference compared to all other ZODB Storage
> >> implementation is the finer-grained locking scheme: in all storage
> >> implementations I know, there is a database-level lock during the
> >> entire second phase of 2PC, whereas in NEO transactions are serialised
> >> only when they alter a common set of objects.
> > This could be a compelling point. I've seen deadlocks in an app that
> > tried to use both ZEO and PostgreSQL via the Storm ORM. (The thread
> > holding the ZEO commit lock was blocked waiting for the PostgreSQL
> > commit to finish, while the PostgreSQL server was waiting for some other
> > transaction to either commit or abort -- and that other transaction
> > couldn't proceed because it was waiting for the ZEO lock.)
> This sounds like an application/transaction configuration problem.
Here's the code to reproduce it: http://pastie.org/4617132
> To avoid this sort of deadlock, you need to always commit in a
> a consistent order. You also need to configure ZEO (or NEO)
> to time-out transactions that take too long to finish the second phase.
The deadlock happens in tpc_begin() in both threads, which is the first
AFAICS Thread #2 first performs tpc_begin() for ClientStorage and takes
the ZEO commit lock. Then it enters tpc_begin() for Storm's
StoreDataManager and blocks waiting for a response from PostgreSQL --
which is delayed because the PostgreSQL server is waiting to see if
the other thread, Thread #1, will commit or abort _its_ transaction, which
is conflicting with the one from Thread #2.
Meanwhile Thread #1 is blocked in ZODB's tpc_begin(), trying to acquire the
ZEO commit lock held by Thread #2.
I'm too fried right now to understand who's at fault here.
Workarounds probably exist (use RelStorage instead of ZEO? Configure
Storm to use a lower PostgreSQL transaction isolation level?). Maybe
this problem would go away if Storm always went into tpc_begin() before
I've pinged the people in #storm on FreeNode about this, but haven't
filed any bugs yet.
Q: Wanting both frequent updates and stability/support is just wishing for a
A: Well, we're riding our ponies to the tune of several billion page views per
month. Where's your pony? Oh, you didn't get one?