Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

LIO + Pacemaker kernel oops on failover

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


phil at macprofessionals

Jul 3, 2012, 11:38 AM

Post #1 of 2 (462 views)
Permalink
LIO + Pacemaker kernel oops on failover

It seems there's something about the iSCSI RAs that hit a bug in LIO:

http://comments.gmane.org/gmane.linux.scsi.target.devel/1568?set_cite=hide

I seem to be hitting the same problem quite reliably whenever I migrate
the iSCSI targets in my cluster. Sounds like the OP was able to reach a
suitable workaround, but I'm not very experienced with LIO or iSCSI so
the discussion is a bit over my head. Anyone have some idea how to
implement the changes described there?


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


phil at macprofessionals

Jul 13, 2012, 8:41 AM

Post #2 of 2 (417 views)
Permalink
Re: LIO + Pacemaker kernel oops on failover [In reply to]

On 07/03/2012 02:38 PM, Phil Frost wrote:
> It seems there's something about the iSCSI RAs that hit a bug in LIO:
>
> http://comments.gmane.org/gmane.linux.scsi.target.devel/1568?set_cite=hide
>
>
> I seem to be hitting the same problem quite reliably whenever I
> migrate the iSCSI targets in my cluster. Sounds like the OP was able
> to reach a suitable workaround, but I'm not very experienced with LIO
> or iSCSI so the discussion is a bit over my head. Anyone have some
> idea how to implement the changes described there?

I wasn't able to find a way to modify the existing
iSCSI(Target|LogicalUnit) RAs to stop the target in a way that avoided
this bug in LIO. The problem was largely that with targets and logical
units as separate resources, it was difficult to start the target before
the LUs, and also stop the target before the LUs. I tried using
asymmetric order constraints, but it didn't work so well in testing. I
don't know if it's because the shutdown wasn't working cleanly, or if
the iSCSILogicalUnit resources were upset that the LUs were stopped when
Pacemaker wasn't expecting it.

Anyhow, my solution was to write a new RA (attached) which managed the
target and the LUs together, and thus could control the ordering of
starting and stopping them in detail. It's not as featureful or general
as the existing RAs, but in my testing so far it is stable.

This is the first RA I have written, so I would appreciate any comments.
One problem in particular relates to the monitor action -- you can see
it only checks that the target is running. I could add monitoring for
the LUs easily enough, but I'm not clear on what should happen if the
target is up, but the LUs are not. In this state the service is neither
"up" nor "down", it's broken, and the right thing to do is probably
attempt to restart it. I'm not sure how I communicate that to Pacemaker
from my RA, though. Should I return OCF_ERR_GENERIC? What will pacemaker
do is this case?
Attachments: liotarget (6.04 KB)

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.