Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

postfix recover failed.

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


quocdn at gmail

Nov 9, 2009, 1:43 AM

Post #1 of 8 (1567 views)
Permalink
postfix recover failed.

I am trying to setup drbd + pacemaker/openais + postfix. Services
started OK, failover..OK.. but when trying to stop postfix service
manually(postfix stop command) of kill postfix service, then postfix
cluster resource failed completely. I though it has to be recovered
from dead status. Pls give me some advices on this.
many thanks,

crm status
============
Last updated: Mon Nov 9 10:59:30 2009
Stack: openais
Current DC: Server1 - partition with quorum
Version: 1.0.6-405fe9a92d827fc627e1d9f18691a9d3d6b2279e
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ Server1 Server2 ]

Master/Slave Set: ms-DRBD
Masters: [ Server1 ]
Slaves: [ Server2 ]
Resource Group: ClusterResources
ClusterIP (ocf::heartbeat:IPaddr2): Started Server1
ClusterFS (ocf::heartbeat:Filesystem): Started Server1
Postfix (ocf::heartbeat:postfix): Started Server1
(unmanaged) FAILED

Failed actions:
Postfix_monitor_50000 (node=Server1, call=16, rc=7,
status=complete): not running
Postfix_stop_0 (node=Server1, call=17, rc=1, status=complete): unknown error

LOG: please prefer attached for log file.
Attachments: postfix.log (38.5 KB)


dejanmm at fastmail

Nov 9, 2009, 5:00 AM

Post #2 of 8 (1515 views)
Permalink
Re: postfix recover failed. [In reply to]

Hi,

On Mon, Nov 09, 2009 at 04:43:23PM +0700, Dinh N. Quoc wrote:
> I am trying to setup drbd + pacemaker/openais + postfix. Services
> started OK, failover..OK.. but when trying to stop postfix service
> manually(postfix stop command) of kill postfix service, then postfix
> cluster resource failed completely. I though it has to be recovered
> from dead status.

It would, but the stop action failed:

Nov 9 11:07:12 Server1 lrmd: [4682]: info: RA output: (Postfix:stop:stderr) 2009/11/09_11:07:12 ERROR: Postfix returned an error while stopping. 1

This error comes from /usr/sbin/postfix. Not an expert on
postfix, so CC the author of the RA, perhaps he can take a look.

Thanks,

Dejan


> Pls give me some advices on this.
> many thanks,
>
> crm status
> ============
> Last updated: Mon Nov 9 10:59:30 2009
> Stack: openais
> Current DC: Server1 - partition with quorum
> Version: 1.0.6-405fe9a92d827fc627e1d9f18691a9d3d6b2279e
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ Server1 Server2 ]
>
> Master/Slave Set: ms-DRBD
> Masters: [ Server1 ]
> Slaves: [ Server2 ]
> Resource Group: ClusterResources
> ClusterIP (ocf::heartbeat:IPaddr2): Started Server1
> ClusterFS (ocf::heartbeat:Filesystem): Started Server1
> Postfix (ocf::heartbeat:postfix): Started Server1
> (unmanaged) FAILED
>
> Failed actions:
> Postfix_monitor_50000 (node=Server1, call=16, rc=7,
> status=complete): not running
> Postfix_stop_0 (node=Server1, call=17, rc=1, status=complete): unknown error
>
> LOG: please prefer attached for log file.


> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dmaziuk at bmrb

Nov 9, 2009, 9:56 AM

Post #3 of 8 (1506 views)
Permalink
Re: postfix recover failed. [In reply to]

On Monday 09 November 2009 07:00:42 Dejan Muhamedagic wrote:

> It would, but the stop action failed:
>
> Nov 9 11:07:12 Server1 lrmd: [4682]: info: RA output:
> (Postfix:stop:stderr) 2009/11/09_11:07:12 ERROR: Postfix returned an error
> while stopping. 1
>
> This error comes from /usr/sbin/postfix. Not an expert on
> postfix, so CC the author of the RA, perhaps he can take a look.

Actually, this

> > Failed actions:
> > Postfix_monitor_50000 (node=Server1, call=16, rc=7,
> > status=complete): not running
> > Postfix_stop_0 (node=Server1, call=17, rc=1, status=complete):
> > unknown error

looks like it's using a "restart" action, usually coded as "stop" followed
by "start".

If the daemon isn't running to begin with, "stop" will fail.

Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


TSERONG at novell

Nov 9, 2009, 5:15 PM

Post #4 of 8 (1512 views)
Permalink
Re: postfix recover failed. [In reply to]

On 11/10/2009 at 04:56 AM, Dimitri Maziuk <dmaziuk [at] bmrb> wrote:
> On Monday 09 November 2009 07:00:42 Dejan Muhamedagic wrote:
>
> > It would, but the stop action failed:
> >
> > Nov 9 11:07:12 Server1 lrmd: [4682]: info: RA output:
> > (Postfix:stop:stderr) 2009/11/09_11:07:12 ERROR: Postfix returned an error
> > while stopping. 1
> >
> > This error comes from /usr/sbin/postfix. Not an expert on
> > postfix, so CC the author of the RA, perhaps he can take a look.
>
> Actually, this
>
> > > Failed actions:
> > > Postfix_monitor_50000 (node=Server1, call=16, rc=7,
> > > status=complete): not running
> > > Postfix_stop_0 (node=Server1, call=17, rc=1, status=complete):
> > > unknown error
>
> looks like it's using a "restart" action, usually coded as "stop" followed
> by "start".
>
> If the daemon isn't running to begin with, "stop" will fail.

That sounds like a bug in the RA. If the resource is already stopped, "stop" is meant to return success.

Regards,

Tim


--
Tim Serong <tserong [at] novell>
Senior Clustering Engineer, Novell Inc.



_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


quocdn at gmail

Nov 9, 2009, 6:45 PM

Post #5 of 8 (1486 views)
Permalink
Re: postfix recover failed. [In reply to]

Thanks you all for pointing me out and here is my workaround, it seems
to be working -- just needs a few lines to check the status of postfix
before doing stop command. Are there any better ideas?

--- Cluster-Resource-Agents-5f09e3bd7e20/heartbeat/postfix
2009-08-14 17:32:33.000000000 +0700
+++ /usr/lib/ocf/resource.d/heartbeat/postfix 2009-11-10
09:32:43.000000000 +0700
@@ -141,6 +141,12 @@

postfix_stop()
{
+
+ if ! postfix_status; then
+ ocf_log info "Postfix already stopped."
+ return $OCF_SUCCESS
+ fi
+
$binary $OPTIONS stop >/dev/null 2>&1
ret=$?


On 11/10/09, Tim Serong <TSERONG [at] novell> wrote:
> On 11/10/2009 at 04:56 AM, Dimitri Maziuk <dmaziuk [at] bmrb> wrote:
>> On Monday 09 November 2009 07:00:42 Dejan Muhamedagic wrote:
>>
>> > It would, but the stop action failed:
>> >
>> > Nov 9 11:07:12 Server1 lrmd: [4682]: info: RA output:
>> > (Postfix:stop:stderr) 2009/11/09_11:07:12 ERROR: Postfix returned an
>> > error
>> > while stopping. 1
>> >
>> > This error comes from /usr/sbin/postfix. Not an expert on
>> > postfix, so CC the author of the RA, perhaps he can take a look.
>>
>> Actually, this
>>
>> > > Failed actions:
>> > > Postfix_monitor_50000 (node=Server1, call=16, rc=7,
>> > > status=complete): not running
>> > > Postfix_stop_0 (node=Server1, call=17, rc=1, status=complete):
>> > > unknown error
>>
>> looks like it's using a "restart" action, usually coded as "stop" followed
>>
>> by "start".
>>
>> If the daemon isn't running to begin with, "stop" will fail.
>
> That sounds like a bug in the RA. If the resource is already stopped,
> "stop" is meant to return success.
>
> Regards,
>
> Tim
>
>
> --
> Tim Serong <tserong [at] novell>
> Senior Clustering Engineer, Novell Inc.
>
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Nov 10, 2009, 1:57 AM

Post #6 of 8 (1494 views)
Permalink
Re: postfix recover failed. [In reply to]

Hi,

On Tue, Nov 10, 2009 at 09:45:52AM +0700, Dinh N. Quoc wrote:
> Thanks you all for pointing me out and here is my workaround, it seems
> to be working -- just needs a few lines to check the status of postfix
> before doing stop command. Are there any better ideas?
>
> --- Cluster-Resource-Agents-5f09e3bd7e20/heartbeat/postfix
> 2009-08-14 17:32:33.000000000 +0700
> +++ /usr/lib/ocf/resource.d/heartbeat/postfix 2009-11-10
> 09:32:43.000000000 +0700
> @@ -141,6 +141,12 @@
>
> postfix_stop()
> {
> +
> + if ! postfix_status; then
> + ocf_log info "Postfix already stopped."
> + return $OCF_SUCCESS
> + fi
> +
> $binary $OPTIONS stop >/dev/null 2>&1
> ret=$?

The patch looks fine to me. Looks like /usr/sbin/postfix can't
handle this itself. Raoul: Is that actually expected?

Thanks,

Dejan

>
> On 11/10/09, Tim Serong <TSERONG [at] novell> wrote:
> > On 11/10/2009 at 04:56 AM, Dimitri Maziuk <dmaziuk [at] bmrb> wrote:
> >> On Monday 09 November 2009 07:00:42 Dejan Muhamedagic wrote:
> >>
> >> > It would, but the stop action failed:
> >> >
> >> > Nov 9 11:07:12 Server1 lrmd: [4682]: info: RA output:
> >> > (Postfix:stop:stderr) 2009/11/09_11:07:12 ERROR: Postfix returned an
> >> > error
> >> > while stopping. 1
> >> >
> >> > This error comes from /usr/sbin/postfix. Not an expert on
> >> > postfix, so CC the author of the RA, perhaps he can take a look.
> >>
> >> Actually, this
> >>
> >> > > Failed actions:
> >> > > Postfix_monitor_50000 (node=Server1, call=16, rc=7,
> >> > > status=complete): not running
> >> > > Postfix_stop_0 (node=Server1, call=17, rc=1, status=complete):
> >> > > unknown error
> >>
> >> looks like it's using a "restart" action, usually coded as "stop" followed
> >>
> >> by "start".
> >>
> >> If the daemon isn't running to begin with, "stop" will fail.
> >
> > That sounds like a bug in the RA. If the resource is already stopped,
> > "stop" is meant to return success.
> >
> > Regards,
> >
> > Tim
> >
> >
> > --
> > Tim Serong <tserong [at] novell>
> > Senior Clustering Engineer, Novell Inc.
> >
> >
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA [at] lists
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dmaziuk at bmrb

Nov 10, 2009, 9:54 AM

Post #7 of 8 (1488 views)
Permalink
Re: postfix recover failed. [In reply to]

On Tuesday 10 November 2009 03:57:25 Dejan Muhamedagic wrote:
> Hi,
>
> The patch looks fine to me. Looks like /usr/sbin/postfix can't
> handle this itself. Raoul: Is that actually expected?

Why not: if "postfix stop" fails to stop postfix for whatever reason, it
should return a non-zero value.

Note that "stop" will wait for the daemons to shut down gracefully. Which
means it could potentially block the failover for a while (unlikely worst
case: deadlock).

Dima
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Nov 10, 2009, 10:10 AM

Post #8 of 8 (1484 views)
Permalink
Re: postfix recover failed. [In reply to]

Hi,

On Tue, Nov 10, 2009 at 11:54:40AM -0600, Dimitri Maziuk wrote:
> On Tuesday 10 November 2009 03:57:25 Dejan Muhamedagic wrote:
> > Hi,
> >
> > The patch looks fine to me. Looks like /usr/sbin/postfix can't
> > handle this itself. Raoul: Is that actually expected?
>
> Why not: if "postfix stop" fails to stop postfix for whatever reason, it
> should return a non-zero value.

Seems that 'postfix stop' fails if postfix has already been
stopped. I can't test that, that's why I asked.

> Note that "stop" will wait for the daemons to shut down gracefully. Which
> means it could potentially block the failover for a while (unlikely worst
> case: deadlock).

Right, though that's not the case here.

Thanks,

Dejan

>
> Dima
> --
> Dimitri Maziuk
> Programmer/sysadmin
> BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.