Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Dev

postfix ocf ra

 

 

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded


r.bhatia at ipax

May 28, 2009, 10:01 AM

Post #1 of 17 (1870 views)
Permalink
postfix ocf ra

hi,

please find my postfix ocf ra attached. it is possible to use it
with multiple postfix instances, if the administrator honors all
pre-requirements (queue_directory, data_directory,
alternate_config_directories)

feedback is appreciated!

cheers,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
Attachments: postfix (8.30 KB)


r.bhatia at ipax

May 28, 2009, 11:21 AM

Post #2 of 17 (1796 views)
Permalink
Re: postfix ocf ra [In reply to]

hi,

Raoul Bhatia [IPAX] wrote:
> hi,
>
> please find my postfix ocf ra attached. it is possible to use it
> with multiple postfix instances, if the administrator honors all
> pre-requirements (queue_directory, data_directory,
> alternate_config_directories)

i found some problems with this ra when you move data/config_dir/queue
to a seperate partition that is also mounted by pacemaker.

this are basically the same issues we encountered with the mysql ra,
and therefore i used this ra as a basis for further improvements.

i will resend the updated postfix ra within the next hour.

cheers,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


r.bhatia at ipax

May 28, 2009, 11:33 AM

Post #3 of 17 (1797 views)
Permalink
Re: postfix ocf ra [In reply to]

Raoul Bhatia [IPAX] wrote:
> hi,
>
> Raoul Bhatia [IPAX] wrote:
>> hi,
>>
>> please find my postfix ocf ra attached. it is possible to use it
>> with multiple postfix instances, if the administrator honors all
>> pre-requirements (queue_directory, data_directory,
>> alternate_config_directories)
>
> i found some problems with this ra when you move data/config_dir/queue
> to a seperate partition that is also mounted by pacemaker.
>
> this are basically the same issues we encountered with the mysql ra,
> and therefore i used this ra as a basis for further improvements.
>
> i will resend the updated postfix ra within the next hour.

please find my current version attached.

i also added a link to a page where i describe how to configure multiple
postfix instances on *one* server.

cheers,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
Attachments: postfix (8.40 KB)


dejanmm at fastmail

May 28, 2009, 12:10 PM

Post #4 of 17 (1792 views)
Permalink
Re: postfix ocf ra [In reply to]

Hi,

On Thu, May 28, 2009 at 08:33:15PM +0200, Raoul Bhatia [IPAX] wrote:
> Raoul Bhatia [IPAX] wrote:
> > hi,
> >
> > Raoul Bhatia [IPAX] wrote:
> >> hi,
> >>
> >> please find my postfix ocf ra attached. it is possible to use it
> >> with multiple postfix instances, if the administrator honors all
> >> pre-requirements (queue_directory, data_directory,
> >> alternate_config_directories)
> >
> > i found some problems with this ra when you move data/config_dir/queue
> > to a seperate partition that is also mounted by pacemaker.
> >
> > this are basically the same issues we encountered with the mysql ra,
> > and therefore i used this ra as a basis for further improvements.
> >
> > i will resend the updated postfix ra within the next hour.
>
> please find my current version attached.

Darn. I knew I should've waited a while :)

> i also added a link to a page where i describe how to configure multiple
> postfix instances on *one* server.

That's great.

Some comments (in a diff form):

IsRunning()
{
+ # Could the master process become zombie?
kill -0 "$1" 2>/dev/null
}
...
- dir=$(ls -l /proc/$pid/exe 2>/dev/null | sed 's/.* -> //; s/\/[^\/]*$//')
- if [ "X$dir" = "X/usr/lib/postfix" ]; then
+ # Linux specific!
+ #dir=$(ls -l /proc/$pid/exe 2>/dev/null | sed 's/.* ->
//; s/\/[^\/]*$/
/')
+ # Is this 'if' necessary? What if the
+ # installation/package is different?
+ #if [ "X$dir" = "X/usr/lib/postfix" ]; then

Also, most exits in validate_all should be OCF_ERR_INSTALLED. I
think.

Otherwise, a very fine job.

Cheers,

Dejan

> cheers,
> raoul
> --
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
> Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
> 1190 Wien tel. +43 1 3670030
> FN 277995t HG Wien fax. +43 1 3670030 15
> ____________________________________________________________________

> #!/bin/sh
> #
> # Resource script for Postfix
> #
> # Description: Manages Postfix as an OCF resource in
> # an high-availability setup.
> #
> # Tested with postfix 2.5.5 on Debian 5.0.
> # Based on the mysql-proxy OCF resource agent.
> #
> # Author: Raoul Bhatia <r.bhatia[at]ipax.at> : Original Author
> # License: GNU General Public License (GPL)
> # Note: if you want to run multiple postfix instances, please see
> # http://amd.co.at/adminwiki/Postfix#Adding_a_Second_Postfix_Instance_on_one_Server
> # http://www.postfix.org/postconf.5.html
> #
> #
> # usage: $0 {start|stop|reload|status|monitor|validate-all|meta-data}
> #
> # The "start" arg starts a Postfix instance
> #
> # The "stop" arg stops it.
> #
> #
> # Test via
> # * /usr/sbin/ocf-tester -n post1 /usr/lib/ocf/resource.d/heartbeat/postfix
> # * /usr/sbin/ocf-tester -n post1 -o binary="/usr/sbin/postfix"
> # -o config_dir="" /usr/lib/ocf/resource.d/heartbeat/postfix
> # * /usr/sbin/ocf-tester -n post1 -o binary="/usr/sbin/postfix"
> # -o config_dir="/root/postfix/" /usr/lib/ocf/resource.d/heartbeat/postfix
> #
> #
> # OCF parameters:
> # OCF_RESKEY_binary
> # OCF_RESKEY_config_dir
> # OCF_RESKEY_parameters
> #
> ##########################################################################
>
> # Initialization:
>
> . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
>
> : ${OCF_RESKEY_binary="/usr/sbin/postfix"}
> : ${OCF_RESKEY_config_dir=""}
> : ${OCF_RESKEY_parameters=""}
> USAGE="Usage: $0 {start|stop|reload|status|monitor|validate-all|meta-data}";
>
> ##########################################################################
>
> usage() {
> echo $USAGE >&2
> }
>
> meta_data() {
> cat <<END
> <?xml version="1.0"?>
> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
> <resource-agent name="postfix">
> <version>0.1</version>
> <longdesc lang="en">
> This script manages Postfix as an OCF resource in a high-availability setup.
> Tested with Postfix 2.5.5 on Debian 5.0.
> </longdesc>
> <shortdesc lang="en">OCF Resource Agent compliant Postfix script.</shortdesc>
>
> <parameters>
>
> <parameter name="binary" unique="0" required="0">
> <longdesc lang="en">
> Full path to the Postfix binary.
> For example, "/usr/sbin/postfix".
> </longdesc>
> <shortdesc lang="en">Full path to Postfix binary</shortdesc>
> <content type="string" default="/usr/sbin/postfix" />
> </parameter>
>
> <parameter name="config_dir" unique="1" required="0">
> <longdesc lang="en">
> Full path to a Postfix configuration directory.
> For example, "/etc/postfix".
> </longdesc>
> <shortdesc lang="en">Full path to configuration directory</shortdesc>
> <content type="string" default="" />
> </parameter>
>
> <parameter name="parameters" unique="0" required="0">
> <longdesc lang="en">
> The Postfix daemon may be called with additional parameters.
> Specify any of them here.
> </longdesc>
> <shortdesc lang="en"></shortdesc>
> <content type="string" default="" />
> </parameter>
>
> </parameters>
>
> <actions>
> <action name="start" timeout="90" />
> <action name="stop" timeout="100" />
> <action name="reload" timeout="100" />
> <action name="monitor" depth="10" timeout="20s" interval="60s" start-delay="0" />
> <action name="validate-all" timeout="30s" />
> <action name="meta-data" timeout="5s" />
> </actions>
> </resource-agent>
> END
> exit $OCF_SUCCESS
> }
>
> isRunning()
> {
> kill -0 "$1" 2>/dev/null
> }
>
> # running() has been copied from debian's init script. we enhanced it a bit
> running() {
> queue=$(postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null || echo /var/spool/postfix)
> if [ -f ${queue}/pid/master.pid ]; then
> pid=$(sed 's/ //g' ${queue}/pid/master.pid)
> # what directory does the executable live in. stupid prelink systems.
> dir=$(ls -l /proc/$pid/exe 2>/dev/null | sed 's/.* -> //; s/\/[^\/]*$//')
> if [ "X$dir" = "X/usr/lib/postfix" ]; then
> if isRunning $pid; then
> # @TODO why does "true" not work here?
> #true
> return 0
> fi
> fi
> fi
>
> # Postfix is not running
> false
> }
>
>
> postfix_status()
> {
> running
> }
>
> postfix_start()
> {
> # if Postfix is running return success
> if postfix_status; then
> ocf_log info "Postfix already running."
> exit $OCF_SUCCESS
> fi
>
> # start Postfix
> $binary $OPTIONS start >/dev/null 2>&1
> ret=$?
>
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix returned error." $ret
> exit $OCF_ERR_GENERIC
> fi
>
> exit $OCF_SUCCESS
> }
>
>
> postfix_stop()
> {
> if postfix_status; then
> $binary $OPTIONS stop >/dev/null 2>&1
> ret=$?
>
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix returned an error while stopping." $ret
> exit $OCF_ERR_GENERIC
> fi
>
> # grant some time for shutdown and recheck
> sleep 1
> if postfix_status; then
> ocf_log err "Postfix failed to stop."
> exit $OCF_ERR_GENERIC
> fi
> fi
>
> exit $OCF_SUCCESS
> }
>
> postfix_reload()
> {
> if postfix_status; then
> ocf_log info "Reloading Postfix."
> $binary $OPTIONS reload
> fi
> }
>
> postfix_monitor()
> {
> if postfix_status; then
> return $OCF_SUCCESS
> fi
>
> return $OCF_NOT_RUNNING
> }
>
> postfix_validate_all()
> {
> # check that the Postfix binary exists and can be executed
> if [ ! -x "$binary" ]; then
> ocf_log err "Postfix binary '$binary' does not exist or cannot be executed."
> return $OCF_ERR_GENERIC
> fi
>
> # check config_dir and alternate_config_directories parameter
> if [ "x$config_dir" != "x" ]; then
> if [ ! -d "$config_dir" ]; then
> ocf_log err "Postfix configuration directory '$config_dir' does not exist." $ret
> return $OCF_ERR_GENERIC
> fi
>
> alternate_config_directories=$(postconf -h alternate_config_directories 2>/dev/null | grep $config_dir)
> if [ "x$alternate_config_directories" = "x" ]; then
> ocf_log err "Postfix main configuration must contain correct 'alternate_config_directories' parameter."
> return $OCF_ERR_GENERIC
> fi
> fi
>
> # check spool/queue directory
> queue=$(postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null || echo /var/spool/postfix)
> if [ ! -d "$queue" ]; then
> ocf_log err "Postfix spool/queue directory '$queue' does not exist." $ret
> return $OCF_ERR_GENERIC
> fi
>
> # run postfix internal check
> $binary $OPTIONS check >/dev/null 2>&1
> ret=$?
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix check failed." $ret
> return $OCF_ERR_GENERIC
> fi
>
> return $OCF_SUCCESS
> }
>
> #
> # Main
> #
>
> if [ $# -ne 1 ]; then
> usage
> exit $OCF_ERR_ARGS
> fi
>
> binary=$OCF_RESKEY_binary
> config_dir=$OCF_RESKEY_config_dir
> parameters=$OCF_RESKEY_parameters
>
> # debugging stuff
> #echo OCF_RESKEY_binary=$OCF_RESKEY_binary >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
> #echo OCF_RESKEY_config_dir=$OCF_RESKEY_config_dir >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
> #echo OCF_RESKEY_parameters=$OCF_RESKEY_parameters >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
>
>
> # build postfix options string *outside* to access from each method
> OPTIONS=''
> OPTION_CONFIG_DIR=''
>
> # check if the Postfix config_dir exist
> if [ "x$config_dir" != "x" ]; then
> # save OPTION_CONFIG_DIR seperatly
> OPTION_CONFIG_DIR="-c $config_dir"
> OPTIONS=$OPTION_CONFIG_DIR
> fi
>
> if [ "x$parameters" != "x" ]; then
> OPTIONS="$OPTIONS $parameters"
> fi
>
> case $1 in
> meta-data) meta_data
> exit $OCF_SUCCESS
> ;;
>
> usage|help) usage
> exit $OCF_SUCCESS
> ;;
> esac
>
> postfix_validate_all
> ret=$?
>
> #echo "[$1:$ret]"
> LSB_STATUS_STOPPED=3
> if [ $ret -ne $OCF_SUCCESS ]; then
> case $1 in
> stop) exit $OCF_SUCCESS ;;
> monitor) exit $OCF_NOT_RUNNING;;
> status) exit $LSB_STATUS_STOPPED;;
> *) exit $ret;;
> esac
> fi
>
> case $1 in
> monitor) postfix_monitor
> exit $?
> ;;
> start) postfix_start
> ;;
>
> stop) postfix_stop
> ;;
>
> reload) postfix_reload
> ;;
>
> status) if postfix_status; then
> ocf_log info "Postfix is running."
> exit $OCF_SUCCESS
> else
> ocf_log info "Postfix is stopped."
> exit $OCF_NOT_RUNNING
> fi
> ;;
>
> monitor) postfix_monitor
> exit $?
> ;;
>
> validate-all) exit $OCF_SUCCESS
> ;;
>
> *) usage
> exit $OCF_ERR_UNIMPLEMENTED
> ;;
> esac

> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


r.bhatia at ipax

May 28, 2009, 12:51 PM

Post #5 of 17 (1795 views)
Permalink
Re: postfix ocf ra [In reply to]

Dejan Muhamedagic wrote:
> Hi,
>
> On Thu, May 28, 2009 at 08:33:15PM +0200, Raoul Bhatia [IPAX] wrote:
>> Raoul Bhatia [IPAX] wrote:
>>> hi,
>>>
>>> Raoul Bhatia [IPAX] wrote:
>>>> hi,
>>>>
>>>> please find my postfix ocf ra attached. it is possible to use it
>>>> with multiple postfix instances, if the administrator honors all
>>>> pre-requirements (queue_directory, data_directory,
>>>> alternate_config_directories)
>>> i found some problems with this ra when you move data/config_dir/queue
>>> to a seperate partition that is also mounted by pacemaker.
>>>
>>> this are basically the same issues we encountered with the mysql ra,
>>> and therefore i used this ra as a basis for further improvements.
>>>
>>> i will resend the updated postfix ra within the next hour.
>> please find my current version attached.
>
> Darn. I knew I should've waited a while :)
ups ;) i currently use the "release fast, release often" paradigm.
i am too much of a perfectionist otherwise ;)

>> i also added a link to a page where i describe how to configure multiple
>> postfix instances on *one* server.
>
> That's great.
>
> Some comments (in a diff form):
>
> IsRunning()
> {
> + # Could the master process become zombie?
> kill -0 "$1" 2>/dev/null
> }

added in a more appropriate place. do not know about that thou. i'm just
doing the thing the initscript on debian 5.0 does.

> ...
> - dir=$(ls -l /proc/$pid/exe 2>/dev/null | sed 's/.* -> //; s/\/[^\/]*$//')
> - if [ "X$dir" = "X/usr/lib/postfix" ]; then
> + # Linux specific!
> + #dir=$(ls -l /proc/$pid/exe 2>/dev/null | sed 's/.* ->
> //; s/\/[^\/]*$/
> /')
> + # Is this 'if' necessary? What if the
> + # installation/package is different?
> + #if [ "X$dir" = "X/usr/lib/postfix" ]; then

that was simply a c/p. i removed this part now as i do not see what this
should help/do at this position.

> Also, most exits in validate_all should be OCF_ERR_INSTALLED. I
> think.
please feel free to change these values. i am not sure where (not) to
use OCF_ERR_INSTALLED.

> Otherwise, a very fine job.
thanks. updated version enclosed.

have a nice weekend,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
Attachments: postfix (8.30 KB)


lars.ellenberg at linbit

May 28, 2009, 1:43 PM

Post #6 of 17 (1795 views)
Permalink
Re: postfix ocf ra [In reply to]

On Thu, May 28, 2009 at 08:33:15PM +0200, Raoul Bhatia [IPAX] wrote:

the first part snipped here has just been skipped by me.

> isRunning()
> {
> kill -0 "$1" 2>/dev/null
> }


what is that supposed to prove?
it is only used after you checked for /proc/$pid/exe.
so $pid is an existing process, otherwise the (linux specific?) /proc/$pid/ would not exist.

whether or not it is "running" (and not STOPPED or ZOMBIE),
kill -0 won't tell you either.

> # running() has been copied from debian's init script. we enhanced it a bit
> running() {
> queue=$(postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null || echo /var/spool/postfix)

if postconf -h queue_directory does not work, this is a broken
installation and should IMO not provide any other "default" value.

> if [ -f ${queue}/pid/master.pid ]; then

just because a pid file does not exist, it is not running?
that is not exactly true.

> pid=$(sed 's/ //g' ${queue}/pid/master.pid)

if you just use $pid (not "$pid"), shell did the whitespace stripping
for you, so this is unnecessary.
if you trust the input from that file (I think you should), just use $pid.
if you don't, you have an other problem a cluster manager will not
protect you against.

> # what directory does the executable live in. stupid prelink systems.
> dir=$(ls -l /proc/$pid/exe 2>/dev/null | sed 's/.* -> //; s/\/[^\/]*$//')
> if [ "X$dir" = "X/usr/lib/postfix" ]; then

what is that supposed to do?
what does it guard against?
stale pidfile pointing to other process with recycled pid?
I don't like that.

and it is not even correct, as you can reconfigure "daemon_directory",
so if you want to do that, you need to postconf -h daemon_directory
first.

> if isRunning $pid; then

see above, kill -0 not necessary, as you just used /proc/pid/ !

> # @TODO why does "true" not work here?

because if you do not return here, you reach the "false" below,
which overrides this true ;)

> #true
> return 0
> fi
> fi
> fi
>
> # Postfix is not running
> false
> }
>
>
> postfix_status()
> {
> running
> }

maybe just do "postqueue -q 2>&1 | head -n1" and see if it reads
"postqueue: warning: Mail system is down -- accessing queue directly"
?

> postfix_start()
> {
> # if Postfix is running return success
> if postfix_status; then
> ocf_log info "Postfix already running."
> exit $OCF_SUCCESS
> fi
>
> # start Postfix
> $binary $OPTIONS start >/dev/null 2>&1
> ret=$?
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix returned error." $ret
> exit $OCF_ERR_GENERIC
> fi
>
> exit $OCF_SUCCESS
> }
>
>
> postfix_stop()
> {
> if postfix_status; then

nope.
just stop it.
always.
do not do status first.

because, crm (or is it only lrm?) will do that for you anyways.

and because postfix stop is idempotent anyways.

as you have it now,
if someone does rm master.pid while postfix is running,
postfix_status says false,
monitor will catch a failure, may attempt a restart on this node
so start is attempted, which fails,
triggeres recovery action in crm for failed start, which -- correct me
if I'm wrong -- would try to stop it here, just in case, which would be
mapped to no-op because of the missing pidfile.

occasionally it would do a monitor again, which would tell it "not running",
and because of the failed start (or failed monitor) on this node, it is
then started on an other node. depending on whether or not it is bound
to a specific IP, and whether or not that IP is controlled by CRM, this
may result in multiple instances running.
which, again depending on the configuration, may be a very bad thing,
or completely harmless.


> $binary $OPTIONS stop >/dev/null 2>&1
> ret=$?
>
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix returned an error while stopping." $ret
> exit $OCF_ERR_GENERIC
> fi
>
> # grant some time for shutdown and recheck
> sleep 1
> if postfix_status; then
> ocf_log err "Postfix failed to stop."
> exit $OCF_ERR_GENERIC
> fi

if that should be necessary, you could escalate to "postfix abort", btw.


> fi
>
> exit $OCF_SUCCESS
> }
>
> postfix_reload()
> {
> if postfix_status; then
> ocf_log info "Reloading Postfix."
> $binary $OPTIONS reload
> fi

again: why do the "if running"?
crm should not even try to "reload" a "supposedly stopped" resource.
so it is expected to be running, if reload fails because it is actually
NOT running, then that is even better as it probably will trigger a
stop/start instead, which should clean it all up.

> }
>
> postfix_monitor()
> {
> if postfix_status; then
> return $OCF_SUCCESS
> fi
>
> return $OCF_NOT_RUNNING
> }
>
> postfix_validate_all()
> {
> # check that the Postfix binary exists and can be executed
> if [ ! -x "$binary" ]; then
> ocf_log err "Postfix binary '$binary' does not exist or cannot be executed."
> return $OCF_ERR_GENERIC
> fi
>
> # check config_dir and alternate_config_directories parameter
> if [ "x$config_dir" != "x" ]; then
> if [ ! -d "$config_dir" ]; then
> ocf_log err "Postfix configuration directory '$config_dir' does not exist." $ret
> return $OCF_ERR_GENERIC
> fi
>
> alternate_config_directories=$(postconf -h alternate_config_directories 2>/dev/null | grep $config_dir)
> if [ "x$alternate_config_directories" = "x" ]; then
> ocf_log err "Postfix main configuration must contain correct 'alternate_config_directories' parameter."
> return $OCF_ERR_GENERIC
> fi
> fi

Dejan already complained about all the GENERIC here, probably better
INSTALLED or CONFIGURED is more appropriate.

> # check spool/queue directory
> queue=$(postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null || echo /var/spool/postfix)
> if [ ! -d "$queue" ]; then
> ocf_log err "Postfix spool/queue directory '$queue' does not exist." $ret
> return $OCF_ERR_GENERIC
> fi
>
> # run postfix internal check
> $binary $OPTIONS check >/dev/null 2>&1
> ret=$?
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix check failed." $ret
> return $OCF_ERR_GENERIC
> fi
>
> return $OCF_SUCCESS
> }
>
> #
> # Main
> #
>
> if [ $# -ne 1 ]; then
> usage
> exit $OCF_ERR_ARGS
> fi
>
> binary=$OCF_RESKEY_binary
> config_dir=$OCF_RESKEY_config_dir
> parameters=$OCF_RESKEY_parameters
>
> # debugging stuff
> #echo OCF_RESKEY_binary=$OCF_RESKEY_binary >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
> #echo OCF_RESKEY_config_dir=$OCF_RESKEY_config_dir >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
> #echo OCF_RESKEY_parameters=$OCF_RESKEY_parameters >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
>
>
> # build postfix options string *outside* to access from each method
> OPTIONS=''
> OPTION_CONFIG_DIR=''
>
> # check if the Postfix config_dir exist
> if [ "x$config_dir" != "x" ]; then
> # save OPTION_CONFIG_DIR seperatly
> OPTION_CONFIG_DIR="-c $config_dir"
> OPTIONS=$OPTION_CONFIG_DIR
> fi
>
> if [ "x$parameters" != "x" ]; then
> OPTIONS="$OPTIONS $parameters"
> fi
>
> case $1 in
> meta-data) meta_data
> exit $OCF_SUCCESS
> ;;
>
> usage|help) usage
> exit $OCF_SUCCESS
> ;;
> esac
>
> postfix_validate_all
> ret=$?
>
> #echo "[$1:$ret]"
> LSB_STATUS_STOPPED=3
> if [ $ret -ne $OCF_SUCCESS ]; then
> case $1 in
> stop) exit $OCF_SUCCESS ;;
> monitor) exit $OCF_NOT_RUNNING;;
> status) exit $LSB_STATUS_STOPPED;;
> *) exit $ret;;
> esac
> fi
>
> case $1 in
> monitor) postfix_monitor
> exit $?
> ;;
> start) postfix_start

inconsistent.
why hide the exit in the sub function for some functions, but not others?

> ;;
>
> stop) postfix_stop
> ;;
>
> reload) postfix_reload
> ;;
>
> status) if postfix_status; then
> ocf_log info "Postfix is running."
> exit $OCF_SUCCESS
> else
> ocf_log info "Postfix is stopped."
> exit $OCF_NOT_RUNNING
> fi
> ;;
>
> monitor) postfix_monitor
> exit $?
> ;;

oops. double monitor.

> validate-all) exit $OCF_SUCCESS
> ;;
>
> *) usage
> exit $OCF_ERR_UNIMPLEMENTED
> ;;
> esac

Still with me?
Thanks for the effort!

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


t.d.lee at durham

May 29, 2009, 1:38 AM

Post #7 of 17 (1779 views)
Permalink
Re: postfix ocf ra [In reply to]

On Thu, 28 May 2009, Dejan Muhamedagic wrote:

> On Thu, May 28, 2009 at 08:33:15PM +0200, Raoul Bhatia [IPAX] wrote:
>> [...]
>> please find my current version attached.

Just a couple more comments, relating to portability.

>> ____________________________________________________________________
>
>> #!/bin/sh
>> [...]

The script later contains some features that are specific to "bash".
So that header should instead read: "#!/bin/bash".

>> ##########################################################################
>>
>> # Initialization:
>>
>> . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
>>
>> : ${OCF_RESKEY_binary="/usr/sbin/postfix"}

Some UN*X systems might have 'postfix' at other locations. So the
"/usr/sbin/postfix" should be a shell variable.

In general the technique is to use 'configure' to determine the pathname
and set it in a variable, then write lines such as the above using that
variable. Basically, whenever (as a linux-ha developers) we find
ourselves typing a pathname, we should use the configure method to feed it
through a shell variable.

(That fixed pathname and a couple of others are also used later in the
script; all such instances ought to use configure-derived variables.)

>> running() {
>> queue=$(postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null || echo /var/spool/postfix)
>> if [ -f ${queue}/pid/master.pid ]; then
>> pid=$(sed 's/ //g' ${queue}/pid/master.pid)
>> # what directory does the executable live in. stupid prelink systems.
>> dir=$(ls -l /proc/$pid/exe 2>/dev/null | sed 's/.* -> //; s/\/[^\/]*$//')
>> if [ "X$dir" = "X/usr/lib/postfix" ]; then
>> [...]

The above contains two non-Bourne features of 'bash' (e.g. "queue=$(...)")
hence my first comment about changing to "#!/bin/bash".

The "/proc/$pid/exe" is Linux-specific. (But if a script is known to be
Linux-sepcific (i.e. script itself will not be valid on (say) *BSD,
Darwin, Solaris, etc.) then such things are probably OK.)

The "/usr/lib/postfix" should come through configure-derived variables.


But don't let that discourage you! All the best.


--

: David Lee I.T. Service :
: Senior Systems Programmer Computer Centre :
: UNIX Team Leader Durham University :
: South Road :
: http://www.dur.ac.uk/t.d.lee/ Durham DH1 3LE :
: Phone: +44 191 334 2752 U.K. :
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


dejanmm at fastmail

May 29, 2009, 2:29 AM

Post #8 of 17 (1786 views)
Permalink
Re: postfix ocf ra [In reply to]

Hi David,

On Fri, May 29, 2009 at 09:38:19AM +0100, David Lee wrote:
> On Thu, 28 May 2009, Dejan Muhamedagic wrote:
>
> > On Thu, May 28, 2009 at 08:33:15PM +0200, Raoul Bhatia [IPAX] wrote:
> >> [...]
> >> please find my current version attached.
>
> Just a couple more comments, relating to portability.
>
> >> ____________________________________________________________________
> >
> >> #!/bin/sh
> >> [...]
>
> The script later contains some features that are specific to "bash".
> So that header should instead read: "#!/bin/bash".
>
> >> ##########################################################################
> >>
> >> # Initialization:
> >>
> >> . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
> >>
> >> : ${OCF_RESKEY_binary="/usr/sbin/postfix"}
>
> Some UN*X systems might have 'postfix' at other locations. So the
> "/usr/sbin/postfix" should be a shell variable.

You're right in general, but I think not so in this particular
case. postfix is out of scope of our project and this path is
just a guess (though probably good for the vast majority). Also,
it is only the default, so people having different installations
may set the binary attribute.

> In general the technique is to use 'configure' to determine the pathname
> and set it in a variable, then write lines such as the above using that
> variable. Basically, whenever (as a linux-ha developers) we find
> ourselves typing a pathname, we should use the configure method to feed it
> through a shell variable.
>
> (That fixed pathname and a couple of others are also used later in the
> script; all such instances ought to use configure-derived variables.)
>
> >> running() {
> >> queue=$(postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null || echo /var/spool/postfix)
> >> if [ -f ${queue}/pid/master.pid ]; then
> >> pid=$(sed 's/ //g' ${queue}/pid/master.pid)
> >> # what directory does the executable live in. stupid prelink systems.
> >> dir=$(ls -l /proc/$pid/exe 2>/dev/null | sed 's/.* -> //; s/\/[^\/]*$//')
> >> if [ "X$dir" = "X/usr/lib/postfix" ]; then
> >> [...]
>
> The above contains two non-Bourne features of 'bash' (e.g. "queue=$(...)")
> hence my first comment about changing to "#!/bin/bash".

OK. Didn't know that Bourne shell doesn't support $(...). I think
I may have that in some scripts too. In this case it's easy to
replace it with `...`.

> The "/proc/$pid/exe" is Linux-specific. (But if a script is known to be
> Linux-sepcific (i.e. script itself will not be valid on (say) *BSD,
> Darwin, Solaris, etc.) then such things are probably OK.)
>
> The "/usr/lib/postfix" should come through configure-derived variables.

Right. But those parts are removed anyway.

> But don't let that discourage you! All the best.

Cheers,

Dejan

>
>
> --
>
> : David Lee I.T. Service :
> : Senior Systems Programmer Computer Centre :
> : UNIX Team Leader Durham University :
> : South Road :
> : http://www.dur.ac.uk/t.d.lee/ Durham DH1 3LE :
> : Phone: +44 191 334 2752 U.K. :
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


dejanmm at fastmail

May 29, 2009, 2:58 AM

Post #9 of 17 (1783 views)
Permalink
Re: postfix ocf ra [In reply to]

Hi Lars,

On Thu, May 28, 2009 at 10:43:37PM +0200, Lars Ellenberg wrote:
> On Thu, May 28, 2009 at 08:33:15PM +0200, Raoul Bhatia [IPAX] wrote:
>
> the first part snipped here has just been skipped by me.
>
> > isRunning()
> > {
> > kill -0 "$1" 2>/dev/null
> > }
>
>
> what is that supposed to prove?
> it is only used after you checked for /proc/$pid/exe.
> so $pid is an existing process, otherwise the (linux specific?) /proc/$pid/ would not exist.
>
> whether or not it is "running" (and not STOPPED or ZOMBIE),
> kill -0 won't tell you either.

Right. Though in case of STOPPED we may still consider the
resource running, since it may CONTinue at any time.

> > # running() has been copied from debian's init script. we enhanced it a bit
> > running() {
> > queue=$(postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null || echo /var/spool/postfix)
>
> if postconf -h queue_directory does not work, this is a broken
> installation and should IMO not provide any other "default" value.
>
> > if [ -f ${queue}/pid/master.pid ]; then
>
> just because a pid file does not exist, it is not running?
> that is not exactly true.

Well spotted.

> > pid=$(sed 's/ //g' ${queue}/pid/master.pid)
>
> if you just use $pid (not "$pid"), shell did the whitespace stripping
> for you, so this is unnecessary.
> if you trust the input from that file (I think you should), just use $pid.
> if you don't, you have an other problem a cluster manager will not
> protect you against.
>
> > # what directory does the executable live in. stupid prelink systems.
> > dir=$(ls -l /proc/$pid/exe 2>/dev/null | sed 's/.* -> //; s/\/[^\/]*$//')
> > if [ "X$dir" = "X/usr/lib/postfix" ]; then
>
> what is that supposed to do?
> what does it guard against?
> stale pidfile pointing to other process with recycled pid?
> I don't like that.

That part flies anyway.

[...]

> > postfix_status()
> > {
> > running
> > }
>
> maybe just do "postqueue -q 2>&1 | head -n1" and see if it reads
> "postqueue: warning: Mail system is down -- accessing queue directly"
> ?

This sounds like a very good idea.

> > postfix_start()
> > {
> > # if Postfix is running return success
> > if postfix_status; then
> > ocf_log info "Postfix already running."
> > exit $OCF_SUCCESS
> > fi
> >
> > # start Postfix
> > $binary $OPTIONS start >/dev/null 2>&1
> > ret=$?
> > if [ $ret -ne 0 ]; then
> > ocf_log err "Postfix returned error." $ret
> > exit $OCF_ERR_GENERIC
> > fi
> >
> > exit $OCF_SUCCESS
> > }
> >
> >
> > postfix_stop()
> > {
> > if postfix_status; then
>
> nope.
> just stop it.
> always.
> do not do status first.
>
> because, crm (or is it only lrm?) will do that for you anyways.
>
> and because postfix stop is idempotent anyways.
>
> as you have it now,
> if someone does rm master.pid while postfix is running,
> postfix_status says false,
> monitor will catch a failure, may attempt a restart on this node
> so start is attempted, which fails,
>
> triggeres recovery action in crm for failed start, which -- correct me
> if I'm wrong -- would try to stop it here, just in case, which would be
> mapped to no-op because of the missing pidfile.
>
> occasionally it would do a monitor again, which would tell it "not running",
> and because of the failed start (or failed monitor) on this node, it is
> then started on an other node. depending on whether or not it is bound
> to a specific IP, and whether or not that IP is controlled by CRM, this
> may result in multiple instances running.
> which, again depending on the configuration, may be a very bad thing,
> or completely harmless.

True, but the monitor action must be fixed to always tell the
truth. No harm in doing monitor before stop. If the RA logs the
case when the resource has already been stopped, perhaps that can
help sometimes with troubleshooting.

> > $binary $OPTIONS stop >/dev/null 2>&1
> > ret=$?
> >
> > if [ $ret -ne 0 ]; then
> > ocf_log err "Postfix returned an error while stopping." $ret
> > exit $OCF_ERR_GENERIC
> > fi
> >
> > # grant some time for shutdown and recheck
> > sleep 1
> > if postfix_status; then
> > ocf_log err "Postfix failed to stop."
> > exit $OCF_ERR_GENERIC
> > fi
>
> if that should be necessary, you could escalate to "postfix abort", btw.

This is also good point. I think that at times the spool
directory may reside on an NFS. I wonder if "abort" would work in
that case too. Forgot to mention that it could be good thing to,
as often it is done with start, to put monitor in a loop and wait
for the stop to timeout in case there's a bit of housekeeping to
be done.

> > case $1 in
> > monitor) postfix_monitor
> > exit $?
> > ;;
> > start) postfix_start
>
> inconsistent.
> why hide the exit in the sub function for some functions, but not others?

Probably because monitor/status is used elsewhere.

[...]

> Still with me?

Hope so :)

> Thanks for the effort!

Cheers,

Dejan

>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


juventus8666 at sohu

Jun 4, 2009, 12:54 AM

Post #10 of 17 (1650 views)
Permalink
Re: postfix ocf ra [In reply to]

I would like to ask٬Integral strategy on the configuration of heartbeat


2009-06-04



juventus8666



·˘ĽţČËŁş Dejan Muhamedagic
·˘ËÍʱĽäŁş 2009-05-29 17:58:43
ĘŐĽţČËŁş High-Availability Linux Development List
ł­ËÍŁş
Ö÷Ě⣺ Re: [Linux-ha-dev] postfix ocf ra

Hi Lars,
On Thu, May 28, 2009 at 10:43:37PM +0200, Lars Ellenberg wrote:
> On Thu, May 28, 2009 at 08:33:15PM +0200, Raoul Bhatia [IPAX] wrote:
>
> the first part snipped here has just been skipped by me.
>
> > isRunning()
> > {
> > kill -0 "$1" 2>/dev/null
> > }
>
>
> what is that supposed to prove?
> it is only used after you checked for /proc/$pid/exe.
> so $pid is an existing process, otherwise the (linux specific?) /proc/$pid/ would not exist.
>
> whether or not it is "running" (and not STOPPED or ZOMBIE),
> kill -0 won't tell you either.
Right. Though in case of STOPPED we may still consider the
resource running, since it may CONTinue at any time.
> > # running() has been copied from debian's init script. we enhanced it a bit
> > running() {
> > queue=$(postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null || echo /var/spool/postfix)
>
> if postconf -h queue_directory does not work, this is a broken
> installation and should IMO not provide any other "default" value.
>
> > if [ -f ${queue}/pid/master.pid ]; then
>
> just because a pid file does not exist, it is not running?
> that is not exactly true.
Well spotted.
> > pid=$(sed 's/ //g' ${queue}/pid/master.pid)
>
> if you just use $pid (not "$pid"), shell did the whitespace stripping
> for you, so this is unnecessary.
> if you trust the input from that file (I think you should), just use $pid.
> if you don't, you have an other problem a cluster manager will not
> protect you against.
>
> > # what directory does the executable live in. stupid prelink systems.
> > dir=$(ls -l /proc/$pid/exe 2>/dev/null | sed 's/.* -> //; s/\/[^\/]*$//')
> > if [ "X$dir" = "X/usr/lib/postfix" ]; then
>
> what is that supposed to do?
> what does it guard against?
> stale pidfile pointing to other process with recycled pid?
> I don't like that.
That part flies anyway.
[...]
> > postfix_status()
> > {
> > running
> > }
>
> maybe just do "postqueue -q 2>&1 | head -n1" and see if it reads
> "postqueue: warning: Mail system is down -- accessing queue directly"
> ?
This sounds like a very good idea.
> > postfix_start()
> > {
> > # if Postfix is running return success
> > if postfix_status; then
> > ocf_log info "Postfix already running."
> > exit $OCF_SUCCESS
> > fi
> >
> > # start Postfix
> > $binary $OPTIONS start >/dev/null 2>&1
> > ret=$?
> > if [ $ret -ne 0 ]; then
> > ocf_log err "Postfix returned error." $ret
> > exit $OCF_ERR_GENERIC
> > fi
> >
> > exit $OCF_SUCCESS
> > }
> >
> >
> > postfix_stop()
> > {
> > if postfix_status; then
>
> nope.
> just stop it.
> always.
> do not do status first.
>
> because, crm (or is it only lrm?) will do that for you anyways.
>
> and because postfix stop is idempotent anyways.
>
> as you have it now,
> if someone does rm master.pid while postfix is running,
> postfix_status says false,
> monitor will catch a failure, may attempt a restart on this node
> so start is attempted, which fails,
>
> triggeres recovery action in crm for failed start, which -- correct me
> if I'm wrong -- would try to stop it here, just in case, which would be
> mapped to no-op because of the missing pidfile.
>
> occasionally it would do a monitor again, which would tell it "not running",
> and because of the failed start (or failed monitor) on this node, it is
> then started on an other node. depending on whether or not it is bound
> to a specific IP, and whether or not that IP is controlled by CRM, this
> may result in multiple instances running.
> which, again depending on the configuration, may be a very bad thing,
> or completely harmless.
True, but the monitor action must be fixed to always tell the
truth. No harm in doing monitor before stop. If the RA logs the
case when the resource has already been stopped, perhaps that can
help sometimes with troubleshooting.
> > $binary $OPTIONS stop >/dev/null 2>&1
> > ret=$?
> >
> > if [ $ret -ne 0 ]; then
> > ocf_log err "Postfix returned an error while stopping." $ret
> > exit $OCF_ERR_GENERIC
> > fi
> >
> > # grant some time for shutdown and recheck
> > sleep 1
> > if postfix_status; then
> > ocf_log err "Postfix failed to stop."
> > exit $OCF_ERR_GENERIC
> > fi
>
> if that should be necessary, you could escalate to "postfix abort", btw.
This is also good point. I think that at times the spool
directory may reside on an NFS. I wonder if "abort" would work in
that case too. Forgot to mention that it could be good thing to,
as often it is done with start, to put monitor in a loop and wait
for the stop to timeout in case there's a bit of housekeeping to
be done.
> > case $1 in
> > monitor) postfix_monitor
> > exit $?
> > ;;
> > start) postfix_start
>
> inconsistent.
> why hide the exit in the sub function for some functions, but not others?
Probably because monitor/status is used elsewhere.
[...]
> Still with me?
Hope so :)
> Thanks for the effort!
Cheers,
Dejan
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


r.bhatia at ipax

Jun 23, 2009, 4:33 AM

Post #11 of 17 (1432 views)
Permalink
Re: postfix ocf ra [In reply to]

juventus8666 wrote:
> I would like to ask٬Integral strategy on the configuration of heartbeat

can you please elaborate on what you mean by that?

cheers,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


r.bhatia at ipax

Jun 23, 2009, 5:13 AM

Post #12 of 17 (1432 views)
Permalink
Re: postfix ocf ra [In reply to]

hi,

i'm reworking my script right now. commenting inline.

Dejan Muhamedagic wrote:
>> the first part snipped here has just been skipped by me.
>>
>>> isRunning()
>>> {
>>> kill -0 "$1" 2>/dev/null
>>> }
>>
>> what is that supposed to prove?
>> it is only used after you checked for /proc/$pid/exe.
>> so $pid is an existing process, otherwise the (linux specific?) /proc/$pid/ would not exist.
>>
>> whether or not it is "running" (and not STOPPED or ZOMBIE),
>> kill -0 won't tell you either.
>
> Right. Though in case of STOPPED we may still consider the
> resource running, since it may CONTinue at any time.

i used this as kill -0/isRunning() is used in several other ocf RAs.
i'm leaving this for now.

>>> # running() has been copied from debian's init script. we enhanced it a bit
>>> running() {
>>> queue=$(postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null || echo /var/spool/postfix)
>> if postconf -h queue_directory does not work, this is a broken
>> installation and should IMO not provide any other "default" value.

i c/p this from the debian init script. what shall we do in this case?

>>> if [ -f ${queue}/pid/master.pid ]; then
>> just because a pid file does not exist, it is not running?
>> that is not exactly true.
>
> Well spotted.

any suggestions on what to do? i think it is also not valid to
check for running master/qmgr/etc. processes as there might be
several postfix instances on the same server (think one local
postfix for e.g. cron and one clustered one)

>>> pid=$(sed 's/ //g' ${queue}/pid/master.pid)
>> if you just use $pid (not "$pid"), shell did the whitespace stripping
>> for you, so this is unnecessary.
>> if you trust the input from that file (I think you should), just use $pid.
>> if you don't, you have an other problem a cluster manager will not
>> protect you against.

point taken, thank you!

>>> # what directory does the executable live in. stupid prelink systems.
>>> dir=$(ls -l /proc/$pid/exe 2>/dev/null | sed 's/.* -> //; s/\/[^\/]*$//')
>>> if [ "X$dir" = "X/usr/lib/postfix" ]; then
>> what is that supposed to do?
>> what does it guard against?
>> stale pidfile pointing to other process with recycled pid?
>> I don't like that.
>
> That part flies anyway.

this has been removed.

>>> postfix_status()
>>> {
>>> running
>>> }
>> maybe just do "postqueue -q 2>&1 | head -n1" and see if it reads
>> "postqueue: warning: Mail system is down -- accessing queue directly"
>> ?
>
> This sounds like a very good idea.

mhm - shall the complete running() function be replaced then?
or shall this be an additional check?

i just (think i) verified that "rm master.pid" does not stop
postqueue -p from working.

>>> postfix_stop()
>>> {
>>> if postfix_status; then
>> nope.
>> just stop it.
>> always.
>> do not do status first.
>>
>> because, crm (or is it only lrm?) will do that for you anyways.
>>
>> and because postfix stop is idempotent anyways.
>>
>> as you have it now,
>> if someone does rm master.pid while postfix is running,
>> postfix_status says false,
>> monitor will catch a failure, may attempt a restart on this node
>> so start is attempted, which fails,
>>
>> triggeres recovery action in crm for failed start, which -- correct me
>> if I'm wrong -- would try to stop it here, just in case, which would be
>> mapped to no-op because of the missing pidfile.
>>
>> occasionally it would do a monitor again, which would tell it "not running",
>> and because of the failed start (or failed monitor) on this node, it is
>> then started on an other node. depending on whether or not it is bound
>> to a specific IP, and whether or not that IP is controlled by CRM, this
>> may result in multiple instances running.
>> which, again depending on the configuration, may be a very bad thing,
>> or completely harmless.
>
> True, but the monitor action must be fixed to always tell the
> truth. No harm in doing monitor before stop. If the RA logs the
> case when the resource has already been stopped, perhaps that can
> help sometimes with troubleshooting.

so, should it stay or should it go?

>>> $binary $OPTIONS stop >/dev/null 2>&1
>>> ret=$?
>>>
>>> if [ $ret -ne 0 ]; then
>>> ocf_log err "Postfix returned an error while stopping." $ret
>>> exit $OCF_ERR_GENERIC
>>> fi
>>>
>>> # grant some time for shutdown and recheck
>>> sleep 1
>>> if postfix_status; then
>>> ocf_log err "Postfix failed to stop."
>>> exit $OCF_ERR_GENERIC
>>> fi
>> if that should be necessary, you could escalate to "postfix abort", btw.
>
> This is also good point. I think that at times the spool
> directory may reside on an NFS. I wonder if "abort" would work in
> that case too. Forgot to mention that it could be good thing to,
> as often it is done with start, to put monitor in a loop and wait
> for the stop to timeout in case there's a bit of housekeeping to
> be done.

a lot of scripts limit the retries to say 20 times so that we're
not stuck in an infinite loop. i would prefer this method.

anyways, i reworked this part and would like your feedback.
please wait until i finished my internal tests though.

>>> case $1 in
>>> monitor) postfix_monitor
>>> exit $?
>>> ;;
>>> start) postfix_start
>> inconsistent.
>> why hide the exit in the sub function for some functions, but not others?
>
> Probably because monitor/status is used elsewhere.

yes. i am reworking this part though

>> Thanks for the effort!

thank you too!

cheers,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


r.bhatia at ipax

Jun 23, 2009, 5:57 AM

Post #13 of 17 (1430 views)
Permalink
Re: postfix ocf ra [In reply to]

Raoul Bhatia [IPAX] wrote:
> i'm reworking my script right now. commenting inline.

i just finished updating the postfix ocf ra and am summarizing the
changes:

* isRunning() stays as this is also used in other ras
* i left running() as well (where i check the master.pid file)
but am ready to rewrite it to use "postqueue -p" or "postfix status"
in addition or exclusively - waiting for your feedback
* i removed $() bashism
* removed "pid=$(sed 's/ //g' ${queue}/pid/master.pid)"
* as of now, removed the postfix_monitor check on "stop"
* waiting 5 seconds for postfix shutdown, then escalating to "abort"
* removed exits inside the functions and replaced it with return.

did i miss something from your feedback?
do you have any further comments?

thanks,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
Attachments: postfix (8.75 KB)


r.bhatia at ipax

Jun 30, 2009, 3:36 AM

Post #14 of 17 (1325 views)
Permalink
Re: postfix ocf ra [In reply to]

*bump* ;)

Raoul Bhatia [IPAX] wrote:
> Raoul Bhatia [IPAX] wrote:
>> i'm reworking my script right now. commenting inline.
>
> i just finished updating the postfix ocf ra and am summarizing the
> changes:
>
> * isRunning() stays as this is also used in other ras
> * i left running() as well (where i check the master.pid file)
> but am ready to rewrite it to use "postqueue -p" or "postfix status"
> in addition or exclusively - waiting for your feedback
> * i removed $() bashism
> * removed "pid=$(sed 's/ //g' ${queue}/pid/master.pid)"
> * as of now, removed the postfix_monitor check on "stop"
> * waiting 5 seconds for postfix shutdown, then escalating to "abort"
> * removed exits inside the functions and replaced it with return.
>
> did i miss something from your feedback?
> do you have any further comments?
>
> thanks,
> raoul
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/


--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


dejanmm at fastmail

Jul 2, 2009, 7:36 AM

Post #15 of 17 (1259 views)
Permalink
Re: postfix ocf ra [In reply to]

Hi Raoul,

Sorry for the delay, somehow I missed the last two messages.

On Tue, Jun 23, 2009 at 02:57:52PM +0200, Raoul Bhatia [IPAX] wrote:
> Raoul Bhatia [IPAX] wrote:
> > i'm reworking my script right now. commenting inline.
>
> i just finished updating the postfix ocf ra and am summarizing the
> changes:
>
> * isRunning() stays as this is also used in other ras
> * i left running() as well (where i check the master.pid file)
> but am ready to rewrite it to use "postqueue -p" or "postfix status"
> in addition or exclusively - waiting for your feedback

In addition to testing for the pidfile, you could also check if
there's a process holding the spool directory, sth like:

rondo:~ # postconf -h queue_directory
/var/spool/postfix
rondo:~ # fuser /var/spool/postfix/
/var/spool/postfix: 5332c 5365c 8313c

Perhaps:

rondo:~ # fuser -v /var/spool/postfix/ 2>&1 | grep -w master
/var/spool/postfix: root 5332 ..c.. master

> * i removed $() bashism
> * removed "pid=$(sed 's/ //g' ${queue}/pid/master.pid)"
> * as of now, removed the postfix_monitor check on "stop"
> * waiting 5 seconds for postfix shutdown, then escalating to "abort"
> * removed exits inside the functions and replaced it with return.
>
> did i miss something from your feedback?
> do you have any further comments?

Lars said:

>> if postconf -h queue_directory does not work, this is a broken
>> installation and should IMO not provide any other "default"
>> value.

and I'd agree with this. It's really important that resources are
properly configured.

Thanks,

Dejan

> thanks,
> raoul
> --
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
> Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
> 1190 Wien tel. +43 1 3670030
> FN 277995t HG Wien fax. +43 1 3670030 15
> ____________________________________________________________________

> #!/bin/sh
> #
> # Resource script for Postfix
> #
> # Description: Manages Postfix as an OCF resource in
> # an high-availability setup.
> #
> # Tested with postfix 2.5.5 on Debian 5.0.
> # Based on the mysql-proxy and mysql OCF resource agents.
> #
> # Author: Raoul Bhatia <r.bhatia[at]ipax.at> : Original Author
> # License: GNU General Public License (GPL)
> # Note: if you want to run multiple postfix instances, please see
> # http://amd.co.at/adminwiki/Postfix#Adding_a_Second_Postfix_Instance_on_one_Server
> # http://www.postfix.org/postconf.5.html
> #
> #
> # usage: $0 {start|stop|reload|status|monitor|validate-all|meta-data}
> #
> # The "start" arg starts a Postfix instance
> #
> # The "stop" arg stops it.
> #
> #
> # Test via
> # * /usr/sbin/ocf-tester -n post1 /usr/lib/ocf/resource.d/heartbeat/postfix
> # * /usr/sbin/ocf-tester -n post1 -o binary="/usr/sbin/postfix"
> # -o config_dir="" /usr/lib/ocf/resource.d/heartbeat/postfix
> # * /usr/sbin/ocf-tester -n post1 -o binary="/usr/sbin/postfix"
> # -o config_dir="/root/postfix/" /usr/lib/ocf/resource.d/heartbeat/postfix
> #
> #
> # OCF parameters:
> # OCF_RESKEY_binary
> # OCF_RESKEY_config_dir
> # OCF_RESKEY_parameters
> #
> ##########################################################################
>
> # Initialization:
>
> . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
>
> : ${OCF_RESKEY_binary="/usr/sbin/postfix"}
> : ${OCF_RESKEY_config_dir=""}
> : ${OCF_RESKEY_parameters=""}
> USAGE="Usage: $0 {start|stop|reload|status|monitor|validate-all|meta-data}";
>
> ##########################################################################
>
> usage() {
> echo $USAGE >&2
> }
>
> meta_data() {
> cat <<END
> <?xml version="1.0"?>
> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
> <resource-agent name="postfix">
> <version>0.1</version>
> <longdesc lang="en">
> This script manages Postfix as an OCF resource in a high-availability setup.
> Tested with Postfix 2.5.5 on Debian 5.0.
> </longdesc>
> <shortdesc lang="en">OCF Resource Agent compliant Postfix script.</shortdesc>
>
> <parameters>
>
> <parameter name="binary" unique="0" required="0">
> <longdesc lang="en">
> Full path to the Postfix binary.
> For example, "/usr/sbin/postfix".
> </longdesc>
> <shortdesc lang="en">Full path to Postfix binary</shortdesc>
> <content type="string" default="/usr/sbin/postfix" />
> </parameter>
>
> <parameter name="config_dir" unique="1" required="0">
> <longdesc lang="en">
> Full path to a Postfix configuration directory.
> For example, "/etc/postfix".
> </longdesc>
> <shortdesc lang="en">Full path to configuration directory</shortdesc>
> <content type="string" default="" />
> </parameter>
>
> <parameter name="parameters" unique="0" required="0">
> <longdesc lang="en">
> The Postfix daemon may be called with additional parameters.
> Specify any of them here.
> </longdesc>
> <shortdesc lang="en"></shortdesc>
> <content type="string" default="" />
> </parameter>
>
> </parameters>
>
> <actions>
> <action name="start" timeout="90" />
> <action name="stop" timeout="100" />
> <action name="reload" timeout="100" />
> <action name="monitor" depth="10" timeout="20s" interval="60s" start-delay="0" />
> <action name="validate-all" timeout="30s" />
> <action name="meta-data" timeout="5s" />
> </actions>
> </resource-agent>
> END
> }
>
> isRunning()
> {
> kill -0 "$1" 2>/dev/null
> }
>
> # running() has been copied from debian's init script. we enhanced it a bit
> # @TODO rb 2009-06-23 maybe try "postqueue -p 2>&1 | head -n1 | grep 'Mail system is down' && false
> # @TODO rb 2009-06-23 maybe try "$binary $OPTIONS status" instead?
> running() {
> queue=`postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null || echo /var/spool/postfix`
> pid_dir=`postconf $OPTION_CONFIG_DIR -h process_id_directory 2>/dev/null`
> pidfile="${queue}/${pid_dir}/master.pid"
>
> if [ -f "${pidfile}" ]; then
> # @TODO Could the master process become zombie?
> pid=`cat ${pidfile}`
> if isRunning $pid; then
> # @TODO why does "true" not work here?
> #true
> return $OCF_SUCCESS
> fi
> fi
>
> # Postfix is not running
> false
> }
>
>
> postfix_status()
> {
> running
> }
>
> postfix_start()
> {
> # if Postfix is running return success
> if postfix_status; then
> ocf_log info "Postfix already running."
> return $OCF_SUCCESS
> fi
>
> # start Postfix
> $binary $OPTIONS start >/dev/null 2>&1
> ret=$?
>
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix returned error." $ret
> return $OCF_ERR_GENERIC
> fi
>
> return $OCF_SUCCESS
> }
>
>
> postfix_stop()
> {
> $binary $OPTIONS stop >/dev/null 2>&1
> ret=$?
>
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix returned an error while stopping." $ret
> return $OCF_ERR_GENERIC
> fi
>
> # grant some time for shutdown and recheck 5 times
> for i in 1 2 3 4 5; do
> if postfix_status; then
> sleep 1
> fi
> done
>
> # escalate to abort if we did not stop by now
> # @TODO shall we loop here too?
> if postfix_status; then
> ocf_log err "Postfix failed to stop. Escalating to 'abort'"
>
> $binary $OPTIONS abort >/dev/null 2>&1; ret=$?
> sleep 5
> postfix_status && $OCF_ERR_GENERIC
> fi
>
> return $OCF_SUCCESS
> }
>
> postfix_reload()
> {
> if postfix_status; then
> ocf_log info "Reloading Postfix."
> $binary $OPTIONS reload
> fi
> }
>
> postfix_monitor()
> {
> if postfix_status; then
> return $OCF_SUCCESS
> fi
>
> return $OCF_NOT_RUNNING
> }
>
> postfix_validate_all()
> {
> # check that the Postfix binary exists and can be executed
> if [ ! -x "$binary" ]; then
> ocf_log err "Postfix binary '$binary' does not exist or cannot be executed."
> return $OCF_ERR_GENERIC
> fi
>
> # check config_dir and alternate_config_directories parameter
> if [ "x$config_dir" != "x" ]; then
> if [ ! -d "$config_dir" ]; then
> ocf_log err "Postfix configuration directory '$config_dir' does not exist." $ret
> return $OCF_ERR_GENERIC
> fi
>
> alternate_config_directories=`postconf -h alternate_config_directories 2>/dev/null | grep $config_dir`
> if [ "x$alternate_config_directories" = "x" ]; then
> ocf_log err "Postfix main configuration must contain correct 'alternate_config_directories' parameter."
> return $OCF_ERR_GENERIC
> fi
> fi
>
> # check spool/queue directory
> queue=`postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null || echo /var/spool/postfix`
> if [ ! -d "$queue" ]; then
> ocf_log err "Postfix spool/queue directory '$queue' does not exist." $ret
> return $OCF_ERR_GENERIC
> fi
>
> # run postfix internal check
> $binary $OPTIONS check >/dev/null 2>&1
> ret=$?
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix 'check' failed." $ret
> return $OCF_ERR_GENERIC
> fi
>
> return $OCF_SUCCESS
> }
>
> #
> # Main
> #
>
> if [ $# -ne 1 ]; then
> usage
> exit $OCF_ERR_ARGS
> fi
>
> binary=$OCF_RESKEY_binary
> config_dir=$OCF_RESKEY_config_dir
> parameters=$OCF_RESKEY_parameters
>
> # debugging stuff
> #echo OCF_RESKEY_binary=$OCF_RESKEY_binary >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
> #echo OCF_RESKEY_config_dir=$OCF_RESKEY_config_dir >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
> #echo OCF_RESKEY_parameters=$OCF_RESKEY_parameters >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
>
>
> # build postfix options string *outside* to access from each method
> OPTIONS=''
> OPTION_CONFIG_DIR=''
>
> # check if the Postfix config_dir exist
> if [ "x$config_dir" != "x" ]; then
> # save OPTION_CONFIG_DIR seperatly
> OPTION_CONFIG_DIR="-c $config_dir"
> OPTIONS=$OPTION_CONFIG_DIR
> fi
>
> if [ "x$parameters" != "x" ]; then
> OPTIONS="$OPTIONS $parameters"
> fi
>
> case $1 in
> meta-data) meta_data
> exit $OCF_SUCCESS
> ;;
>
> usage|help) usage
> exit $OCF_SUCCESS
> ;;
> esac
>
> postfix_validate_all
> ret=$?
>
> #echo "debug[$1:$ret]"
> LSB_STATUS_STOPPED=3
> if [ $ret -ne $OCF_SUCCESS ]; then
> case $1 in
> stop) exit $OCF_SUCCESS ;;
> monitor) exit $OCF_NOT_RUNNING;;
> status) exit $LSB_STATUS_STOPPED;;
> *) exit $ret;;
> esac
> fi
>
> case $1 in
> monitor) postfix_monitor
> exit $?
> ;;
> start) postfix_start
> exit $?
> ;;
>
> stop) postfix_stop
> exit $?
> ;;
>
> reload) postfix_reload
> exit $?
> ;;
>
> status) if postfix_status; then
> ocf_log info "Postfix is running."
> exit $OCF_SUCCESS
> else
> ocf_log info "Postfix is stopped."
> exit $OCF_NOT_RUNNING
> fi
> ;;
>
> monitor) postfix_monitor
> exit $?
> ;;
>
> validate-all) exit $OCF_SUCCESS
> ;;
>
> *) usage
> exit $OCF_ERR_UNIMPLEMENTED
> ;;
> esac

> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


r.bhatia at ipax

Jul 15, 2009, 6:42 AM

Post #16 of 17 (1079 views)
Permalink
Re: postfix ocf ra [In reply to]

hi dejan,

sorry for the late reply, i've been on vacation and am still catching
up at work.

thanks for your feedback, please see below.

Dejan Muhamedagic wrote:
> Hi Raoul,
>
> Sorry for the delay, somehow I missed the last two messages.
>
> On Tue, Jun 23, 2009 at 02:57:52PM +0200, Raoul Bhatia [IPAX] wrote:
>> Raoul Bhatia [IPAX] wrote:
>>> i'm reworking my script right now. commenting inline.
>> i just finished updating the postfix ocf ra and am summarizing the
>> changes:
>>
>> * isRunning() stays as this is also used in other ras
>> * i left running() as well (where i check the master.pid file)
>> but am ready to rewrite it to use "postqueue -p" or "postfix status"
>> in addition or exclusively - waiting for your feedback
>
> In addition to testing for the pidfile, you could also check if
> there's a process holding the spool directory, sth like:
>
> rondo:~ # postconf -h queue_directory
> /var/spool/postfix
> rondo:~ # fuser /var/spool/postfix/
> /var/spool/postfix: 5332c 5365c 8313c
>
> Perhaps:
>
> rondo:~ # fuser -v /var/spool/postfix/ 2>&1 | grep -w master
> /var/spool/postfix: root 5332 ..c.. master

i'm now checking more indepth for:
1. empty queue_directory
2. pidfile
3. "postfix status"
4. postqueue ... | grep 'Mail system is down'
5. fuser -v $queue

is this ok? feel free to remove some checks

>> * i removed $() bashism
>> * removed "pid=$(sed 's/ //g' ${queue}/pid/master.pid)"
>> * as of now, removed the postfix_monitor check on "stop"
>> * waiting 5 seconds for postfix shutdown, then escalating to "abort"
>> * removed exits inside the functions and replaced it with return.
>>
>> did i miss something from your feedback?
>> do you have any further comments?
>
> Lars said:
>
>>> if postconf -h queue_directory does not work, this is a broken
>>> installation and should IMO not provide any other "default"
>>> value.
>
> and I'd agree with this. It's really important that resources are
> properly configured.

i'm catching this now but am not sure if i'm correctly handling this
case in "isRunning()". maybe checking this inside validate_all() is
good enough?

cheers,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________
Attachments: postfix (9.26 KB)


dejanmm at fastmail

Jul 16, 2009, 2:45 AM

Post #17 of 17 (1054 views)
Permalink
Re: postfix ocf ra [In reply to]

Hi Raoul,

On Wed, Jul 15, 2009 at 03:42:16PM +0200, Raoul Bhatia [IPAX] wrote:
> hi dejan,
>
> sorry for the late reply, i've been on vacation and am still catching
> up at work.
>
> thanks for your feedback, please see below.
>
> Dejan Muhamedagic wrote:
> > Hi Raoul,
> >
> > Sorry for the delay, somehow I missed the last two messages.
> >
> > On Tue, Jun 23, 2009 at 02:57:52PM +0200, Raoul Bhatia [IPAX] wrote:
> >> Raoul Bhatia [IPAX] wrote:
> >>> i'm reworking my script right now. commenting inline.
> >> i just finished updating the postfix ocf ra and am summarizing the
> >> changes:
> >>
> >> * isRunning() stays as this is also used in other ras
> >> * i left running() as well (where i check the master.pid file)
> >> but am ready to rewrite it to use "postqueue -p" or "postfix status"
> >> in addition or exclusively - waiting for your feedback
> >
> > In addition to testing for the pidfile, you could also check if
> > there's a process holding the spool directory, sth like:
> >
> > rondo:~ # postconf -h queue_directory
> > /var/spool/postfix
> > rondo:~ # fuser /var/spool/postfix/
> > /var/spool/postfix: 5332c 5365c 8313c
> >
> > Perhaps:
> >
> > rondo:~ # fuser -v /var/spool/postfix/ 2>&1 | grep -w master
> > /var/spool/postfix: root 5332 ..c.. master
>
> i'm now checking more indepth for:
> 1. empty queue_directory

For monitor? Why?

> 2. pidfile
> 3. "postfix status"
> 4. postqueue ... | grep 'Mail system is down'
> 5. fuser -v $queue
>
> is this ok? feel free to remove some checks

It should be enough just to check for the process. First using
pidfile and if that doesn't work then with fuser.

> >> * i removed $() bashism
> >> * removed "pid=$(sed 's/ //g' ${queue}/pid/master.pid)"
> >> * as of now, removed the postfix_monitor check on "stop"
> >> * waiting 5 seconds for postfix shutdown, then escalating to "abort"
> >> * removed exits inside the functions and replaced it with return.
> >>
> >> did i miss something from your feedback?
> >> do you have any further comments?
> >
> > Lars said:
> >
> >>> if postconf -h queue_directory does not work, this is a broken
> >>> installation and should IMO not provide any other "default"
> >>> value.
> >
> > and I'd agree with this. It's really important that resources are
> > properly configured.
>
> i'm catching this now but am not sure if i'm correctly handling this
> case in "isRunning()".

Just check if that returns a valid directory?

> maybe checking this inside validate_all() is
> good enough?

Not sure. Your preference :)

Thanks,

Dejan

> cheers,
> raoul
> --
> ____________________________________________________________________
> DI (FH) Raoul Bhatia M.Sc. email. r.bhatia[at]ipax.at
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
> Barawitzkagasse 10/2/2/11 email. office[at]ipax.at
> 1190 Wien tel. +43 1 3670030
> FN 277995t HG Wien fax. +43 1 3670030 15
> ____________________________________________________________________

> #!/bin/sh
> #
> # Resource script for Postfix
> #
> # Description: Manages Postfix as an OCF resource in
> # an high-availability setup.
> #
> # Tested with postfix 2.5.5 on Debian 5.0.
> # Based on the mysql-proxy and mysql OCF resource agents.
> #
> # Author: Raoul Bhatia <r.bhatia[at]ipax.at> : Original Author
> # License: GNU General Public License (GPL)
> # Note: if you want to run multiple postfix instances, please see
> # http://amd.co.at/adminwiki/Postfix#Adding_a_Second_Postfix_Instance_on_one_Server
> # http://www.postfix.org/postconf.5.html
> #
> #
> # usage: $0 {start|stop|reload|status|monitor|validate-all|meta-data}
> #
> # The "start" arg starts a Postfix instance
> #
> # The "stop" arg stops it.
> #
> #
> # Test via
> # * /usr/sbin/ocf-tester -n post1 /usr/lib/ocf/resource.d/heartbeat/postfix
> # * /usr/sbin/ocf-tester -n post1 -o binary="/usr/sbin/postfix"
> # -o config_dir="" /usr/lib/ocf/resource.d/heartbeat/postfix
> # * /usr/sbin/ocf-tester -n post1 -o binary="/usr/sbin/postfix"
> # -o config_dir="/root/postfix/" /usr/lib/ocf/resource.d/heartbeat/postfix
> #
> #
> # OCF parameters:
> # OCF_RESKEY_binary
> # OCF_RESKEY_config_dir
> # OCF_RESKEY_parameters
> #
> ##########################################################################
>
> # Initialization:
>
> . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
>
> : ${OCF_RESKEY_binary="/usr/sbin/postfix"}
> : ${OCF_RESKEY_config_dir=""}
> : ${OCF_RESKEY_parameters=""}
> USAGE="Usage: $0 {start|stop|reload|status|monitor|validate-all|meta-data}";
>
> ##########################################################################
>
> usage() {
> echo $USAGE >&2
> }
>
> meta_data() {
> cat <<END
> <?xml version="1.0"?>
> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
> <resource-agent name="postfix">
> <version>0.1</version>
> <longdesc lang="en">
> This script manages Postfix as an OCF resource in a high-availability setup.
> Tested with Postfix 2.5.5 on Debian 5.0.
> </longdesc>
> <shortdesc lang="en">OCF Resource Agent compliant Postfix script.</shortdesc>
>
> <parameters>
>
> <parameter name="binary" unique="0" required="0">
> <longdesc lang="en">
> Full path to the Postfix binary.
> For example, "/usr/sbin/postfix".
> </longdesc>
> <shortdesc lang="en">Full path to Postfix binary</shortdesc>
> <content type="string" default="/usr/sbin/postfix" />
> </parameter>
>
> <parameter name="config_dir" unique="1" required="0">
> <longdesc lang="en">
> Full path to a Postfix configuration directory.
> For example, "/etc/postfix".
> </longdesc>
> <shortdesc lang="en">Full path to configuration directory</shortdesc>
> <content type="string" default="" />
> </parameter>
>
> <parameter name="parameters" unique="0" required="0">
> <longdesc lang="en">
> The Postfix daemon may be called with additional parameters.
> Specify any of them here.
> </longdesc>
> <shortdesc lang="en"></shortdesc>
> <content type="string" default="" />
> </parameter>
>
> </parameters>
>
> <actions>
> <action name="start" timeout="90" />
> <action name="stop" timeout="100" />
> <action name="reload" timeout="100" />
> <action name="monitor" depth="10" timeout="20s" interval="60s" start-delay="0" />
> <action name="validate-all" timeout="30s" />
> <action name="meta-data" timeout="5s" />
> </actions>
> </resource-agent>
> END
> }
>
> isRunning()
> {
> kill -0 "$1" 2>/dev/null
> }
>
> # running() has been copied from debian's init script. we enhanced it a bit
> # @TODO rb 2009-06-23 maybe try "postqueue -p 2>&1 | head -n1 | grep 'Mail system is down' && false
> # @TODO rb 2009-06-23 maybe try "$binary $OPTIONS status" instead?
> running() {
> pid_dir=`postconf $OPTION_CONFIG_DIR -h process_id_directory 2>/dev/null`
> pidfile="${queue}/${pid_dir}/master.pid"
> queue=`postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null`
> [ -z $queue ] && false # check if queue directory is empty @TODO shall we return false or $OCF_ERR_something
>
> if [ -f "${pidfile}" ]; then
> # @TODO Could the master process become zombie?
> pid=`cat ${pidfile}`
> if isRunning $pid; then
> # @TODO why does "true" not work here?
> #true
> return $OCF_SUCCESS
> fi
> fi
>
> # try some different methods to see if we can find a running postfix/master instance
> # postfix status
> $binary $OPTION_CONFIG_DIR status && return $OCF_SUCCESS
>
> # what does postqueue say?
> echo postqueue $OPTION_CONFIG_DIR -p 2>&1
> postqueue $OPTION_CONFIG_DIR -p 2>&1 | head -n1 | grep 'Mail system is down' && false
>
> # is there a master process holding the spool directory?
> fuser -v $queue 2>&1 | grep -w master && return $OCF_SUCCESS
>
>
> # Postfix is not running
> false
> }
>
>
> postfix_status()
> {
> running
> }
>
> postfix_start()
> {
> # if Postfix is running return success
> if postfix_status; then
> ocf_log info "Postfix already running."
> return $OCF_SUCCESS
> fi
>
> # start Postfix
> $binary $OPTIONS start >/dev/null 2>&1
> ret=$?
>
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix returned error." $ret
> return $OCF_ERR_GENERIC
> fi
>
> return $OCF_SUCCESS
> }
>
>
> postfix_stop()
> {
> $binary $OPTIONS stop >/dev/null 2>&1
> ret=$?
>
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix returned an error while stopping." $ret
> return $OCF_ERR_GENERIC
> fi
>
> # grant some time for shutdown and recheck 5 times
> for i in 1 2 3 4 5; do
> if postfix_status; then
> sleep 1
> fi
> done
>
> # escalate to abort if we did not stop by now
> # @TODO shall we loop here too?
> if postfix_status; then
> ocf_log err "Postfix failed to stop. Escalating to 'abort'"
>
> $binary $OPTIONS abort >/dev/null 2>&1; ret=$?
> sleep 5
> postfix_status && $OCF_ERR_GENERIC
> fi
>
> return $OCF_SUCCESS
> }
>
> postfix_reload()
> {
> if postfix_status; then
> ocf_log info "Reloading Postfix."
> $binary $OPTIONS reload
> fi
> }
>
> postfix_monitor()
> {
> if postfix_status; then
> return $OCF_SUCCESS
> fi
>
> return $OCF_NOT_RUNNING
> }
>
> postfix_validate_all()
> {
> # check that the Postfix binary exists and can be executed
> if [ ! -x "$binary" ]; then
> ocf_log err "Postfix binary '$binary' does not exist or cannot be executed."
> return $OCF_ERR_GENERIC
> fi
>
> # check config_dir and alternate_config_directories parameter
> if [ "x$config_dir" != "x" ]; then
> if [ ! -d "$config_dir" ]; then
> ocf_log err "Postfix configuration directory '$config_dir' does not exist." $ret
> return $OCF_ERR_GENERIC
> fi
>
> alternate_config_directories=`postconf -h alternate_config_directories 2>/dev/null | grep $config_dir`
> if [ "x$alternate_config_directories" = "x" ]; then
> ocf_log err "Postfix main configuration must contain correct 'alternate_config_directories' parameter."
> return $OCF_ERR_GENERIC
> fi
> fi
>
> # check spool/queue directory
> queue=`postconf $OPTION_CONFIG_DIR -h queue_directory 2>/dev/null`
> if [ ! -d "$queue" ]; then
> ocf_log err "Postfix spool/queue directory '$queue' does not exist." $ret
> return $OCF_ERR_GENERIC
> fi
>
> # run postfix internal check
> $binary $OPTIONS check >/dev/null 2>&1
> ret=$?
> if [ $ret -ne 0 ]; then
> ocf_log err "Postfix 'check' failed." $ret
> return $OCF_ERR_GENERIC
> fi
>
> return $OCF_SUCCESS
> }
>
> #
> # Main
> #
>
> if [ $# -ne 1 ]; then
> usage
> exit $OCF_ERR_ARGS
> fi
>
> binary=$OCF_RESKEY_binary
> config_dir=$OCF_RESKEY_config_dir
> parameters=$OCF_RESKEY_parameters
>
> # debugging stuff
> #echo OCF_RESKEY_binary=$OCF_RESKEY_binary >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
> #echo OCF_RESKEY_config_dir=$OCF_RESKEY_config_dir >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
> #echo OCF_RESKEY_parameters=$OCF_RESKEY_parameters >> /tmp/prox_conf_$OCF_RESOURCE_INSTANCE
>
>
> # build postfix options string *outside* to access from each method
> OPTIONS=''
> OPTION_CONFIG_DIR=''
>
> # check if the Postfix config_dir exist
> if [ "x$config_dir" != "x" ]; then
> # save OPTION_CONFIG_DIR seperatly
> OPTION_CONFIG_DIR="-c $config_dir"
> OPTIONS=$OPTION_CONFIG_DIR
> fi
>
> if [ "x$parameters" != "x" ]; then
> OPTIONS="$OPTIONS $parameters"
> fi
>
> case $1 in
> meta-data) meta_data
> exit $OCF_SUCCESS
> ;;
>
> usage|help) usage
> exit $OCF_SUCCESS
> ;;
> esac
>
> postfix_validate_all
> ret=$?
>
> #echo "debug[$1:$ret]"
> LSB_STATUS_STOPPED=3
> if [ $ret -ne $OCF_SUCCESS ]; then
> case $1 in
> stop) exit $OCF_SUCCESS ;;
> monitor) exit $OCF_NOT_RUNNING;;
> status) exit $LSB_STATUS_STOPPED;;
> *) exit $ret;;
> esac
> fi
>
> case $1 in
> monitor) postfix_monitor
> exit $?
> ;;
> start) postfix_start
> exit $?
> ;;
>
> stop) postfix_stop
> exit $?
> ;;
>
> reload) postfix_reload
> exit $?
> ;;
>
> status) if postfix_status; then
> ocf_log info "Postfix is running."
> exit $OCF_SUCCESS
> else
> ocf_log info "Postfix is stopped."
> exit $OCF_NOT_RUNNING
> fi
> ;;
>
> monitor) postfix_monitor
> exit $?
> ;;
>
> validate-all) exit $OCF_SUCCESS
> ;;
>
> *) usage
> exit $OCF_ERR_UNIMPLEMENTED
> ;;
> esac

> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.