Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

Trouble with ocf:Squid resource agent

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


cornuwel at gmail

Jul 18, 2012, 1:06 AM

Post #1 of 9 (1554 views)
Permalink
Trouble with ocf:Squid resource agent

Hi,

I'm setting up a proxy cluster on OpenSuSE 12.1. Squid starts OK on
both servers when called from the lsb script. I stopped it and here is
the configuration I set up :

# crm configure show
node corsen-a
node corsen-b
primitive Proxy ocf:heartbeat:Squid \
params squid_exe="/usr/sbin/squid"
squid_conf="/etc/squid/squid.conf" squid_pidfile="/tmp/squid.pid"
squid_port="3128" squid_stop_timeout="30" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="120s" \
op monitor interval="20s" timeout="30s"
property $id="cib-bootstrap-options" \
dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"

# crm_mon -1
============
Last updated: Tue Jul 17 16:51:30 2012
Last change: Tue Jul 17 16:46:32 2012 by root via cibadmin on corsen-a
Stack: openais
Current DC: corsen-a - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ corsen-a corsen-b ]


Failed actions:
Proxy_start_0 (node=corsen-a, call=3, rc=-2, status=Timed Out):
unknown exec error
Proxy_start_0 (node=corsen-b, call=3, rc=-2, status=Timed Out):
unknown exec error


Not much in the logs. Any idea what I missed ?

Thanks in advance,

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


jsmith at argotec

Jul 18, 2012, 10:25 AM

Post #2 of 9 (1545 views)
Permalink
Re: Trouble with ocf:Squid resource agent [In reply to]

----- Original Message -----
> From: "Julien Cornuwel" <cornuwel [at] gmail>
> To: "The Pacemaker cluster resource manager" <pacemaker [at] oss>
> Sent: Wednesday, July 18, 2012 4:06:55 AM
> Subject: [Pacemaker] Trouble with ocf:Squid resource agent
>
> Hi,
>
> I'm setting up a proxy cluster on OpenSuSE 12.1. Squid starts OK on
> both servers when called from the lsb script. I stopped it and here
> is
> the configuration I set up :
>
> # crm configure show
> node corsen-a
> node corsen-b
> primitive Proxy ocf:heartbeat:Squid \
> params squid_exe="/usr/sbin/squid"
> squid_conf="/etc/squid/squid.conf" squid_pidfile="/tmp/squid.pid"
> squid_port="3128" squid_stop_timeout="30" \
> op start interval="0" timeout="60s" \
> op stop interval="0" timeout="120s" \
> op monitor interval="20s" timeout="30s"
> property $id="cib-bootstrap-options" \
> dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
>
> # crm_mon -1
> ============
> Last updated: Tue Jul 17 16:51:30 2012
> Last change: Tue Jul 17 16:46:32 2012 by root via cibadmin on
> corsen-a
> Stack: openais
> Current DC: corsen-a - partition with quorum
> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> 2 Nodes configured, 2 expected votes
> 1 Resources configured.
> ============
>
> Online: [ corsen-a corsen-b ]
>
>
> Failed actions:
> Proxy_start_0 (node=corsen-a, call=3, rc=-2, status=Timed Out):
> unknown exec error
> Proxy_start_0 (node=corsen-b, call=3, rc=-2, status=Timed Out):
> unknown exec error
>

With the "status=Timed Out" I'm thinking that you're setting of 60s for start timeout might be too short? How long does it take to return if you start squid from the LSB script? How long after starting until the squid.pid is created (is it even created?)?

HTH

Jake

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


cornuwel at gmail

Jul 19, 2012, 7:34 AM

Post #3 of 9 (1520 views)
Permalink
Re: Trouble with ocf:Squid resource agent [In reply to]

2012/7/18 Jake Smith <jsmith [at] argotec>:
> With the "status=Timed Out" I'm thinking that you're setting of 60s for start timeout might be too short? How long does it take to return if you start squid from the LSB script? How long after starting until the squid.pid is created (is it even created?)?

From the LSB script, it takes about two seconds and the pidfile is
created immediately.
From pacemaker, the pidfile isn't created at all.

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


cornuwel at gmail

Jul 24, 2012, 9:30 AM

Post #4 of 9 (1495 views)
Permalink
Re: Trouble with ocf:Squid resource agent [In reply to]

Hi,

Fixed! The problem comes from the squid ocf script
(/usr/lib/ocf/resource.d/heartbeat/Squid) that doesn't handle IPv6
addresses correctly.
All you have to do is modify the line 198 as such :
awk '/(tcp.*[0-9]+\.[0-9]+\.+[0-9]+\.[0-9]+:'$SQUID_PORT'
|tcp.*:::'$SQUID_PORT' )/{

Source: http://www.n3oxid.fr/index.php?post/2012/04/07/Installation-et-configuration-d-un-cluster-Pacemaker/CoroSync-sous-GNU/Linux-Debian-6-%28Squeeze%29

Cheers,

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


cornuwel at gmail

Jul 25, 2012, 2:51 AM

Post #5 of 9 (1488 views)
Permalink
Re: Trouble with ocf:Squid resource agent [In reply to]

Oops! Spoke too fast. The fix below allows squid to start. But the
script also has problems in the 'stop' part. It is stuck in an
infinite loop and here are the logs (repeats every second) :

Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
(Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
320: kill: -: arguments must be process or job IDs
Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
(Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
320: kill: -: arguments must be process or job IDs
Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
squid:stop_squid:318: try to stop by SIGKILL: -
Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
squid:stop_squid:318: try to stop by SIGKILL: -

Being on a deadline, I'll use the lsb script for the moment. If
someone figures out how to use this ocf script, I'm very interrested.

Regards


2012/7/24 Julien Cornuwel <cornuwel [at] gmail>:
> Hi,
>
> Fixed! The problem comes from the squid ocf script
> (/usr/lib/ocf/resource.d/heartbeat/Squid) that doesn't handle IPv6
> addresses correctly.
> All you have to do is modify the line 198 as such :
> awk '/(tcp.*[0-9]+\.[0-9]+\.+[0-9]+\.[0-9]+:'$SQUID_PORT'
> |tcp.*:::'$SQUID_PORT' )/{
>
> Source: http://www.n3oxid.fr/index.php?post/2012/04/07/Installation-et-configuration-d-un-cluster-Pacemaker/CoroSync-sous-GNU/Linux-Debian-6-%28Squeeze%29
>
> Cheers,

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


jsmith at argotec

Jul 30, 2012, 9:09 AM

Post #6 of 9 (1473 views)
Permalink
Re: Trouble with ocf:Squid resource agent [In reply to]

----- Original Message -----
> From: "Julien Cornuwel" <cornuwel [at] gmail>
> To: pacemaker [at] oss
> Sent: Wednesday, July 25, 2012 5:51:28 AM
> Subject: Re: [Pacemaker] Trouble with ocf:Squid resource agent
>
> Oops! Spoke too fast. The fix below allows squid to start. But the
> script also has problems in the 'stop' part. It is stuck in an
> infinite loop and here are the logs (repeats every second) :
>
> Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
> (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
> 320: kill: -: arguments must be process or job IDs
> Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
> (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
> 320: kill: -: arguments must be process or job IDs
> Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
> squid:stop_squid:318: try to stop by SIGKILL: -
> Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
> squid:stop_squid:318: try to stop by SIGKILL: -
>
> Being on a deadline, I'll use the lsb script for the moment. If
> someone figures out how to use this ocf script, I'm very interrested.
>

I took a quick look at the OCF... here's the stop section with inline comments from me (###)

stop_squid()
{
typeset lapse_sec

if ocf_run $SQUID_EXE -f $SQUID_CONF -k shutdown; then
lapse_sec=0
while true; do
get_pids
if is_squid_dead; then
rm -f $SQUID_PIDFILE
return $OCF_SUCCESS
fi
(( lapse_sec = lapse_sec + 1 ))
if (( lapse_sec > SQUID_STOP_TIMEOUT )); then

### looks to me like you're hitting the line above which then breaks out and drops down to the "while true" 8 lines down. I would time a manual stop of squid (I know it takes quite awhile) and make sure you're primitive's "op stop interval="0" timeout="120s"" is set high enough (definately more than 120s I would assume) that the elapsed time to stop squid doesn't normally exceed the timeout value.

break
fi
sleep 1
ocf_log info "$SQUID_NAME:$FUNCNAME:$LINENO: " \
"stop NORM $lapse_sec/$SQUID_STOP_TIMEOUT"
done
fi

while true; do
get_pids
ocf_log info "$SQUID_NAME:$FUNCNAME:$LINENO: " \
"try to stop by SIGKILL:${SQUID_PIDS[0]} ${SQUID_PIDS[2]}"
kill -KILL ${SQUID_PIDS[0]} ${SQUID_PIDS[2]}

### have you tried manually running the above line and see what you get (inserting the correct PID's of course)? Maybe the kill -KILL syntax is invalid for your flavor of linux and the OCF needs to be updated to take that into account when running the kill command? Even if you increase the timeout above to a normally reasonable value you still want it to be able to kill it if it is unresponsive!

sleep 1
if is_squid_dead; then
rm -f $SQUID_PIDFILE
return $OCF_SUCCESS
fi
done

return $OCF_ERR_GENERIC
}


> Regards
>
>
> 2012/7/24 Julien Cornuwel <cornuwel [at] gmail>:
> > Hi,
> >
> > Fixed! The problem comes from the squid ocf script
> > (/usr/lib/ocf/resource.d/heartbeat/Squid) that doesn't handle IPv6
> > addresses correctly.
> > All you have to do is modify the line 198 as such :
> > awk '/(tcp.*[0-9]+\.[0-9]+\.+[0-9]+\.[0-9]+:'$SQUID_PORT'
> > |tcp.*:::'$SQUID_PORT' )/{
> >
> > Source:
> > http://www.n3oxid.fr/index.php?post/2012/04/07/Installation-et-configuration-d-un-cluster-Pacemaker/CoroSync-sous-GNU/Linux-Debian-6-%28Squeeze%29
> >

Not sure if the above fully patches the OCF for squid ipv4 and ipv6 but I would recommend submitting a patch against the resource agent so in the future it just works ;-)

HTH
Jake

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


dejanmm at fastmail

Aug 13, 2012, 5:07 AM

Post #7 of 9 (1433 views)
Permalink
Re: Trouble with ocf:Squid resource agent [In reply to]

Hi,

On Mon, Jul 30, 2012 at 12:09:10PM -0400, Jake Smith wrote:
>
> ----- Original Message -----
> > From: "Julien Cornuwel" <cornuwel [at] gmail>
> > To: pacemaker [at] oss
> > Sent: Wednesday, July 25, 2012 5:51:28 AM
> > Subject: Re: [Pacemaker] Trouble with ocf:Squid resource agent
> >
> > Oops! Spoke too fast. The fix below allows squid to start. But the
> > script also has problems in the 'stop' part. It is stuck in an
> > infinite loop and here are the logs (repeats every second) :
> >
> > Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
> > (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
> > 320: kill: -: arguments must be process or job IDs
> > Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
> > (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
> > 320: kill: -: arguments must be process or job IDs
> > Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
> > squid:stop_squid:318: try to stop by SIGKILL: -
> > Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
> > squid:stop_squid:318: try to stop by SIGKILL: -
> >
> > Being on a deadline, I'll use the lsb script for the moment. If
> > someone figures out how to use this ocf script, I'm very interrested.
> >
>
> I took a quick look at the OCF... here's the stop section with inline comments from me (###)
>
> stop_squid()
> {
> typeset lapse_sec
>
> if ocf_run $SQUID_EXE -f $SQUID_CONF -k shutdown; then
> lapse_sec=0
> while true; do
> get_pids
> if is_squid_dead; then
> rm -f $SQUID_PIDFILE
> return $OCF_SUCCESS
> fi
> (( lapse_sec = lapse_sec + 1 ))
> if (( lapse_sec > SQUID_STOP_TIMEOUT )); then
>
> ### looks to me like you're hitting the line above which then breaks out and drops down to the "while true" 8 lines down. I would time a manual stop of squid (I know it takes quite awhile) and make sure you're primitive's "op stop interval="0" timeout="120s"" is set high enough (definately more than 120s I would assume) that the elapsed time to stop squid doesn't normally exceed the timeout value.
>
> break
> fi
> sleep 1
> ocf_log info "$SQUID_NAME:$FUNCNAME:$LINENO: " \
> "stop NORM $lapse_sec/$SQUID_STOP_TIMEOUT"
> done
> fi
>
> while true; do
> get_pids
> ocf_log info "$SQUID_NAME:$FUNCNAME:$LINENO: " \
> "try to stop by SIGKILL:${SQUID_PIDS[0]} ${SQUID_PIDS[2]}"
> kill -KILL ${SQUID_PIDS[0]} ${SQUID_PIDS[2]}
>
> ### have you tried manually running the above line and see what you get (inserting the correct PID's of course)? Maybe the kill -KILL syntax is invalid for your flavor of linux and the OCF needs to be updated to take that into account when running the kill command? Even if you increase the timeout above to a normally reasonable value you still want it to be able to kill it if it is unresponsive!
>
> sleep 1
> if is_squid_dead; then
> rm -f $SQUID_PIDFILE
> return $OCF_SUCCESS
> fi
> done
>
> return $OCF_ERR_GENERIC
> }
>
>
> > Regards
> >
> >
> > 2012/7/24 Julien Cornuwel <cornuwel [at] gmail>:
> > > Hi,
> > >
> > > Fixed! The problem comes from the squid ocf script
> > > (/usr/lib/ocf/resource.d/heartbeat/Squid) that doesn't handle IPv6
> > > addresses correctly.
> > > All you have to do is modify the line 198 as such :
> > > awk '/(tcp.*[0-9]+\.[0-9]+\.+[0-9]+\.[0-9]+:'$SQUID_PORT'
> > > |tcp.*:::'$SQUID_PORT' )/{
> > >
> > > Source:
> > > http://www.n3oxid.fr/index.php?post/2012/04/07/Installation-et-configuration-d-un-cluster-Pacemaker/CoroSync-sous-GNU/Linux-Debian-6-%28Squeeze%29
> > >
>
> Not sure if the above fully patches the OCF for squid ipv4 and ipv6 but I would recommend submitting a patch against the resource agent so in the future it just works ;-)

Yes. If somebody opens a bugzilla at LF
(https://developerbugs.linuxfoundation.org/) or an issue at
https://github.com/ClusterLabs/resource-agents somebody
(hopefully the author) will take care of it.

Thanks,

Dejan

> HTH
> Jake
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


lars.ellenberg at linbit

Feb 8, 2013, 2:21 AM

Post #8 of 9 (1051 views)
Permalink
Re: Trouble with ocf:Squid resource agent [In reply to]

On Mon, Aug 13, 2012 at 02:07:46PM +0200, Dejan Muhamedagic wrote:
> Hi,
>
> On Mon, Jul 30, 2012 at 12:09:10PM -0400, Jake Smith wrote:
> >
> > ----- Original Message -----
> > > From: "Julien Cornuwel" <cornuwel [at] gmail>
> > > To: pacemaker [at] oss
> > > Sent: Wednesday, July 25, 2012 5:51:28 AM
> > > Subject: Re: [Pacemaker] Trouble with ocf:Squid resource agent
> > >
> > > Oops! Spoke too fast. The fix below allows squid to start. But the
> > > script also has problems in the 'stop' part. It is stuck in an
> > > infinite loop and here are the logs (repeats every second) :
> > >
> > > Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
> > > (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
> > > 320: kill: -: arguments must be process or job IDs
> > > Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
> > > (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
> > > 320: kill: -: arguments must be process or job IDs
> > > Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
> > > squid:stop_squid:318: try to stop by SIGKILL: -
> > > Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
> > > squid:stop_squid:318: try to stop by SIGKILL: -
> > >
> > > Being on a deadline, I'll use the lsb script for the moment. If
> > > someone figures out how to use this ocf script, I'm very interrested.
> > >

Did you try to use the current version of the script?

It very much looks like you miss out on this fix:

commit cbf70945f162aa296dacfc07817f1764a76e412e
Author: Dejan Muhamedagic <dejan [at] suse>
Date: Mon Oct 1 12:43:29 2012 +0200

Medium: Squid: fix getting PIDs of squid processes (lf#2653)

See
https://github.com/ClusterLabs/resource-agents/commit/cbf70945f162aa296dacfc07817f1764a76e412e

(and some other fixes that come later!)

> > > > Fixed! The problem comes from the squid ocf script
> > > > (/usr/lib/ocf/resource.d/heartbeat/Squid) that doesn't handle IPv6
> > > > addresses correctly.
> > > > All you have to do is modify the line 198 as such :
> > > > awk '/(tcp.*[0-9]+\.[0-9]+\.+[0-9]+\.[0-9]+:'$SQUID_PORT'
> > > > |tcp.*:::'$SQUID_PORT' )/{

This is supposed to be fixed as well
in the current version of that script...

> Yes. If somebody opens a bugzilla at LF
> (https://developerbugs.linuxfoundation.org/) or an issue at
> https://github.com/ClusterLabs/resource-agents somebody
> (hopefully the author) will take care of it.

As I wrote, I think both of these are already fixed.

Please use resource-agents v3.9.5.

Lars

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


lars.ellenberg at linbit

Feb 8, 2013, 2:26 AM

Post #9 of 9 (1038 views)
Permalink
Re: Trouble with ocf:Squid resource agent [In reply to]

On Fri, Feb 08, 2013 at 11:21:15AM +0100, Lars Ellenberg wrote:
> On Mon, Aug 13, 2012 at 02:07:46PM +0200, Dejan Muhamedagic wrote:

Appologies, I did not look at the date of the Post.
For some reason it appeart as "first unread", and I assumed it was
recent. D'oh.

:-)

> Please use resource-agents v3.9.5.

Lars


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.