Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Strange HB Status displayed for root vs. unprivileged users; bug or feature?

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


Ralph.Grothe at itdz-berlin

Jul 1, 2008, 7:04 AM

Post #1 of 2 (128 views)
Permalink
Strange HB Status displayed for root vs. unprivileged users; bug or feature?

Dear linux-ha subscribers,

we run an HB 2-node active-standby cluster, still with the legacy HB1 configuration.

The two nodes consist of 2 RHEL 5.1 x86_64 systems.

The used HB version is

# /usr/lib64/heartbeat/heartbeat -V
2.1.3

Before the OS upgrade of the nodes they used to run Fedora 3 i386 (the HB release I cannot remember).

I used to run a simple Nagios plugin script that I wrote,
which merely invokes the "heartbeat -s" command and has been issued through nrpe,
just to get alerted if heartbeat for whatever reason isn't running on one of the nodes
(which happened in the past, when a failover didn't take place as it should)
e.g.

# grep check_heartbeat /etc/nagios/nrpe.cfg
command[check_heartbeat]=/usr/lib64/nagios/plugins/custom/check_heartbeat.sh

This worked fine for the old OS (and probably old HB version that was used then).

After I had successfully upgraded this cluster to the new OS
I was wondering, why my Nagios plugin always returned CRITICAL states
though heartbeat was running on the node at the time.
Then I discovered that the output of my check command differed decisively depending on who executed the check.

e.g. as root I get

# /usr/lib64/nagios/plugins/custom/check_heartbeat.sh
OK - heartbeat is running on nodeA

or rather what really gets executed in that plugin and whose output merely gets parsed is

# /usr/lib64/heartbeat/heartbeat -s
heartbeat OK [pid 31017 et al] is running on nodeA [nodeA]...


# pgrep -P1 -fl heartbeat
31017 heartbeat: master control process


But when run as an unprivileged user, as is the case when the nrpe daemon is executing the check,
oops, I get this strange result

# /usr/lib64/nagios/plugins/check_nrpe -n -H localhost -c check_heartbeat
CRITICAL - heartbeat is stopped on nodeA

# echo $?
2

or, because nrpe is running as this user in reality

# runuser -s /bin/sh -l -c '/usr/lib64/heartbeat/heartbeat -s' munin
heartbeat is stopped. No process


How come, is this a bug or intended behavior?

I wonder if it then wasn't wiser for my simple Nagios plugin to just do something similar to this?

# pid=$(pgrep -P1 heartbeat) && printf "OK - Heartbeat (PID=%s) running\n" $pid
OK - Heartbeat (PID=31017) running


Regards


_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


linux-ha at mm

Jul 1, 2008, 8:55 AM

Post #2 of 2 (119 views)
Permalink
Re: Strange HB Status displayed for root vs. unprivileged users; bug or feature? [In reply to]

On Tue, Jul 01, 2008 at 04:04:54PM +0200, Ralph.Grothe[at]itdz-berlin.de wrote:
> After I had successfully upgraded this cluster to the new OS I was
> wondering, why my Nagios plugin always returned CRITICAL states
> though heartbeat was running on the node at the time.
> Then I discovered that the output of my check command differed
> decisively depending on who executed the check.
>
> e.g. as root I get
>
> # /usr/lib64/nagios/plugins/custom/check_heartbeat.sh
> OK - heartbeat is running on nodeA
>
> or rather what really gets executed in that plugin and whose
> output merely gets parsed is
>
> # /usr/lib64/heartbeat/heartbeat -s
> heartbeat OK [pid 31017 et al] is running on nodeA [nodeA]...
>
> # pgrep -P1 -fl heartbeat
> 31017 heartbeat: master control process
>
> But when run as an unprivileged user, as is the case when the nrpe
> daemon is executing the check, oops, I get this strange result
>
> # /usr/lib64/nagios/plugins/check_nrpe -n -H localhost -c check_heartbeat
> CRITICAL - heartbeat is stopped on nodeA
>
> How come, is this a bug or intended behavior?

I've just had a quick look through the source to see what the -s
flag actually does (I'll need to set up monitoring of heartbeat in
Nagios shortly, as it happens). It reads the PID file and then
checks if the process is running, and that the process with the PID
it's checking is actually heartbeat (by checking that its
/proc/.../exe is a link to the heartbeat binary).

On my system, even though the process directory and the symlinks
therein appear to be world-readable, they're not:

$ ls -la /proc/`sed 's/ *//' /var/run/heartbeat.pid`
ls: cannot read symbolic link /proc/18467/cwd: Permission denied
ls: cannot read symbolic link /proc/18467/root: Permission denied
ls: cannot read symbolic link /proc/18467/exe: Permission denied

When heartbeat tries to ascertain that the process running with that
particularly pid is actually heartbeat, it encounters an error and
therefore fails.

I'm not sure if this aspect of the proc filesystem's behaviour can
be adjusted, or if it's desirable to adjust it. So, I would suggest
one of:

1. Go with your approach of just checking the process listing
2. Set up sudo or similar so Nagios can do the check
3. Set up a scheduled job to do a check as root, and write the result
status code and a line of output to a file somewhere. Then the
Nagios check command can check that the status file was
updated recently, and if so use that for its own response.

I'll probably go with option #2 or #3, but I haven't really looked
into how exactly I'm going to ascertain that heartbeat is up and
running. Possibly I'll use crm_mon -1 and check that the expected
nodes are both online, and set a warning status if either is
offline (and critical if I can't work out their status at all).
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.