Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

RE: Strange HB Status displayed for root vs. unprivilegedusers; bug or feature?

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


Ralph.Grothe at itdz-berlin

Jul 1, 2008, 10:51 PM

Post #1 of 2 (90 views)
Permalink
RE: Strange HB Status displayed for root vs. unprivilegedusers; bug or feature?

Hi Michael,

many thanks for the quick reply.

>
> I've just had a quick look through the source to see what the -s
> flag actually does (I'll need to set up monitoring of heartbeat in
> Nagios shortly, as it happens). It reads the PID file and then
> checks if the process is running, and that the process with the PID
> it's checking is actually heartbeat (by checking that its
> /proc/.../exe is a link to the heartbeat binary).
>

Yes, of course that's one of the advantages of open source
that you always can look at them, which I have forgotten,
but a strace for the syscalls open (and kill, notice the SIG_0 which is the check if the proc is still alive as it looks)
reveals what files need to be readable.

# strace -e trace=open,kill /usr/lib64/heartbeat/heartbeat -s 2>&1|grep -A3 \.pid
open("/var/run/heartbeat.pid", O_RDONLY) = 3
kill(31017, SIG_0) = 0
open("/usr/lib64/pils/plugins/InterfaceMgr/generic.so", O_RDONLY) = 3
open("/etc/ha.d/nodeinfo", O_RDONLY) = -1 ENOENT (No such file or directory)
heartbeat OK [pid 31017 et al] is running on nodeA [nodeA]...


>
> On my system, even though the process directory and the symlinks
> therein appear to be world-readable, they're not:
>

This seems to be similar on my system.

While the PID file is world-readable

# ls -l /var/run/heartbeat.pid
-rw-r--r-- 1 root root 11 Jun 27 10:36 /var/run/heartbeat.pid

some of the symlinks and other files in the proc's procfs "subdir" (here restricted to 1st subdir level for brevity) aren't


# tr -d \\040 < /var/run/heartbeat.pid|xargs -iPID find /proc/PID -maxdepth 1 -follow ! -perm -004 -ls
2032730121 0 dr-x------ 2 root root 0 Jul 1 08:13 /proc/31017/fd
2032730122 0 -r-------- 1 root root 0 Jul 2 07:09 /proc/31017/environ
2032730123 0 -r-------- 1 root root 0 Jul 2 07:09 /proc/31017/auxv
2032730117 0 -rw------- 1 root root 0 Jul 2 07:09 /proc/31017/mem
10247 1 drwx------ 2 root root 1024 Feb 6 13:18 /proc/31017/cwd
2032730130 0 -r-------- 1 root root 0 Jul 2 07:09 /proc/31017/mountstats
2032730132 0 -r-------- 1 root root 0 Jul 2 07:09 /proc/31017/smaps



But I don't beleive that any missing rights on /proc/31017 are causing the problem here
but the kill() syscall, as seen in strace's dump.
Afaik, only root or the process owner may signal the proc
even though there is only the harmless SIG_0 involved.
I think the developers deemed this way of checking the validity of a possibly stale PID as read from the pid file
much terser than fumbling with pstat() structures, or whatever the Linux syscall equivalent may be.

> 2. Set up sudo or similar so Nagios can do the check

This is what I did, which was the most straight forward method,
especially since I already applied a sudo ruleset for that user munin
to be able to run a few Munin plugins which require elevated privileges as well.


What only puzzles me is that my check_heartbeat.sh "plugin" worked together with the former Heartbeat installation
without requiring any quirks.

Regards
Ralph

> -----Original Message-----
> From: linux-ha-bounces[at]lists.linux-ha.org
> [mailto:linux-ha-bounces[at]lists.linux-ha.org]On Behalf Of Michael Alger
> Sent: Tuesday, July 01, 2008 5:56 PM
> To: General Linux-HA mailing list
> Subject: Re: [Linux-HA] Strange HB Status displayed for root vs.
> unprivilegedusers; bug or feature?
>
>
> On Tue, Jul 01, 2008 at 04:04:54PM +0200,
> Ralph.Grothe[at]itdz-berlin.de wrote:
> > After I had successfully upgraded this cluster to the new OS I was
> > wondering, why my Nagios plugin always returned CRITICAL states
> > though heartbeat was running on the node at the time.
> > Then I discovered that the output of my check command differed
> > decisively depending on who executed the check.
> >
> > e.g. as root I get
> >
> > # /usr/lib64/nagios/plugins/custom/check_heartbeat.sh
> > OK - heartbeat is running on nodeA
> >
> > or rather what really gets executed in that plugin and whose
> > output merely gets parsed is
> >
> > # /usr/lib64/heartbeat/heartbeat -s
> > heartbeat OK [pid 31017 et al] is running on nodeA [nodeA]...
> >
> > # pgrep -P1 -fl heartbeat
> > 31017 heartbeat: master control process
> >
> > But when run as an unprivileged user, as is the case when the nrpe
> > daemon is executing the check, oops, I get this strange result
> >
> > # /usr/lib64/nagios/plugins/check_nrpe -n -H localhost -c
> check_heartbeat
> > CRITICAL - heartbeat is stopped on nodeA
> >
> > How come, is this a bug or intended behavior?
>
> I've just had a quick look through the source to see what the -s
> flag actually does (I'll need to set up monitoring of heartbeat in
> Nagios shortly, as it happens). It reads the PID file and then
> checks if the process is running, and that the process with the PID
> it's checking is actually heartbeat (by checking that its
> /proc/.../exe is a link to the heartbeat binary).
>
> On my system, even though the process directory and the symlinks
> therein appear to be world-readable, they're not:
>
> $ ls -la /proc/`sed 's/ *//' /var/run/heartbeat.pid`
> ls: cannot read symbolic link /proc/18467/cwd: Permission denied
> ls: cannot read symbolic link /proc/18467/root: Permission denied
> ls: cannot read symbolic link /proc/18467/exe: Permission denied
>
> When heartbeat tries to ascertain that the process running with that
> particularly pid is actually heartbeat, it encounters an error and
> therefore fails.
>
> I'm not sure if this aspect of the proc filesystem's behaviour can
> be adjusted, or if it's desirable to adjust it. So, I would suggest
> one of:
>
> 1. Go with your approach of just checking the process listing
> 2. Set up sudo or similar so Nagios can do the check
> 3. Set up a scheduled job to do a check as root, and write the result
> status code and a line of output to a file somewhere. Then the
> Nagios check command can check that the status file was
> updated recently, and if so use that for its own response.
>
> I'll probably go with option #2 or #3, but I haven't really looked
> into how exactly I'm going to ascertain that heartbeat is up and
> running. Possibly I'll use crm_mon -1 and check that the expected
> nodes are both online, and set a warning status if either is
> offline (and critical if I can't work out their status at all).
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


misch at multinet

Jul 2, 2008, 1:31 AM

Post #2 of 2 (85 views)
Permalink
Re: Strange HB Status displayed for root vs. unprivilegedusers; bug or feature? [In reply to]

Am Mittwoch, 2. Juli 2008 07:51 schrieb Ralph.Grothe[at]itdz-berlin.de:
(...) [Long discussion, shortened to save bandwidth]

Why do you folks do not use plain SNMP? heartbeat has a wonderful subagent!
SNMP is internet standard (RFC), everywhere implemented and platform
independend! Contrary to your own nagios installation.

Greetings,

--
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: misch[at]multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.