Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Dev

[PATCH] Low: ldirectord: don't leave open pipes behinds and kill child on timeout

 

 

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded


heder at google

Jul 22, 2009, 6:33 AM

Post #1 of 3 (770 views)
Permalink
[PATCH] Low: ldirectord: don't leave open pipes behinds and kill child on timeout

This patch solves two problems:

- When external checks timeout frequently ldirectord runs out of
file handles, because when alarm()-ing out of Perls system() an
internal pipe is not reclaimed. This is at least true for Perl
5.8.8 on Linux, but likely to affect other versions of Perl and
systems as well. The pipe is used to send errno(3) from the
child to the parent, in case internal exec(3) fails.

- Do not leave child behind on timeout, TERMinate it.

How are they solved?

- Emulate Perls system() via Perls fork(), exec(), and waitpid() and
by TERMinating the child on timeout.

Signed-off-by: Hannes Eder <heder [at] google>
---

diff -r 5cd1d56b330e ldirectord/ldirectord.in
--- a/ldirectord/ldirectord.in Tue Jul 21 19:58:42 2009 +0200
+++ b/ldirectord/ldirectord.in Wed Jul 22 14:35:51 2009 +0200
@@ -1095,6 +1095,7 @@
sub ld_handler_chld
{
$DAEMON_CHLD=1;
+ # NOTE: calling waitpid here would mess up $?
}

sub ld_process_chld
@@ -3185,26 +3186,19 @@
sub check_external
{
my ($v, $r) = @_;
- my $result;
my $v_server;
- my $timed_out = 1;
-
- eval {
- local $SIG{'__DIE__'} = "DEFAULT";
- local $SIG{'ALRM'} = sub { die "Timeout Alarm" };
- &ld_debug(4, "Timeout is $$v{checktimeout}");
- alarm $$v{checktimeout};
- if (defined $$v{server}) {
- $v_server = $$v{server};
- } else {
- $v_server = $$v{fwm};
- }
- $result = system_wrapper($$v{checkcommand}, $v_server,
- $$v{port}, $$r{server}, $$r{port});
- alarm 0;
- $result >>= 8;
- };
- if ($@ or $result != 0) {
+
+ if (defined $$v{server}) {
+ $v_server = $$v{server};
+ } else {
+ $v_server = $$v{fwm};
+ }
+
+ my $result = system_timeout($$v{checktimeout},
+ $$v{checkcommand}, $v_server, $$v{port},
+ $$r{server}, $$r{port});
+
+ if ($result) {
&service_set($v, $r, "down", {do_log => 1});
&ld_debug(3, "Deactivated service $$r{server}:$$r{port}: " .
"$@ after calling $$v{checkcommand} with result " .
@@ -4395,6 +4389,11 @@

# system_wrapper
# Wrapper around system() to log errors
+#
+# WARNING: Do not use alarm() together with this function. A internal
+# pipe will not be reclaimed (at least with Perl 5.8.8). This can
+# cause ldirectord to run out of file handles.
+#
# pre: LIST: arguments to pass to system()
# post: system() is called and if it returns non-zero a failure
# message is logged
@@ -4414,6 +4413,70 @@
return($status)
}

+# system_timeout
+# Emulate system() with timeout via fork(), exec(), and waitpid() and
+# TERMinate the child on timeout. Set an alarm() for the timeout.
+#
+# This function does not suffer the deficiencies of system_wrapper()
+# of leaving pipes unreclaimed. Zombies are reaped by ld_handler_chld
+# and the related code.
+#
+# pre: timeout: timeout in seconds
+# LIST: arguments to pass to exec()
+# return: >= 0 exit status of the child process
+# 127 exec failed
+# -1 timeout
+# -2 fork failed
+sub system_timeout
+{
+ my $timeout = shift;
+ my (@args) = (@_);
+ my $status;
+
+ &ld_log("Running system_timeout($timeout, @args)") if $DEBUG>2;
+
+ my $childpid = fork();
+ if (!defined($childpid)) {
+ &ld_log("fork failed: $!");
+ return(-2);
+ }
+ elsif ($childpid) {
+ # parent
+ eval {
+ local $SIG{'ALRM'} = sub { die "timeout\n"; };
+ alarm $timeout;
+ waitpid($childpid, 0);
+ $status = $? >> 8;
+ # When die()-ing in the SIGALRM handler we
+ # will never reach this point. Child/Zombie
+ # is left behind. The grim reaper
+ # (ld_handler_chld + ld_process_chld) will
+ # take care of the zombie.
+ };
+ alarm 0;
+ if ($@) {
+ # timeout
+ if ($@ ne "timeout\n") {
+ # propagate unexpected errors
+ &ld_log("system_timeout unexpected error: $!");
+ }
+
+ # TERMinate child
+ kill 15, $childpid;
+ return(-1);
+ }
+ else {
+ # did not timeout
+ return($status);
+ }
+ }
+ else {
+ # child
+ exec(@args) or &ld_exit(127, "exec(@args) failed: $!");
+ die "ld_exit() broken?, stopped";
+ }
+}
+
# exec_wrapper
# Wrapper around exec() to log errors
# pre: LIST: arguments to pass to exec()
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


horms at verge

Aug 3, 2009, 4:45 PM

Post #2 of 3 (658 views)
Permalink
Re: [PATCH] Low: ldirectord: don't leave open pipes behinds and kill child on timeout [In reply to]

On Wed, Jul 22, 2009 at 03:33:16PM +0200, Hannes Eder wrote:
> This patch solves two problems:
>
> - When external checks timeout frequently ldirectord runs out of
> file handles, because when alarm()-ing out of Perls system() an
> internal pipe is not reclaimed. This is at least true for Perl
> 5.8.8 on Linux, but likely to affect other versions of Perl and
> systems as well. The pipe is used to send errno(3) from the
> child to the parent, in case internal exec(3) fails.
>
> - Do not leave child behind on timeout, TERMinate it.
>
> How are they solved?
>
> - Emulate Perls system() via Perls fork(), exec(), and waitpid() and
> by TERMinating the child on timeout.
>
> Signed-off-by: Hannes Eder <heder [at] google>


Hi Hannes,

Thanks, and sorry for the delay in getting back to yuo.

The change looks good to me - though its kind of
disturbing that Perl needs such workarounds.
I've applied the change to the dev tree on hg.linux-ha.org

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lars.ellenberg at linbit

Aug 4, 2009, 2:51 AM

Post #3 of 3 (658 views)
Permalink
Re: [PATCH] Low: ldirectord: don't leave open pipes behinds and kill child on timeout [In reply to]

On Wed, Jul 22, 2009 at 03:33:16PM +0200, Hannes Eder wrote:
> This patch solves two problems:
>
> - When external checks timeout frequently ldirectord runs out of
> file handles, because when alarm()-ing out of Perls system() an
> internal pipe is not reclaimed. This is at least true for Perl
> 5.8.8 on Linux, but likely to affect other versions of Perl and
> systems as well. The pipe is used to send errno(3) from the
> child to the parent, in case internal exec(3) fails.
>
> - Do not leave child behind on timeout, TERMinate it.
>
> How are they solved?
>
> - Emulate Perls system() via Perls fork(), exec(), and waitpid() and
> by TERMinating the child on timeout.

nice catch.

But:

> +sub system_timeout
> +{

...
> + alarm 0;
> + if ($@) {
> + # timeout
> + if ($@ ne "timeout\n") {
> + # propagate unexpected errors

you don't propagate, you only log. maybe you want to log the @arg
along, so the error can be matched to the invocation?



BUG (typo):
this should be $@, not $!, right?
> + &ld_log("system_timeout unexpected error: $!");



> + }

maybe you want to log "timed out" as well?
else {
&ld_log("system_timeout(@args) timed out, kill -TERM child.");
}
(or something like that)

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.