Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Dev

cl_make_realtime() used by too few processes?

 

 

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded


lmb at suse

Feb 18, 2007, 9:42 AM

Post #1 of 10 (1023 views)
Permalink
cl_make_realtime() used by too few processes?

5 0 1781 1 -2 - 12108 12108 - SLs ? 0:00 heartbeat: master control process
5 65534 1784 1781 -2 - 5512 5512 pipe_w SL ? 0:00 heartbeat: FIFO reader
5 65534 1785 1781 -2 - 5508 5508 - SL ? 0:00 heartbeat: write: mcast eth0
5 65534 1786 1781 -2 - 5508 5508 334996 SL ? 0:00 heartbeat: read: mcast eth0
4 90 1808 1781 16 0 4804 1424 - S ? 0:00 /usr/lib/heartbeat/ccm
4 90 1809 1781 16 0 7596 3616 - S ? 0:20 /usr/lib/heartbeat/cib
4 65534 1810 1781 16 0 4828 1928 - S ? 0:00 /usr/lib/heartbeat/lrmd -r
4 65534 1811 1781 16 0 4592 4592 - SL ? 0:00 /usr/lib/heartbeat/stonithd
4 90 1812 1781 16 0 4504 1496 - S ? 0:00 /usr/lib/heartbeat/attrd
4 90 1813 1781 16 0 5692 2840 - S ? 0:01 /usr/lib/heartbeat/crmd
4 0 1814 1781 16 0 6068 2316 - S ? 0:00 /usr/lib/heartbeat/mgmtd -v
0 90 2873 1813 16 0 5684 2740 - S ? 0:00 /usr/lib/heartbeat/tengine
0 90 2874 1813 16 0 6180 3196 - S ? 0:01 /usr/lib/heartbeat/pengine


mgmtd, lrmd, pengine, tengine not using cl_make_realtime() I
understand.

ccm, crmd, probably cib should, no?


Sincerely,
Lars

--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


alanr at unix

Feb 20, 2007, 3:19 PM

Post #2 of 10 (976 views)
Permalink
Re: cl_make_realtime() used by too few processes? [In reply to]

Lars Marowsky-Bree wrote:
> 5 0 1781 1 -2 - 12108 12108 - SLs ? 0:00 heartbeat: master control process
> 5 65534 1784 1781 -2 - 5512 5512 pipe_w SL ? 0:00 heartbeat: FIFO reader
> 5 65534 1785 1781 -2 - 5508 5508 - SL ? 0:00 heartbeat: write: mcast eth0
> 5 65534 1786 1781 -2 - 5508 5508 334996 SL ? 0:00 heartbeat: read: mcast eth0
> 4 90 1808 1781 16 0 4804 1424 - S ? 0:00 /usr/lib/heartbeat/ccm
> 4 90 1809 1781 16 0 7596 3616 - S ? 0:20 /usr/lib/heartbeat/cib
> 4 65534 1810 1781 16 0 4828 1928 - S ? 0:00 /usr/lib/heartbeat/lrmd -r
> 4 65534 1811 1781 16 0 4592 4592 - SL ? 0:00 /usr/lib/heartbeat/stonithd
> 4 90 1812 1781 16 0 4504 1496 - S ? 0:00 /usr/lib/heartbeat/attrd
> 4 90 1813 1781 16 0 5692 2840 - S ? 0:01 /usr/lib/heartbeat/crmd
> 4 0 1814 1781 16 0 6068 2316 - S ? 0:00 /usr/lib/heartbeat/mgmtd -v
> 0 90 2873 1813 16 0 5684 2740 - S ? 0:00 /usr/lib/heartbeat/tengine
> 0 90 2874 1813 16 0 6180 3196 - S ? 0:01 /usr/lib/heartbeat/pengine
>
>
> mgmtd, lrmd, pengine, tengine not using cl_make_realtime() I
> understand.
>
> ccm, crmd, probably cib should, no?


Probably not. using cl_make_realtime() requires that the programs be
EXTREMELY well-behaved. I'm not criticizing that software, but you
REALLY want to minimize the number of processes that use it - just
because bugs in the code become system lockups. Really bad news...

--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lmb at suse

Feb 22, 2007, 2:46 AM

Post #3 of 10 (972 views)
Permalink
Re: cl_make_realtime() used by too few processes? [In reply to]

On 2007-02-20T16:19:55, Alan Robertson <alanr [at] unix> wrote:

> Probably not. using cl_make_realtime() requires that the programs be
> EXTREMELY well-behaved. I'm not criticizing that software, but you
> REALLY want to minimize the number of processes that use it - just
> because bugs in the code become system lockups. Really bad news...

I know, but if we do not have the full stack required to fail-over
locked, we might as well disable stonithd as well, for example.

I think ccmd and crmd both are timing critical as well; they need to
respond to voting decisions.


--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


alanr at unix

Feb 22, 2007, 5:41 AM

Post #4 of 10 (971 views)
Permalink
Re: cl_make_realtime() used by too few processes? [In reply to]

Lars Marowsky-Bree wrote:
> On 2007-02-20T16:19:55, Alan Robertson <alanr [at] unix> wrote:
>
>> Probably not. using cl_make_realtime() requires that the programs be
>> EXTREMELY well-behaved. I'm not criticizing that software, but you
>> REALLY want to minimize the number of processes that use it - just
>> because bugs in the code become system lockups. Really bad news...
>
> I know, but if we do not have the full stack required to fail-over
> locked, we might as well disable stonithd as well, for example.
>
> I think ccmd and crmd both are timing critical as well; they need to
> respond to voting decisions.

But, they are MUCH less timing critical.

One should be able to trust that they in turn are being monitored, and
will be restarted if they misbehave. [If that isn't an true, then it
should be ;-)].

As a result, although failovers might be _slow_ in a serious memory
overload condition, they should still occur. One could also lock a
process into memory without setting realtime priority - which would fix
the problem in a much less dangerous way (cl_make_realtime can do that).
Unfortunately, it ALSO means these process would have to be started as
root :-(

The timeouts at these higher levels should be MUCH larger than deadtime
- at least double or triple, maybe even an order of magnitude higher --
for exactly this reason.

[.[.I know about (for example) the bug in the CCM - where it isn't
treating timeouts correctly - but that's a bug, not a broken policy -
and one shouldn't change policy to fix a bug. You're talking policy
here, if I understood correctly.]]



--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lmb at suse

Feb 22, 2007, 12:50 PM

Post #5 of 10 (972 views)
Permalink
Re: cl_make_realtime() used by too few processes? [In reply to]

On 2007-02-22T06:41:14, Alan Robertson <alanr [at] unix> wrote:

> But, they are MUCH less timing critical.
>
> One should be able to trust that they in turn are being monitored, and
> will be restarted if they misbehave. [.If that isn't an true, then it
> should be ;-)].

crmd and ccmd both have timeouts on the network which are critical.

> As a result, although failovers might be _slow_ in a serious memory
> overload condition, they should still occur. One could also lock a
> process into memory without setting realtime priority - which would fix
> the problem in a much less dangerous way (cl_make_realtime can do that).
> Unfortunately, it ALSO means these process would have to be started as
> root :-(

Yes, they probably need to be locked into memory at least. That was
mostly what I wanted to refer to, actually. Write-out path blocking and
all that.


> [.[.I know about (for example) the bug in the CCM - where it isn't
> treating timeouts correctly - but that's a bug, not a broken policy -
> and one shouldn't change policy to fix a bug. You're talking policy
> here, if I understood correctly.]]

Uhm, the CCM timeout issue has nothing to do with this, I don't see the
connection at all.



_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


alanr at unix

Feb 22, 2007, 3:13 PM

Post #6 of 10 (976 views)
Permalink
Re: cl_make_realtime() used by too few processes? [In reply to]

Lars Marowsky-Bree wrote:
> On 2007-02-22T06:41:14, Alan Robertson <alanr [at] unix> wrote:
>
>> But, they are MUCH less timing critical.
>>
>> One should be able to trust that they in turn are being monitored, and
>> will be restarted if they misbehave. [.If that isn't an true, then it
>> should be ;-)].
>
> crmd and ccmd both have timeouts on the network which are critical.

But could be very lazy in practice with no ill effect. Heartbeat is
still monitoring that the machine and its client processes don't go
away. The upper layer of software shouldn't be using timeouts to do the
job of the lower layer software. It should just be a very lazy "in case
of really weird shit" timeout. Could be several minutes.

>> As a result, although failovers might be _slow_ in a serious memory
>> overload condition, they should still occur. One could also lock a
>> process into memory without setting realtime priority - which would fix
>> the problem in a much less dangerous way (cl_make_realtime can do that).
>> Unfortunately, it ALSO means these process would have to be started as
>> root :-(
>
> Yes, they probably need to be locked into memory at least. That was
> mostly what I wanted to refer to, actually. Write-out path blocking and
> all that.

If the CRM is in the write path, something very bad is going on. Do you
think it is? If so, then I certainly can see the argument for locking
them into memory. Would you mind explaining the sequence of events
you're talking to for "write path" that involves the CRM?

>> [.[.I know about (for example) the bug in the CCM - where it isn't
>> treating timeouts correctly - but that's a bug, not a broken policy -
>> and one shouldn't change policy to fix a bug. You're talking policy
>> here, if I understood correctly.]]
>
> Uhm, the CCM timeout issue has nothing to do with this, I don't see the
> connection at all.

Just trying to keep you from making a connection.


--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lmb at suse

Feb 22, 2007, 4:04 PM

Post #7 of 10 (972 views)
Permalink
Re: cl_make_realtime() used by too few processes? [In reply to]

On 2007-02-22T16:13:18, Alan Robertson <alanr [at] unix> wrote:

> If the CRM is in the write path, something very bad is going on. Do you
> think it is? If so, then I certainly can see the argument for locking
> them into memory. Would you mind explaining the sequence of events
> you're talking to for "write path" that involves the CRM?

Heartbeat driven OCFS2, for example.

Of course, that'd effectively mean needing to lock everything into
memory, which is clearly infeasible and there's more work here to fix
the theoretical deadlock issue.

But, with the same argument, stonithd, which is not more timing critical
than the LRM, probably shouldn't be using this then.

> >> [.[.I know about (for example) the bug in the CCM - where it isn't
> >> treating timeouts correctly - but that's a bug, not a broken policy -
> >> and one shouldn't change policy to fix a bug. You're talking policy
> >> here, if I understood correctly.]]
> > Uhm, the CCM timeout issue has nothing to do with this, I don't see the
> > connection at all.
> Just trying to keep you from making a connection.

So you bring up something which I didn't mention to keep me from making
a connection? That's an interesting strategy, I need to remember it ;-)


Regards,
Lars

--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


alanr at unix

Feb 22, 2007, 7:30 PM

Post #8 of 10 (972 views)
Permalink
Re: cl_make_realtime() used by too few processes? [In reply to]

Lars Marowsky-Bree wrote:
> On 2007-02-22T16:13:18, Alan Robertson <alanr [at] unix> wrote:
>
>> If the CRM is in the write path, something very bad is going on. Do you
>> think it is? If so, then I certainly can see the argument for locking
>> them into memory. Would you mind explaining the sequence of events
>> you're talking to for "write path" that involves the CRM?
>
> Heartbeat driven OCFS2, for example.
>
> Of course, that'd effectively mean needing to lock everything into
> memory, which is clearly infeasible and there's more work here to fix
> the theoretical deadlock issue.
>
> But, with the same argument, stonithd, which is not more timing critical
> than the LRM, probably shouldn't be using this then.

Stonithd is clearly in the execution path for rebooting. The operation
for rebooting a node does NOT go through the LRM.

--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lmb at suse

Feb 22, 2007, 11:23 PM

Post #9 of 10 (981 views)
Permalink
Re: cl_make_realtime() used by too few processes? [In reply to]

On 2007-02-22T20:30:10, Alan Robertson <alanr [at] unix> wrote:

> > Of course, that'd effectively mean needing to lock everything into
> > memory, which is clearly infeasible and there's more work here to fix
> > the theoretical deadlock issue.
> >
> > But, with the same argument, stonithd, which is not more timing critical
> > than the LRM, probably shouldn't be using this then.
>
> Stonithd is clearly in the execution path for rebooting. The operation
> for rebooting a node does NOT go through the LRM.

Uhm. Everybody who _tells_ stonithd to perform a reboot is not in the
protected part of the stack, so it's not really useful to have stonithd
in there.


_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


alanr at unix

Feb 23, 2007, 10:20 AM

Post #10 of 10 (993 views)
Permalink
Re: cl_make_realtime() used by too few processes? [In reply to]

Lars Marowsky-Bree wrote:
> On 2007-02-22T20:30:10, Alan Robertson <alanr [at] unix> wrote:
>
>>> Of course, that'd effectively mean needing to lock everything into
>>> memory, which is clearly infeasible and there's more work here to fix
>>> the theoretical deadlock issue.
>>>
>>> But, with the same argument, stonithd, which is not more timing critical
>>> than the LRM, probably shouldn't be using this then.
>> Stonithd is clearly in the execution path for rebooting. The operation
>> for rebooting a node does NOT go through the LRM.
>
> Uhm. Everybody who _tells_ stonithd to perform a reboot is not in the
> protected part of the stack, so it's not really useful to have stonithd
> in there.

Not in the current architecture, no.

Stonithd already has to run as root for other reasons. So, it's not a
new security hole to make it run as root and perform this.


--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.