Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Dev

make --enable-libc-alloc the default?

 

 

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded


lmb at suse

Feb 12, 2007, 11:16 AM

Post #1 of 10 (1044 views)
Permalink
make --enable-libc-alloc the default?

Andrew's measured 10% performance increase suggests that we should make
this the default, IMHO, at least on Linux: apparently, our glibc
allocators are better than heartbeats.

I've been running tests with it and not found any issues.

I'll probably make this the default on SLES, but think it'd make sense
for upstream as well.


Sincerely,
Lars

--
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge."

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


alanr at unix

Feb 20, 2007, 3:26 PM

Post #2 of 10 (1003 views)
Permalink
Re: make --enable-libc-alloc the default? [In reply to]

Lars Marowsky-Bree wrote:
> Andrew's measured 10% performance increase suggests that we should make
> this the default, IMHO, at least on Linux: apparently, our glibc
> allocators are better than heartbeats.
>
> I've been running tests with it and not found any issues.
>
> I'll probably make this the default on SLES, but think it'd make sense
> for upstream as well.

Heartbeat on the average on a real running system should take < 1% or
less of system resources. Dropping that to .9% really doesn't excite me
at all. And, my guess is that it makes the most difference for the CRM
-- which does LOTS more mallocs than anything else (as it must), and
usually does nothing most of the time on a real running system. So, my
guess is that it drops it more like 5% or so rather than 10%.

Andrew has been having some trouble with use-after-frees recently. Our
code catches that -- if it's enabled ;-). If it's disabled, it doesn't
help anything.


--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


beekhof at gmail

Feb 21, 2007, 4:23 AM

Post #3 of 10 (1009 views)
Permalink
Re: make --enable-libc-alloc the default? [In reply to]

On 2/21/07, Alan Robertson <alanr [at] unix> wrote:
> Lars Marowsky-Bree wrote:
> > Andrew's measured 10% performance increase suggests that we should make
> > this the default, IMHO, at least on Linux: apparently, our glibc
> > allocators are better than heartbeats.
> >
> > I've been running tests with it and not found any issues.
> >
> > I'll probably make this the default on SLES, but think it'd make sense
> > for upstream as well.
>
> Heartbeat on the average on a real running system should take < 1% or
> less of system resources.

yeah, when its not doing anything.

you know full well that we use significantly more during failover.

how about we also factor in that cluster nodes only account for, say,
5% of a site's resources?
then the it only changes the percentages from 0.05% to 0.049%... yeah,
totally not worth it.

A 10% overhead for something the user never benefits from is
significant, but If you want to fiddle the numbers so you can feel
better - go ahead.

> Dropping that to .9% really doesn't excite me
> at all. And, my guess is that it makes the most difference for the CRM
> -- which does LOTS more mallocs than anything else (as it must), and
> usually does nothing most of the time on a real running system. So, my
> guess is that it drops it more like 5% or so rather than 10%.
>
> Andrew has been having some trouble with use-after-frees recently. Our
> code catches that -- if it's enabled ;-).

apparently not all the time

> If it's disabled, it doesn't
> help anything.

If you're referring to what Coverity caught last week, those have been
in existence for a long time and clmalloc never caught them because
they were in error paths.

The others were found after running Valgrind which gives you both a
stack-trace of where it happened and a complete stack-trace of who
free'd it originally. Slightly more useful than "that memory is
already free'd, bye". I dont know why clmalloc didnt complain in the
past.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


alanr at unix

Feb 21, 2007, 7:09 AM

Post #4 of 10 (993 views)
Permalink
Re: make --enable-libc-alloc the default? [In reply to]

Andrew Beekhof wrote:
> On 2/21/07, Alan Robertson <alanr [at] unix> wrote:
>> Lars Marowsky-Bree wrote:
>> > Andrew's measured 10% performance increase suggests that we should make
>> > this the default, IMHO, at least on Linux: apparently, our glibc
>> > allocators are better than heartbeats.
>> >
>> > I've been running tests with it and not found any issues.
>> >
>> > I'll probably make this the default on SLES, but think it'd make sense
>> > for upstream as well.
>>
>> Heartbeat on the average on a real running system should take < 1% or
>> less of system resources.
>
> yeah, when its not doing anything.


Which is EXACTLY what it's supposed to do for months or years at a time
- and in fact what it does do for months or years at a time.

The fact that _under test conditions_ it consumes a bit more is a bit
artificial, IMHO.

_Of course_ it consumes more when running CTS. We're doing our level
best to beat the crap out of it. I should _hope_ it consumes more when
running CTS.

What's important is how much difference it makes _in real life_ on real
systems running the code - not how bad you can make it look on test systems.

Anything unlike the real experience of real uses is, as you said,
"fiddling the numbers".



--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lmb at suse

Feb 22, 2007, 2:39 AM

Post #5 of 10 (997 views)
Permalink
Re: make --enable-libc-alloc the default? [In reply to]

On 2007-02-21T08:09:43, Alan Robertson <alanr [at] unix> wrote:

> Which is EXACTLY what it's supposed to do for months or years at a time
> - and in fact what it does do for months or years at a time.
>
> The fact that _under test conditions_ it consumes a bit more is a bit
> artificial, IMHO.

This is incomplete. It neglects that 10% speedup mean 10% faster
fail-over, give or take. People don't care whether we take 1% or even 2%
at runtime, but they'll bother if we take 10% less time to fail-over.

(Given of course that the CCM implementation right now slows us down
tons as well, this may not be significant.)

And, it also means that the code does not provide anything we need in
practice; during testing, valgrind/Coverity/BEAM(?) provide more useful
feedback. Occam's razor suggests the other half of the reason for
disabling the code.


Sincerely,
Lars

--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


alanr at unix

Feb 22, 2007, 5:55 AM

Post #6 of 10 (993 views)
Permalink
Re: make --enable-libc-alloc the default? [In reply to]

Lars Marowsky-Bree wrote:
> On 2007-02-21T08:09:43, Alan Robertson <alanr [at] unix> wrote:
>
>> Which is EXACTLY what it's supposed to do for months or years at a time
>> - and in fact what it does do for months or years at a time.
>>
>> The fact that _under test conditions_ it consumes a bit more is a bit
>> artificial, IMHO.
>
> This is incomplete. It neglects that 10% speedup mean 10% faster
> fail-over, give or take. People don't care whether we take 1% or even 2%
> at runtime, but they'll bother if we take 10% less time to fail-over.
>
> (Given of course that the CCM implementation right now slows us down
> tons as well, this may not be significant.)

It doesn't mean that AT ALL. Failover time is normally 90% dominated by
resource agent time. And increasing CPU time in a multi-process,
multi-processor situation where networking delays and scheduling delays
are typically higher than CPU time means that even the wall-clock delay
that isn't due to resource agents isn't probably more than 5% at most.
If resource agents are 90% of that time, then the delay is probably more
like .5%.

> And, it also means that the code does not provide anything we need in
> practice; during testing, valgrind/Coverity/BEAM(?) provide more useful
> feedback. Occam's razor suggests the other half of the reason for
> disabling the code.

Really? It has caught dozens of bugs. Different tools find different
bugs. None are perfect. Somehow you're saying that we should take
weapons for finding bugs out of our arsenal because when Andrew disables
them, they don't find any bugs for him.

Hmmm... I wonder why that could be...

I think there's a little circular reasoning going on here...

In addition, Occam's razor does not apply in this case. There is no
single explanation being sought here (the context for applying the
Occam's razor pattern). If we were to apply your principle here, then
we would get rid of valgrind because we run Coverity -- or vice versa.

And, while Andrew's code is very important, it's still not by any means
all the code in the system. Maybe you'll recall his discussion of why
valgrind (the only runtime tool besides the malloc library) wouldn't
work for the rest of the system.

--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lmb at suse

Feb 22, 2007, 12:58 PM

Post #7 of 10 (991 views)
Permalink
Re: make --enable-libc-alloc the default? [In reply to]

On 2007-02-22T06:55:37, Alan Robertson <alanr [at] unix> wrote:

> It doesn't mean that AT ALL. Failover time is normally 90% dominated by
> resource agent time. And increasing CPU time in a multi-process,
> multi-processor situation where networking delays and scheduling delays
> are typically higher than CPU time means that even the wall-clock delay
> that isn't due to resource agents isn't probably more than 5% at most.
> If resource agents are 90% of that time, then the delay is probably more
> like .5%.

That much is certainly true.

> Really? It has caught dozens of bugs. Different tools find different
> bugs. None are perfect. Somehow you're saying that we should take
> weapons for finding bugs out of our arsenal because when Andrew disables
> them, they don't find any bugs for him.

Has it caught bugs recently?

And no, I'm not saying to rip it out. I'm saying to disable it for
production shipments, so the question becomes: Has it caught any bugs on
production systems?

For CTS runs and stuff, the malloc safeguards are good. One might even
consider making/leaving them as the default.

I totally don't see the point of our own allocator, given that glibc's
one is obviously - that much the numbers clearly show - significantly
faster, even if the overall impact may be small. So, why maintain it?
It's pointless by now - it was useful once, but the system libraries
have become better.

I don't object to the safeguards code. I object to the allocator. That
bit no longer makes sense.

And, for production systems, even the safeguards are questionable,
because those bugs have all been caught during debugging.

I know this is your code, and you're attached to it, but please at least
address the matter of the safeguards and having our own allocator
separately.


Sincerely,
Lars

--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


alanr at unix

Feb 22, 2007, 3:05 PM

Post #8 of 10 (987 views)
Permalink
Re: make --enable-libc-alloc the default? [In reply to]

Lars Marowsky-Bree wrote:
> On 2007-02-22T06:55:37, Alan Robertson <alanr [at] unix> wrote:
>
>> It doesn't mean that AT ALL. Failover time is normally 90% dominated by
>> resource agent time. And increasing CPU time in a multi-process,
>> multi-processor situation where networking delays and scheduling delays
>> are typically higher than CPU time means that even the wall-clock delay
>> that isn't due to resource agents isn't probably more than 5% at most.
>> If resource agents are 90% of that time, then the delay is probably more
>> like .5%.
>
> That much is certainly true.
>
>> Really? It has caught dozens of bugs. Different tools find different
>> bugs. None are perfect. Somehow you're saying that we should take
>> weapons for finding bugs out of our arsenal because when Andrew disables
>> them, they don't find any bugs for him.
>
> Has it caught bugs recently?

Andrew's been writing most of the newer code. Newer code has more bugs
than older code. Andrew has it disabled. What a surprise that it isn't
finding any bugs in his code.

> And no, I'm not saying to rip it out. I'm saying to disable it for
> production shipments, so the question becomes: Has it caught any bugs on
> production systems?
>
> For CTS runs and stuff, the malloc safeguards are good. One might even
> consider making/leaving them as the default.
>
> I totally don't see the point of our own allocator, given that glibc's
> one is obviously - that much the numbers clearly show - significantly
> faster, even if the overall impact may be small. So, why maintain it?
> It's pointless by now - it was useful once, but the system libraries
> have become better.
>
> I don't object to the safeguards code. I object to the allocator. That
> bit no longer makes sense.
>
> And, for production systems, even the safeguards are questionable,
> because those bugs have all been caught during debugging.
>
> I know this is your code, and you're attached to it, but please at least
> address the matter of the safeguards and having our own allocator
> separately.

They are intimately tied together - and probably the cause of the
inefficiency you're complaining about.

Any time you're talking about turning off a safeguard for what is at
best a very small improvement in performance, I don't see the value. I
like to be able to debug things when they go wrong.

The relevant acronym is RAS:
Reliability - aided by using this during debugging
Availability - (improving R improves A)
Servicability - the ability to debug things in the field

So, the patches help all three letters of RAS - for a small performance
penalty.

Let's see what the web site says:
The basic goal of the High Availability Linux project is to:

*Provide a high-availability (clustering) solution for Linux
which promotes reliability, availability, and serviceability
(RAS) through a community development effort.

So, the goal listed on the web site seems to indicate that RAS is
important. Much more important than a small performance hit.

If Linux-HA consumed lots of CPU, and the first paragraph of the web
site said "enhance performance" or something similar, I'd certainly
agree. This isn't about it being my code. It's about something much
simpler - the reason the project exists.

It's about the right perspective for the task at hand.


--
Alan Robertson <alanr [at] unix>

"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


lmb at suse

Feb 22, 2007, 4:02 PM

Post #9 of 10 (993 views)
Permalink
Re: make --enable-libc-alloc the default? [In reply to]

On 2007-02-22T16:05:51, Alan Robertson <alanr [at] unix> wrote:

> > Has it caught bugs recently?
> Andrew's been writing most of the newer code. Newer code has more bugs
> than older code. Andrew has it disabled. What a surprise that it isn't
> finding any bugs in his code.

That comment is not appropriate. Andrew has only been testing with it
disabled for a few weeks, because disabling it greatly increased the
value of valgrind for him. Coverity, too, has better models of
malloc/free, and promptly spotted a bunch of more issues.

And it turned out to have a performance benefits. As well as using
memory more efficiently.

So, it seems to be a win to have tested with it disabled. And I've NEVER
had a report where it caught any issue in the field. Have you? Any case
where it _would_ have helped, but was disabled?

> They are intimately tied together - and probably the cause of the
> inefficiency you're complaining about.

The bucket allocator is not tied to the safe guards. It is tied to it in
the entangled current implementation, but that could be restructured
readily using inline functions calling either the bucket allocator or
libc free.

Yes, MARK_PRISTINE verification on alloc doesn't really work too well.
But, strangely enough, that's disabled because of the horrible
performance penality anyway ...

For production systems, I do not see the value of this allocator. I do
not even see the value of the checks, because they simply don't catch
anything in practice; so I conclude they are nice for debugging, but for
production systems, I'd rather use the much better tested libc
allocators.

> The relevant acronym is RAS:
> Reliability - aided by using this during debugging
> Availability - (improving R improves A)
> Servicability - the ability to debug things in the field
>
> So, the patches help all three letters of RAS - for a small performance
> penalty.

This is your reading. I disagree. They do not help RAS in a way which
offsets the runtime penalty and the code complexity.

The coredump feature is great. That helps a lot.

> If Linux-HA consumed lots of CPU, and the first paragraph of the web
> site said "enhance performance" or something similar, I'd certainly
> agree. This isn't about it being my code. It's about something much
> simpler - the reason the project exists.
>
> It's about the right perspective for the task at hand.

Yes. The right perspective, of course, being subjective. And, you having
started the project, of course get to comment on why it exists.

If this was the right perspective, for only a few additional percent of
runtime overhead, one might consider linking against a debugging version
of glibc, or a debugging kernel. Strangely enough, people don't do that
for production systems.

If reliability and code quality were your utmost concerns, you'd have a
much stronger point. Your personal bugzilla history doesn't reflect
that, nor does the amount of review you do before accepting a patch or
while proof reading other people's code. So, I can but conclude you're
arguing in this case because it's your code, and probably I bother
arguing with you for the same reason instead of just doing my thing on
SLES ;-)


However, design arguments are _impossible_ to win by rational argument,
otherwise there'd be only one kind of art. So I will stop this futile
argument and apologize for wasting our time: there's better ways to
solve such disagreement.


Regards,
Lars

--
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


beekhof at gmail

Feb 23, 2007, 2:57 AM

Post #10 of 10 (1001 views)
Permalink
Re: make --enable-libc-alloc the default? [In reply to]

On 2/23/07, Lars Marowsky-Bree <lmb [at] suse> wrote:
> On 2007-02-22T16:05:51, Alan Robertson <alanr [at] unix> wrote:
>
> > > Has it caught bugs recently?
> > Andrew's been writing most of the newer code. Newer code has more bugs
> > than older code. Andrew has it disabled. What a surprise that it isn't
> > finding any bugs in his code.
>
> That comment is not appropriate.

Not to mention a stupid thing to say as it obviously proves that we
have enough mechanisms in place to catch such errors _without_
cl_malloc.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Linux-HA dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.