Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[RFC][PATCH 0/6] Critical Page Pool

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


colpatch at us

Dec 13, 2005, 11:50 PM

Post #1 of 13 (487 views)
Permalink
[RFC][PATCH 0/6] Critical Page Pool

Here is the latest version of the Critical Page Pool patches. Besides
bugfixes, I've removed all the slab cleanup work from the series. Also,
since one of the main questions about the patch series seems to revolve
around how to appropriately size the pool, I've added some basic statistics
about the critical page pool, viewable by reading
/proc/sys/vm/critical_pages. The code now exports how many pages were
requested, how many pages are currently in use, and the maximum number of
pages that were ever in use.

The overall purpose of this patch series is to all a system administrator
to reserve a number of pages in a 'critical pool' that is set aside for
situations when the system is 'in emergency'. It is up to the individual
administrator to determine when his/her system is 'in emergency'. This is
not meant to (necessarily) anticipate OOM situations, though that is
certainly one possible use. The purpose this was originally designed for
is to allow the networking code to keep functioning despite the sytem
losing its (potentially networked) swap device, and thus temporarily
putting the system under exreme memory pressure.

Any comments about the code or the overall design are very welcome.
Patches agaist 2.6.15-rc5.

-Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


pavel at suse

Dec 14, 2005, 2:08 AM

Post #2 of 13 (468 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

Hi!

> The overall purpose of this patch series is to all a system administrator
> to reserve a number of pages in a 'critical pool' that is set aside for
> situations when the system is 'in emergency'. It is up to the individual
> administrator to determine when his/her system is 'in emergency'. This is
> not meant to (necessarily) anticipate OOM situations, though that is
> certainly one possible use. The purpose this was originally designed for
> is to allow the networking code to keep functioning despite the sytem
> losing its (potentially networked) swap device, and thus temporarily
> putting the system under exreme memory pressure.

I don't see how this can ever work.

How can _userspace_ know about what allocations are critical to the
kernel?!

And as you noticed, it does not work for your original usage case,
because reserved memory pool would have to be "sum of all network
interface bandwidths * ammount of time expected to survive without
network" which is way too much.

If you want few emergency pages for some strange hack you are doing
(swapping over network?), just put swap into ramdisk and swapon() it
when you are in emergency, or use memory hotplug and plug few more
gigabytes into your machine. But don't go introducing infrastructure
that _can't_ be used right.
Pavel
--
Thanks, Sharp!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


pavel at suse

Dec 14, 2005, 2:08 AM

Post #3 of 13 (488 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

Hi!

> The overall purpose of this patch series is to all a system administrator
> to reserve a number of pages in a 'critical pool' that is set aside for
> situations when the system is 'in emergency'. It is up to the individual
> administrator to determine when his/her system is 'in emergency'. This is
> not meant to (necessarily) anticipate OOM situations, though that is
> certainly one possible use. The purpose this was originally designed for
> is to allow the networking code to keep functioning despite the sytem
> losing its (potentially networked) swap device, and thus temporarily
> putting the system under exreme memory pressure.

I don't see how this can ever work.

How can _userspace_ know about what allocations are critical to the
kernel?!

And as you noticed, it does not work for your original usage case,
because reserved memory pool would have to be "sum of all network
interface bandwidths * ammount of time expected to survive without
network" which is way too much.

If you want few emergency pages for some strange hack you are doing
(swapping over network?), just put swap into ramdisk and swapon() it
when you are in emergency, or use memory hotplug and plug few more
gigabytes into your machine. But don't go introducing infrastructure
that _can't_ be used right.
Pavel
--
Thanks, Sharp!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


andrea at suse

Dec 14, 2005, 4:01 AM

Post #4 of 13 (490 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

On Wed, Dec 14, 2005 at 11:08:41AM +0100, Pavel Machek wrote:
> because reserved memory pool would have to be "sum of all network
> interface bandwidths * ammount of time expected to survive without
> network" which is way too much.

Yes, a global pool isn't really useful. A per-subsystem pool would be
more reasonable...

> gigabytes into your machine. But don't go introducing infrastructure
> that _can't_ be used right.

Agreed, the current design of the patch can't be used right.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


alan at lxorguk

Dec 14, 2005, 5:03 AM

Post #5 of 13 (484 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

On Mer, 2005-12-14 at 13:01 +0100, Andrea Arcangeli wrote:
> On Wed, Dec 14, 2005 at 11:08:41AM +0100, Pavel Machek wrote:
> > because reserved memory pool would have to be "sum of all network
> > interface bandwidths * ammount of time expected to survive without
> > network" which is way too much.
>
> Yes, a global pool isn't really useful. A per-subsystem pool would be
> more reasonable...


The whole extra critical level seems dubious in itself. In 2.0/2.2 days
there were a set of patches that just dropped incoming memory on sockets
when the memory was tight unless they were marked as critical (ie NFS
swap). It worked rather well. The rest of the changes beyond that seem
excessive.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


colpatch at us

Dec 14, 2005, 7:55 AM

Post #6 of 13 (470 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

Pavel Machek wrote:
> Hi!
>
>
>>The overall purpose of this patch series is to all a system administrator
>>to reserve a number of pages in a 'critical pool' that is set aside for
>>situations when the system is 'in emergency'. It is up to the individual
>>administrator to determine when his/her system is 'in emergency'. This is
>>not meant to (necessarily) anticipate OOM situations, though that is
>>certainly one possible use. The purpose this was originally designed for
>>is to allow the networking code to keep functioning despite the sytem
>>losing its (potentially networked) swap device, and thus temporarily
>>putting the system under exreme memory pressure.
>
>
> I don't see how this can ever work.
>
> How can _userspace_ know about what allocations are critical to the
> kernel?!

Well, it isn't userspace that is determining *which* allocations are
critical to the kernel. That is statically determined at compile time by
using the flag __GFP_CRITICAL on specific *kernel* allocations. Sridhar,
cc'd on this mail, has a set of patches that sprinkle the __GFP_CRITICAL
flag throughout the networking code to take advantage of this pool.
Userspace is in charge of determining *when* we're in an emergency
situation, and should thus use the critical pool, but not *which*
allocations are critical to surviving this emergency situation.


> And as you noticed, it does not work for your original usage case,
> because reserved memory pool would have to be "sum of all network
> interface bandwidths * ammount of time expected to survive without
> network" which is way too much.

Well, I never suggested it didn't work for my original usage case. The
discussion we had is that it would be incredibly difficult to 100%
iron-clad guarantee that the pool would NEVER run out of pages. But we can
size the pool, especially given a decent workload approximation, so as to
make failure far less likely.


> If you want few emergency pages for some strange hack you are doing
> (swapping over network?), just put swap into ramdisk and swapon() it
> when you are in emergency, or use memory hotplug and plug few more
> gigabytes into your machine. But don't go introducing infrastructure
> that _can't_ be used right.

Well, that's basically the point of posting these patches as an RFC. I'm
not quite so delusional as to think they're going to get picked up right
now. I was, however, hoping for feedback to figure out how to design
infrastructure that *can* be used right, as well as trying to find other
potential users of such a feature.

Thanks!

-Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


colpatch at us

Dec 14, 2005, 8:03 AM

Post #7 of 13 (482 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

Andrea Arcangeli wrote:
> On Wed, Dec 14, 2005 at 11:08:41AM +0100, Pavel Machek wrote:
>
>>because reserved memory pool would have to be "sum of all network
>>interface bandwidths * ammount of time expected to survive without
>>network" which is way too much.
>
>
> Yes, a global pool isn't really useful. A per-subsystem pool would be
> more reasonable...

Which is an idea that I toyed with, as well. The problem that I ran into
is how to tag an allocation as belonging to a specific subsystem. For
example, in our code we need networking to use the critical pool. How do
we let __alloc_pages() know what allocations belong to networking?
Networking needs named slab allocations, kmalloc allocations, and whole
page allocations to function. Should each subsystem get it's own GFP flag
(GFP_NETWORKING, GFP_SCSI, GFP_SOUND, GFP_TERMINAL, ad nauseum)? Should we
create these pools dynamically and pass a reference to which pool each
specific allocation uses (thus adding a parameter to all memory allocation
functions in the kernel)? I realize that per-subsystem pools would be
better, but I thought about this for a while and couldn't come up with a
reasonable way to do it.


>>gigabytes into your machine. But don't go introducing infrastructure
>>that _can't_ be used right.
>
>
> Agreed, the current design of the patch can't be used right.

Well, it can for our use, but I recognize that isn't going to be a huge
selling point! :) As I mentioned in my reply to Pavel, I'd really like to
find a way to design something that WOULD be generally useful.

Thanks!

-Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


colpatch at us

Dec 14, 2005, 8:37 AM

Post #8 of 13 (483 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

Alan Cox wrote:
> On Mer, 2005-12-14 at 13:01 +0100, Andrea Arcangeli wrote:
>
>>On Wed, Dec 14, 2005 at 11:08:41AM +0100, Pavel Machek wrote:
>>
>>>because reserved memory pool would have to be "sum of all network
>>>interface bandwidths * ammount of time expected to survive without
>>>network" which is way too much.
>>
>>Yes, a global pool isn't really useful. A per-subsystem pool would be
>>more reasonable...
>
>
>
> The whole extra critical level seems dubious in itself. In 2.0/2.2 days
> there were a set of patches that just dropped incoming memory on sockets
> when the memory was tight unless they were marked as critical (ie NFS
> swap). It worked rather well. The rest of the changes beyond that seem
> excessive.

Actually, Sridhar's code (mentioned earlier in this thread) *does* drop
incoming packets that are not 'critical', but unfortunately you need to
completely copy the packet into kernel memory before you can do any
processing on it to determine whether or not it's 'critical', and thus
accept or reject it. If network traffic is coming in at a good clip and
the system is already under memory pressure, it's going to be difficult to
receive all these packets, which was the inspiration for this patchset.

Thanks!

-Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


alan at lxorguk

Dec 14, 2005, 11:17 AM

Post #9 of 13 (480 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

On Mer, 2005-12-14 at 08:37 -0800, Matthew Dobson wrote:
> Actually, Sridhar's code (mentioned earlier in this thread) *does* drop
> incoming packets that are not 'critical', but unfortunately you need to

I realise that but if you look at the previous history in 2.0 and 2.2
this was all that was ever needed. It thus begs the question why all the
extra support and logic this time around ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


pavel at suse

Dec 15, 2005, 8:26 AM

Post #10 of 13 (482 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

Hi!

> > I don't see how this can ever work.
> >
> > How can _userspace_ know about what allocations are critical to the
> > kernel?!
>
> Well, it isn't userspace that is determining *which* allocations are
> critical to the kernel. That is statically determined at compile time by
> using the flag __GFP_CRITICAL on specific *kernel* allocations. Sridhar,
> cc'd on this mail, has a set of patches that sprinkle the __GFP_CRITICAL
> flag throughout the networking code to take advantage of this pool.
> Userspace is in charge of determining *when* we're in an emergency
> situation, and should thus use the critical pool, but not *which*

It still is not too reliable. If you userspace tool is swapped out
(etc), it may not get chance to wake up.

> > And as you noticed, it does not work for your original usage case,
> > because reserved memory pool would have to be "sum of all network
> > interface bandwidths * ammount of time expected to survive without
> > network" which is way too much.
>
> Well, I never suggested it didn't work for my original usage case. The
> discussion we had is that it would be incredibly difficult to 100%
> iron-clad guarantee that the pool would NEVER run out of pages. But we can
> size the pool, especially given a decent workload approximation, so as to
> make failure far less likely.

Perhaps you should add file in Documentation/ explaining it is not
reliable?

> > If you want few emergency pages for some strange hack you are doing
> > (swapping over network?), just put swap into ramdisk and swapon() it
> > when you are in emergency, or use memory hotplug and plug few more
> > gigabytes into your machine. But don't go introducing infrastructure
> > that _can't_ be used right.
>
> Well, that's basically the point of posting these patches as an RFC. I'm
> not quite so delusional as to think they're going to get picked up right
> now. I was, however, hoping for feedback to figure out how to design
> infrastructure that *can* be used right, as well as trying to find other
> potential users of such a feature.

Well, we don't usually take infrastructure that has no in-kernel
users, and example user would indeed be nice.
Pavel
--
Thanks, Sharp!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


pavel at suse

Dec 15, 2005, 8:27 AM

Post #11 of 13 (481 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

Hi!

> > The whole extra critical level seems dubious in itself. In 2.0/2.2 days
> > there were a set of patches that just dropped incoming memory on sockets
> > when the memory was tight unless they were marked as critical (ie NFS
> > swap). It worked rather well. The rest of the changes beyond that seem
> > excessive.
>
> Actually, Sridhar's code (mentioned earlier in this thread) *does* drop
> incoming packets that are not 'critical', but unfortunately you need to
> completely copy the packet into kernel memory before you can do any
> processing on it to determine whether or not it's 'critical', and thus
> accept or reject it. If network traffic is coming in at a good clip and
> the system is already under memory pressure, it's going to be difficult to
> receive all these packets, which was the inspiration for this patchset.

You should be able to do all this with single, MTU-sized buffer.

Receive packet into buffer. If it is nice, pass it up, otherwise drop
it. Yes, it may drop some "important" packets, but that's okay, packet
loss is expected on networks.
Pavel
--
Thanks, Sharp!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


colpatch at us

Dec 15, 2005, 1:51 PM

Post #12 of 13 (482 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

Pavel Machek wrote:
>>>And as you noticed, it does not work for your original usage case,
>>>because reserved memory pool would have to be "sum of all network
>>>interface bandwidths * ammount of time expected to survive without
>>>network" which is way too much.
>>
>>Well, I never suggested it didn't work for my original usage case. The
>>discussion we had is that it would be incredibly difficult to 100%
>>iron-clad guarantee that the pool would NEVER run out of pages. But we can
>>size the pool, especially given a decent workload approximation, so as to
>>make failure far less likely.
>
>
> Perhaps you should add file in Documentation/ explaining it is not
> reliable?

That's a good suggestion. I will rework the patch's additions to
Documentation/sysctl/vm.txt to be more clear about exactly what we're
providing.


>>>If you want few emergency pages for some strange hack you are doing
>>>(swapping over network?), just put swap into ramdisk and swapon() it
>>>when you are in emergency, or use memory hotplug and plug few more
>>>gigabytes into your machine. But don't go introducing infrastructure
>>>that _can't_ be used right.
>>
>>Well, that's basically the point of posting these patches as an RFC. I'm
>>not quite so delusional as to think they're going to get picked up right
>>now. I was, however, hoping for feedback to figure out how to design
>>infrastructure that *can* be used right, as well as trying to find other
>>potential users of such a feature.
>
>
> Well, we don't usually take infrastructure that has no in-kernel
> users, and example user would indeed be nice.
> Pavel

Understood. I certainly wouldn't expect otherwise. I'll see if I can get
Sridhar to post his networking changes that take advantage of this.

Thanks!

-Matt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


sri at us

Dec 15, 2005, 9:02 PM

Post #13 of 13 (479 views)
Permalink
Re: [RFC][PATCH 0/6] Critical Page Pool [In reply to]

Matthew Dobson wrote:

>Pavel Machek wrote:
>
>
>>>>And as you noticed, it does not work for your original usage case,
>>>>because reserved memory pool would have to be "sum of all network
>>>>interface bandwidths * ammount of time expected to survive without
>>>>network" which is way too much.
>>>>
>>>>
>>>Well, I never suggested it didn't work for my original usage case. The
>>>discussion we had is that it would be incredibly difficult to 100%
>>>iron-clad guarantee that the pool would NEVER run out of pages. But we can
>>>size the pool, especially given a decent workload approximation, so as to
>>>make failure far less likely.
>>>
>>>
>>Perhaps you should add file in Documentation/ explaining it is not
>>reliable?
>>
>>
>
>That's a good suggestion. I will rework the patch's additions to
>Documentation/sysctl/vm.txt to be more clear about exactly what we're
>providing.
>
>
>
>
>>>>If you want few emergency pages for some strange hack you are doing
>>>>(swapping over network?), just put swap into ramdisk and swapon() it
>>>>when you are in emergency, or use memory hotplug and plug few more
>>>>gigabytes into your machine. But don't go introducing infrastructure
>>>>that _can't_ be used right.
>>>>
>>>>
>>>Well, that's basically the point of posting these patches as an RFC. I'm
>>>not quite so delusional as to think they're going to get picked up right
>>>now. I was, however, hoping for feedback to figure out how to design
>>>infrastructure that *can* be used right, as well as trying to find other
>>>potential users of such a feature.
>>>
>>>
>>Well, we don't usually take infrastructure that has no in-kernel
>>users, and example user would indeed be nice.
>> Pavel
>>
>>
>
>Understood. I certainly wouldn't expect otherwise. I'll see if I can get
>Sridhar to post his networking changes that take advantage of this.
>
>
I have posted these patches yesterday on lkml and netdev and here is a
link to the thread.
http://thread.gmane.org/gmane.linux.kernel/357835

Thanks
Sridhar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.