Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH 0/7] Generic Process Containers (+ ResGroups/BeanCounters)

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


menage at google

Nov 23, 2006, 4:08 AM

Post #1 of 4 (335 views)
Permalink
[PATCH 0/7] Generic Process Containers (+ ResGroups/BeanCounters)

This is an update to my multi-hierarchy generic containers patch (against
2.6.19-rc6). Changes include:

- an example patch implementing the BeanCounters core and numfiles
counters over generic containers. The addition of the
BeanCounters code unifies the three main process grouping
abstractions (Cpusets, ResGroups and BeanCounters).

- a patch splitting Cpusets into two independently groupable
subsystems, Cpusets and Memsets.

- support for a subsystem to keep a container alive via refcounts
(e.g. the BeanCounters numfiles counter has a reference to the
beancounter object from each file charged to that beancounter, so
needs to be able to keep the beancounter alive until the file is
destroyed)

-------------------------------------

There have recently been various proposals floating around for
resource management/accounting subsystems in the kernel, including
Res Groups, User BeanCounters and others. These all need the basic
abstraction of being able to group together multiple processes in an
aggregate, in order to track/limit the resources permitted to those
processes, and all implement this grouping in different ways.

Already existing in the kernel is the cpuset subsystem; this has a
process grouping mechanism that is mature, tested, and well documented
(particularly with regards to synchronization rules).

This patchset extracts the process grouping code from cpusets into a
generic container system, and makes the cpusets code a client of
the container system.

It also provides several example clients of the container system,
including ResGroups and BeanCounters

The change is implemented in five stages plus two additional example patches:

1) extract the process grouping code from cpusets into a standalone system

2) remove the process grouping code from cpusets and hook into the
container system

3) convert the container system to present a generic multi-hierarchy
API, and make cpusets a client of that API

4) add a simple CPU accounting container subsystem as an example

5) example of implementing ResGroups and its numtasks controller over
generic containers - not intended to be applied with this patch set

6) split cpusets into two subsystems, cpusets and memsets

7) example of implementing BeanCounters and its numfiles counter over
generic containers - not intended to be applied with this patch set


The intention is that the various resource management efforts can also
become container clients, with the result that:

- the userspace APIs are (somewhat) normalised

- it's easier to test out e.g. the ResGroups CPU controller in
conjunction with the BeanCounters memory controller

- the additional kernel footprint of any of the competing resource
management systems is substantially reduced, since it doesn't need
to provide process grouping/containment, hence improving their
chances of getting into the kernel

Signed-off-by: Paul Menage <menage[at]google.com>

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


pj at sgi

Nov 29, 2006, 11:32 PM

Post #2 of 4 (292 views)
Permalink
Re: [PATCH 0/7] Generic Process Containers (+ ResGroups/BeanCounters) [In reply to]

I got a chance to build and test this patch set, to see if it behaved
like I expected cpusets to behave, on an ia64 SN2 Altix system.

Two details - otherwise looked good. I continue to like this
approach.

The two details are (1) /proc/<pid>/cpuset not configured by
default if CPUSETS configured, and (2) a locking bug wedging
tasks trying to rmdir a cpuset off the notify_on_release hook.


1) I had to enable CONFIG_PROC_PID_CPUSET. I used the following
one line change to do this. I am willing to consider, in due
time, phasing out such legacy cpuset support. But so long as it
is small stuff that is not getting in anyone's way, I think we
should take our sweet time about doing so -- as in a year or two
after marking it deprecated or some such. No sense deciding that
matter now; keep the current cpuset API working throughout any
transitition to container based cpusets, then revisit the question
of whether to deprecate and eventually remove these kernel API
details, later on, after the major reconstruction dust settles.
In general, we try to avoid removing kernel API's, especially if
they are happily being used and working and not causing anyone
grief.

============================ begin ============================
--- 2.6.19-rc5.orig/init/Kconfig 2006-11-29 21:14:48.071114833 -0800
+++ 2.6.19-rc5/init/Kconfig 2006-11-29 22:19:02.015166048 -0800
@@ -268,6 +268,7 @@ config CPUSETS
config PROC_PID_CPUSET
bool "Include legacy /proc/<pid>/cpuset file"
depends on CPUSETS
+ default y if CPUSETS

config CONTAINER_CPUACCT
bool "Simple CPU accounting container subsystem"
============================= end =============================


2) I wedged the kernel on the container_lock, doing a removal of a cpuset
using notify_on_release.

Right now, that test system has the following two tasks, wedged:

============================ begin ============================
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 S root 4992 34 0 71 -5 - 380 wait 22:51 ? 00:00:00 /bin/sh /sbin/cpuset_release_agent /cpuset_test_tree
0 D root 4994 4992 0 72 -5 - 200 contai 22:51 ? 00:00:00 rmdir /dev/cpuset//cpuset_test_tree
============================= end =============================

I had a cpuset called /cpuset_test_tree, and some sub-cpusets
below it. I marked it 'notify_on_release' and then removed all
tasks from it, and then removed the child cpusets that it had.
Removing that last child cpuset presumably triggered the above
callout to /sbin/cpuset_release_agent, which called rmdir.

That wait address (from /proc/4994/stat) in hex is a0000001000f1060,
and my System.map has the two lines:

a0000001000f1040 T container_lock
a0000001000f1360 T container_manage_unlock

So it is wedged in container_lock.

I have subsequently also wedged an 'ls' command trying to scan this
/dev/cpuset directory, waiting in the kernel routine vfs_readdir
(not surprising, given that I'm in the middle of doing a rmdir on
that directory.)

If you don't immediately see the problem, I can go back and get a
kernel stack trace or whatever else you need.

This lockup occurred the first, and thus far only, time that I tried
to use notify_on_release to rmdir a cpuset. So I presume it is an
easy failure for me to reproduce.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj[at]sgi.com> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


menage at google

Nov 30, 2006, 12:01 AM

Post #3 of 4 (301 views)
Permalink
Re: [PATCH 0/7] Generic Process Containers (+ ResGroups/BeanCounters) [In reply to]

On 11/29/06, Paul Jackson <pj[at]sgi.com> wrote:
> config PROC_PID_CPUSET
> bool "Include legacy /proc/<pid>/cpuset file"
> depends on CPUSETS
> + default y if CPUSETS
>

Sounds very reasonable.

> 2) I wedged the kernel on the container_lock, doing a removal of a cpuset
> using notify_on_release.
>

I guess I've not really tested doing interesting things from the
notify_on_release code, just checked that it successfully executed a
simple command. I'll look into it.

Thanks for the feedback.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


menage at google

Nov 30, 2006, 12:51 AM

Post #4 of 4 (309 views)
Permalink
Re: [PATCH 0/7] Generic Process Containers (+ ResGroups/BeanCounters) [In reply to]

On 11/29/06, Paul Jackson <pj[at]sgi.com> wrote:
>
> 2) I wedged the kernel on the container_lock, doing a removal of a cpuset
> using notify_on_release.

I couldn't reproduce this, with a /sbin/cpuset_release_agent that does:

#!/bin/bash
logger cpuset_release_agent $1
rmdir /dev/cpuset/$1

and running the commands:

while true; do
mkdir -p /dev/cpuset/bar/foo
echo 1 > /dev/cpuset/bar/notify_on_release
rmdir /dev/cpuset/bar/foo
usleep 1000
done

Is it actually reproducible for you? If so, could you get a fuller backtrace?

Thanks,

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo[at]vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.