Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

Re: cgroup: status-quo and userland efforts

 

 

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


tj at kernel

Jun 27, 2013, 10:01 PM

Post #26 of 54 (2456 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

Hello, Mike.

On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> I always thought that was a very cool feature, mkdir+echo, poof done.
> Now maybe that interface is suboptimal for serious usage, but it makes
> the things usable via dirt simple scripts, very flexible, nice.

Oh, that in itself is not bad. I mean, if you're root, it's pretty
easy to play with and that part is fine. But combined with the
hierarchical nature of cgroup and file permissions, it encourages
people to "deligate" subdirectories to less previledged domains, which
in turn leads to normal binaries to manipulate them directly, which is
where the horror begins. We end up exposing control knobs which are
tightly coupled to kernel implementation details right into lay
binaries and scripts directly used by end users.

I think this is the first time this happened, which is probably why
nobody really noticed the mess earlier.

Anyways, if you're root, you can keep doing whatever you want. You
could be stepping on the centralized agent's toes a bit and vice-versa
but I don't think that's gonna be disastrous. What I'm trying to
stamp out is direct usages from !root domains and !system-management
binaries / scripts. They absolutely have to go. There's no question
about it and I'll take totalitarian userland agent anyday over the
current mess.

Eventually, I think we'll be able to reach an equilibrium where most
things are reasonable and we'll be exploring the acceptable limits of
flexibility again, but right now, please bear with the brutality.
We're way over the line and I can't see a way back which isn't gonna
sting a bit. I'm and will keep trying to make it as painless as
possible.

Thanks!

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


bitbucket at online

Jun 27, 2013, 11:00 PM

Post #27 of 54 (2457 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

On Thu, 2013-06-27 at 22:01 -0700, Tejun Heo wrote:

> Anyways, if you're root, you can keep doing whatever you want. You
> could be stepping on the centralized agent's toes a bit and vice-versa

Keep on truckn' sounds good, that vice-versa toe stomping not so good,
but yeah, until systemd or ilk grows the ability to shut me down, I
shouldn't feel any burning need to introduce it to my machete.

> but I don't think that's gonna be disastrous. What I'm trying to
> stamp out is direct usages from !root domains and !system-management
> binaries / scripts. They absolutely have to go. There's no question
> about it and I'll take totalitarian userland agent anyday over the
> current mess.

I get some of the why.. and yeah, it's the dirt simple usage that I care
about most, not the big hairy problem cases you're trying to address.

> Eventually, I think we'll be able to reach an equilibrium where most
> things are reasonable and we'll be exploring the acceptable limits of
> flexibility again, but right now, please bear with the brutality.
> We're way over the line and I can't see a way back which isn't gonna
> sting a bit. I'm and will keep trying to make it as painless as
> possible.

Keep on driving, and thanks for listening. Aaaooooo ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


berrange at redhat

Jun 28, 2013, 2:09 AM

Post #28 of 54 (2462 views)
Permalink
Re: [Workman-devel] cgroup: status-quo and userland efforts [In reply to]

On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
> FWIW, the code is too embarassing yet to see daylight, but I'm playing
> with a very lowlevel cgroup manager which supports nesting itself.
> Access in this POC is low-level ("set freezer.state to THAWED for cgroup
> /c1/c2", "Create /c3"), but the key feature is that it can run in two
> modes - native mode in which it uses cgroupfs, and child mode where it
> talks to a parent manager to make the changes.
>
> So then the idea would be that userspace (like libvirt and lxc) would
> talk over /dev/cgroup to its manager. Userspace inside a container
> (which can't actually mount cgroups itself) would talk to its own
> manager which is talking over a passed-in socket to the host manager,
> which in turn runs natively (uses cgroupfs, and nests "create /c1" under
> the requestor's cgroup).
>
> At some point (probably soon) we might want to talk about a standard API
> for these things. However I think it will have to come in the form of
> a standard library, which knows to either send requests over dbus to
> systemd, or over /dev/cgroup sock to the manager.

Are you also planning to actually write a new cgroup parent manager
daemon too ? Currently my plan for libvirt is to just talk directly
to systemd's new DBus APIs for all management of cgroups, and then
fall back to writing to cgroupfs directly for cases where systemd
is not around. Having a library to abstract these two possible
alternatives isn't all that compelling unless we think there will
be multiple cgroups manager daemons. I've been somewhat assuming that
even Ubuntu will eventually see the benefits & switch to systemd,
then the issue of multiple manager daemons wouldn't really exist.

Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mhocko at suse

Jun 28, 2013, 8:05 AM

Post #29 of 54 (2457 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

On Thu 27-06-13 22:01:38, Tejun Heo wrote:
> Hello, Mike.
>
> On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> > I always thought that was a very cool feature, mkdir+echo, poof done.
> > Now maybe that interface is suboptimal for serious usage, but it makes
> > the things usable via dirt simple scripts, very flexible, nice.
>
> Oh, that in itself is not bad. I mean, if you're root, it's pretty
> easy to play with and that part is fine. But combined with the
> hierarchical nature of cgroup and file permissions, it encourages
> people to "deligate" subdirectories to less previledged domains,

OK, this really depends on what you expose to non-root users. I have
seen use cases where admin prepares top-level which is root-only but
it allows creating sub-groups which are under _full_ control of the
subdomain. This worked nicely for memcg for example because hard limit,
oom handling and other knobs are hierarchical so the subdomain cannot
overwrite what admin has said.

> which
> in turn leads to normal binaries to manipulate them directly, which is
> where the horror begins. We end up exposing control knobs which are
> tightly coupled to kernel implementation details right into lay
> binaries and scripts directly used by end users.
>
> I think this is the first time this happened, which is probably why
> nobody really noticed the mess earlier.
>
> Anyways, if you're root, you can keep doing whatever you want.

OK, so libcgroup's rules daemon will still work and place my tasks in
appropriate cgroups?

This is not quite in par with "libcgroup is dead and others have to
migrate to systemd as well" statements from the link posted earlier.
I really do not think that _any_ central agent will understand my
requirements and needs so I need a way to talk to cgroupfs somehow - I
have used libcgroups so far but touching cgroupfs is quite convinient
as well.

And the systemd, with its history of eating projects and not caring much
about their previous users who are not willing to jump in to the systemd
car, doesn't sound like a good place where to place the new interface to
me.

[...]
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


serge.hallyn at ubuntu

Jun 28, 2013, 8:53 AM

Post #30 of 54 (2455 views)
Permalink
Re: [Workman-devel] cgroup: status-quo and userland efforts [In reply to]

Quoting Daniel P. Berrange (berrange [at] redhat):
> On Thu, Jun 27, 2013 at 08:22:06AM -0500, Serge Hallyn wrote:
> > FWIW, the code is too embarassing yet to see daylight, but I'm playing
> > with a very lowlevel cgroup manager which supports nesting itself.
> > Access in this POC is low-level ("set freezer.state to THAWED for cgroup
> > /c1/c2", "Create /c3"), but the key feature is that it can run in two
> > modes - native mode in which it uses cgroupfs, and child mode where it
> > talks to a parent manager to make the changes.
> >
> > So then the idea would be that userspace (like libvirt and lxc) would
> > talk over /dev/cgroup to its manager. Userspace inside a container
> > (which can't actually mount cgroups itself) would talk to its own
> > manager which is talking over a passed-in socket to the host manager,
> > which in turn runs natively (uses cgroupfs, and nests "create /c1" under
> > the requestor's cgroup).
> >
> > At some point (probably soon) we might want to talk about a standard API
> > for these things. However I think it will have to come in the form of
> > a standard library, which knows to either send requests over dbus to
> > systemd, or over /dev/cgroup sock to the manager.
>
> Are you also planning to actually write a new cgroup parent manager
> daemon too ? Currently my plan for libvirt is to just talk directly

I'm toying with the idea, yes. (Right now my toy runs in either native
mode, using cgroupfs, or child mode, talking to a parent manager) I'd
love if someone else does it, but it needs to be done.

As I've said elsewhere in the thread, I see 2 problems to be addressed:

1. The ability to nest the cgroup manager daemons, so that a daemon
running in a container can talk to a daemon running on the host. This
is the problem my current toy is aiming to address. But the API it
exports is just a thin layer over cgroupfs.

2. Abstract away the kernel/cgroupfs details so that userspace can
explain its cgroup needs generically. This is IIUC what systemd is
addressing with slices and scopes.

(2) is where I'd really like to have a well thought out, community
designed API that everyone can agree on, and it might be worth getting
together (with Tejun) at plumbers or something to lay something out.

In the end, something like libvirt or lxc should not need to care
what is running underneat it. It should be able to make its requests
the same way regardless of whether it running in fedora or ubuntu,
and whether it is running on the host or in a tightly bound container.
That's my goal anyway :)

> to systemd's new DBus APIs for all management of cgroups, and then
> fall back to writing to cgroupfs directly for cases where systemd
> is not around. Having a library to abstract these two possible
> alternatives isn't all that compelling unless we think there will
> be multiple cgroups manager daemons. I've been somewhat assuming that
> even Ubuntu will eventually see the benefits & switch to systemd,

So far I've seen no indication of that :)

If the systemd code to manage slices could be made separately
compileable as a standalone library or daemon, then I'd advocate
using that. But I don't see a lot of incentive for systemd to do
that, so I'd feel like a heel even asking.

> then the issue of multiple manager daemons wouldn't really exist.

True. But I'm running under the assumption that Ubuntu will stick with
upstart, and therefore yes I'll need a separate (perhaps pair of)
management daemons.

Even if we were to switch to systemd, I'd like the API for userspace
programs to configure and use cgroups to be as generic as possible,
so that anyone who wanted to write their own daemon could do so.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


vgoyal at redhat

Jun 28, 2013, 11:01 AM

Post #31 of 54 (2463 views)
Permalink
Re: [Workman-devel] cgroup: status-quo and userland efforts [In reply to]

On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> On Thu 27-06-13 22:01:38, Tejun Heo wrote:
> > Hello, Mike.
> >
> > On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> > > I always thought that was a very cool feature, mkdir+echo, poof done.
> > > Now maybe that interface is suboptimal for serious usage, but it makes
> > > the things usable via dirt simple scripts, very flexible, nice.
> >
> > Oh, that in itself is not bad. I mean, if you're root, it's pretty
> > easy to play with and that part is fine. But combined with the
> > hierarchical nature of cgroup and file permissions, it encourages
> > people to "deligate" subdirectories to less previledged domains,
>
> OK, this really depends on what you expose to non-root users. I have
> seen use cases where admin prepares top-level which is root-only but
> it allows creating sub-groups which are under _full_ control of the
> subdomain. This worked nicely for memcg for example because hard limit,
> oom handling and other knobs are hierarchical so the subdomain cannot
> overwrite what admin has said.
>
> > which
> > in turn leads to normal binaries to manipulate them directly, which is
> > where the horror begins. We end up exposing control knobs which are
> > tightly coupled to kernel implementation details right into lay
> > binaries and scripts directly used by end users.
> >
> > I think this is the first time this happened, which is probably why
> > nobody really noticed the mess earlier.
> >
> > Anyways, if you're root, you can keep doing whatever you want.
>
> OK, so libcgroup's rules daemon will still work and place my tasks in
> appropriate cgroups?

Do you use that daemon in practice? For user session logins, I think
systemd has plans to put user sessions in a cgroup (kind of making
pam_cgroup redundant).

Other functionality rulesengined was providing moving tasks automatically
in a cgroup based on executable name. I think that was racy and not
many people had liked it.

IIUC, systemd can't disable access to cgroupfs from other utilities.
So most likely rulesengined should contine to work. But having both
systemd and libcgroup might not make much sense though.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tj at kernel

Jun 28, 2013, 11:30 AM

Post #32 of 54 (2455 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

Hello, Michal.

On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> OK, this really depends on what you expose to non-root users. I have
> seen use cases where admin prepares top-level which is root-only but
> it allows creating sub-groups which are under _full_ control of the
> subdomain. This worked nicely for memcg for example because hard limit,
> oom handling and other knobs are hierarchical so the subdomain cannot
> overwrite what admin has said.

Some knobs are safer than others and memcg probably has it easy as it
doesn't implement proportional control. But, even then, there's a
huge chasm between cgroup knobs and proper kernel API visible to
normal programs. Just imagine exposing memcg features by extending
rlimits. It'll take months if not a couple years ironing out the API
details and going through review process, and rightfully so, these
things, once published and made widely available, can't be taken back.
Now compare that to how we decide what knobs to expose in cgroup. I
mean, you even recently suggested flipping the default polarity of
soft limit knob.

cgroup's interface standard is very low. It's probably a notch higher
than boot params but about at the same level as sysctl knobs. It
isn't necessarily a bad thing as it allows us to rapidly explore
various options and expose useable things in a very agile manner, but
we should be very aware of how widely the interface is exposed;
otherwise, we'd be exposing features and leaking kernel implementation
details directly into userland programs without going through proper
review process or buliding consensus, which, in the long term, is
gonna be much worse than not having the feature exposed at all.

"It works for special cases XXX and YYY" is a very poor and extremely
short-sighted argument when the whole approach is breaching the very
fundamentals of kernel API conventions.

In addition, I really don't think cgroup is the right interface to
directly expose to individual programs. As a management thing, it
does make some sense but kernel API already has its, at times ancient
but, generally working hierarchy and inheritance rules and conventions
and primitive resource control contructs - nice, ionice, rlimits and
so on. If exposing cgroup-level resource control directly to
individual applications proves to be beneficial enough, what we should
do is extending those things. The backend sure can be supported by
cgroups but this mkdiring and echoing things with separate hierarchy
from the usual process hierarchy isn't something which should be
visible to individual applications.

Currently, I'm not convinced that this is something which should be
exposed to individual applications, but I sure can be wrong. But,
right now, let's first get the existing part settled. We can worry
about the rest later.

Also, in light of the rather sneaky subversion happened with cgroup
filesystem interface, I wonder whether we need to add some sort of
generic warning mechanism which warns when permissions of pseudo file
systems like cgroupfs are delegated to lesser security domains. In
itself, it could be harmless but it can serves as a useful beacon.
Not sure to what extent or how tho.

> OK, so libcgroup's rules daemon will still work and place my tasks in
> appropriate cgroups?

You have two competing managers of the same hierarchy. There are ways
to make them not interfere with each other too much but ultimately
it's gonna be something clunky. That said, libcgroup itself is pretty
clunky, so maybe you'll be okay with it. I don't know.

> This is not quite in par with "libcgroup is dead and others have to
> migrate to systemd as well" statements from the link posted earlier.
> I really do not think that _any_ central agent will understand my
> requirements and needs so I need a way to talk to cgroupfs somehow - I
> have used libcgroups so far but touching cgroupfs is quite convinient
> as well.

As a developer who knows what's going on, I don't think it'd be too
difficult to meddle with things manually with or without the central
manager. It'll complain that someone else is meddling with the cgroup
hierarchy and some functionalities might not work as expected, but I
don't think it'll lock you out.

At the same time, while us, the developers, having the level of
latitude required to do our work is necessary, that shouldn't be the
overruling focal point of the design of the whole system. It's
something to be used and supporting the actual use cases should be the
priority. I'm not saying developer convenience is not important but
that it's not the only thing which matters. The way I see it, cgroup
has basically been a playground for devs going wild without too much,
if any, thought on how it'll actually be useable and useful to wider
audience, so let's please adjust our priorities a bit.

And, no, I don't believe that the use cases are so wildly different
that we can't have a capable enough central manager. That's usually a
symptom of not understanding the problem space well enough and how one
ends up with mess like e.g. grub2 configuration. There sure are and
will be outliers but it should be possible to come up with something
which can serve most of the use cases reasonably well, and right now,
I believe that should be the focus.

> And the systemd, with its history of eating projects and not caring much
> about their previous users who are not willing to jump in to the systemd
> car, doesn't sound like a good place where to place the new interface to
> me.

That part I don't know. I really don't care whether it's systemd or
something else but it sure seems there are people who dislike it with
passion. To me, it seems rather silly but to each his/her own. Maybe
ubuntu will come up with their own manager paired with upstart and
people can use that one instead? Who knows.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


thockin at hockin

Jun 28, 2013, 11:44 AM

Post #33 of 54 (2464 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

On Thu, Jun 27, 2013 at 2:04 PM, Tejun Heo <tj [at] kernel> wrote:
> Hello,
>
> On Thu, Jun 27, 2013 at 01:46:18PM -0700, Tim Hockin wrote:
>> So what you're saying is that you don't care that this new thing is
>> less capable than the old thing, despite it having real impact.
>
> Sort of. I'm saying, at least up until now, moving away from
> orthogonal hierarchy support seems to be the right trade-off. It all
> depends on how you measure how much things are simplified and how
> heavy the "real impacts" are. It's not like these things can be
> determined white and black. Given the current situation, I think it's
> the right call.

I totally understand where you're coming from - trying to get back to
a stable feature set. But it sucks to be on the losing end of that
battle - you're cutting things that REALLY matter to us, and without a
really viable alternative. So we'll keep fighting.

>> If controller C is enabled at level X but disabled at level X/Y, does
>> that mean that X/Y uses the limits set in X? How about X/Y/Z?
>
> Y and Y/Z wouldn't make any difference. Tasks belonging to them would
> behave as if they belong to X as far as C is concerened.

OK, that *sounds* sane. It doesn't solve all our problems, but it
alleviates some of them.

>> So take away some of the flexibility that has minimal impact and
>> maximum return. Splitting threads across cgroups - we use it, but we
>> could get off that. Force all-or-nothing joining of an aggregate
>
> Please do so.

Splitting threads is sort of important for some cgroups, like CPU. I
wonder if pjt is paying attention to this thread.

>> construct (a container vs N cgroups).
>>
>> But perform surgery with a scalpel, not a hatchet.
>
> As anything else, it's drawing a line in a continuous spectrum of
> grey. Right now, given that maintaining multiple orthogonal
> hierarchies while introducing a proper concept of resource container
> involves addition of completely new constructs and complexity, I don't
> think that's a good option. If there are problems which can't be
> resolved / worked around in a reasonable manner, please bring them up
> along with their contexts. Let's examine them and see whether there
> are other ways to accomodate them.

You're arguing that the abstraction you want is that of a "container"
but that it's easier to remove options than to actually build a better
API.

I think this is wrong. Take the opportunity to define the RIGHT
interface that you WANT - a container. Implement it in terms of
cgroups (and maybe other stuff!). Make that API so compelling that
people want to use it, and your war of attrition on direct cgroup
madness will be won, but with net progress rather than regress.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


thockin at hockin

Jun 28, 2013, 11:53 AM

Post #34 of 54 (2458 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

On Fri, Jun 28, 2013 at 8:05 AM, Michal Hocko <mhocko [at] suse> wrote:
> On Thu 27-06-13 22:01:38, Tejun Heo wrote:

>> Oh, that in itself is not bad. I mean, if you're root, it's pretty
>> easy to play with and that part is fine. But combined with the
>> hierarchical nature of cgroup and file permissions, it encourages
>> people to "deligate" subdirectories to less previledged domains,
>
> OK, this really depends on what you expose to non-root users. I have
> seen use cases where admin prepares top-level which is root-only but
> it allows creating sub-groups which are under _full_ control of the
> subdomain. This worked nicely for memcg for example because hard limit,
> oom handling and other knobs are hierarchical so the subdomain cannot
> overwrite what admin has said.

bingo

> And the systemd, with its history of eating projects and not caring much
> about their previous users who are not willing to jump in to the systemd
> car, doesn't sound like a good place where to place the new interface to
> me.

+1

If systemd is the only upstream implementation of this single-agent
idea, we will have to invent our own, and continue to diverge rather
than converge. I think that, if we are going to pursue this model of
a single-agent, we should make a kick-ass implementation that is
flexible and scalable, and full-featured enough to not require
divergence at the lowest layer of the stack. Then build systemd on
top of that. Let systemd offer more features and policies and
"semantic" APIs.

We will build our own semantic APIs that are, necessarily, different
from systemd. But we can all use the same low-level mechanism.

Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


thockin at hockin

Jun 28, 2013, 11:58 AM

Post #35 of 54 (2456 views)
Permalink
Re: [Workman-devel] cgroup: status-quo and userland efforts [In reply to]

On Fri, Jun 28, 2013 at 8:53 AM, Serge Hallyn <serge.hallyn [at] ubuntu> wrote:
> Quoting Daniel P. Berrange (berrange [at] redhat):

>> Are you also planning to actually write a new cgroup parent manager
>> daemon too ? Currently my plan for libvirt is to just talk directly
>
> I'm toying with the idea, yes. (Right now my toy runs in either native
> mode, using cgroupfs, or child mode, talking to a parent manager) I'd
> love if someone else does it, but it needs to be done.
>
> As I've said elsewhere in the thread, I see 2 problems to be addressed:
>
> 1. The ability to nest the cgroup manager daemons, so that a daemon
> running in a container can talk to a daemon running on the host. This
> is the problem my current toy is aiming to address. But the API it
> exports is just a thin layer over cgroupfs.
>
> 2. Abstract away the kernel/cgroupfs details so that userspace can
> explain its cgroup needs generically. This is IIUC what systemd is
> addressing with slices and scopes.
>
> (2) is where I'd really like to have a well thought out, community
> designed API that everyone can agree on, and it might be worth getting
> together (with Tejun) at plumbers or something to lay something out.

We're also working on (2) (well, we HAVE it, but we're dis-integrating
it so we can hopefully publish more widely). But our (2) depends on
direct cgroupfs access. If that is to change, we need a really robust
(1). It's OK (desireable, in fact) that (1) be a very thin layer of
abstraction.

> In the end, something like libvirt or lxc should not need to care
> what is running underneat it. It should be able to make its requests
> the same way regardless of whether it running in fedora or ubuntu,
> and whether it is running on the host or in a tightly bound container.
> That's my goal anyway :)
>
>> to systemd's new DBus APIs for all management of cgroups, and then
>> fall back to writing to cgroupfs directly for cases where systemd
>> is not around. Having a library to abstract these two possible
>> alternatives isn't all that compelling unless we think there will
>> be multiple cgroups manager daemons. I've been somewhat assuming that
>> even Ubuntu will eventually see the benefits & switch to systemd,
>
> So far I've seen no indication of that :)
>
> If the systemd code to manage slices could be made separately
> compileable as a standalone library or daemon, then I'd advocate
> using that. But I don't see a lot of incentive for systemd to do
> that, so I'd feel like a heel even asking.

I want to say "let the best API win", but I know that systemd is a
giant katamari ball, and it's absorbing subsystems so it may win by
default. That isn't going to stop us from trying to do what we do,
and share that with the world.

>> then the issue of multiple manager daemons wouldn't really exist.
>
> True. But I'm running under the assumption that Ubuntu will stick with
> upstart, and therefore yes I'll need a separate (perhaps pair of)
> management daemons.
>
> Even if we were to switch to systemd, I'd like the API for userspace
> programs to configure and use cgroups to be as generic as possible,
> so that anyone who wanted to write their own daemon could do so.
>
> -serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


luto at amacapital

Jun 28, 2013, 12:18 PM

Post #36 of 54 (2455 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

On 06/27/2013 11:01 AM, Tejun Heo wrote:
> AFAICS, having a userland agent which has overall knowledge of the
> hierarchy and enforcesf structure and limiations is a requirement to
> make cgroup generally useable and useful. For systemd based systems,
> systemd serving that role isn't too crazy. It's sure gonna have
> teeting issues at the beginning but it has all the necessary
> information to manage workloads on the system.
>
> A valid issue is interoperability between systemd and non-systemd
> systems. I don't have an immediately good answer for that. I wrote
> in another reply but making cgroup generally available is a pretty new
> effort and we're still in the process of figuring out what the right
> constructs and abstractions are. Hopefully, we'll be able to reach a
> common set of abstractions to base things on top in itme.
>

The systemd stuff will break my code, too (although the single hierarchy
by itself won't, I think). I think that the kernel should make whatever
simple changes are needed so that systemd can function without using
cgroups at all. That way users of a different cgroup scheme can turn
off systemd's.

Here was my proposal, which hasn't gotten a clear reply:

http://article.gmane.org/gmane.comp.sysutils.systemd.devel/11424

I've already sent a patch to make /proc/<pid>/task/<tid>/children
available regardless of configuration.

--Andy


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


serge.hallyn at ubuntu

Jun 28, 2013, 12:36 PM

Post #37 of 54 (2456 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

Quoting Andy Lutomirski (luto [at] amacapital):
> On 06/27/2013 11:01 AM, Tejun Heo wrote:
> > AFAICS, having a userland agent which has overall knowledge of the
> > hierarchy and enforcesf structure and limiations is a requirement to
> > make cgroup generally useable and useful. For systemd based systems,
> > systemd serving that role isn't too crazy. It's sure gonna have
> > teeting issues at the beginning but it has all the necessary
> > information to manage workloads on the system.
> >
> > A valid issue is interoperability between systemd and non-systemd
> > systems. I don't have an immediately good answer for that. I wrote
> > in another reply but making cgroup generally available is a pretty new
> > effort and we're still in the process of figuring out what the right
> > constructs and abstractions are. Hopefully, we'll be able to reach a
> > common set of abstractions to base things on top in itme.
> >
>
> The systemd stuff will break my code, too (although the single hierarchy
> by itself won't, I think). I think that the kernel should make whatever
> simple changes are needed so that systemd can function without using
> cgroups at all. That way users of a different cgroup scheme can turn
> off systemd's.
>
> Here was my proposal, which hasn't gotten a clear reply:
>
> http://article.gmane.org/gmane.comp.sysutils.systemd.devel/11424

Neat. I like that proposal.

> I've already sent a patch to make /proc/<pid>/task/<tid>/children
> available regardless of configuration.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


berrange at redhat

Jun 28, 2013, 12:59 PM

Post #38 of 54 (2459 views)
Permalink
Re: [Workman-devel] cgroup: status-quo and userland efforts [In reply to]

On Fri, Jun 28, 2013 at 02:01:55PM -0400, Vivek Goyal wrote:
> On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> > On Thu 27-06-13 22:01:38, Tejun Heo wrote:
> > > Hello, Mike.
> > >
> > > On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> > > > I always thought that was a very cool feature, mkdir+echo, poof done.
> > > > Now maybe that interface is suboptimal for serious usage, but it makes
> > > > the things usable via dirt simple scripts, very flexible, nice.
> > >
> > > Oh, that in itself is not bad. I mean, if you're root, it's pretty
> > > easy to play with and that part is fine. But combined with the
> > > hierarchical nature of cgroup and file permissions, it encourages
> > > people to "deligate" subdirectories to less previledged domains,
> >
> > OK, this really depends on what you expose to non-root users. I have
> > seen use cases where admin prepares top-level which is root-only but
> > it allows creating sub-groups which are under _full_ control of the
> > subdomain. This worked nicely for memcg for example because hard limit,
> > oom handling and other knobs are hierarchical so the subdomain cannot
> > overwrite what admin has said.
> >
> > > which
> > > in turn leads to normal binaries to manipulate them directly, which is
> > > where the horror begins. We end up exposing control knobs which are
> > > tightly coupled to kernel implementation details right into lay
> > > binaries and scripts directly used by end users.
> > >
> > > I think this is the first time this happened, which is probably why
> > > nobody really noticed the mess earlier.
> > >
> > > Anyways, if you're root, you can keep doing whatever you want.
> >
> > OK, so libcgroup's rules daemon will still work and place my tasks in
> > appropriate cgroups?
>
> Do you use that daemon in practice? For user session logins, I think
> systemd has plans to put user sessions in a cgroup (kind of making
> pam_cgroup redundant).
>
> Other functionality rulesengined was providing moving tasks automatically
> in a cgroup based on executable name. I think that was racy and not
> many people had liked it.

Regardless of the changes being proposed, IMHO, the cgrulesd should
never be used. It is just outright dangerous for a daemon to be
arbitrarily re-arranging what cgroups a process is placed in without
the applications being aware of it. It can only be safely used in a
scenario where cgroups are exclusively used by the administrator,
and never used by applications for their own needs.

> IIUC, systemd can't disable access to cgroupfs from other utilities.

The kernel can exposed a knob that would allow systemd to lock that
down

> So most likely rulesengined should contine to work. But having both
> systemd and libcgroup might not make much sense though.

Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


serge.hallyn at ubuntu

Jun 28, 2013, 3:40 PM

Post #39 of 54 (2461 views)
Permalink
Re: [Workman-devel] cgroup: status-quo and userland efforts [In reply to]

Quoting Daniel P. Berrange (berrange [at] redhat):
> On Fri, Jun 28, 2013 at 02:01:55PM -0400, Vivek Goyal wrote:
> > On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
> > > On Thu 27-06-13 22:01:38, Tejun Heo wrote:
> > > > Hello, Mike.
> > > >
> > > > On Fri, Jun 28, 2013 at 06:49:10AM +0200, Mike Galbraith wrote:
> > > > > I always thought that was a very cool feature, mkdir+echo, poof done.
> > > > > Now maybe that interface is suboptimal for serious usage, but it makes
> > > > > the things usable via dirt simple scripts, very flexible, nice.
> > > >
> > > > Oh, that in itself is not bad. I mean, if you're root, it's pretty
> > > > easy to play with and that part is fine. But combined with the
> > > > hierarchical nature of cgroup and file permissions, it encourages
> > > > people to "deligate" subdirectories to less previledged domains,
> > >
> > > OK, this really depends on what you expose to non-root users. I have
> > > seen use cases where admin prepares top-level which is root-only but
> > > it allows creating sub-groups which are under _full_ control of the
> > > subdomain. This worked nicely for memcg for example because hard limit,
> > > oom handling and other knobs are hierarchical so the subdomain cannot
> > > overwrite what admin has said.
> > >
> > > > which
> > > > in turn leads to normal binaries to manipulate them directly, which is
> > > > where the horror begins. We end up exposing control knobs which are
> > > > tightly coupled to kernel implementation details right into lay
> > > > binaries and scripts directly used by end users.
> > > >
> > > > I think this is the first time this happened, which is probably why
> > > > nobody really noticed the mess earlier.
> > > >
> > > > Anyways, if you're root, you can keep doing whatever you want.
> > >
> > > OK, so libcgroup's rules daemon will still work and place my tasks in
> > > appropriate cgroups?
> >
> > Do you use that daemon in practice? For user session logins, I think
> > systemd has plans to put user sessions in a cgroup (kind of making
> > pam_cgroup redundant).
> >
> > Other functionality rulesengined was providing moving tasks automatically
> > in a cgroup based on executable name. I think that was racy and not
> > many people had liked it.
>
> Regardless of the changes being proposed, IMHO, the cgrulesd should
> never be used. It is just outright dangerous for a daemon to be
> arbitrarily re-arranging what cgroups a process is placed in without
> the applications being aware of it. It can only be safely used in a
> scenario where cgroups are exclusively used by the administrator,
> and never used by applications for their own needs.

Even then it's not safe, since if the program quickly forks or clones a
few times, you can end up with some of the tasks being reclassified
and some not.

> > IIUC, systemd can't disable access to cgroupfs from other utilities.
>
> The kernel can exposed a knob that would allow systemd to lock that
> down

Gah - why would you give him that idea? :)

But yes, I'd sort of assume that was coming, eventually.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tj at kernel

Jun 28, 2013, 3:43 PM

Post #40 of 54 (2460 views)
Permalink
Re: [Workman-devel] cgroup: status-quo and userland efforts [In reply to]

On Fri, Jun 28, 2013 at 05:40:53PM -0500, Serge Hallyn wrote:
> > The kernel can exposed a knob that would allow systemd to lock that
> > down
>
> Gah - why would you give him that idea? :)

That's one of the ideas I had from the beginning.

> But yes, I'd sort of assume that was coming, eventually.

But I think we'll probably settle with a mechanism to find out whether
someone else is touching the hierarchy, which will be generally useful
for other consumers of cgroup too.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


lpoetter at redhat

Jun 28, 2013, 6:48 PM

Post #41 of 54 (2454 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

On 28.06.2013 20:53, Tim Hockin wrote:

> a single-agent, we should make a kick-ass implementation that is
> flexible and scalable, and full-featured enough to not require
> divergence at the lowest layer of the stack. Then build systemd on
> top of that. Let systemd offer more features and policies and
> "semantic" APIs.

Well, what if systemd is already kick-ass? I mean, if you have a problem
with systemd, then that's your own problem, but I really don't think why
I should bother?

I for sure am not going to make the PID 1 a client of another daemon.
That's just wrong. If you have a daemon that is both conceptually the
manager of another service and the client of that other service, then
that's bad design and you will easily run into deadlocks and such. Just
think about it: if you have some external daemon for managing cgroups,
and you need cgroups for running external daemons, how are you going to
start the external daemon for managing cgroups? Sure, you can hack
around this, make that daemon special, and magic, and stuff -- or you
can just not do such nonsense. There's no reason to repeat the fuckup
that cgroup became in kernelspace a second time, but this time in
userspace, with multiple manager daemons all with different and slightly
incompatible definitions what a unit to manage actualy is...

We want to run fewer, simpler things on our systems, we want to reuse as
much of the code as we can. You don't achieve that by running yet
another daemon that does worse what systemd can anyway do simpler,
easier and better.

The least you could grant us is to have a look at the final APIs we will
have to offer before you already imply that systemd cannot be a valid
implementation of any API people could ever agree on.

Lennart
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


thockin at hockin

Jun 28, 2013, 8:05 PM

Post #42 of 54 (2459 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

Come on, now, Lennart. You put a lot of words in my mouth.

On Fri, Jun 28, 2013 at 6:48 PM, Lennart Poettering <lpoetter [at] redhat> wrote:
> On 28.06.2013 20:53, Tim Hockin wrote:
>
>> a single-agent, we should make a kick-ass implementation that is
>> flexible and scalable, and full-featured enough to not require
>> divergence at the lowest layer of the stack. Then build systemd on
>> top of that. Let systemd offer more features and policies and
>> "semantic" APIs.
>
>
> Well, what if systemd is already kick-ass? I mean, if you have a problem
> with systemd, then that's your own problem, but I really don't think why I
> should bother?

I didn't say it wasn't. I said that we can build a common substrate
that systemd can build on *and* non-systemd systems can use *and*
Google can participate in.

> I for sure am not going to make the PID 1 a client of another daemon. That's
> just wrong. If you have a daemon that is both conceptually the manager of
> another service and the client of that other service, then that's bad design
> and you will easily run into deadlocks and such. Just think about it: if you
> have some external daemon for managing cgroups, and you need cgroups for
> running external daemons, how are you going to start the external daemon for
> managing cgroups? Sure, you can hack around this, make that daemon special,
> and magic, and stuff -- or you can just not do such nonsense. There's no
> reason to repeat the fuckup that cgroup became in kernelspace a second time,
> but this time in userspace, with multiple manager daemons all with different
> and slightly incompatible definitions what a unit to manage actualy is...

I forgot about the tautology of systemd. systemd is monolithic.
Therefore it can not have any external dependencies. Therefore it
must absorb anything it depends on. Therefore systemd continues to
grow in size and scope. Up next: systemd manages your X sessions!

But that's not my point. It seems pretty easy to make this cgroup
management (in "native mode") a library that can have either a thin
veneer of a main() function, while also being usable by systemd. The
point is to solve all of the problems ONCE. I'm trying to make the
case that systemd itself should be focusing on features and policies
and awesome APIs.

> We want to run fewer, simpler things on our systems, we want to reuse as

Fewer and simpler are not compatible, unless you are losing
functionality. Systemd is fewer, but NOT simpler.

> much of the code as we can. You don't achieve that by running yet another
> daemon that does worse what systemd can anyway do simpler, easier and
> better.

Considering this is all hypothetical, I find this to be a funny
debate. My hypothetical idea is better than your hypothetical idea.

> The least you could grant us is to have a look at the final APIs we will
> have to offer before you already imply that systemd cannot be a valid
> implementation of any API people could ever agree on.

Whoah, don't get defensive. I said nothing of the sort. The fact of
the matter is that we do not run systemd, at least in part because of
the monolithic nature. That's unlikely to change in this timescale.
What I said was that it would be a shame if we had to invent our own
low-level cgroup daemon just because the "upstream" daemons was too
tightly coupled with systemd.

I think we have a lot of experience to offer to this project, and a
vested interest in seeing it done well. But if it is purely
targetting systemd, we have little incentive to devote resources to
it.

Please note that I am strictly talking about the lowest layer of the
API. Just the thing that guards cgroupfs against mere mortals. The
higher layers - where abstractions exist, that are actually USEFUL to
end users - are not really in scope right now. We already have our
own higher level APIs.

This is supposed to be collaborative, not combative.

Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tj at kernel

Jun 29, 2013, 9:40 AM

Post #43 of 54 (2439 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

Hello, Tim.

On Fri, Jun 28, 2013 at 11:44:23AM -0700, Tim Hockin wrote:
> I totally understand where you're coming from - trying to get back to
> a stable feature set. But it sucks to be on the losing end of that

Oh, it has been sucking and will continue to suck like hell for me too
for the foreseeable future. Trust me, this side ain't any greener.

> battle - you're cutting things that REALLY matter to us, and without a
> really viable alternative. So we'll keep fighting.

Yeah, that's understandable. More on this later.

> Splitting threads is sort of important for some cgroups, like CPU. I
> wonder if pjt is paying attention to this thread.

Paul?

> I think this is wrong. Take the opportunity to define the RIGHT
> interface that you WANT - a container. Implement it in terms of
> cgroups (and maybe other stuff!). Make that API so compelling that
> people want to use it, and your war of attrition on direct cgroup
> madness will be won, but with net progress rather than regress.

The goal is to reach sane and widely useable / useful state with
minimum amount of complexity. Maintaining backward compatibility for
some period - likely quite a few years - while still allowing future
development is a pretty important consideration. Another factor is
that the general situation has been more or less atrocious and cgroup
as a whole has been failing in the very basic places, which also
reinforces the drive for simplicity.

I probably am forgetting some, but anyways, from my POV, there are
fairly strong by-default factors which push for simplicity even if
that means some loss of functionalities as long as those aren't
something catastrophic. I've been going over the decisions past few
days and unified hierarchy still seems the best, or rather, most
acceptable solution.

That said, I stil don't know very well the scope and severity of the
problems you guys might face from the loss of multiple orthogonal
hierarchies. The cpuset one wasn't very convincing especially given
that most of expressibility problems can be mitigated if you presume
the central managing facility which can adapt the configurations as
the workload changes. Dynamic execution of configuration of course is
the job of cgroup proper but larger cadence changes doesn't have to be
statically encoded in the hierarchy itself and as I wrote before some
just can't be whether multiple hierarchy or not.

While the bar to overcome is pretty high, I do want to learn about the
problems you guys are foreseeing, so that I can at least evaulate the
graveness properly and hopefully compromises which can mitigate the
most sore ones can be made wherever necessary.

So, can you please explain the issues that you've experienced and are
foreseeing in detail with their contexts? ie. if you have certain
requirement, please give at least brief explanation on where such
requirement is coming from and how important the requirement is.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mhocko at suse

Jun 30, 2013, 11:38 AM

Post #44 of 54 (2447 views)
Permalink
Re: [Workman-devel] cgroup: status-quo and userland efforts [In reply to]

On Fri 28-06-13 14:01:55, Vivek Goyal wrote:
> On Fri, Jun 28, 2013 at 05:05:13PM +0200, Michal Hocko wrote:
[...]
> > OK, so libcgroup's rules daemon will still work and place my tasks in
> > appropriate cgroups?
>
> Do you use that daemon in practice?

I am not but my users do. And that is why I care.

> For user session logins, I think systemd has plans to put user
> sessions in a cgroup (kind of making pam_cgroup redundant).
>
> Other functionality rulesengined was providing moving tasks automatically
> in a cgroup based on executable name. I think that was racy and not
> many people had liked it.

It doesn't make sense for short lived processes, all right, but it can
be useful for those that live for a long time.

> IIUC, systemd can't disable access to cgroupfs from other utilities.

The previous messages read otherwise. And that is why this rised the red
flag at many fronts.

> So most likely rulesengined should contine to work. But having both
> systemd and libcgroup might not make much sense though.
>
> Thanks
> Vivek

--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


lpoetter at redhat

Jun 30, 2013, 12:39 PM

Post #45 of 54 (2441 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

Heya,

On 29.06.2013 05:05, Tim Hockin wrote:
> Come on, now, Lennart. You put a lot of words in my mouth.

>> I for sure am not going to make the PID 1 a client of another daemon. That's
>> just wrong. If you have a daemon that is both conceptually the manager of
>> another service and the client of that other service, then that's bad design
>> and you will easily run into deadlocks and such. Just think about it: if you
>> have some external daemon for managing cgroups, and you need cgroups for
>> running external daemons, how are you going to start the external daemon for
>> managing cgroups? Sure, you can hack around this, make that daemon special,
>> and magic, and stuff -- or you can just not do such nonsense. There's no
>> reason to repeat the fuckup that cgroup became in kernelspace a second time,
>> but this time in userspace, with multiple manager daemons all with different
>> and slightly incompatible definitions what a unit to manage actualy is...
>
> I forgot about the tautology of systemd. systemd is monolithic.

systemd is certainly not monolithic for almost any definition of that
term. I am not sure where you are taking that from, and I am not sure I
want to discuss on that level. This just sounds like FUD you picked up
somewhere and are repeating carelessly...

> But that's not my point. It seems pretty easy to make this cgroup
> management (in "native mode") a library that can have either a thin
> veneer of a main() function, while also being usable by systemd. The
> point is to solve all of the problems ONCE. I'm trying to make the
> case that systemd itself should be focusing on features and policies
> and awesome APIs.

You know, getting this all right isn't easy. If you want to do things
properly, then you need to propagate attribute changes between the units
you manage. You also need something like a scheduler, since a number of
controllers can only be configured under certain external conditions
(for example: the blkio or devices controller use major/minor parameters
for configuring per-device limits. Since major/minor assignments are
pretty much unpredictable these days -- and users probably want to
configure things with friendly and stable /dev/disk/by-id/* symlinks
anyway -- this requires us to wait for devices to show up before we can
configure the parameters.) Soo... you need a graph of units, where you
can propagate things, and schedule things based on some execution/event
queue. And the propagation and scheduling are closely intermingled.

Now, that's pretty much exactly what systemd actually *is*. It
implements a graph of units with a scheduler. And if you rip that part
out of systemd to make this an "easy cgroup management library", then
you simply turn what systemd is into a library without leaving anything.
Which is just bogus.

So no, if you say "seems pretty easy to make this cgroup management a
library" then well, I have to disagree with you.

>> We want to run fewer, simpler things on our systems, we want to reuse as
>
> Fewer and simpler are not compatible, unless you are losing
> functionality. Systemd is fewer, but NOT simpler.

Oh, certainly it is. If we'd split up the cgroup fs access into
separate daemon of some kind, then we'd need some kind of IPC for that,
and so you have more daemons and you have some complex IPC between the
processes. So yeah, the systemd approach is certainly both simpler and
uses fewer daemons then your hypothetical one.

>> much of the code as we can. You don't achieve that by running yet another
>> daemon that does worse what systemd can anyway do simpler, easier and
>> better.
>
> Considering this is all hypothetical, I find this to be a funny
> debate. My hypothetical idea is better than your hypothetical idea.

Well, systemd is pretty real, and the code to do the unified cgroup
management within systemd is pretty complete. systemd is certainly not
hypothetical.

>> The least you could grant us is to have a look at the final APIs we will
>> have to offer before you already imply that systemd cannot be a valid
>> implementation of any API people could ever agree on.
>
> Whoah, don't get defensive. I said nothing of the sort. The fact of
> the matter is that we do not run systemd, at least in part because of
> the monolithic nature. That's unlikely to change in this timescale.

Oh, my. I am not sure what makes you think it is monolithic.

> What I said was that it would be a shame if we had to invent our own
> low-level cgroup daemon just because the "upstream" daemons was too
> tightly coupled with systemd.

I have no interest to reimplement systemd as a library, just to make you
happy... I am quite happy with what we already have....

> This is supposed to be collaborative, not combative.

It certainly sounds *very* differently in what you are writing.

Lennart
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


thockin at hockin

Jun 30, 2013, 11:06 PM

Post #46 of 54 (2424 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

On Sun, Jun 30, 2013 at 12:39 PM, Lennart Poettering
<lpoetter [at] redhat> wrote:
> Heya,
>
>
> On 29.06.2013 05:05, Tim Hockin wrote:
>>
>> Come on, now, Lennart. You put a lot of words in my mouth.
>
>
>>> I for sure am not going to make the PID 1 a client of another daemon.
>>> That's
>>> just wrong. If you have a daemon that is both conceptually the manager of
>>> another service and the client of that other service, then that's bad
>>> design
>>> and you will easily run into deadlocks and such. Just think about it: if
>>> you
>>> have some external daemon for managing cgroups, and you need cgroups for
>>> running external daemons, how are you going to start the external daemon
>>> for
>>> managing cgroups? Sure, you can hack around this, make that daemon
>>> special,
>>> and magic, and stuff -- or you can just not do such nonsense. There's no
>>> reason to repeat the fuckup that cgroup became in kernelspace a second
>>> time,
>>> but this time in userspace, with multiple manager daemons all with
>>> different
>>> and slightly incompatible definitions what a unit to manage actualy is...
>>
>>
>> I forgot about the tautology of systemd. systemd is monolithic.
>
>
> systemd is certainly not monolithic for almost any definition of that term.
> I am not sure where you are taking that from, and I am not sure I want to
> discuss on that level. This just sounds like FUD you picked up somewhere and
> are repeating carelessly...

It does a number of sort-of-related things. Maybe it does them better
by doing them together. I can't say, really. We don't use it at
work, and I am on Ubuntu elsewhere, for now.

>> But that's not my point. It seems pretty easy to make this cgroup
>> management (in "native mode") a library that can have either a thin
>> veneer of a main() function, while also being usable by systemd. The
>> point is to solve all of the problems ONCE. I'm trying to make the
>> case that systemd itself should be focusing on features and policies
>> and awesome APIs.
>
> You know, getting this all right isn't easy. If you want to do things
> properly, then you need to propagate attribute changes between the units you
> manage. You also need something like a scheduler, since a number of
> controllers can only be configured under certain external conditions (for
> example: the blkio or devices controller use major/minor parameters for
> configuring per-device limits. Since major/minor assignments are pretty much
> unpredictable these days -- and users probably want to configure things with
> friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
> wait for devices to show up before we can configure the parameters.) Soo...
> you need a graph of units, where you can propagate things, and schedule
> things based on some execution/event queue. And the propagation and
> scheduling are closely intermingled.

I'm really just talking about the most basic low-level substrate of
writing to cgroupfs. Again, we don't use udev (yet?) so we don't have
these problems. It seems to me that it's possible to formulate a
bottom layer that is usable by both systemd and non-systemd systems.
But, you know, maybe I am wrong and our internal universe is so much
simpler (and behind the times) than the rest of the world that
layering can work for us and not you.

> Now, that's pretty much exactly what systemd actually *is*. It implements a
> graph of units with a scheduler. And if you rip that part out of systemd to
> make this an "easy cgroup management library", then you simply turn what
> systemd is into a library without leaving anything. Which is just bogus.
>
> So no, if you say "seems pretty easy to make this cgroup management a
> library" then well, I have to disagree with you.
>
>
>>> We want to run fewer, simpler things on our systems, we want to reuse as
>>
>>
>> Fewer and simpler are not compatible, unless you are losing
>> functionality. Systemd is fewer, but NOT simpler.
>
>
> Oh, certainly it is. If we'd split up the cgroup fs access into separate
> daemon of some kind, then we'd need some kind of IPC for that, and so you
> have more daemons and you have some complex IPC between the processes. So
> yeah, the systemd approach is certainly both simpler and uses fewer daemons
> then your hypothetical one.

Well, it SOUNDS like Serge is trying to develop this to demonstrate
that a standalone daemon works. That's what I am keen to help with
(or else we have to invent ourselves). I am not really afraid of IPC
or of "more daemons". I much prefer simple agents doing one thing and
interacting with each other in simple ways. But that's me.

>>> much of the code as we can. You don't achieve that by running yet another
>>> daemon that does worse what systemd can anyway do simpler, easier and
>>> better.
>>
>>
>> Considering this is all hypothetical, I find this to be a funny
>> debate. My hypothetical idea is better than your hypothetical idea.
>
>
> Well, systemd is pretty real, and the code to do the unified cgroup
> management within systemd is pretty complete. systemd is certainly not
> hypothetical.

Fair enough - I did not realize you had already done all the work that
Serge is just starting out on.

>>> The least you could grant us is to have a look at the final APIs we will
>>> have to offer before you already imply that systemd cannot be a valid
>>> implementation of any API people could ever agree on.
>>
>>
>> Whoah, don't get defensive. I said nothing of the sort. The fact of
>> the matter is that we do not run systemd, at least in part because of
>> the monolithic nature. That's unlikely to change in this timescale.
>
>
> Oh, my. I am not sure what makes you think it is monolithic.

It is not a replacement for any one thing. It is a replacement for a
handful of things that we are not keen to change all at once. That's
all. I have not personally looked at what subsystems are able to be
compiled-out so we could do an incremental changeover, though, so
maybe it can work in different modes? I don't know. I am not
pursuing this anyway, so I am not the person to convince, regardless.

>> What I said was that it would be a shame if we had to invent our own
>> low-level cgroup daemon just because the "upstream" daemons was too
>> tightly coupled with systemd.
>
>
> I have no interest to reimplement systemd as a library, just to make you
> happy... I am quite happy with what we already have....
>
>
>> This is supposed to be collaborative, not combative.
>
>
> It certainly sounds *very* differently in what you are writing.

Sorry, then. No offense intended. I'm just looking for opportunities
to not-replicate work, if this whole model is going to be thrust upon
me.

Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tglx at linutronix

Jul 2, 2013, 4:57 PM

Post #47 of 54 (2429 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

Lennart,

On Sun, 30 Jun 2013, Lennart Poettering wrote:
> On 29.06.2013 05:05, Tim Hockin wrote:
> > But that's not my point. It seems pretty easy to make this cgroup
> > management (in "native mode") a library that can have either a thin
> > veneer of a main() function, while also being usable by systemd. The
> > point is to solve all of the problems ONCE. I'm trying to make the
> > case that systemd itself should be focusing on features and policies
> > and awesome APIs.
>
> You know, getting this all right isn't easy. If you want to do things
> properly, then you need to propagate attribute changes between the units you
> manage. You also need something like a scheduler, since a number of
> controllers can only be configured under certain external conditions (for
> example: the blkio or devices controller use major/minor parameters for
> configuring per-device limits. Since major/minor assignments are pretty much
> unpredictable these days -- and users probably want to configure things with
> friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
> wait for devices to show up before we can configure the parameters.) Soo...
> you need a graph of units, where you can propagate things, and schedule things
> based on some execution/event queue. And the propagation and scheduling are
> closely intermingled.

you are confusing policy and mechanisms.

The access to cgroupfs is mechanism.

The propagation of changes, the scheduling of cgroupfs access and
the correlation to external conditions are policy.

What Tim is asking for is to have a common interface, i.e. a library
which implements the low level access to the cgroupfs mechanism
without imposing systemd defined policies to it (It might implement a
set of common useful policies, but that's a different discussion).

That's definitely not an unreasonable request, because he wants to
implement his own set of policies which are not necessarily the same
as those which are implemented by systemd.

You are simply ignoring the fact, that Linux is used in other ways
than those which you are focussed on. That's true for Google's way to
manage its gazillion machines and that's equally true for the other
end of the spectrum which is deep embedded or any other specialized
use case. Just face it: running Linux on your laptop and on some RHT
lab machines is covering about 1% of the use cases.

Nevertheless you repeatedly claim, that systemd is the only way to
deal with system startup and system management, is covering _ALL_ use
cases and the interfaces you expose are sufficient.

Did you ever work on specialized embedded or big data use cases? I
really doubt that, but I might be wrong as usual.

So I invite you to prove that you can beat an existing setup for an
automotive use case with your magic systemd foo. I refund you fully,
if you can beat the mark of a functional system less than 800ms after
reset release on a 200MHz ARM machine. Functional is defined by the
use case requirements and means:

- Basic cgroups management working
- GUI up and running
- Main communication interface (CAN bus) up and running

The rest of the system is starting up after that including a more
complex cgroup management.

According to your claim that systemd is covering everything and some
more, this should take you a few hours. I grant you a full week to
work on that.

The use case Tim is talking about is different, but has similar
constraints which are completely driven by his particular use case
scenario. I'm sure, that Tim can persuade his management to setup a
similar contest to prove your expertise on the other extreme of the
Linux world.

Before answering please think about the relevance of your statements
"getting this all right isn't easy", "something like a scheduler",
"users probably want ..." and "stable /dev/disk/by-id/* symlinks" in
those contexts.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kay at vrfy

Jul 2, 2013, 5:44 PM

Post #48 of 54 (2425 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

On Wed, Jul 3, 2013 at 1:57 AM, Thomas Gleixner <tglx [at] linutronix> wrote:
> On Sun, 30 Jun 2013, Lennart Poettering wrote:
>> On 29.06.2013 05:05, Tim Hockin wrote:
>> > But that's not my point. It seems pretty easy to make this cgroup
>> > management (in "native mode") a library that can have either a thin
>> > veneer of a main() function, while also being usable by systemd. The
>> > point is to solve all of the problems ONCE. I'm trying to make the
>> > case that systemd itself should be focusing on features and policies
>> > and awesome APIs.
>>
>> You know, getting this all right isn't easy. If you want to do things
>> properly, then you need to propagate attribute changes between the units you
>> manage. You also need something like a scheduler, since a number of
>> controllers can only be configured under certain external conditions (for
>> example: the blkio or devices controller use major/minor parameters for
>> configuring per-device limits. Since major/minor assignments are pretty much
>> unpredictable these days -- and users probably want to configure things with
>> friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
>> wait for devices to show up before we can configure the parameters.) Soo...
>> you need a graph of units, where you can propagate things, and schedule things
>> based on some execution/event queue. And the propagation and scheduling are
>> closely intermingled.
>
> you are confusing policy and mechanisms.
>
> The access to cgroupfs is mechanism.
>
> The propagation of changes, the scheduling of cgroupfs access and
> the correlation to external conditions are policy.
>
> What Tim is asking for is to have a common interface, i.e. a library
> which implements the low level access to the cgroupfs mechanism
> without imposing systemd defined policies to it (It might implement a
> set of common useful policies, but that's a different discussion).
>
> That's definitely not an unreasonable request, because he wants to
> implement his own set of policies which are not necessarily the same
> as those which are implemented by systemd.
>
> You are simply ignoring the fact, that Linux is used in other ways
> than those which you are focussed on. That's true for Google's way to
> manage its gazillion machines and that's equally true for the other
> end of the spectrum which is deep embedded or any other specialized
> use case. Just face it: running Linux on your laptop and on some RHT
> lab machines is covering about 1% of the use cases.
>
> Nevertheless you repeatedly claim, that systemd is the only way to
> deal with system startup and system management, is covering _ALL_ use
> cases and the interfaces you expose are sufficient.
>
> Did you ever work on specialized embedded or big data use cases? I
> really doubt that, but I might be wrong as usual.
>
> So I invite you to prove that you can beat an existing setup for an
> automotive use case with your magic systemd foo. I refund you fully,
> if you can beat the mark of a functional system less than 800ms after
> reset release on a 200MHz ARM machine. Functional is defined by the
> use case requirements and means:
>
> - Basic cgroups management working
> - GUI up and running
> - Main communication interface (CAN bus) up and running
>
> The rest of the system is starting up after that including a more
> complex cgroup management.
>
> According to your claim that systemd is covering everything and some
> more, this should take you a few hours. I grant you a full week to
> work on that.
>
> The use case Tim is talking about is different, but has similar
> constraints which are completely driven by his particular use case
> scenario. I'm sure, that Tim can persuade his management to setup a
> similar contest to prove your expertise on the other extreme of the
> Linux world.
>
> Before answering please think about the relevance of your statements
> "getting this all right isn't easy", "something like a scheduler",
> "users probably want ..." and "stable /dev/disk/by-id/* symlinks" in
> those contexts.

I don't think anybody needs your money.

But it's sure an improvement over last time when you wanted to use a
"Kantholz" to make your statement.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


bp at alien8

Jul 3, 2013, 12:37 AM

Post #49 of 54 (2424 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

On Wed, Jul 03, 2013 at 02:44:31AM +0200, Kay Sievers wrote:
> I don't think anybody needs your money.
>
> But it's sure an improvement over last time when you wanted to use a
> "Kantholz" to make your statement.

Kantholz, frozen sharks, whatever helps get the real point across. Hint:
this is not at all about the money.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tglx at linutronix

Jul 3, 2013, 2:30 AM

Post #50 of 54 (2418 views)
Permalink
Re: cgroup: status-quo and userland efforts [In reply to]

On Wed, 3 Jul 2013, Kay Sievers wrote:
> On Wed, Jul 3, 2013 at 1:57 AM, Thomas Gleixner <tglx [at] linutronix> wrote:
> > Before answering please think about the relevance of your statements
> > "getting this all right isn't easy", "something like a scheduler",
> > "users probably want ..." and "stable /dev/disk/by-id/* symlinks" in
> > those contexts.
>
> I don't think anybody needs your money.

Thanks for your well thought out technical argument.

> But it's sure an improvement over last time when you wanted to use a
> "Kantholz" to make your statement.

Using an out of context snippet from a private conversation at the bar
to answer a technical argument is definitely proving your point.

Thanks,

tglx



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.