Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

Attaching a process to cgroups

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


renton at renton

Jun 19, 2012, 11:58 AM

Post #1 of 20 (176 views)
Permalink
Attaching a process to cgroups

Hi.

Is it possible to somehow fasten a process of pid attaching to cgroup?
The problem is the pid attaches to a task-file with some strange delay:

22:28:00.788224 open("/sys/fs/cgroup/memory/virtwww/w_test-l24-apache1_4bdf3d13/apache/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000035>
22:28:00.788289 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
22:28:00.788326 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e78074000 <0.000005>
22:28:00.788355 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
22:28:00.788389 lseek(3, 0, SEEK_SET) = 0 <0.000004>
22:28:00.788426 write(3, "16317\n", 6) = 6 <0.128094>
22:28:00.916578 close(3) = 0 <0.000006>

For a comparison here's a test attaching pid-file in placed tmpfs:

22:24:41.892562 open("/tmp/w_test-l24-apache1_4bdf3d13/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000010>
22:24:41.892597 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
22:24:41.892631 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5685b6f000 <0.000006>
22:24:41.892664 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
22:24:41.892701 lseek(3, 6, SEEK_SET) = 6 <0.000004>
22:24:41.892738 write(3, "25966\n", 6) = 6 <0.000008>
22:24:41.892767 close(3) = 0 <0.000005>

Here goes it immediately.

--
BRGDS. Alexey Vlasov.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


nishimura at mxp

Jun 19, 2012, 8:34 PM

Post #2 of 20 (170 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

Hi.

What does "cat /sys/fs/cgroup/.../apache/memory.move_charge_at_immigrate" show ?

If it shows non-zero value, you can make the pid attachment faster by writing "0" to
memory.move_charge_at_immigrate before attaching the process.
But note that if you disable the feature, current memory usage of the process is not
moved to the new cgroup.

Thanks,
Daisuke Nishimura.

On Tue, 19 Jun 2012 22:58:56 +0400
Alexey Vlasov <renton [at] renton> wrote:

> Hi.
>
> Is it possible to somehow fasten a process of pid attaching to cgroup?
> The problem is the pid attaches to a task-file with some strange delay:
>
> 22:28:00.788224 open("/sys/fs/cgroup/memory/virtwww/w_test-l24-apache1_4bdf3d13/apache/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000035>
> 22:28:00.788289 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788326 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e78074000 <0.000005>
> 22:28:00.788355 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788389 lseek(3, 0, SEEK_SET) = 0 <0.000004>
> 22:28:00.788426 write(3, "16317\n", 6) = 6 <0.128094>
> 22:28:00.916578 close(3) = 0 <0.000006>
>
> For a comparison here's a test attaching pid-file in placed tmpfs:
>
> 22:24:41.892562 open("/tmp/w_test-l24-apache1_4bdf3d13/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000010>
> 22:24:41.892597 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
> 22:24:41.892631 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5685b6f000 <0.000006>
> 22:24:41.892664 fstat(3, {st_mode=S_IFREG|0644, st_size=6, ...}) = 0 <0.000004>
> 22:24:41.892701 lseek(3, 6, SEEK_SET) = 6 <0.000004>
> 22:24:41.892738 write(3, "25966\n", 6) = 6 <0.000008>
> 22:24:41.892767 close(3) = 0 <0.000005>
>
> Here goes it immediately.
>
> --
> BRGDS. Alexey Vlasov.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo [at] vger
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


renton at renton

Jun 20, 2012, 4:08 AM

Post #3 of 20 (170 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Wed, Jun 20, 2012 at 12:34:31PM +0900, Daisuke Nishimura wrote:
>
> What does "cat /sys/fs/cgroup/.../apache/memory.move_charge_at_immigrate" show ?

Here I have zero.

By the way such a delay is not only in memory controller but also in
others - in cpu, blkio for example.

--
BRGDS. Alexey Vlasov.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


efault at gmx

Jun 20, 2012, 5:28 AM

Post #4 of 20 (171 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Tue, 2012-06-19 at 22:58 +0400, Alexey Vlasov wrote:
> Hi.
>
> Is it possible to somehow fasten a process of pid attaching to cgroup?
> The problem is the pid attaches to a task-file with some strange delay:
>
> 22:28:00.788224 open("/sys/fs/cgroup/memory/virtwww/w_test-l24-apache1_4bdf3d13/apache/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000035>
> 22:28:00.788289 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788326 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e78074000 <0.000005>
> 22:28:00.788355 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> 22:28:00.788389 lseek(3, 0, SEEK_SET) = 0 <0.000004>
> 22:28:00.788426 write(3, "16317\n", 6) = 6 <0.128094>
> 22:28:00.916578 close(3) = 0 <0.000006>


kernel/cgroup.c::cgroup_attach_task()
{
...
synchronize_rcu();
...
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


renton at renton

Jun 21, 2012, 12:54 AM

Post #5 of 20 (164 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
>
> kernel/cgroup.c::cgroup_attach_task()
> {
> ...
> synchronize_rcu();
> ...
> }

So nothing can be done here? (I mean if only I knew how to fix it I
wouldn't ask about it ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


efault at gmx

Jun 21, 2012, 1:23 AM

Post #6 of 20 (163 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> >
> > kernel/cgroup.c::cgroup_attach_task()
> > {
> > ...
> > synchronize_rcu();
> > ...
> > }
>
> So nothing can be done here? (I mean if only I knew how to fix it I
> wouldn't ask about it ;)

Sure, kill the obnoxious thing, it's sitting right in the middle of the
userspace interface.

I banged on it a while back (wrt explosive android patches), extracted
RCU from the userspace interface. It seemed to work great, much faster,
couldn't make it explode. I wouldn't bet anything I wasn't willing to
immediately part with that the result was really really safe though ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


efault at gmx

Jun 21, 2012, 1:26 AM

Post #7 of 20 (164 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Thu, 2012-06-21 at 10:23 +0200, Mike Galbraith wrote:
> On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > >
> > > kernel/cgroup.c::cgroup_attach_task()
> > > {
> > > ...
> > > synchronize_rcu();
> > > ...
> > > }
> >
> > So nothing can be done here? (I mean if only I knew how to fix it I
> > wouldn't ask about it ;)
>
> Sure, kill the obnoxious thing, it's sitting right in the middle of the
> userspace interface.

Um, lest anyone misunderstand, no, I don't mean by whacking instances of
synchronize_rcu() without further ado :)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


paulmck at linux

Jun 26, 2012, 11:06 AM

Post #8 of 20 (159 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > >
> > > kernel/cgroup.c::cgroup_attach_task()
> > > {
> > > ...
> > > synchronize_rcu();
> > > ...
> > > }
> >
> > So nothing can be done here? (I mean if only I knew how to fix it I
> > wouldn't ask about it ;)
>
> Sure, kill the obnoxious thing, it's sitting right in the middle of the
> userspace interface.
>
> I banged on it a while back (wrt explosive android patches), extracted
> RCU from the userspace interface. It seemed to work great, much faster,
> couldn't make it explode. I wouldn't bet anything I wasn't willing to
> immediately part with that the result was really really safe though ;-)

Or replace it with synchronize_rcu_expedited(). You can "get lucky"
for quite some time removing synchronize_rcu() calls!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


efault at gmx

Jun 27, 2012, 12:23 AM

Post #9 of 20 (159 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Tue, 2012-06-26 at 11:06 -0700, Paul E. McKenney wrote:
> On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> > On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> > > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > > >
> > > > kernel/cgroup.c::cgroup_attach_task()
> > > > {
> > > > ...
> > > > synchronize_rcu();
> > > > ...
> > > > }
> > >
> > > So nothing can be done here? (I mean if only I knew how to fix it I
> > > wouldn't ask about it ;)
> >
> > Sure, kill the obnoxious thing, it's sitting right in the middle of the
> > userspace interface.
> >
> > I banged on it a while back (wrt explosive android patches), extracted
> > RCU from the userspace interface. It seemed to work great, much faster,
> > couldn't make it explode. I wouldn't bet anything I wasn't willing to
> > immediately part with that the result was really really safe though ;-)
>
> Or replace it with synchronize_rcu_expedited(). You can "get lucky"
> for quite some time removing synchronize_rcu() calls!

s/remove/replace, but yup. A company that wanted to use the android
patches plus my tinkering showed a fix they needed on top to close a
race discovered in their testing. So yeah, even when all seems fine,
extracting synchronize_rcu() may expose evils you couldn't encounter
before, and didn't happen to encounter afterward.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


paulmck at linux

Jun 27, 2012, 10:10 AM

Post #10 of 20 (155 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Wed, Jun 27, 2012 at 09:23:31AM +0200, Mike Galbraith wrote:
> On Tue, 2012-06-26 at 11:06 -0700, Paul E. McKenney wrote:
> > On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> > > On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> > > > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > > > >
> > > > > kernel/cgroup.c::cgroup_attach_task()
> > > > > {
> > > > > ...
> > > > > synchronize_rcu();
> > > > > ...
> > > > > }
> > > >
> > > > So nothing can be done here? (I mean if only I knew how to fix it I
> > > > wouldn't ask about it ;)
> > >
> > > Sure, kill the obnoxious thing, it's sitting right in the middle of the
> > > userspace interface.
> > >
> > > I banged on it a while back (wrt explosive android patches), extracted
> > > RCU from the userspace interface. It seemed to work great, much faster,
> > > couldn't make it explode. I wouldn't bet anything I wasn't willing to
> > > immediately part with that the result was really really safe though ;-)
> >
> > Or replace it with synchronize_rcu_expedited(). You can "get lucky"
> > for quite some time removing synchronize_rcu() calls!
>
> s/remove/replace, but yup. A company that wanted to use the android
> patches plus my tinkering showed a fix they needed on top to close a
> race discovered in their testing. So yeah, even when all seems fine,
> extracting synchronize_rcu() may expose evils you couldn't encounter
> before, and didn't happen to encounter afterward.

I really did mean "remove". Removing a synchronize_rcu() does result
in a race, but often an extremely low-probability race. So you can
remove a synchronize_rcu() and get lucky for a long time, but sooner
or later, something will explode.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


efault at gmx

Jun 27, 2012, 7:40 PM

Post #11 of 20 (152 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Wed, 2012-06-27 at 10:10 -0700, Paul E. McKenney wrote:

> I really did mean "remove". Removing a synchronize_rcu() does result
> in a race, but often an extremely low-probability race. So you can
> remove a synchronize_rcu() and get lucky for a long time, but sooner
> or later, something will explode.

Ah. Android patches did "replace", and it still didn't take long at all
for bad luck to happen.. so "remove" for that particular instance would
probably result in "sooner" variety explosions :)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


righi.andrea at gmail

Jul 23, 2012, 1:41 PM

Post #12 of 20 (96 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > >
> > > kernel/cgroup.c::cgroup_attach_task()
> > > {
> > > ...
> > > synchronize_rcu();
> > > ...
> > > }
> >
> > So nothing can be done here? (I mean if only I knew how to fix it I
> > wouldn't ask about it ;)
>
> Sure, kill the obnoxious thing, it's sitting right in the middle of the
> userspace interface.
>
> I banged on it a while back (wrt explosive android patches), extracted
> RCU from the userspace interface. It seemed to work great, much faster,
> couldn't make it explode. I wouldn't bet anything I wasn't willing to
> immediately part with that the result was really really safe though ;-)
>
> -Mike

JFYI,

I'm testing the following patch in a bunch of hosts and I wasn't able to
make any of them to explode, even running a multi-threaded
cgroup-intensive workload, but probably I was just lucky (or unlucky,
depending on the point of view).

It is basically the same Not-signed-off-by work posted by Mike a while
ago: https://lkml.org/lkml/2011/4/12/599.

In addition, I totally removed the synchronize_rcu() call from
cgroup_attach_task() and added the call_rcu -> schedule_work removal
also for css_set. The latter looks unnecessary to me from a logical
point of view, or maybe I'm missing something, because I can't explain
why with it I can't trigger any BUG / oops.

Mike, did you make any progress from your old patch?

Thanks,
-Andrea

---
include/linux/cgroup.h | 2 ++
kernel/cgroup.c | 91 +++++++++++++++++++++++++++++-------------------
2 files changed, 58 insertions(+), 35 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d3f5fba..2bab2d6 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -212,6 +212,7 @@ struct cgroup {

/* For RCU-protected deletion */
struct rcu_head rcu_head;
+ struct work_struct work;

/* List of events which userspace want to receive */
struct list_head event_list;
@@ -260,6 +261,7 @@ struct css_set {

/* For RCU-protected deletion */
struct rcu_head rcu_head;
+ struct work_struct work;
};

/*
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index b303dfc..aa7cc06 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -396,6 +396,21 @@ static struct hlist_head *css_set_hash(struct cgroup_subsys_state *css[])
* compiled into their kernel but not actually in use */
static int use_task_css_set_links __read_mostly;

+static void free_css_set_work(struct work_struct *work)
+{
+ struct css_set *cg = container_of(work, struct css_set, work);
+
+ kfree(cg);
+}
+
+static void free_css_set_rcu(struct rcu_head *obj)
+{
+ struct css_set *cg = container_of(obj, struct css_set, rcu_head);
+
+ INIT_WORK(&cg->work, free_css_set_work);
+ schedule_work(&cg->work);
+}
+
static void __put_css_set(struct css_set *cg, int taskexit)
{
struct cg_cgroup_link *link;
@@ -433,7 +448,7 @@ static void __put_css_set(struct css_set *cg, int taskexit)
}

write_unlock(&css_set_lock);
- kfree_rcu(cg, rcu_head);
+ call_rcu(&cg->rcu_head, free_css_set_rcu);
}

/*
@@ -875,44 +890,52 @@ static int cgroup_call_pre_destroy(struct cgroup *cgrp)
return ret;
}

-static void cgroup_diput(struct dentry *dentry, struct inode *inode)
+static void free_cgroup_work(struct work_struct *work)
{
- /* is dentry a directory ? if so, kfree() associated cgroup */
- if (S_ISDIR(inode->i_mode)) {
- struct cgroup *cgrp = dentry->d_fsdata;
- struct cgroup_subsys *ss;
- BUG_ON(!(cgroup_is_removed(cgrp)));
- /* It's possible for external users to be holding css
- * reference counts on a cgroup; css_put() needs to
- * be able to access the cgroup after decrementing
- * the reference count in order to know if it needs to
- * queue the cgroup to be handled by the release
- * agent */
- synchronize_rcu();
+ struct cgroup *cgrp = container_of(work, struct cgroup, work);
+ struct cgroup_subsys *ss;

- mutex_lock(&cgroup_mutex);
- /*
- * Release the subsystem state objects.
- */
- for_each_subsys(cgrp->root, ss)
- ss->destroy(cgrp);
+ mutex_lock(&cgroup_mutex);
+ /*
+ * Release the subsystem state objects.
+ */
+ for_each_subsys(cgrp->root, ss)
+ ss->destroy(cgrp);

- cgrp->root->number_of_cgroups--;
- mutex_unlock(&cgroup_mutex);
+ cgrp->root->number_of_cgroups--;
+ mutex_unlock(&cgroup_mutex);

- /*
- * Drop the active superblock reference that we took when we
- * created the cgroup
- */
- deactivate_super(cgrp->root->sb);
+ /*
+ * Drop the active superblock reference that we took when we
+ * created the cgroup
+ */
+ deactivate_super(cgrp->root->sb);

- /*
- * if we're getting rid of the cgroup, refcount should ensure
- * that there are no pidlists left.
- */
- BUG_ON(!list_empty(&cgrp->pidlists));
+ /*
+ * if we're getting rid of the cgroup, refcount should ensure
+ * that there are no pidlists left.
+ */
+ BUG_ON(!list_empty(&cgrp->pidlists));
+
+ kfree(cgrp);
+}
+
+static void free_cgroup_rcu(struct rcu_head *obj)
+{
+ struct cgroup *cgrp = container_of(obj, struct cgroup, rcu_head);
+
+ INIT_WORK(&cgrp->work, free_cgroup_work);
+ schedule_work(&cgrp->work);
+}
+
+static void cgroup_diput(struct dentry *dentry, struct inode *inode)
+{
+ /* is dentry a directory ? if so, kfree() associated cgroup */
+ if (S_ISDIR(inode->i_mode)) {
+ struct cgroup *cgrp = dentry->d_fsdata;

- kfree_rcu(cgrp, rcu_head);
+ BUG_ON(!(cgroup_is_removed(cgrp)));
+ call_rcu(&cgrp->rcu_head, free_cgroup_rcu);
} else {
struct cfent *cfe = __d_cfe(dentry);
struct cgroup *cgrp = dentry->d_parent->d_fsdata;
@@ -1990,8 +2013,6 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
ss->attach(cgrp, &tset);
}

- synchronize_rcu();
-
/*
* wake up rmdir() waiter. the rmdir should fail since the cgroup
* is no longer empty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


efault at gmx

Jul 23, 2012, 6:19 PM

Post #13 of 20 (96 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Mon, 2012-07-23 at 22:41 +0200, Andrea Righi wrote:
> On Thu, Jun 21, 2012 at 10:23:02AM +0200, Mike Galbraith wrote:
> > On Thu, 2012-06-21 at 11:54 +0400, Alexey Vlasov wrote:
> > > On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> > > >
> > > > kernel/cgroup.c::cgroup_attach_task()
> > > > {
> > > > ...
> > > > synchronize_rcu();
> > > > ...
> > > > }
> > >
> > > So nothing can be done here? (I mean if only I knew how to fix it I
> > > wouldn't ask about it ;)
> >
> > Sure, kill the obnoxious thing, it's sitting right in the middle of the
> > userspace interface.
> >
> > I banged on it a while back (wrt explosive android patches), extracted
> > RCU from the userspace interface. It seemed to work great, much faster,
> > couldn't make it explode. I wouldn't bet anything I wasn't willing to
> > immediately part with that the result was really really safe though ;-)
> >
> > -Mike
>
> JFYI,
>
> I'm testing the following patch in a bunch of hosts and I wasn't able to
> make any of them to explode, even running a multi-threaded
> cgroup-intensive workload, but probably I was just lucky (or unlucky,
> depending on the point of view).
>
> It is basically the same Not-signed-off-by work posted by Mike a while
> ago: https://lkml.org/lkml/2011/4/12/599.
>
> In addition, I totally removed the synchronize_rcu() call from
> cgroup_attach_task() and added the call_rcu -> schedule_work removal
> also for css_set. The latter looks unnecessary to me from a logical
> point of view, or maybe I'm missing something, because I can't explain
> why with it I can't trigger any BUG / oops.
>
> Mike, did you make any progress from your old patch?

No, it worked, but I couldn't prove it was really safe, so let it drop.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


renton at renton

Jul 25, 2012, 6:36 AM

Post #14 of 20 (96 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

Hi.

Now I've got almost 5k used groups and it got even worse. Now I've got
almost 5k used groups and it got even worse.

If only write was working slower, now everything connected with cgroups
is hardly working.

Could it be connected with synchronize_rcu()?

Hanging on read():

# strace -ttT cat /proc/cgroups

17:30:43.825005 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0 <0.000005>
17:30:43.825048 open("/proc/cgroups", O_RDONLY) = 3 <0.000014>
17:30:43.825085 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000004>
17:30:43.825125 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000005>
17:30:43.825161 read(3, "#subsys_name\thierarchy\tnum_cgrou"..., 32768) = 112 <7.447084>
17:30:51.272302 write(1, "#subsys_name\thierarchy\tnum_cgrou"..., 112#subsys_name hierarchy num_cgroups enabled
cpuacct 1 16844 1
memory 2 16844 1
devices 3 16844 1
blkio 4 16844 1
) = 112 <0.000018>
17:30:51.272363 read(3, "", 32768) = 0 <0.000007>
17:30:51.272405 brk(0x62d000) = 0x62d000 <0.000011>
17:30:51.272444 close(3) = 0 <0.000011>
17:30:51.272488 close(1) = 0 <0.000007>
17:30:51.272518 close(2) = 0 <0.000005>

On Wed, Jun 20, 2012 at 02:28:18PM +0200, Mike Galbraith wrote:
> On Tue, 2012-06-19 at 22:58 +0400, Alexey Vlasov wrote:

> > Is it possible to somehow fasten a process of pid attaching to cgroup?
> > The problem is the pid attaches to a task-file with some strange delay:
> >
> > 22:28:00.788224 open("/sys/fs/cgroup/memory/virtwww/w_test-l24-apache1_4bdf3d13/apache/tasks", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3 <0.000035>
> > 22:28:00.788289 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> > 22:28:00.788326 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5e78074000 <0.000005>
> > 22:28:00.788355 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 <0.000004>
> > 22:28:00.788389 lseek(3, 0, SEEK_SET) = 0 <0.000004>
> > 22:28:00.788426 write(3, "16317\n", 6) = 6 <0.128094>
> > 22:28:00.916578 close(3) = 0 <0.000006>
>
>
> kernel/cgroup.c::cgroup_attach_task()
> {
> ...
> synchronize_rcu();
> ...
> }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


efault at gmx

Jul 25, 2012, 6:57 AM

Post #15 of 20 (96 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Wed, 2012-07-25 at 17:36 +0400, Alexey Vlasov wrote:
> Hi.
>
> Now I've got almost 5k used groups and it got even worse. Now I've got
> almost 5k used groups and it got even worse.
>
> If only write was working slower, now everything connected with cgroups
> is hardly working.
>
> Could it be connected with synchronize_rcu()?

I'd profile it with perf, and expect to find a large pile of cycles.

> Hanging on read():
>
> # strace -ttT cat /proc/cgroups
>
> 17:30:43.825005 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0 <0.000005>
> 17:30:43.825048 open("/proc/cgroups", O_RDONLY) = 3 <0.000014>
> 17:30:43.825085 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000004>
> 17:30:43.825125 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000005>
> 17:30:43.825161 read(3, "#subsys_name\thierarchy\tnum_cgrou"..., 32768) = 112 <7.447084>

Ew.. zillion cgroups is definitely a bad idea.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


renton at renton

Jul 26, 2012, 6:02 AM

Post #16 of 20 (96 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Wed, Jul 25, 2012 at 03:57:47PM +0200, Mike Galbraith wrote:
>
> I'd profile it with perf, and expect to find a large pile of cycles.

I did it the as following:
# perf stat cat /proc/self/cgroup

4:blkio:/
3:devices:/
2:memory:/
1:cpuacct:/

Performance counter stats for 'cat /proc/self/cgroup':

0.472513 task-clock # 0.000 CPUs utilized
1 context-switches # 0.002 M/sec
1 CPU-migrations # 0.002 M/sec
169 page-faults # 0.358 M/sec
1111521 cycles # 2.352 GHz
784737 stalled-cycles-frontend # 70.60% frontend cycles idle
445520 stalled-cycles-backend # 40.08% backend cycles idle
576622 instructions # 0.52 insns per cycle
# 1.36 stalled cycles per insn
120032 branches # 254.029 M/sec
6577 branch-misses # 5.48% of all branches

9.114631804 seconds time elapsed

# perf report --sort comm,dso
Kernel address maps (/proc/{kallsyms,modules}) were restricted.

Check /proc/sys/kernel/kptr_restrict before running 'perf record'.

If some relocation was applied (e.g. kexec) symbols may be misresolved.

Samples in kernel modules can't be resolved as well.

# ========
# captured on: Thu Jul 26 16:23:06 2012
# hostname : l24
# os release : 3.3.3-1gb-c-s-m
# perf version : 3.2
# arch : x86_64
# nrcpus online : 24
# nrcpus avail : 24
# cpudesc : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
# total memory : 74181032 kB
# cmdline : /usr/sbin/perf record cat /proc/self/cgroup
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1758, 1759, 1760, 1761, 1762, 1763, 1764, 1765, 1766, 1767, 1768, 1769, 1770, 1771, 1772, 1773, 1774, 1775, 1776, 1777, 1778, 1779, 1780, 1781 }
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Events: 21 cycles
#
# Overhead Command Shared Object
# ........ ....... .................
#
100.00% cat [kernel.kallsyms]

but I don't know what next unfortunately.

I also checked the same thing on the other server with the 2.6.37 kernel,
there' some thousands cgroups too, and it somehow works there immediately.

--
Alexey Vlasov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


efault at gmx

Jul 26, 2012, 7:44 AM

Post #17 of 20 (97 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Thu, 2012-07-26 at 17:02 +0400, Alexey Vlasov wrote:
> On Wed, Jul 25, 2012 at 03:57:47PM +0200, Mike Galbraith wrote:
> >
> > I'd profile it with perf, and expect to find a large pile of cycles.
>
> I did it the as following:
> # perf stat cat /proc/self/cgroup
>
> 4:blkio:/
> 3:devices:/
> 2:memory:/
> 1:cpuacct:/
>
> Performance counter stats for 'cat /proc/self/cgroup':
>
> 0.472513 task-clock # 0.000 CPUs utilized
> 1 context-switches # 0.002 M/sec
> 1 CPU-migrations # 0.002 M/sec
> 169 page-faults # 0.358 M/sec
> 1111521 cycles # 2.352 GHz
> 784737 stalled-cycles-frontend # 70.60% frontend cycles idle
> 445520 stalled-cycles-backend # 40.08% backend cycles idle
> 576622 instructions # 0.52 insns per cycle
> # 1.36 stalled cycles per insn
> 120032 branches # 254.029 M/sec
> 6577 branch-misses # 5.48% of all branches
>
> 9.114631804 seconds time elapsed

Sleepy box.

> # perf report --sort comm,dso

perf report --sort symbol,dso won't toss everything in one basket.

> Kernel address maps (/proc/{kallsyms,modules}) were restricted.
>
> Check /proc/sys/kernel/kptr_restrict before running 'perf record'.
>
> If some relocation was applied (e.g. kexec) symbols may be misresolved.
>
> Samples in kernel modules can't be resolved as well.
>
> # ========
> # captured on: Thu Jul 26 16:23:06 2012
> # hostname : l24
> # os release : 3.3.3-1gb-c-s-m
> # perf version : 3.2
> # arch : x86_64
> # nrcpus online : 24
> # nrcpus avail : 24
> # cpudesc : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
> # total memory : 74181032 kB
> # cmdline : /usr/sbin/perf record cat /proc/self/cgroup
>
> # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 1758, 1759, 1760, 1761, 1762, 1763, 1764, 1765, 1766, 1767, 1768, 1769, 1770, 1771, 1772, 1773, 1774, 1775, 1776, 1777, 1778, 1779, 1780, 1781 }
> # HEADER_CPU_TOPOLOGY info available, use -I to display
> # HEADER_NUMA_TOPOLOGY info available, use -I to display
> # ========
> #
> # Events: 21 cycles
> #
> # Overhead Command Shared Object
> # ........ ....... .................
> #
> 100.00% cat [kernel.kallsyms]
>
> but I don't know what next unfortunately.

I'll have to pass. I would just stop creating thousands of cgroups ;-)

They've become a lot more scalable fairly recently, but if you populate
those thousands with frequently runnable tasks, I suspect you'll still
see huge truckloads of scheduler in profiles.. maybe even nothing else.

> I also checked the same thing on the other server with the 2.6.37 kernel,
> there' some thousands cgroups too, and it somehow works there immediately.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


renton at renton

Aug 8, 2012, 9:40 AM

Post #18 of 20 (95 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Wed, Jul 25, 2012 at 03:57:47PM +0200, Mike Galbraith wrote:
> > Hanging on read():
> >
> > # strace -ttT cat /proc/cgroups
> >
> > 17:30:43.825005 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0 <0.000005>
> > 17:30:43.825048 open("/proc/cgroups", O_RDONLY) = 3 <0.000014>
> > 17:30:43.825085 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000004>
> > 17:30:43.825125 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000005>
> > 17:30:43.825161 read(3, "#subsys_name\thierarchy\tnum_cgrou"..., 32768) = 112 <7.447084>

In general I've changed it to synchronize_rcu_expedited () and all the
delays have gone both on writing and reading files from cgroups.

--
BRGDS. Alexey Vlasov.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


paulmck at linux

Aug 8, 2012, 9:51 AM

Post #19 of 20 (95 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Wed, Aug 08, 2012 at 08:40:33PM +0400, Alexey Vlasov wrote:
> On Wed, Jul 25, 2012 at 03:57:47PM +0200, Mike Galbraith wrote:
> > > Hanging on read():
> > >
> > > # strace -ttT cat /proc/cgroups
> > >
> > > 17:30:43.825005 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 13), ...}) = 0 <0.000005>
> > > 17:30:43.825048 open("/proc/cgroups", O_RDONLY) = 3 <0.000014>
> > > 17:30:43.825085 fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 <0.000004>
> > > 17:30:43.825125 fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 <0.000005>
> > > 17:30:43.825161 read(3, "#subsys_name\thierarchy\tnum_cgrou"..., 32768) = 112 <7.447084>
>
> In general I've changed it to synchronize_rcu_expedited () and all the
> delays have gone both on writing and reading files from cgroups.

Is the writing and reading from cgroups something that your workload
does all the time, or is it something that happens only on occasional
updates to your cgroup configuration?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


renton at renton

Aug 10, 2012, 2:53 AM

Post #20 of 20 (94 views)
Permalink
Re: Attaching a process to cgroups [In reply to]

On Wed, Aug 08, 2012 at 09:51:29AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 08, 2012 at 08:40:33PM +0400, Alexey Vlasov wrote:
> >
> > In general I've changed it to synchronize_rcu_expedited () and all the
> > delays have gone both on writing and reading files from cgroups.
>
> Is the writing and reading from cgroups something that your workload
> does all the time, or is it something that happens only on occasional
> updates to your cgroup configuration?

There always were some delay on writing. It reproduces easily, you have
to create some 1000 groups (may be it can be enough to create 1 group, I
didn't check it actually) and write pid to a task file of the group. I
described it in my first message.

Delays on reading appeared when there began an active rotation of
proccesses in task files and may be by renewing of counters
(cpuacct.stat, memory.stat) due to the cgroups hierarchy. LA has grown
from 10 to 500 and all the programms that read cgroups files in /proc
(htop for example) practically stopped working.

--
BRGDS. Alexey Vlasov.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.