Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


nishimura at mxp

Nov 23, 2009, 9:57 PM

Post #1 of 8 (141 views)
Permalink
[BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy

task_in_mem_cgroup(), which is called by select_bad_process() to check whether
a task can be a candidate for being oom-killed from memcg's limit, checks
"curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).

But this check return true(it's false positive) when:

<some path>/00 use_hierarchy == 0 <- hitting limit
<some path>/00/aa use_hierarchy == 1 <- "curr"

This leads to killing an innocent task in 00/aa. This patch is a fix for this
bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
should print information of mem_cgroup which the task being killed, not current,
belongs to.

Signed-off-by: Daisuke Nishimura <nishimura [at] mxp>
---
The bug exists and should be fixed in 2.6.31.y too.
I'll post a patch for -stable later.

mm/memcontrol.c | 4 ++--
mm/oom_kill.c | 13 +++++++------
2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ea00a93..d02f9f8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -783,7 +783,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
task_unlock(task);
if (!curr)
return 0;
- if (curr->use_hierarchy)
+ if (mem->use_hierarchy)
ret = css_is_ancestor(&curr->css, &mem->css);
else
ret = (curr == mem);
@@ -1032,7 +1032,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
static char memcg_name[PATH_MAX];
int ret;

- if (!memcg)
+ if (!memcg || !p)
return;


diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ab04537..be56461 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -356,7 +356,8 @@ static void dump_tasks(const struct mem_cgroup *mem)
} while_each_thread(g, p);
}

-static void dump_header(gfp_t gfp_mask, int order, struct mem_cgroup *mem)
+static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
+ struct mem_cgroup *mem)
{
pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
"oom_adj=%d\n",
@@ -365,7 +366,7 @@ static void dump_header(gfp_t gfp_mask, int order, struct mem_cgroup *mem)
cpuset_print_task_mems_allowed(current);
task_unlock(current);
dump_stack();
- mem_cgroup_print_oom_info(mem, current);
+ mem_cgroup_print_oom_info(mem, p);
show_mem();
if (sysctl_oom_dump_tasks)
dump_tasks(mem);
@@ -440,7 +441,7 @@ static int oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
struct task_struct *c;

if (printk_ratelimit())
- dump_header(gfp_mask, order, mem);
+ dump_header(p, gfp_mask, order, mem);

/*
* If the task is already exiting, don't alarm the sysadmin or kill
@@ -576,7 +577,7 @@ retry:
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!p) {
read_unlock(&tasklist_lock);
- dump_header(gfp_mask, order, NULL);
+ dump_header(NULL, gfp_mask, order, NULL);
panic("Out of memory and no killable processes...\n");
}

@@ -644,7 +645,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
return;

if (sysctl_panic_on_oom == 2) {
- dump_header(gfp_mask, order, NULL);
+ dump_header(NULL, gfp_mask, order, NULL);
panic("out of memory. Compulsory panic_on_oom is selected.\n");
}

@@ -663,7 +664,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,

case CONSTRAINT_NONE:
if (sysctl_panic_on_oom) {
- dump_header(gfp_mask, order, NULL);
+ dump_header(NULL, gfp_mask, order, NULL);
panic("out of memory. panic_on_oom is selected\n");
}
/* Fall-through */
--
1.5.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Nov 24, 2009, 5:31 AM

Post #2 of 8 (120 views)
Permalink
Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy [In reply to]

On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
<nishimura [at] mxp> wrote:
> task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> a task can be a candidate for being oom-killed from memcg's limit, checks
> "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
>
> But this check return true(it's false positive) when:
>
>        <some path>/00          use_hierarchy == 0      <- hitting limit
>          <some path>/00/aa     use_hierarchy == 1      <- "curr"
>
> This leads to killing an innocent task in 00/aa. This patch is a fix for this
> bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> should print information of mem_cgroup which the task being killed, not current,
> belongs to.
>

Quick Question: What happens if <some path>/00 has no tasks in it
after your patches?

Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


d-nishimura at mtf

Nov 24, 2009, 6:00 AM

Post #3 of 8 (119 views)
Permalink
Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy [In reply to]

On Tue, 24 Nov 2009 19:01:54 +0530
Balbir Singh <balbir [at] linux> wrote:

> On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> <nishimura [at] mxp> wrote:
> > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > a task can be a candidate for being oom-killed from memcg's limit, checks
> > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> >
> > But this check return true(it's false positive) when:
> >
> >        <some path>/00          use_hierarchy == 0      <- hitting limit
> >          <some path>/00/aa     use_hierarchy == 1      <- "curr"
> >
> > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > should print information of mem_cgroup which the task being killed, not current,
> > belongs to.
> >
>
> Quick Question: What happens if <some path>/00 has no tasks in it
> after your patches?
>
Nothing would happen because <some path>/00 never hit its limit.

The bug that this patch fixes is:

- create a dir <some path>/00 and set some limits.
- create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
- run some programs in both in 00 and 00/aa. programs in 00 should be
big enough to cause oom by its limit.
- when oom happens by 00's limit, tasks in 00/aa can also be killed.


Regards,
Daisuke Nishimura.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Nov 24, 2009, 9:04 AM

Post #4 of 8 (119 views)
Permalink
Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy [In reply to]

* Daisuke Nishimura <d-nishimura [at] mtf> [2009-11-24 23:00:29]:

> On Tue, 24 Nov 2009 19:01:54 +0530
> Balbir Singh <balbir [at] linux> wrote:
>
> > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > <nishimura [at] mxp> wrote:
> > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > >
> > > But this check return true(it's false positive) when:
> > >
> > >        <some path>/00          use_hierarchy == 0      <- hitting limit
> > >          <some path>/00/aa     use_hierarchy == 1      <- "curr"
> > >
> > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > should print information of mem_cgroup which the task being killed, not current,
> > > belongs to.
> > >
> >
> > Quick Question: What happens if <some path>/00 has no tasks in it
> > after your patches?
> >
> Nothing would happen because <some path>/00 never hit its limit.

Why not? I am talking of a scenario where <some path>/00 is set to a
limit (similar to your example) and hits its limit, but the groups
under it have no limits, but tasks. Shouldn't we be scanning
<some path>/00/aa as well?

>
> The bug that this patch fixes is:
>
> - create a dir <some path>/00 and set some limits.
> - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> - run some programs in both in 00 and 00/aa. programs in 00 should be
> big enough to cause oom by its limit.
> - when oom happens by 00's limit, tasks in 00/aa can also be killed.
>

To be honest, the last part is fair, specifically if 00/aa has a task
that is really the heaviest task as per the oom logic. no? Are you
suggesting that only tasks in <some path>/00 should be selected by the
oom logic?

--
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


nishimura at mxp

Nov 24, 2009, 3:49 PM

Post #5 of 8 (119 views)
Permalink
Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy [In reply to]

On Tue, 24 Nov 2009 22:34:02 +0530, Balbir Singh <balbir [at] linux> wrote:
> * Daisuke Nishimura <d-nishimura [at] mtf> [2009-11-24 23:00:29]:
>
> > On Tue, 24 Nov 2009 19:01:54 +0530
> > Balbir Singh <balbir [at] linux> wrote:
> >
> > > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > > <nishimura [at] mxp> wrote:
> > > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > > >
> > > > But this check return true(it's false positive) when:
> > > >
> > > >        <some path>/00          use_hierarchy == 0      <- hitting limit
> > > >          <some path>/00/aa     use_hierarchy == 1      <- "curr"
> > > >
> > > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > > should print information of mem_cgroup which the task being killed, not current,
> > > > belongs to.
> > > >
> > >
> > > Quick Question: What happens if <some path>/00 has no tasks in it
> > > after your patches?
> > >
> > Nothing would happen because <some path>/00 never hit its limit.
>
> Why not? I am talking of a scenario where <some path>/00 is set to a
> limit (similar to your example) and hits its limit, but the groups
> under it have no limits, but tasks. Shouldn't we be scanning
> <some path>/00/aa as well?
>
> >
> > The bug that this patch fixes is:
> >
> > - create a dir <some path>/00 and set some limits.
> > - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> > - run some programs in both in 00 and 00/aa. programs in 00 should be
> > big enough to cause oom by its limit.
> > - when oom happens by 00's limit, tasks in 00/aa can also be killed.
> >
>
> To be honest, the last part is fair, specifically if 00/aa has a task
> that is really the heaviest task as per the oom logic. no? Are you
> suggesting that only tasks in <some path>/00 should be selected by the
> oom logic?
>
All of your comments would be rational if hierarchy is enabled in 00(it's
also enabled in 00/aa automatically in this case).
I'm saying about the case where it's disabled in 00 but enabled in 00/aa.

In this scenario, charges by tasks in 00/aa is(and should not be) charged to 00.
And oom caused by 00's limit should not affect the task in 00/aa.


Regards,
Daisuke Nishimura.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kamezawa.hiroyu at jp

Nov 24, 2009, 4:07 PM

Post #6 of 8 (118 views)
Permalink
Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy [In reply to]

On Tue, 24 Nov 2009 22:34:02 +0530
Balbir Singh <balbir [at] linux> wrote:

> * Daisuke Nishimura <d-nishimura [at] mtf> [2009-11-24 23:00:29]:
>
> > On Tue, 24 Nov 2009 19:01:54 +0530
> > Balbir Singh <balbir [at] linux> wrote:
> >
> > > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > > <nishimura [at] mxp> wrote:
> > > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > > >
> > > > But this check return true(it's false positive) when:
> > > >
> > > >        <some path>/00          use_hierarchy == 0      <- hitting limit
> > > >          <some path>/00/aa     use_hierarchy == 1      <- "curr"
> > > >
> > > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > > should print information of mem_cgroup which the task being killed, not current,
> > > > belongs to.
> > > >
> > >
> > > Quick Question: What happens if <some path>/00 has no tasks in it
> > > after your patches?
> > >
> > Nothing would happen because <some path>/00 never hit its limit.
>
> Why not? I am talking of a scenario where <some path>/00 is set to a
> limit (similar to your example) and hits its limit, but the groups
> under it have no limits, but tasks. Shouldn't we be scanning
> <some path>/00/aa as well?
>
No. <some path>/00 == use_hierarchy=0 means _all_ children's accounting
information is never added up to <some path>/00.

If there is no task in <some path>/00, it means <some path>/00 contains only
file cache and not-migrated rss. To hit limit, the admin has to make
memory.(memsw).limit_in_bytes smaller. But in this case, oom is not called.
-ENOMEM is returned to users. IIUC.




> >
> > The bug that this patch fixes is:
> >
> > - create a dir <some path>/00 and set some limits.
> > - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> > - run some programs in both in 00 and 00/aa. programs in 00 should be
> > big enough to cause oom by its limit.
> > - when oom happens by 00's limit, tasks in 00/aa can also be killed.
> >
>
> To be honest, the last part is fair, specifically if 00/aa has a task
> that is really the heaviest task as per the oom logic. no? Are you
> suggesting that only tasks in <some path>/00 should be selected by the
> oom logic?
>
<some path>/00 and <some path>/00/aa has completely different accounting set.
There are no hierarchy relationship. The directory tree shows "virtual"
hierarchy but in reality, their relationship is horizontal rather than hierarchycal.

So, killing tasks only in <some path>/00 is better.

Thanks,
-Kame



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Nov 24, 2009, 7:29 PM

Post #7 of 8 (119 views)
Permalink
Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy [In reply to]

* nishimura [at] mxp <nishimura [at] mxp> [2009-11-25 08:49:10]:

> On Tue, 24 Nov 2009 22:34:02 +0530, Balbir Singh <balbir [at] linux> wrote:
> > * Daisuke Nishimura <d-nishimura [at] mtf> [2009-11-24 23:00:29]:
> >
> > > On Tue, 24 Nov 2009 19:01:54 +0530
> > > Balbir Singh <balbir [at] linux> wrote:
> > >
> > > > On Tue, Nov 24, 2009 at 11:27 AM, Daisuke Nishimura
> > > > <nishimura [at] mxp> wrote:
> > > > > task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> > > > > a task can be a candidate for being oom-killed from memcg's limit, checks
> > > > > "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
> > > > >
> > > > > But this check return true(it's false positive) when:
> > > > >
> > > > >        <some path>/00          use_hierarchy == 0      <- hitting limit
> > > > >          <some path>/00/aa     use_hierarchy == 1      <- "curr"
> > > > >
> > > > > This leads to killing an innocent task in 00/aa. This patch is a fix for this
> > > > > bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> > > > > should print information of mem_cgroup which the task being killed, not current,
> > > > > belongs to.
> > > > >
> > > >
> > > > Quick Question: What happens if <some path>/00 has no tasks in it
> > > > after your patches?
> > > >
> > > Nothing would happen because <some path>/00 never hit its limit.
> >
> > Why not? I am talking of a scenario where <some path>/00 is set to a
> > limit (similar to your example) and hits its limit, but the groups
> > under it have no limits, but tasks. Shouldn't we be scanning
> > <some path>/00/aa as well?
> >
> > >
> > > The bug that this patch fixes is:
> > >
> > > - create a dir <some path>/00 and set some limits.
> > > - create a sub dir <some path>/00/aa w/o any limits, and enable hierarchy.
> > > - run some programs in both in 00 and 00/aa. programs in 00 should be
> > > big enough to cause oom by its limit.
> > > - when oom happens by 00's limit, tasks in 00/aa can also be killed.
> > >
> >
> > To be honest, the last part is fair, specifically if 00/aa has a task
> > that is really the heaviest task as per the oom logic. no? Are you
> > suggesting that only tasks in <some path>/00 should be selected by the
> > oom logic?
> >
> All of your comments would be rational if hierarchy is enabled in 00(it's
> also enabled in 00/aa automatically in this case).
> I'm saying about the case where it's disabled in 00 but enabled in 00/aa.
>

OK, I misunderstood the example then, so even though hierarchy is
disabled, we kill a task in 00/aa when 00 hits the limit. Thanks for
clarifying.

> In this scenario, charges by tasks in 00/aa is(and should not be) charged to 00.
> And oom caused by 00's limit should not affect the task in 00/aa.
>
>
> Regards,
> Daisuke Nishimura.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo [at] kvack For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont [at] kvack"> email [at] kvack </a>
>

--
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Nov 24, 2009, 8:09 PM

Post #8 of 8 (119 views)
Permalink
Re: [BUGFIX][PATCH -mmotm] memcg: avoid oom-killing innocent task in case of use_hierarchy [In reply to]

* nishimura [at] mxp <nishimura [at] mxp> [2009-11-24 14:57:59]:

> task_in_mem_cgroup(), which is called by select_bad_process() to check whether
> a task can be a candidate for being oom-killed from memcg's limit, checks
> "curr->use_hierarchy"("curr" is the mem_cgroup the task belongs to).
>
> But this check return true(it's false positive) when:
>
> <some path>/00 use_hierarchy == 0 <- hitting limit
> <some path>/00/aa use_hierarchy == 1 <- "curr"
>
> This leads to killing an innocent task in 00/aa. This patch is a fix for this
> bug. And this patch also fixes the arg for mem_cgroup_print_oom_info(). We
> should print information of mem_cgroup which the task being killed, not current,
> belongs to.
>
> Signed-off-by: Daisuke Nishimura <nishimura [at] mxp>


Acked-by: Balbir Singh <balbir [at] linux>

--
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.