Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH] mm: check zone->all_unreclaimable in all_unreclaimable()

 

 

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


avagin at openvz

Mar 5, 2011, 3:44 AM

Post #1 of 27 (2344 views)
Permalink
[PATCH] mm: check zone->all_unreclaimable in all_unreclaimable()

Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
kernel may hang up, because shrink_zones() will do nothing, but
all_unreclaimable() will say, that zone has reclaimable pages.

do_try_to_free_pages()
shrink_zones()
for_each_zone
if (zone->all_unreclaimable)
continue
if !all_unreclaimable(zonelist, sc)
return 1

__alloc_pages_slowpath()
retry:
did_some_progress = do_try_to_free_pages(page)
...
if (!page && did_some_progress)
retry;

Signed-off-by: Andrey Vagin <avagin [at] openvz>
---
mm/vmscan.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6771ea7..1c056f7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,

for_each_zone_zonelist_nodemask(zone, z, zonelist,
gfp_zone(sc->gfp_mask), sc->nodemask) {
+ if (zone->all_unreclaimable)
+ continue;
if (!populated_zone(zone))
continue;
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
--
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Mar 5, 2011, 7:20 AM

Post #2 of 27 (2285 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
> Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
> kernel may hang up, because shrink_zones() will do nothing, but
> all_unreclaimable() will say, that zone has reclaimable pages.
>
> do_try_to_free_pages()
> shrink_zones()
> for_each_zone
> if (zone->all_unreclaimable)
> continue
> if !all_unreclaimable(zonelist, sc)
> return 1
>
> __alloc_pages_slowpath()
> retry:
> did_some_progress = do_try_to_free_pages(page)
> ...
> if (!page && did_some_progress)
> retry;
>
> Signed-off-by: Andrey Vagin <avagin [at] openvz>
> ---
> mm/vmscan.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 6771ea7..1c056f7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,
>
> for_each_zone_zonelist_nodemask(zone, z, zonelist,
> gfp_zone(sc->gfp_mask), sc->nodemask) {
> + if (zone->all_unreclaimable)
> + continue;
> if (!populated_zone(zone))
> continue;
> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))


zone_reclaimable checks it. Isn't it enough?
Does the hang up really happen or see it by code review?

> --
> 1.7.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo [at] kvack For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont [at] kvack"> email [at] kvack </a>

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avagin at gmail

Mar 5, 2011, 7:34 AM

Post #3 of 27 (2264 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On 03/05/2011 06:20 PM, Minchan Kim wrote:
> On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
>> Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
>> kernel may hang up, because shrink_zones() will do nothing, but
>> all_unreclaimable() will say, that zone has reclaimable pages.
>>
>> do_try_to_free_pages()
>> shrink_zones()
>> for_each_zone
>> if (zone->all_unreclaimable)
>> continue
>> if !all_unreclaimable(zonelist, sc)
>> return 1
>>
>> __alloc_pages_slowpath()
>> retry:
>> did_some_progress = do_try_to_free_pages(page)
>> ...
>> if (!page&& did_some_progress)
>> retry;
>>
>> Signed-off-by: Andrey Vagin<avagin [at] openvz>
>> ---
>> mm/vmscan.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 6771ea7..1c056f7 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,
>>
>> for_each_zone_zonelist_nodemask(zone, z, zonelist,
>> gfp_zone(sc->gfp_mask), sc->nodemask) {
>> + if (zone->all_unreclaimable)
>> + continue;
>> if (!populated_zone(zone))
>> continue;
>> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>
> zone_reclaimable checks it. Isn't it enough?
I sent one more patch [PATCH] mm: skip zombie in OOM-killer.
This two patches are enough.
> Does the hang up really happen or see it by code review?
Yes. You can reproduce it for help the attached python program. It's not
very clever:)
It make the following actions in loop:
1. fork
2. mmap
3. touch memory
4. read memory
5. munmmap

>> --
>> 1.7.1
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo [at] kvack For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
>> Don't email:<a href=mailto:"dont [at] kvack"> email [at] kvack</a>
Attachments: memeater.py (0.64 KB)


minchan.kim at gmail

Mar 5, 2011, 7:53 AM

Post #4 of 27 (2305 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Sat, Mar 05, 2011 at 06:34:37PM +0300, Andrew Vagin wrote:
> On 03/05/2011 06:20 PM, Minchan Kim wrote:
> >On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
> >>Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
> >>kernel may hang up, because shrink_zones() will do nothing, but
> >>all_unreclaimable() will say, that zone has reclaimable pages.
> >>
> >>do_try_to_free_pages()
> >> shrink_zones()
> >> for_each_zone
> >> if (zone->all_unreclaimable)
> >> continue
> >> if !all_unreclaimable(zonelist, sc)
> >> return 1
> >>
> >>__alloc_pages_slowpath()
> >>retry:
> >> did_some_progress = do_try_to_free_pages(page)
> >> ...
> >> if (!page&& did_some_progress)
> >> retry;
> >>
> >>Signed-off-by: Andrey Vagin<avagin [at] openvz>
> >>---
> >> mm/vmscan.c | 2 ++
> >> 1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >>diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>index 6771ea7..1c056f7 100644
> >>--- a/mm/vmscan.c
> >>+++ b/mm/vmscan.c
> >>@@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,
> >>
> >> for_each_zone_zonelist_nodemask(zone, z, zonelist,
> >> gfp_zone(sc->gfp_mask), sc->nodemask) {
> >>+ if (zone->all_unreclaimable)
> >>+ continue;
> >> if (!populated_zone(zone))
> >> continue;
> >> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> >
> >zone_reclaimable checks it. Isn't it enough?
> I sent one more patch [PATCH] mm: skip zombie in OOM-killer.
> This two patches are enough.

Sorry if I confused you.
I mean zone->all_unreclaimable become true if !zone_reclaimable in balance_pgdat.
zone_reclaimable compares recent pages_scanned with the number of zone lru pages.
So too many page scanning in small lru pages makes the zone to unreclaimable zone.

In all_unreclaimable, we calls zone_reclaimable to detect it.
It's the same thing with your patch.

> >Does the hang up really happen or see it by code review?
> Yes. You can reproduce it for help the attached python program. It's
> not very clever:)
> It make the following actions in loop:
> 1. fork
> 2. mmap
> 3. touch memory
> 4. read memory
> 5. munmmap

It seems the test program makes fork bombs and memory hogging.
If you applied this patch, the problem is gone?

>
> >>--
> >>1.7.1
> >>
> >>--
> >>To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>the body to majordomo [at] kvack For more info on Linux MM,
> >>see: http://www.linux-mm.org/ .
> >>Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> >>Don't email:<a href=mailto:"dont [at] kvack"> email [at] kvack</a>
>

> import sys, time, mmap, os
> from subprocess import Popen, PIPE
> import random
>
> global mem_size
>
> def info(msg):
> pid = os.getpid()
> print >> sys.stderr, "%s: %s" % (pid, msg)
> sys.stderr.flush()
>
>
>
> def memory_loop(cmd = "a"):
> """
> cmd may be:
> c: check memory
> else: touch memory
> """
> c = 0
> for j in xrange(0, mem_size):
> if cmd == "c":
> if f[j<<12] != chr(j % 255):
> info("Data corruption")
> sys.exit(1)
> else:
> f[j<<12] = chr(j % 255)
>
> while True:
> pid = os.fork()
> if (pid != 0):
> mem_size = random.randint(0, 56 * 4096)
> f = mmap.mmap(-1, mem_size << 12, mmap.MAP_ANONYMOUS|mmap.MAP_PRIVATE)
> memory_loop()
> memory_loop("c")
> f.close()


--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avagin at gmail

Mar 5, 2011, 8:41 AM

Post #5 of 27 (2272 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On 03/05/2011 06:53 PM, Minchan Kim wrote:
> On Sat, Mar 05, 2011 at 06:34:37PM +0300, Andrew Vagin wrote:
>> On 03/05/2011 06:20 PM, Minchan Kim wrote:
>>> On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
>>>> Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
>>>> kernel may hang up, because shrink_zones() will do nothing, but
>>>> all_unreclaimable() will say, that zone has reclaimable pages.
>>>>
>>>> do_try_to_free_pages()
>>>> shrink_zones()
>>>> for_each_zone
>>>> if (zone->all_unreclaimable)
>>>> continue
>>>> if !all_unreclaimable(zonelist, sc)
>>>> return 1
>>>>
>>>> __alloc_pages_slowpath()
>>>> retry:
>>>> did_some_progress = do_try_to_free_pages(page)
>>>> ...
>>>> if (!page&& did_some_progress)
>>>> retry;
>>>>
>>>> Signed-off-by: Andrey Vagin<avagin [at] openvz>
>>>> ---
>>>> mm/vmscan.c | 2 ++
>>>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index 6771ea7..1c056f7 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,
>>>>
>>>> for_each_zone_zonelist_nodemask(zone, z, zonelist,
>>>> gfp_zone(sc->gfp_mask), sc->nodemask) {
>>>> + if (zone->all_unreclaimable)
>>>> + continue;
>>>> if (!populated_zone(zone))
>>>> continue;
>>>> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>>> zone_reclaimable checks it. Isn't it enough?
>> I sent one more patch [PATCH] mm: skip zombie in OOM-killer.
>> This two patches are enough.
> Sorry if I confused you.
> I mean zone->all_unreclaimable become true if !zone_reclaimable in balance_pgdat.
> zone_reclaimable compares recent pages_scanned with the number of zone lru pages.
> So too many page scanning in small lru pages makes the zone to unreclaimable zone.
>
> In all_unreclaimable, we calls zone_reclaimable to detect it.
> It's the same thing with your patch.
balance_pgdat set zone->all_unreclaimable, but the problem is that it is
cleaned late.

The problem is that zone->all_unreclaimable = True, but
zone_reclaimable() returns True too.

zone->all_unreclaimable will be cleaned in free_*_pages, but this may be
late. It is enough allocate one page from page cache, that
zone_reclaimable() returns True and zone->all_unreclaimable becomes True.
>>> Does the hang up really happen or see it by code review?
>> Yes. You can reproduce it for help the attached python program. It's
>> not very clever:)
>> It make the following actions in loop:
>> 1. fork
>> 2. mmap
>> 3. touch memory
>> 4. read memory
>> 5. munmmap
> It seems the test program makes fork bombs and memory hogging.
> If you applied this patch, the problem is gone?
Yes.
>>>> --
>>>> 1.7.1
>>>>
>>>> --
>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>> the body to majordomo [at] kvack For more info on Linux MM,
>>>> see: http://www.linux-mm.org/ .
>>>> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
>>>> Don't email:<a href=mailto:"dont [at] kvack"> email [at] kvack</a>
>> import sys, time, mmap, os
>> from subprocess import Popen, PIPE
>> import random
>>
>> global mem_size
>>
>> def info(msg):
>> pid = os.getpid()
>> print>> sys.stderr, "%s: %s" % (pid, msg)
>> sys.stderr.flush()
>>
>>
>>
>> def memory_loop(cmd = "a"):
>> """
>> cmd may be:
>> c: check memory
>> else: touch memory
>> """
>> c = 0
>> for j in xrange(0, mem_size):
>> if cmd == "c":
>> if f[j<<12] != chr(j % 255):
>> info("Data corruption")
>> sys.exit(1)
>> else:
>> f[j<<12] = chr(j % 255)
>>
>> while True:
>> pid = os.fork()
>> if (pid != 0):
>> mem_size = random.randint(0, 56 * 4096)
>> f = mmap.mmap(-1, mem_size<< 12, mmap.MAP_ANONYMOUS|mmap.MAP_PRIVATE)
>> memory_loop()
>> memory_loop("c")
>> f.close()
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Mar 5, 2011, 9:07 AM

Post #6 of 27 (2270 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Sat, Mar 05, 2011 at 07:41:26PM +0300, Andrew Vagin wrote:
> On 03/05/2011 06:53 PM, Minchan Kim wrote:
> >On Sat, Mar 05, 2011 at 06:34:37PM +0300, Andrew Vagin wrote:
> >>On 03/05/2011 06:20 PM, Minchan Kim wrote:
> >>>On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
> >>>>Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
> >>>>kernel may hang up, because shrink_zones() will do nothing, but
> >>>>all_unreclaimable() will say, that zone has reclaimable pages.
> >>>>
> >>>>do_try_to_free_pages()
> >>>> shrink_zones()
> >>>> for_each_zone
> >>>> if (zone->all_unreclaimable)
> >>>> continue
> >>>> if !all_unreclaimable(zonelist, sc)
> >>>> return 1
> >>>>
> >>>>__alloc_pages_slowpath()
> >>>>retry:
> >>>> did_some_progress = do_try_to_free_pages(page)
> >>>> ...
> >>>> if (!page&& did_some_progress)
> >>>> retry;
> >>>>
> >>>>Signed-off-by: Andrey Vagin<avagin [at] openvz>
> >>>>---
> >>>> mm/vmscan.c | 2 ++
> >>>> 1 files changed, 2 insertions(+), 0 deletions(-)
> >>>>
> >>>>diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>>>index 6771ea7..1c056f7 100644
> >>>>--- a/mm/vmscan.c
> >>>>+++ b/mm/vmscan.c
> >>>>@@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,
> >>>>
> >>>> for_each_zone_zonelist_nodemask(zone, z, zonelist,
> >>>> gfp_zone(sc->gfp_mask), sc->nodemask) {
> >>>>+ if (zone->all_unreclaimable)
> >>>>+ continue;
> >>>> if (!populated_zone(zone))
> >>>> continue;
> >>>> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> >>>zone_reclaimable checks it. Isn't it enough?
> >>I sent one more patch [PATCH] mm: skip zombie in OOM-killer.
> >>This two patches are enough.
> >Sorry if I confused you.
> >I mean zone->all_unreclaimable become true if !zone_reclaimable in balance_pgdat.
> >zone_reclaimable compares recent pages_scanned with the number of zone lru pages.
> >So too many page scanning in small lru pages makes the zone to unreclaimable zone.
> >
> >In all_unreclaimable, we calls zone_reclaimable to detect it.
> >It's the same thing with your patch.
> balance_pgdat set zone->all_unreclaimable, but the problem is that
> it is cleaned late.

Yes. It can be delayed by pcp so (zone->all_unreclaimable = true) is
a false alram since zone have a free page and it can be returned
to free list by drain_all_pages in next turn.

>
> The problem is that zone->all_unreclaimable = True, but
> zone_reclaimable() returns True too.

Why is it a problem?
If zone->all_unreclaimable gives a false alram, we does need to check
it again by zone_reclaimable call.

If we believe a false alarm and give up the reclaim, maybe we have to make
unnecessary oom kill.

>
> zone->all_unreclaimable will be cleaned in free_*_pages, but this
> may be late. It is enough allocate one page from page cache, that
> zone_reclaimable() returns True and zone->all_unreclaimable becomes
> True.
> >>>Does the hang up really happen or see it by code review?
> >>Yes. You can reproduce it for help the attached python program. It's
> >>not very clever:)
> >>It make the following actions in loop:
> >>1. fork
> >>2. mmap
> >>3. touch memory
> >>4. read memory
> >>5. munmmap
> >It seems the test program makes fork bombs and memory hogging.
> >If you applied this patch, the problem is gone?
> Yes.

Hmm.. Although it solves the problem, I think it's not a good idea that
depends on false alram and give up the retry.


> >>>>--
> >>>>1.7.1
> >>>>
> >>>>--
> >>>>To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>>>the body to majordomo [at] kvack For more info on Linux MM,
> >>>>see: http://www.linux-mm.org/ .
> >>>>Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> >>>>Don't email:<a href=mailto:"dont [at] kvack"> email [at] kvack</a>
> >>import sys, time, mmap, os
> >>from subprocess import Popen, PIPE
> >>import random
> >>
> >>global mem_size
> >>
> >>def info(msg):
> >> pid = os.getpid()
> >> print>> sys.stderr, "%s: %s" % (pid, msg)
> >> sys.stderr.flush()
> >>
> >>
> >>
> >>def memory_loop(cmd = "a"):
> >> """
> >> cmd may be:
> >> c: check memory
> >> else: touch memory
> >> """
> >> c = 0
> >> for j in xrange(0, mem_size):
> >> if cmd == "c":
> >> if f[j<<12] != chr(j % 255):
> >> info("Data corruption")
> >> sys.exit(1)
> >> else:
> >> f[j<<12] = chr(j % 255)
> >>
> >>while True:
> >> pid = os.fork()
> >> if (pid != 0):
> >> mem_size = random.randint(0, 56 * 4096)
> >> f = mmap.mmap(-1, mem_size<< 12, mmap.MAP_ANONYMOUS|mmap.MAP_PRIVATE)
> >> memory_loop()
> >> memory_loop("c")
> >> f.close()
> >
>

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


akpm at linux-foundation

Mar 7, 2011, 1:58 PM

Post #7 of 27 (2255 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Sun, 6 Mar 2011 02:07:59 +0900
Minchan Kim <minchan.kim [at] gmail> wrote:

> On Sat, Mar 05, 2011 at 07:41:26PM +0300, Andrew Vagin wrote:
> > On 03/05/2011 06:53 PM, Minchan Kim wrote:
> > >On Sat, Mar 05, 2011 at 06:34:37PM +0300, Andrew Vagin wrote:
> > >>On 03/05/2011 06:20 PM, Minchan Kim wrote:
> > >>>On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
> > >>>>Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
> > >>>>kernel may hang up, because shrink_zones() will do nothing, but
> > >>>>all_unreclaimable() will say, that zone has reclaimable pages.
> > >>>>
> > >>>>do_try_to_free_pages()
> > >>>> shrink_zones()
> > >>>> for_each_zone
> > >>>> if (zone->all_unreclaimable)
> > >>>> continue
> > >>>> if !all_unreclaimable(zonelist, sc)
> > >>>> return 1
> > >>>>
> > >>>>__alloc_pages_slowpath()
> > >>>>retry:
> > >>>> did_some_progress = do_try_to_free_pages(page)
> > >>>> ...
> > >>>> if (!page&& did_some_progress)
> > >>>> retry;
> > >>>>
> > >>>>Signed-off-by: Andrey Vagin<avagin [at] openvz>
> > >>>>---
> > >>>> mm/vmscan.c | 2 ++
> > >>>> 1 files changed, 2 insertions(+), 0 deletions(-)
> > >>>>
> > >>>>diff --git a/mm/vmscan.c b/mm/vmscan.c
> > >>>>index 6771ea7..1c056f7 100644
> > >>>>--- a/mm/vmscan.c
> > >>>>+++ b/mm/vmscan.c
> > >>>>@@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,
> > >>>>
> > >>>> for_each_zone_zonelist_nodemask(zone, z, zonelist,
> > >>>> gfp_zone(sc->gfp_mask), sc->nodemask) {
> > >>>>+ if (zone->all_unreclaimable)
> > >>>>+ continue;
> > >>>> if (!populated_zone(zone))
> > >>>> continue;
> > >>>> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> > >>>zone_reclaimable checks it. Isn't it enough?
> > >>I sent one more patch [PATCH] mm: skip zombie in OOM-killer.
> > >>This two patches are enough.
> > >Sorry if I confused you.
> > >I mean zone->all_unreclaimable become true if !zone_reclaimable in balance_pgdat.
> > >zone_reclaimable compares recent pages_scanned with the number of zone lru pages.
> > >So too many page scanning in small lru pages makes the zone to unreclaimable zone.
> > >
> > >In all_unreclaimable, we calls zone_reclaimable to detect it.
> > >It's the same thing with your patch.
> > balance_pgdat set zone->all_unreclaimable, but the problem is that
> > it is cleaned late.
>
> Yes. It can be delayed by pcp so (zone->all_unreclaimable = true) is
> a false alram since zone have a free page and it can be returned
> to free list by drain_all_pages in next turn.
>
> >
> > The problem is that zone->all_unreclaimable = True, but
> > zone_reclaimable() returns True too.
>
> Why is it a problem?
> If zone->all_unreclaimable gives a false alram, we does need to check
> it again by zone_reclaimable call.
>
> If we believe a false alarm and give up the reclaim, maybe we have to make
> unnecessary oom kill.
>
> >
> > zone->all_unreclaimable will be cleaned in free_*_pages, but this
> > may be late. It is enough allocate one page from page cache, that
> > zone_reclaimable() returns True and zone->all_unreclaimable becomes
> > True.
> > >>>Does the hang up really happen or see it by code review?
> > >>Yes. You can reproduce it for help the attached python program. It's
> > >>not very clever:)
> > >>It make the following actions in loop:
> > >>1. fork
> > >>2. mmap
> > >>3. touch memory
> > >>4. read memory
> > >>5. munmmap
> > >It seems the test program makes fork bombs and memory hogging.
> > >If you applied this patch, the problem is gone?
> > Yes.
>
> Hmm.. Although it solves the problem, I think it's not a good idea that
> depends on false alram and give up the retry.

Any alternative proposals? We should get the livelock fixed if possible..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Mar 7, 2011, 3:45 PM

Post #8 of 27 (2258 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Tue, Mar 8, 2011 at 6:58 AM, Andrew Morton <akpm [at] linux-foundation> wrote:
> On Sun, 6 Mar 2011 02:07:59 +0900
> Minchan Kim <minchan.kim [at] gmail> wrote:
>
>> On Sat, Mar 05, 2011 at 07:41:26PM +0300, Andrew Vagin wrote:
>> > On 03/05/2011 06:53 PM, Minchan Kim wrote:
>> > >On Sat, Mar 05, 2011 at 06:34:37PM +0300, Andrew Vagin wrote:
>> > >>On 03/05/2011 06:20 PM, Minchan Kim wrote:
>> > >>>On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
>> > >>>>Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
>> > >>>>kernel may hang up, because shrink_zones() will do nothing, but
>> > >>>>all_unreclaimable() will say, that zone has reclaimable pages.
>> > >>>>
>> > >>>>do_try_to_free_pages()
>> > >>>>        shrink_zones()
>> > >>>>                 for_each_zone
>> > >>>>                        if (zone->all_unreclaimable)
>> > >>>>                                continue
>> > >>>>        if !all_unreclaimable(zonelist, sc)
>> > >>>>                return 1
>> > >>>>
>> > >>>>__alloc_pages_slowpath()
>> > >>>>retry:
>> > >>>>        did_some_progress = do_try_to_free_pages(page)
>> > >>>>        ...
>> > >>>>        if (!page&&   did_some_progress)
>> > >>>>                retry;
>> > >>>>
>> > >>>>Signed-off-by: Andrey Vagin<avagin [at] openvz>
>> > >>>>---
>> > >>>>  mm/vmscan.c |    2 ++
>> > >>>>  1 files changed, 2 insertions(+), 0 deletions(-)
>> > >>>>
>> > >>>>diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > >>>>index 6771ea7..1c056f7 100644
>> > >>>>--- a/mm/vmscan.c
>> > >>>>+++ b/mm/vmscan.c
>> > >>>>@@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,
>> > >>>>
>> > >>>>        for_each_zone_zonelist_nodemask(zone, z, zonelist,
>> > >>>>                        gfp_zone(sc->gfp_mask), sc->nodemask) {
>> > >>>>+               if (zone->all_unreclaimable)
>> > >>>>+                       continue;
>> > >>>>                if (!populated_zone(zone))
>> > >>>>                        continue;
>> > >>>>                if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>> > >>>zone_reclaimable checks it. Isn't it enough?
>> > >>I sent one more patch [PATCH] mm: skip zombie in OOM-killer.
>> > >>This two patches are enough.
>> > >Sorry if I confused you.
>> > >I mean zone->all_unreclaimable become true if !zone_reclaimable in balance_pgdat.
>> > >zone_reclaimable compares recent pages_scanned with the number of zone lru pages.
>> > >So too many page scanning in small lru pages makes the zone to unreclaimable zone.
>> > >
>> > >In all_unreclaimable, we calls zone_reclaimable to detect it.
>> > >It's the same thing with your patch.
>> > balance_pgdat set zone->all_unreclaimable, but the problem is that
>> > it is cleaned late.
>>
>> Yes. It can be delayed by pcp so (zone->all_unreclaimable = true) is
>> a false alram since zone have a free page and it can be returned
>> to free list by drain_all_pages in next turn.
>>
>> >
>> > The problem is that zone->all_unreclaimable = True, but
>> > zone_reclaimable() returns True too.
>>
>> Why is it a problem?
>> If zone->all_unreclaimable gives a false alram, we does need to check
>> it again by zone_reclaimable call.
>>
>> If we believe a false alarm and give up the reclaim, maybe we have to make
>> unnecessary oom kill.
>>
>> >
>> > zone->all_unreclaimable will be cleaned in free_*_pages, but this
>> > may be late. It is enough allocate one page from page cache, that
>> > zone_reclaimable() returns True and zone->all_unreclaimable becomes
>> > True.
>> > >>>Does the hang up really happen or see it by code review?
>> > >>Yes. You can reproduce it for help the attached python program. It's
>> > >>not very clever:)
>> > >>It make the following actions in loop:
>> > >>1. fork
>> > >>2. mmap
>> > >>3. touch memory
>> > >>4. read memory
>> > >>5. munmmap
>> > >It seems the test program makes fork bombs and memory hogging.
>> > >If you applied this patch, the problem is gone?
>> > Yes.
>>
>> Hmm.. Although it solves the problem, I think it's not a good idea that
>> depends on false alram and give up the retry.
>
> Any alternative proposals?  We should get the livelock fixed if possible..
>

And we should avoid unnecessary OOM kill if possible.

I think the problem is caused by (zone->pages_scanned <
zone_reclaimable_pages(zone) * 6). I am not sure (* 6) is a best. It
would be rather big on recent big DRAM machines.

I think it is a trade-off between latency and OOM kill.
If we decrease the magic value, maybe we should prevent the almost
livelock but happens unnecessary OOM kill.

And I think zone_reclaimable not fair.
For example, too many scanning makes reclaimable state to
unreclaimable state. Maybe it takes a very long time. But just some
page free makes unreclaimable state to reclaimabe with very easy. So
we need much painful reclaiming for changing reclaimable state with
unreclaimabe state. it would affect latency very much.

Maybe we need more smart zone_reclaimabe which is adaptive with memory pressure.

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kamezawa.hiroyu at jp

Mar 7, 2011, 4:44 PM

Post #9 of 27 (2250 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Mon, 7 Mar 2011 13:58:31 -0800
Andrew Morton <akpm [at] linux-foundation> wrote:

> On Sun, 6 Mar 2011 02:07:59 +0900
> Minchan Kim <minchan.kim [at] gmail> wrote:
>
> > On Sat, Mar 05, 2011 at 07:41:26PM +0300, Andrew Vagin wrote:
> > > On 03/05/2011 06:53 PM, Minchan Kim wrote:
> > > >On Sat, Mar 05, 2011 at 06:34:37PM +0300, Andrew Vagin wrote:
> > > >>On 03/05/2011 06:20 PM, Minchan Kim wrote:
> > > >>>On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
> > > >>>>Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
> > > >>>>kernel may hang up, because shrink_zones() will do nothing, but
> > > >>>>all_unreclaimable() will say, that zone has reclaimable pages.
> > > >>>>
> > > >>>>do_try_to_free_pages()
> > > >>>> shrink_zones()
> > > >>>> for_each_zone
> > > >>>> if (zone->all_unreclaimable)
> > > >>>> continue
> > > >>>> if !all_unreclaimable(zonelist, sc)
> > > >>>> return 1
> > > >>>>
> > > >>>>__alloc_pages_slowpath()
> > > >>>>retry:
> > > >>>> did_some_progress = do_try_to_free_pages(page)
> > > >>>> ...
> > > >>>> if (!page&& did_some_progress)
> > > >>>> retry;
> > > >>>>
> > > >>>>Signed-off-by: Andrey Vagin<avagin [at] openvz>
> > > >>>>---
> > > >>>> mm/vmscan.c | 2 ++
> > > >>>> 1 files changed, 2 insertions(+), 0 deletions(-)
> > > >>>>
> > > >>>>diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > >>>>index 6771ea7..1c056f7 100644
> > > >>>>--- a/mm/vmscan.c
> > > >>>>+++ b/mm/vmscan.c
> > > >>>>@@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,
> > > >>>>
> > > >>>> for_each_zone_zonelist_nodemask(zone, z, zonelist,
> > > >>>> gfp_zone(sc->gfp_mask), sc->nodemask) {
> > > >>>>+ if (zone->all_unreclaimable)
> > > >>>>+ continue;
> > > >>>> if (!populated_zone(zone))
> > > >>>> continue;
> > > >>>> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> > > >>>zone_reclaimable checks it. Isn't it enough?
> > > >>I sent one more patch [PATCH] mm: skip zombie in OOM-killer.
> > > >>This two patches are enough.
> > > >Sorry if I confused you.
> > > >I mean zone->all_unreclaimable become true if !zone_reclaimable in balance_pgdat.
> > > >zone_reclaimable compares recent pages_scanned with the number of zone lru pages.
> > > >So too many page scanning in small lru pages makes the zone to unreclaimable zone.
> > > >
> > > >In all_unreclaimable, we calls zone_reclaimable to detect it.
> > > >It's the same thing with your patch.
> > > balance_pgdat set zone->all_unreclaimable, but the problem is that
> > > it is cleaned late.
> >
> > Yes. It can be delayed by pcp so (zone->all_unreclaimable = true) is
> > a false alram since zone have a free page and it can be returned
> > to free list by drain_all_pages in next turn.
> >
> > >
> > > The problem is that zone->all_unreclaimable = True, but
> > > zone_reclaimable() returns True too.
> >
> > Why is it a problem?
> > If zone->all_unreclaimable gives a false alram, we does need to check
> > it again by zone_reclaimable call.
> >
> > If we believe a false alarm and give up the reclaim, maybe we have to make
> > unnecessary oom kill.
> >
> > >
> > > zone->all_unreclaimable will be cleaned in free_*_pages, but this
> > > may be late. It is enough allocate one page from page cache, that
> > > zone_reclaimable() returns True and zone->all_unreclaimable becomes
> > > True.
> > > >>>Does the hang up really happen or see it by code review?
> > > >>Yes. You can reproduce it for help the attached python program. It's
> > > >>not very clever:)
> > > >>It make the following actions in loop:
> > > >>1. fork
> > > >>2. mmap
> > > >>3. touch memory
> > > >>4. read memory
> > > >>5. munmmap
> > > >It seems the test program makes fork bombs and memory hogging.
> > > >If you applied this patch, the problem is gone?
> > > Yes.
> >
> > Hmm.. Although it solves the problem, I think it's not a good idea that
> > depends on false alram and give up the retry.
>
> Any alternative proposals? We should get the livelock fixed if possible..

I agree with Minchan and can't think this is a real fix....
Andrey, I'm now trying your fix and it seems your fix for oom-killer,
'skip-zombie-process' works enough good for my environ.

What is your enviroment ? number of cpus ? architecture ? size of memory ?



Thanks,
-Kame

















--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kosaki.motohiro at jp

Mar 7, 2011, 7:06 PM

Post #10 of 27 (2247 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

> > > Hmm.. Although it solves the problem, I think it's not a good idea that
> > > depends on false alram and give up the retry.
> >
> > Any alternative proposals? We should get the livelock fixed if possible..
>
> I agree with Minchan and can't think this is a real fix....
> Andrey, I'm now trying your fix and it seems your fix for oom-killer,
> 'skip-zombie-process' works enough good for my environ.
>
> What is your enviroment ? number of cpus ? architecture ? size of memory ?

me too. 'skip-zombie-process V1' work fine. and I didn't seen this patch
improve oom situation.

And, The test program is purely fork bomb. Our oom-killer is not silver
bullet for fork bomb from very long time ago. That said, oom-killer send
SIGKILL and start to kill the victim process. But, it doesn't prevent
to be created new memory hogging tasks. Therefore we have no gurantee
to win process exiting and creating race.

*IF* we really need to care fork bomb issue, we need to write completely
new VM feature.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avagin at gmail

Mar 8, 2011, 12:12 AM

Post #11 of 27 (2250 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

Hi, All
> I agree with Minchan and can't think this is a real fix....
> Andrey, I'm now trying your fix and it seems your fix for oom-killer,
> 'skip-zombie-process' works enough good for my environ.
>
> What is your enviroment ? number of cpus ? architecture ? size of memory ?
Processort: AMD Phenom(tm) II X6 1055T Processor (six-core)
Ram: 8Gb
RHEL6, x86_64. This host doesn't have swap.

It hangs up fast. Tomorrow I will have to send a processes state, if it
will be interesting for you. With my patch the kernel work fine. I added
debug and found that it hangs up in the described case.
I suppose that my patch may be incorrect, but the problem exists and we
should do something.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avagin at gmail

Mar 8, 2011, 11:02 AM

Post #12 of 27 (2255 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On 03/08/2011 06:06 AM, KOSAKI Motohiro wrote:
>>>> Hmm.. Although it solves the problem, I think it's not a good idea that
>>>> depends on false alram and give up the retry.
>>>
>>> Any alternative proposals? We should get the livelock fixed if possible..
>>
>> I agree with Minchan and can't think this is a real fix....
>> Andrey, I'm now trying your fix and it seems your fix for oom-killer,
>> 'skip-zombie-process' works enough good for my environ.
>>
>> What is your enviroment ? number of cpus ? architecture ? size of memory ?
>
> me too. 'skip-zombie-process V1' work fine. and I didn't seen this patch
> improve oom situation.
>
> And, The test program is purely fork bomb. Our oom-killer is not silver
> bullet for fork bomb from very long time ago. That said, oom-killer send
> SIGKILL and start to kill the victim process. But, it doesn't prevent
> to be created new memory hogging tasks. Therefore we have no gurantee
> to win process exiting and creating race.

I think a live-lock is a bug, even if it's provoked by fork bomds.

And now I want say some words about zone->all_unreclaimable. I think
this flag is "conservative". It is set when situation is bad and it's
unset when situation get better. If we have a small number of
reclaimable pages, the situation is still bad. What do you mean, when
say that kernel is alive? If we have one reclaimable page, is the kernel
alive? Yes, it can work, it will generate many page faults and do
something, but anyone say that it is more dead than alive.

Try to look at it from my point of view. The patch will be correct and
the kernel will be more alive.

Excuse me, If I'm mistaken...


>
> *IF* we really need to care fork bomb issue, we need to write completely
> new VM feature.
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kamezawa.hiroyu at jp

Mar 8, 2011, 9:37 PM

Post #13 of 27 (2242 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Tue, 8 Mar 2011 08:45:51 +0900
Minchan Kim <minchan.kim [at] gmail> wrote:

> On Tue, Mar 8, 2011 at 6:58 AM, Andrew Morton <akpm [at] linux-foundation> wrote:
> > On Sun, 6 Mar 2011 02:07:59 +0900
> > Minchan Kim <minchan.kim [at] gmail> wrote:
> > Any alternative proposals?  We should get the livelock fixed if possible..
> >
>
> And we should avoid unnecessary OOM kill if possible.
>
> I think the problem is caused by (zone->pages_scanned <
> zone_reclaimable_pages(zone) * 6). I am not sure (* 6) is a best. It
> would be rather big on recent big DRAM machines.
>

It means 3 times full-scan from the highest priority to the lowest
and cannot freed any pages. I think big memory machine tend to have
more cpus, so don't think it's big.

> I think it is a trade-off between latency and OOM kill.
> If we decrease the magic value, maybe we should prevent the almost
> livelock but happens unnecessary OOM kill.
>

Hmm, should I support a sacrifice feature 'some signal(SIGINT?) will be sent by
the kernel when it detects system memory is in short' in cgroup ?
(For example, if full LRU scan is done in a zone, notifier
works and SIGINT will be sent.)

> And I think zone_reclaimable not fair.
> For example, too many scanning makes reclaimable state to
> unreclaimable state. Maybe it takes a very long time. But just some
> page free makes unreclaimable state to reclaimabe with very easy. So
> we need much painful reclaiming for changing reclaimable state with
> unreclaimabe state. it would affect latency very much.
>
> Maybe we need more smart zone_reclaimabe which is adaptive with memory pressure.
>
I agree.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kamezawa.hiroyu at jp

Mar 8, 2011, 9:43 PM

Post #14 of 27 (2240 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Wed, 9 Mar 2011 14:37:04 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu [at] jp> wrote:

> On Tue, 8 Mar 2011 08:45:51 +0900
> Minchan Kim <minchan.kim [at] gmail> wrote:
> Hmm, should I support a sacrifice feature 'some signal(SIGINT?) will be sent by
> the kernel when it detects system memory is in short' in cgroup ?
> (For example, if full LRU scan is done in a zone, notifier
> works and SIGINT will be sent.)
>

Sorry, this sounds like "mem_notify" ;), Kosaki-san's old work.

I think functionality for "mem_notify" will have no obstacle opinion but
implementation detail is a problem....Shouldn't we try it again ?

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kamezawa.hiroyu at jp

Mar 8, 2011, 9:52 PM

Post #15 of 27 (2241 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Tue, 08 Mar 2011 22:02:27 +0300
"avagin [at] gmail" <avagin [at] gmail> wrote:

> On 03/08/2011 06:06 AM, KOSAKI Motohiro wrote:
> >>>> Hmm.. Although it solves the problem, I think it's not a good idea that
> >>>> depends on false alram and give up the retry.
> >>>
> >>> Any alternative proposals? We should get the livelock fixed if possible..
> >>
> >> I agree with Minchan and can't think this is a real fix....
> >> Andrey, I'm now trying your fix and it seems your fix for oom-killer,
> >> 'skip-zombie-process' works enough good for my environ.
> >>
> >> What is your enviroment ? number of cpus ? architecture ? size of memory ?
> >
> > me too. 'skip-zombie-process V1' work fine. and I didn't seen this patch
> > improve oom situation.
> >
> > And, The test program is purely fork bomb. Our oom-killer is not silver
> > bullet for fork bomb from very long time ago. That said, oom-killer send
> > SIGKILL and start to kill the victim process. But, it doesn't prevent
> > to be created new memory hogging tasks. Therefore we have no gurantee
> > to win process exiting and creating race.
>
> I think a live-lock is a bug, even if it's provoked by fork bomds.
>

I tried to write fork-bomb-detector in oom-kill layer but I think
it should be co-operative with do_fork(), now.
IOW, some fork() should return -ENOMEM under OOM condition.

I'd like to try some but if you have some idea, please do.


> And now I want say some words about zone->all_unreclaimable. I think
> this flag is "conservative". It is set when situation is bad and it's
> unset when situation get better. If we have a small number of
> reclaimable pages, the situation is still bad. What do you mean, when
> say that kernel is alive? If we have one reclaimable page, is the kernel
> alive? Yes, it can work, it will generate many page faults and do
> something, but anyone say that it is more dead than alive.
>
> Try to look at it from my point of view. The patch will be correct and
> the kernel will be more alive.
>
> Excuse me, If I'm mistaken...
>

Mayne something more casual interface than oom-kill should be provided.
I wonder I can add memory-reclaim-priority to memory cgroup and
allow control of page fault latency for applicaton...
Maybe "soft_limit" for memcg, it's implemented now, works to some extent.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kamezawa.hiroyu at jp

Mar 8, 2011, 10:06 PM

Post #16 of 27 (2248 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Tue, 08 Mar 2011 11:12:22 +0300
Andrew Vagin <avagin [at] gmail> wrote:

> Hi, All
> > I agree with Minchan and can't think this is a real fix....
> > Andrey, I'm now trying your fix and it seems your fix for oom-killer,
> > 'skip-zombie-process' works enough good for my environ.
> >
> > What is your enviroment ? number of cpus ? architecture ? size of memory ?
> Processort: AMD Phenom(tm) II X6 1055T Processor (six-core)
> Ram: 8Gb
> RHEL6, x86_64. This host doesn't have swap.
>
Ok, thanks. "NO SWAP" is a big information ;)

> It hangs up fast. Tomorrow I will have to send a processes state, if it
> will be interesting for you. With my patch the kernel work fine. I added
> debug and found that it hangs up in the described case.
> I suppose that my patch may be incorrect, but the problem exists and we
> should do something.
>

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kosaki.motohiro at jp

Mar 8, 2011, 10:17 PM

Post #17 of 27 (2255 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

> On 03/08/2011 06:06 AM, KOSAKI Motohiro wrote:
> >>>> Hmm.. Although it solves the problem, I think it's not a good idea that
> >>>> depends on false alram and give up the retry.
> >>>
> >>> Any alternative proposals? We should get the livelock fixed if possible..
> >>
> >> I agree with Minchan and can't think this is a real fix....
> >> Andrey, I'm now trying your fix and it seems your fix for oom-killer,
> >> 'skip-zombie-process' works enough good for my environ.
> >>
> >> What is your enviroment ? number of cpus ? architecture ? size of memory ?
> >
> > me too. 'skip-zombie-process V1' work fine. and I didn't seen this patch
> > improve oom situation.
> >
> > And, The test program is purely fork bomb. Our oom-killer is not silver
> > bullet for fork bomb from very long time ago. That said, oom-killer send
> > SIGKILL and start to kill the victim process. But, it doesn't prevent
> > to be created new memory hogging tasks. Therefore we have no gurantee
> > to win process exiting and creating race.
>
> I think a live-lock is a bug, even if it's provoked by fork bomds.
>
> And now I want say some words about zone->all_unreclaimable. I think
> this flag is "conservative". It is set when situation is bad and it's
> unset when situation get better. If we have a small number of
> reclaimable pages, the situation is still bad. What do you mean, when
> say that kernel is alive? If we have one reclaimable page, is the kernel
> alive? Yes, it can work, it will generate many page faults and do
> something, but anyone say that it is more dead than alive.
>
> Try to look at it from my point of view. The patch will be correct and
> the kernel will be more alive.
>
> Excuse me, If I'm mistaken...

Hi,

Hmmm...
If I could observed your patch, I did support your opinion. but I didn't. so, now I'm
curious why we got the different conclusion. tommorow, I'll try to construct a test
environment to reproduce your system.

Unfortunatelly, zone->all_unreclamable is unreliable value while hibernation processing.
Then I doubt current your patch is enough acceptable. but I'm not against to make alternative
if we can observe the same phenomenon.

At minimum, I also dislike kernel hang up issue.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Mar 9, 2011, 10:58 PM

Post #18 of 27 (2245 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

Hi Kame,

Sorry for late response.
I had a time to test this issue shortly because these day I am very busy.
This issue was interesting to me.
So I hope taking a time for enough testing when I have a time.
I should find out root cause of livelock.

I will answer your comment after it. :)
Thanks!

On Wed, Mar 9, 2011 at 2:37 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu [at] jp> wrote:
> On Tue, 8 Mar 2011 08:45:51 +0900
> Minchan Kim <minchan.kim [at] gmail> wrote:
>
>> On Tue, Mar 8, 2011 at 6:58 AM, Andrew Morton <akpm [at] linux-foundation> wrote:
>> > On Sun, 6 Mar 2011 02:07:59 +0900
>> > Minchan Kim <minchan.kim [at] gmail> wrote:
>> > Any alternative proposals?  We should get the livelock fixed if possible..
>> >
>>
>> And we should avoid unnecessary OOM kill if possible.
>>
>> I think the problem is caused by (zone->pages_scanned <
>> zone_reclaimable_pages(zone) * 6). I am not sure (* 6) is a best. It
>> would be rather big on recent big DRAM machines.
>>
>
> It means 3 times full-scan from the highest priority to the lowest
> and cannot freed any pages. I think big memory machine tend to have
> more cpus, so don't think it's big.
>
>> I think it is a trade-off between latency and OOM kill.
>> If we decrease the magic value, maybe we should prevent the almost
>> livelock but happens unnecessary OOM kill.
>>
>
> Hmm, should I support a sacrifice feature 'some signal(SIGINT?) will be sent by
> the kernel when it detects system memory is in short' in cgroup ?
> (For example, if full LRU scan is done in a zone, notifier
>  works and SIGINT will be sent.)
>
>> And I think zone_reclaimable not fair.
>> For example, too many scanning makes reclaimable state to
>> unreclaimable state. Maybe it takes a very long time. But just some
>> page free makes unreclaimable state to reclaimabe with very easy. So
>> we need much painful reclaiming for changing reclaimable state with
>> unreclaimabe state. it would affect latency very much.
>>
>> Maybe we need more smart zone_reclaimabe which is adaptive with memory pressure.
>>
> I agree.
>
> Thanks,
> -Kame
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kosaki.motohiro at jp

Mar 10, 2011, 6:08 AM

Post #19 of 27 (2233 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

> Hi,
>
> Hmmm...
> If I could observed your patch, I did support your opinion. but I didn't. so, now I'm
> curious why we got the different conclusion. tommorow, I'll try to construct a test
> environment to reproduce your system.

Hm,

following two patches seems to have bad interaction. former makes
SCHED_FIFO when OOM, latter makes CPU 100% occupied busy loop if
LRU is really tight.
Of cource, I need to run more much test. I'll digg it more at this
weekend (maybe).


commit 93b43fa55088fe977503a156d1097cc2055449a2
Author: Luis Claudio R. Goncalves <lclaudio [at] uudg>
Date: Mon Aug 9 17:19:41 2010 -0700

oom: give the dying task a higher priority


commit 0e093d99763eb4cea09f8ca4f1d01f34e121d10b
Author: Mel Gorman <mel [at] csn>
Date: Tue Oct 26 14:21:45 2010 -0700

writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant conge



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kamezawa.hiroyu at jp

Mar 10, 2011, 3:58 PM

Post #20 of 27 (2230 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Thu, 10 Mar 2011 15:58:29 +0900
Minchan Kim <minchan.kim [at] gmail> wrote:

> Hi Kame,
>
> Sorry for late response.
> I had a time to test this issue shortly because these day I am very busy.
> This issue was interesting to me.
> So I hope taking a time for enough testing when I have a time.
> I should find out root cause of livelock.
>

Thanks. I and Kosaki-san reproduced the bug with swapless system.
Now, Kosaki-san is digging and found some issue with scheduler boost at OOM
and lack of enough "wait" in vmscan.c.

I myself made patch like attached one. This works well for returning TRUE at
all_unreclaimable() but livelock(deadlock?) still happens.
I wonder vmscan itself isn't a key for fixing issue.
Then, I'd like to wait for Kosaki-san's answer ;)

I'm now wondering how to catch fork-bomb and stop it (without using cgroup).
I think the problem is that fork-bomb is faster than killall...

Thanks,
-Kame
==

This is just a debug patch.

---
mm/vmscan.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 54 insertions(+), 4 deletions(-)

Index: mmotm-0303/mm/vmscan.c
===================================================================
--- mmotm-0303.orig/mm/vmscan.c
+++ mmotm-0303/mm/vmscan.c
@@ -1983,9 +1983,55 @@ static void shrink_zones(int priority, s
}
}

-static bool zone_reclaimable(struct zone *zone)
+static bool zone_seems_empty(struct zone *zone, struct scan_control *sc)
{
- return zone->pages_scanned < zone_reclaimable_pages(zone) * 6;
+ unsigned long nr, wmark, free, isolated, lru;
+
+ /*
+ * If scanned, zone->pages_scanned is incremented and this can
+ * trigger OOM.
+ */
+ if (sc->nr_scanned)
+ return false;
+
+ free = zone_page_state(zone, NR_FREE_PAGES);
+ isolated = zone_page_state(zone, NR_ISOLATED_FILE);
+ if (nr_swap_pages)
+ isolated += zone_page_state(zone, NR_ISOLATED_ANON);
+
+ /* In we cannot do scan, don't count LRU pages. */
+ if (!zone->all_unreclaimable) {
+ lru = zone_page_state(zone, NR_ACTIVE_FILE);
+ lru += zone_page_state(zone, NR_INACTIVE_FILE);
+ if (nr_swap_pages) {
+ lru += zone_page_state(zone, NR_ACTIVE_ANON);
+ lru += zone_page_state(zone, NR_INACTIVE_ANON);
+ }
+ } else
+ lru = 0;
+ nr = free + isolated + lru;
+ wmark = min_wmark_pages(zone);
+ wmark += zone->lowmem_reserve[gfp_zone(sc->gfp_mask)];
+ wmark += 1 << sc->order;
+ printk("thread %d/%ld all %d scanned %ld pages %ld/%ld/%ld/%ld/%ld/%ld\n",
+ current->pid, sc->nr_scanned, zone->all_unreclaimable,
+ zone->pages_scanned,
+ nr,free,isolated,lru,
+ zone_reclaimable_pages(zone), wmark);
+ /*
+ * In some case (especially noswap), almost all page cache are paged out
+ * and we'll see the amount of reclaimable+free pages is smaller than
+ * zone->min. In this case, we canoot expect any recovery other
+ * than OOM-KILL. We can't reclaim memory enough for usual tasks.
+ */
+
+ return nr <= wmark;
+}
+
+static bool zone_reclaimable(struct zone *zone, struct scan_control *sc)
+{
+ /* zone_reclaimable_pages() can return 0, we need <= */
+ return zone->pages_scanned <= zone_reclaimable_pages(zone) * 6;
}

/*
@@ -2006,11 +2052,15 @@ static bool all_unreclaimable(struct zon
continue;
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
continue;
- if (zone_reclaimable(zone)) {
+ if (zone_seems_empty(zone, sc))
+ continue;
+ if (zone_reclaimable(zone, sc)) {
all_unreclaimable = false;
break;
}
}
+ if (all_unreclaimable)
+ printk("all_unreclaimable() returns TRUE\n");

return all_unreclaimable;
}
@@ -2456,7 +2506,7 @@ loop_again:
if (zone->all_unreclaimable)
continue;
if (!compaction && nr_slab == 0 &&
- !zone_reclaimable(zone))
+ !zone_reclaimable(zone, &sc))
zone->all_unreclaimable = 1;
/*
* If we've done a decent amount of scanning and

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Mar 10, 2011, 4:18 PM

Post #21 of 27 (2233 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Fri, Mar 11, 2011 at 8:58 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu [at] jp> wrote:
> On Thu, 10 Mar 2011 15:58:29 +0900
> Minchan Kim <minchan.kim [at] gmail> wrote:
>
>> Hi Kame,
>>
>> Sorry for late response.
>> I had a time to test this issue shortly because these day I am very busy.
>> This issue was interesting to me.
>> So I hope taking a time for enough testing when I have a time.
>> I should find out root cause of livelock.
>>
>
> Thanks. I and Kosaki-san reproduced the bug with swapless system.
> Now, Kosaki-san is digging and found some issue with scheduler boost at OOM
> and lack of enough "wait" in vmscan.c.
>
> I myself made patch like attached one. This works well for returning TRUE at
> all_unreclaimable() but livelock(deadlock?) still happens.

I saw the deadlock.
It seems to happen by following code by my quick debug but not sure. I
need to investigate further but don't have a time now. :(


* Note: this may have a chance of deadlock if it gets
* blocked waiting for another task which itself is waiting
* for memory. Is there a better alternative?
*/
if (test_tsk_thread_flag(p, TIF_MEMDIE))
return ERR_PTR(-1UL);
It would be wait to die the task forever without another victim selection.
If it's right, It's a known BUG and we have no choice until now. Hmm.

> I wonder vmscan itself isn't a key for fixing issue.

I agree.

> Then, I'd like to wait for Kosaki-san's answer ;)

Me, too. :)

>
> I'm now wondering how to catch fork-bomb and stop it (without using cgroup).

Yes. Fork throttling without cgroup is very important.
And as off-topic, mem_notify without memcontrol you mentioned is
important to embedded people, I gues.

> I think the problem is that fork-bomb is faster than killall...

And deadlock problem I mentioned.

>
> Thanks,
> -Kame

Thanks for the investigation, Kame.

> ==
>
> This is just a debug patch.
>
> ---
>  mm/vmscan.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 54 insertions(+), 4 deletions(-)
>
> Index: mmotm-0303/mm/vmscan.c
> ===================================================================
> --- mmotm-0303.orig/mm/vmscan.c
> +++ mmotm-0303/mm/vmscan.c
> @@ -1983,9 +1983,55 @@ static void shrink_zones(int priority, s
>        }
>  }
>
> -static bool zone_reclaimable(struct zone *zone)
> +static bool zone_seems_empty(struct zone *zone, struct scan_control *sc)
>  {
> -       return zone->pages_scanned < zone_reclaimable_pages(zone) * 6;
> +       unsigned long nr, wmark, free, isolated, lru;
> +
> +       /*
> +        * If scanned, zone->pages_scanned is incremented and this can
> +        * trigger OOM.
> +        */
> +       if (sc->nr_scanned)
> +               return false;
> +
> +       free = zone_page_state(zone, NR_FREE_PAGES);
> +       isolated = zone_page_state(zone, NR_ISOLATED_FILE);
> +       if (nr_swap_pages)
> +               isolated += zone_page_state(zone, NR_ISOLATED_ANON);
> +
> +       /* In we cannot do scan, don't count LRU pages. */
> +       if (!zone->all_unreclaimable) {
> +               lru = zone_page_state(zone, NR_ACTIVE_FILE);
> +               lru += zone_page_state(zone, NR_INACTIVE_FILE);
> +               if (nr_swap_pages) {
> +                       lru += zone_page_state(zone, NR_ACTIVE_ANON);
> +                       lru += zone_page_state(zone, NR_INACTIVE_ANON);
> +               }
> +       } else
> +               lru = 0;
> +       nr = free + isolated + lru;
> +       wmark = min_wmark_pages(zone);
> +       wmark += zone->lowmem_reserve[gfp_zone(sc->gfp_mask)];
> +       wmark += 1 << sc->order;
> +       printk("thread %d/%ld all %d scanned %ld pages %ld/%ld/%ld/%ld/%ld/%ld\n",
> +               current->pid, sc->nr_scanned, zone->all_unreclaimable,
> +               zone->pages_scanned,
> +               nr,free,isolated,lru,
> +               zone_reclaimable_pages(zone), wmark);
> +       /*
> +        * In some case (especially noswap), almost all page cache are paged out
> +        * and we'll see the amount of reclaimable+free pages is smaller than
> +        * zone->min. In this case, we canoot expect any recovery other
> +        * than OOM-KILL. We can't reclaim memory enough for usual tasks.
> +        */
> +
> +       return nr <= wmark;
> +}
> +
> +static bool zone_reclaimable(struct zone *zone, struct scan_control *sc)
> +{
> +       /* zone_reclaimable_pages() can return 0, we need <= */
> +       return zone->pages_scanned <= zone_reclaimable_pages(zone) * 6;
>  }
>
>  /*
> @@ -2006,11 +2052,15 @@ static bool all_unreclaimable(struct zon
>                        continue;
>                if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>                        continue;
> -               if (zone_reclaimable(zone)) {
> +               if (zone_seems_empty(zone, sc))
> +                       continue;
> +               if (zone_reclaimable(zone, sc)) {
>                        all_unreclaimable = false;
>                        break;
>                }
>        }
> +       if (all_unreclaimable)
> +               printk("all_unreclaimable() returns TRUE\n");
>
>        return all_unreclaimable;
>  }
> @@ -2456,7 +2506,7 @@ loop_again:
>                        if (zone->all_unreclaimable)
>                                continue;
>                        if (!compaction && nr_slab == 0 &&
> -                           !zone_reclaimable(zone))
> +                           !zone_reclaimable(zone, &sc))
>                                zone->all_unreclaimable = 1;
>                        /*
>                         * If we've done a decent amount of scanning and
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avagin at gmail

Mar 10, 2011, 10:08 PM

Post #22 of 27 (2231 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On 03/11/2011 03:18 AM, Minchan Kim wrote:
> On Fri, Mar 11, 2011 at 8:58 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu [at] jp> wrote:
>> On Thu, 10 Mar 2011 15:58:29 +0900
>> Minchan Kim<minchan.kim [at] gmail> wrote:
>>
>>> Hi Kame,
>>>
>>> Sorry for late response.
>>> I had a time to test this issue shortly because these day I am very busy.
>>> This issue was interesting to me.
>>> So I hope taking a time for enough testing when I have a time.
>>> I should find out root cause of livelock.
>>>
>>
>> Thanks. I and Kosaki-san reproduced the bug with swapless system.
>> Now, Kosaki-san is digging and found some issue with scheduler boost at OOM
>> and lack of enough "wait" in vmscan.c.
>>
>> I myself made patch like attached one. This works well for returning TRUE at
>> all_unreclaimable() but livelock(deadlock?) still happens.
>
> I saw the deadlock.
> It seems to happen by following code by my quick debug but not sure. I
> need to investigate further but don't have a time now. :(
>
>
> * Note: this may have a chance of deadlock if it gets
> * blocked waiting for another task which itself is waiting
> * for memory. Is there a better alternative?
> */
> if (test_tsk_thread_flag(p, TIF_MEMDIE))
> return ERR_PTR(-1UL);
> It would be wait to die the task forever without another victim selection.
> If it's right, It's a known BUG and we have no choice until now. Hmm.


I fixed this bug too and sent patch "mm: skip zombie in OOM-killer".

http://groups.google.com/group/linux.kernel/browse_thread/thread/b9c6ddf34d1671ab/2941e1877ca4f626?lnk=raot&pli=1

- if (test_tsk_thread_flag(p, TIF_MEMDIE))
+ if (test_tsk_thread_flag(p, TIF_MEMDIE) && p->mm)
return ERR_PTR(-1UL);

It is not committed yet, because Devid Rientjes and company think what
to do with "[patch] oom: prevent unnecessary oom kills or kernel panics.".
>
>> I wonder vmscan itself isn't a key for fixing issue.
>
> I agree.
>
>> Then, I'd like to wait for Kosaki-san's answer ;)
>
> Me, too. :)
>
>>
>> I'm now wondering how to catch fork-bomb and stop it (without using cgroup).
>
> Yes. Fork throttling without cgroup is very important.
> And as off-topic, mem_notify without memcontrol you mentioned is
> important to embedded people, I gues.
>
>> I think the problem is that fork-bomb is faster than killall...
>
> And deadlock problem I mentioned.
>
>>
>> Thanks,
>> -Kame
>
> Thanks for the investigation, Kame.
>
>> ==
>>
>> This is just a debug patch.
>>
>> ---
>> mm/vmscan.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 54 insertions(+), 4 deletions(-)
>>
>> Index: mmotm-0303/mm/vmscan.c
>> ===================================================================
>> --- mmotm-0303.orig/mm/vmscan.c
>> +++ mmotm-0303/mm/vmscan.c
>> @@ -1983,9 +1983,55 @@ static void shrink_zones(int priority, s
>> }
>> }
>>
>> -static bool zone_reclaimable(struct zone *zone)
>> +static bool zone_seems_empty(struct zone *zone, struct scan_control *sc)
>> {
>> - return zone->pages_scanned< zone_reclaimable_pages(zone) * 6;
>> + unsigned long nr, wmark, free, isolated, lru;
>> +
>> + /*
>> + * If scanned, zone->pages_scanned is incremented and this can
>> + * trigger OOM.
>> + */
>> + if (sc->nr_scanned)
>> + return false;
>> +
>> + free = zone_page_state(zone, NR_FREE_PAGES);
>> + isolated = zone_page_state(zone, NR_ISOLATED_FILE);
>> + if (nr_swap_pages)
>> + isolated += zone_page_state(zone, NR_ISOLATED_ANON);
>> +
>> + /* In we cannot do scan, don't count LRU pages. */
>> + if (!zone->all_unreclaimable) {
>> + lru = zone_page_state(zone, NR_ACTIVE_FILE);
>> + lru += zone_page_state(zone, NR_INACTIVE_FILE);
>> + if (nr_swap_pages) {
>> + lru += zone_page_state(zone, NR_ACTIVE_ANON);
>> + lru += zone_page_state(zone, NR_INACTIVE_ANON);
>> + }
>> + } else
>> + lru = 0;
>> + nr = free + isolated + lru;
>> + wmark = min_wmark_pages(zone);
>> + wmark += zone->lowmem_reserve[gfp_zone(sc->gfp_mask)];
>> + wmark += 1<< sc->order;
>> + printk("thread %d/%ld all %d scanned %ld pages %ld/%ld/%ld/%ld/%ld/%ld\n",
>> + current->pid, sc->nr_scanned, zone->all_unreclaimable,
>> + zone->pages_scanned,
>> + nr,free,isolated,lru,
>> + zone_reclaimable_pages(zone), wmark);
>> + /*
>> + * In some case (especially noswap), almost all page cache are paged out
>> + * and we'll see the amount of reclaimable+free pages is smaller than
>> + * zone->min. In this case, we canoot expect any recovery other
>> + * than OOM-KILL. We can't reclaim memory enough for usual tasks.
>> + */
>> +
>> + return nr<= wmark;
>> +}
>> +
>> +static bool zone_reclaimable(struct zone *zone, struct scan_control *sc)
>> +{
>> + /* zone_reclaimable_pages() can return 0, we need<= */
>> + return zone->pages_scanned<= zone_reclaimable_pages(zone) * 6;
>> }
>>
>> /*
>> @@ -2006,11 +2052,15 @@ static bool all_unreclaimable(struct zon
>> continue;
>> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>> continue;
>> - if (zone_reclaimable(zone)) {
>> + if (zone_seems_empty(zone, sc))
>> + continue;
>> + if (zone_reclaimable(zone, sc)) {
>> all_unreclaimable = false;
>> break;
>> }
>> }
>> + if (all_unreclaimable)
>> + printk("all_unreclaimable() returns TRUE\n");
>>
>> return all_unreclaimable;
>> }
>> @@ -2456,7 +2506,7 @@ loop_again:
>> if (zone->all_unreclaimable)
>> continue;
>> if (!compaction&& nr_slab == 0&&
>> - !zone_reclaimable(zone))
>> + !zone_reclaimable(zone,&sc))
>> zone->all_unreclaimable = 1;
>> /*
>> * If we've done a decent amount of scanning and
>>
>>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Mar 13, 2011, 6:03 PM

Post #23 of 27 (2214 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

On Fri, Mar 11, 2011 at 3:08 PM, avagin [at] gmail <avagin [at] gmail> wrote:
> On 03/11/2011 03:18 AM, Minchan Kim wrote:
>>
>> On Fri, Mar 11, 2011 at 8:58 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu [at] jp>  wrote:
>>>
>>> On Thu, 10 Mar 2011 15:58:29 +0900
>>> Minchan Kim<minchan.kim [at] gmail>  wrote:
>>>
>>>> Hi Kame,
>>>>
>>>> Sorry for late response.
>>>> I had a time to test this issue shortly because these day I am very
>>>> busy.
>>>> This issue was interesting to me.
>>>> So I hope taking a time for enough testing when I have a time.
>>>> I should find out root cause of livelock.
>>>>
>>>
>>> Thanks. I and Kosaki-san reproduced the bug with swapless system.
>>> Now, Kosaki-san is digging and found some issue with scheduler boost at
>>> OOM
>>> and lack of enough "wait" in vmscan.c.
>>>
>>> I myself made patch like attached one. This works well for returning TRUE
>>> at
>>> all_unreclaimable() but livelock(deadlock?) still happens.
>>
>> I saw the deadlock.
>> It seems to happen by following code by my quick debug but not sure. I
>> need to investigate further but don't have a time now. :(
>>
>>
>>                  * Note: this may have a chance of deadlock if it gets
>>                  * blocked waiting for another task which itself is
>> waiting
>>                  * for memory. Is there a better alternative?
>>                  */
>>                 if (test_tsk_thread_flag(p, TIF_MEMDIE))
>>                         return ERR_PTR(-1UL);
>> It would be wait to die the task forever without another victim selection.
>> If it's right, It's a known BUG and we have no choice until now. Hmm.
>
>
> I fixed this bug too and sent patch "mm: skip zombie in OOM-killer".
>
> http://groups.google.com/group/linux.kernel/browse_thread/thread/b9c6ddf34d1671ab/2941e1877ca4f626?lnk=raot&pli=1
>
> -               if (test_tsk_thread_flag(p, TIF_MEMDIE))
> +               if (test_tsk_thread_flag(p, TIF_MEMDIE) && p->mm)
>                        return ERR_PTR(-1UL);
>
> It is not committed yet, because Devid Rientjes and company think what to do
> with "[patch] oom: prevent unnecessary oom kills or kernel panics.".

Thanks, Andrey.
The patch "mm: skip zombie in OOM-killer" solves my livelock issue
but I didn't look effectiveness of "mm: check zone->all_unreclaimable
in all_unreclaimable". I have to look further.

But your patch "mm: skip zombie in OOM-killer" is very controversial
because It breaks multi-thread case.
Since find_lock_task_mm is introduced, we have considered mt cases but
I think it doesn't cover completely all cases like discussing
TIF_MEMDIE now.

I will watch the discussion.
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


caiqian at redhat

May 3, 2011, 6:38 PM

Post #24 of 27 (2096 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

----- Original Message -----
> On 03/05/2011 06:20 PM, Minchan Kim wrote:
> > On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
> >> Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
> >> kernel may hang up, because shrink_zones() will do nothing, but
> >> all_unreclaimable() will say, that zone has reclaimable pages.
> >>
> >> do_try_to_free_pages()
> >> shrink_zones()
> >> for_each_zone
> >> if (zone->all_unreclaimable)
> >> continue
> >> if !all_unreclaimable(zonelist, sc)
> >> return 1
> >>
> >> __alloc_pages_slowpath()
> >> retry:
> >> did_some_progress = do_try_to_free_pages(page)
> >> ...
> >> if (!page&& did_some_progress)
> >> retry;
> >>
> >> Signed-off-by: Andrey Vagin<avagin [at] openvz>
> >> ---
> >> mm/vmscan.c | 2 ++
> >> 1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index 6771ea7..1c056f7 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist
> >> *zonelist,
> >>
> >> for_each_zone_zonelist_nodemask(zone, z, zonelist,
> >> gfp_zone(sc->gfp_mask), sc->nodemask) {
> >> + if (zone->all_unreclaimable)
> >> + continue;
> >> if (!populated_zone(zone))
> >> continue;
> >> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> >
> > zone_reclaimable checks it. Isn't it enough?
> I sent one more patch [PATCH] mm: skip zombie in OOM-killer.
> This two patches are enough.
> > Does the hang up really happen or see it by code review?
> Yes. You can reproduce it for help the attached python program. It's
> not
> very clever:)
> It make the following actions in loop:
> 1. fork
> 2. mmap
> 3. touch memory
> 4. read memory
> 5. munmmap
>
> >> --
> >> 1.7.1
I have tested this for the latest mainline kernel using the reproducer
attached, the system just hung or deadlock after oom. The whole oom
trace is here.
http://people.redhat.com/qcai/oom.log

Did I miss anything?
Attachments: memeater.py (0.68 KB)


kosaki.motohiro at jp

May 8, 2011, 11:54 PM

Post #25 of 27 (2105 views)
Permalink
Re: [PATCH] mm: check zone->all_unreclaimable in all_unreclaimable() [In reply to]

>
>
> ----- Original Message -----
> > On 03/05/2011 06:20 PM, Minchan Kim wrote:
> > > On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
> > >> Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
> > >> kernel may hang up, because shrink_zones() will do nothing, but
> > >> all_unreclaimable() will say, that zone has reclaimable pages.
> > >>
> > >> do_try_to_free_pages()
> > >> shrink_zones()
> > >> for_each_zone
> > >> if (zone->all_unreclaimable)
> > >> continue
> > >> if !all_unreclaimable(zonelist, sc)
> > >> return 1
> > >>
> > >> __alloc_pages_slowpath()
> > >> retry:
> > >> did_some_progress = do_try_to_free_pages(page)
> > >> ...
> > >> if (!page&& did_some_progress)
> > >> retry;
> > >>
> > >> Signed-off-by: Andrey Vagin<avagin [at] openvz>
> > >> ---
> > >> mm/vmscan.c | 2 ++
> > >> 1 files changed, 2 insertions(+), 0 deletions(-)
> > >>
> > >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> > >> index 6771ea7..1c056f7 100644
> > >> --- a/mm/vmscan.c
> > >> +++ b/mm/vmscan.c
> > >> @@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist
> > >> *zonelist,
> > >>
> > >> for_each_zone_zonelist_nodemask(zone, z, zonelist,
> > >> gfp_zone(sc->gfp_mask), sc->nodemask) {
> > >> + if (zone->all_unreclaimable)
> > >> + continue;
> > >> if (!populated_zone(zone))
> > >> continue;
> > >> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> > >
> > > zone_reclaimable checks it. Isn't it enough?
> > I sent one more patch [PATCH] mm: skip zombie in OOM-killer.
> > This two patches are enough.
> > > Does the hang up really happen or see it by code review?
> > Yes. You can reproduce it for help the attached python program. It's
> > not
> > very clever:)
> > It make the following actions in loop:
> > 1. fork
> > 2. mmap
> > 3. touch memory
> > 4. read memory
> > 5. munmmap
> >
> > >> --
> > >> 1.7.1
> I have tested this for the latest mainline kernel using the reproducer
> attached, the system just hung or deadlock after oom. The whole oom
> trace is here.
> http://people.redhat.com/qcai/oom.log
>
> Did I miss anything?

Can you please try commit 929bea7c714220fc76ce3f75bef9056477c28e74?




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.