Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

Found the commit that causes the OOMs

 

 

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


dhowells at redhat

Jun 27, 2009, 12:12 AM

Post #1 of 65 (776 views)
Permalink
Found the commit that causes the OOMs

I've managed to bisect things to find the commit that causes the OOMs. It's:

commit 69c854817566db82c362797b4a6521d0b00fe1d8
Author: MinChan Kim <minchan.kim [at] gmail>
Date: Tue Jun 16 15:32:44 2009 -0700

vmscan: prevent shrinking of active anon lru list in case of no swap space V3

shrink_zone() can deactivate active anon pages even if we don't have a
swap device. Many embedded products don't have a swap device. So the
deactivation of anon pages is unnecessary.

This patch prevents unnecessary deactivation of anon lru pages. But, it
don't prevent aging of anon pages to swap out.

Signed-off-by: Minchan Kim <minchan.kim [at] gmail>
Acked-by: KOSAKI Motohiro <kosaki.motohiro [at] jp>
Cc: Johannes Weiner <hannes [at] cmpxchg>
Acked-by: Rik van Riel <riel [at] redhat>
Signed-off-by: Andrew Morton <akpm [at] linux-foundation>
Signed-off-by: Linus Torvalds <torvalds [at] linux-foundation>

This exhibits the problem. The previous commit:

commit 35282a2de4e5e4e173ab61aa9d7015886021a821
Author: Brice Goglin <Brice.Goglin [at] ens-lyon>
Date: Tue Jun 16 15:32:43 2009 -0700

migration: only migrate_prep() once per move_pages()

survives 16 iterations of the LTP syscall testsuite without exhibiting the
problem.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Jun 27, 2009, 5:07 AM

Post #2 of 65 (741 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

HI, David.

First of all, Thanks for your effort to find out cause.

Unfortunately, I don't have followed your problem.
I guess you met OOM problem with no swap device. right ?

My patch shouldn't have affect yours.
The patch's motivation is following as.

"If our system have no swap device, we can't reclaim anon pages.
So, anon pages's moving in anon lru list is unnecessary."

If we don't call shrink_active_list in shrink_zone's tail,
it can affect reclaim_stat->recent_[rotated|scanned].

Then it can affect number of pages for scanning in anon lru list.
But, Look at shrink_zone.

If we don't have swap device, we never scan anon lru list forcely.
(anon lru's percent is always zero)

Nonetheless, OOM happen.

Hmm..
Could I show your oops and show_mem information, please ?

Rik, Kosaki, What do you think ?

On Sat, Jun 27, 2009 at 4:12 PM, David Howells<dhowells [at] redhat> wrote:
>
> I've managed to bisect things to find the commit that causes the OOMs.  It's:
>
>        commit 69c854817566db82c362797b4a6521d0b00fe1d8
>        Author: MinChan Kim <minchan.kim [at] gmail>
>        Date:   Tue Jun 16 15:32:44 2009 -0700
>
>            vmscan: prevent shrinking of active anon lru list in case of no swap space V3
>
>            shrink_zone() can deactivate active anon pages even if we don't have a
>            swap device.  Many embedded products don't have a swap device.  So the
>            deactivation of anon pages is unnecessary.
>
>            This patch prevents unnecessary deactivation of anon lru pages.  But, it
>            don't prevent aging of anon pages to swap out.
>
>            Signed-off-by: Minchan Kim <minchan.kim [at] gmail>
>            Acked-by: KOSAKI Motohiro <kosaki.motohiro [at] jp>
>            Cc: Johannes Weiner <hannes [at] cmpxchg>
>            Acked-by: Rik van Riel <riel [at] redhat>
>            Signed-off-by: Andrew Morton <akpm [at] linux-foundation>
>            Signed-off-by: Linus Torvalds <torvalds [at] linux-foundation>
>
> This exhibits the problem.  The previous commit:
>
>        commit 35282a2de4e5e4e173ab61aa9d7015886021a821
>        Author: Brice Goglin <Brice.Goglin [at] ens-lyon>
>        Date:   Tue Jun 16 15:32:43 2009 -0700
>
>            migration: only migrate_prep() once per move_pages()
>
> survives 16 iterations of the LTP syscall testsuite without exhibiting the
> problem.
>
> David
>



--
Kinds regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hannes at cmpxchg

Jun 27, 2009, 5:54 AM

Post #3 of 65 (739 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Sat, Jun 27, 2009 at 08:12:49AM +0100, David Howells wrote:
>
> I've managed to bisect things to find the commit that causes the OOMs. It's:
>
> commit 69c854817566db82c362797b4a6521d0b00fe1d8
> Author: MinChan Kim <minchan.kim [at] gmail>
> Date: Tue Jun 16 15:32:44 2009 -0700
>
> vmscan: prevent shrinking of active anon lru list in case of no swap space V3
>
> shrink_zone() can deactivate active anon pages even if we don't have a
> swap device. Many embedded products don't have a swap device. So the
> deactivation of anon pages is unnecessary.
>
> This patch prevents unnecessary deactivation of anon lru pages. But, it
> don't prevent aging of anon pages to swap out.
>
> Signed-off-by: Minchan Kim <minchan.kim [at] gmail>
> Acked-by: KOSAKI Motohiro <kosaki.motohiro [at] jp>
> Cc: Johannes Weiner <hannes [at] cmpxchg>
> Acked-by: Rik van Riel <riel [at] redhat>
> Signed-off-by: Andrew Morton <akpm [at] linux-foundation>
> Signed-off-by: Linus Torvalds <torvalds [at] linux-foundation>
>
> This exhibits the problem. The previous commit:
>
> commit 35282a2de4e5e4e173ab61aa9d7015886021a821
> Author: Brice Goglin <Brice.Goglin [at] ens-lyon>
> Date: Tue Jun 16 15:32:43 2009 -0700
>
> migration: only migrate_prep() once per move_pages()
>
> survives 16 iterations of the LTP syscall testsuite without exhibiting the
> problem.

Here is the patch in question:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7592d8e..879d034 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1570,7 +1570,7 @@ static void shrink_zone(int priority, struct zone *zone,
* Even if we did not try to evict anon pages at all, we want to
* rebalance the anon lru active/inactive ratio.
*/
- if (inactive_anon_is_low(zone, sc))
+ if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);

throttle_vm_writeout(sc->gfp_mask);

When this was discussed, I think we missed that nr_swap_pages can
actually get zero on swap systems as well and this should have been
total_swap_pages - otherwise we also stop balancing the two anon lists
when swap is _full_ which was not the intention of this change at all.

[. There is another one hiding in shrink_zone() that does the same - it
was moved from get_scan_ratio() and is pretty old but we still kept
the inactive/active ratio halfway sane without MinChan's patch. ]

This is from your OOM-run dmesg, David:

Adding 32k swap on swapfile22. Priority:-21 extents:1 across:32k
Adding 32k swap on swapfile23. Priority:-22 extents:1 across:32k
Adding 32k swap on swapfile24. Priority:-23 extents:3 across:44k
Adding 32k swap on swapfile25. Priority:-24 extents:1 across:32k

So we actually have swap? Or are those removed again before the OOM?

If not, I think we let the anon lists rot while swap is full and when
some swap space gets freed up and we should be able to evict anon
pages again, we don't find any candidates. The following patch should
improve on that.

If it's not true for your particular situation, I think we still need
it for the scenario described above.

---
From: Johannes Weiner <hannes [at] cmpxchg>
Subject: vmscan: keep balancing anon lists on swap-full conditions

Page reclaim doesn't scan and balance the anon LRU lists when
nr_swap_pages is zero to save the scan overhead for swapless systems.

Unfortunately, this variable can reach zero when all present swap
space is occupied as well and we don't want to stop balancing in that
case or we encounter an unreclaimable mess of anon lists when swap
space gets freed up and we are theoretically in the position to page
out again.

Use the total_swap_pages variable to have a better indicator when to
scan the anon LRU lists.

We still might have unbalanced anon lists when swap space is added
during run time but it is a a less dynamic change in state and we
still save the scanning overhead for CONFIG_SWAP systems that never
actually set up swap space.

Signed-off-by: Johannes Weiner <hannes [at] cmpxchg>
---

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5415526..5ea7fc3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1524,7 +1524,7 @@ static void shrink_zone(int priority, struct zone *zone,
int noswap = 0;

/* If we have no swap space, do not bother scanning anon pages. */
- if (!sc->may_swap || (nr_swap_pages <= 0)) {
+ if (!sc->may_swap || (total_swap_pages <= 0)) {
noswap = 1;
percent[0] = 0;
percent[1] = 100;
@@ -1578,7 +1578,7 @@ static void shrink_zone(int priority, struct zone *zone,
* Even if we did not try to evict anon pages at all, we want to
* rebalance the anon lru active/inactive ratio.
*/
- if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
+ if (inactive_anon_is_low(zone, sc) && total_swap_pages > 0)
shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);

throttle_vm_writeout(sc->gfp_mask);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Jun 27, 2009, 6:50 AM

Post #4 of 65 (736 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

Hi, Hannes.

On Sat, Jun 27, 2009 at 9:54 PM, Johannes Weiner<hannes [at] cmpxchg> wrote:
> On Sat, Jun 27, 2009 at 08:12:49AM +0100, David Howells wrote:
>>
>> I've managed to bisect things to find the commit that causes the OOMs.  It's:
>>
>>       commit 69c854817566db82c362797b4a6521d0b00fe1d8
>>       Author: MinChan Kim <minchan.kim [at] gmail>
>>       Date:   Tue Jun 16 15:32:44 2009 -0700
>>
>>           vmscan: prevent shrinking of active anon lru list in case of no swap space V3
>>
>>           shrink_zone() can deactivate active anon pages even if we don't have a
>>           swap device.  Many embedded products don't have a swap device.  So the
>>           deactivation of anon pages is unnecessary.
>>
>>           This patch prevents unnecessary deactivation of anon lru pages.  But, it
>>           don't prevent aging of anon pages to swap out.
>>
>>           Signed-off-by: Minchan Kim <minchan.kim [at] gmail>
>>           Acked-by: KOSAKI Motohiro <kosaki.motohiro [at] jp>
>>           Cc: Johannes Weiner <hannes [at] cmpxchg>
>>           Acked-by: Rik van Riel <riel [at] redhat>
>>           Signed-off-by: Andrew Morton <akpm [at] linux-foundation>
>>           Signed-off-by: Linus Torvalds <torvalds [at] linux-foundation>
>>
>> This exhibits the problem.  The previous commit:
>>
>>       commit 35282a2de4e5e4e173ab61aa9d7015886021a821
>>       Author: Brice Goglin <Brice.Goglin [at] ens-lyon>
>>       Date:   Tue Jun 16 15:32:43 2009 -0700
>>
>>           migration: only migrate_prep() once per move_pages()
>>
>> survives 16 iterations of the LTP syscall testsuite without exhibiting the
>> problem.
>
> Here is the patch in question:
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7592d8e..879d034 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1570,7 +1570,7 @@ static void shrink_zone(int priority, struct zone *zone,
>         * Even if we did not try to evict anon pages at all, we want to
>         * rebalance the anon lru active/inactive ratio.
>         */
> -       if (inactive_anon_is_low(zone, sc))
> +       if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
>                shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
>
>        throttle_vm_writeout(sc->gfp_mask);
>
> When this was discussed, I think we missed that nr_swap_pages can
> actually get zero on swap systems as well and this should have been
> total_swap_pages - otherwise we also stop balancing the two anon lists
> when swap is _full_ which was not the intention of this change at all.

At that time we considered it so that we didn't prevent anon list
aging for background reclaim.
Do you think it is not enough ?



--
Kinds regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hannes at cmpxchg

Jun 27, 2009, 8:36 AM

Post #5 of 65 (734 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Sat, Jun 27, 2009 at 10:50:25PM +0900, Minchan Kim wrote:
> Hi, Hannes.
>
> On Sat, Jun 27, 2009 at 9:54 PM, Johannes Weiner<hannes [at] cmpxchg> wrote:
> > On Sat, Jun 27, 2009 at 08:12:49AM +0100, David Howells wrote:
> >>
> >> I've managed to bisect things to find the commit that causes the OOMs.  It's:
> >>
> >>       commit 69c854817566db82c362797b4a6521d0b00fe1d8
> >>       Author: MinChan Kim <minchan.kim [at] gmail>
> >>       Date:   Tue Jun 16 15:32:44 2009 -0700
> >>
> >>           vmscan: prevent shrinking of active anon lru list in case of no swap space V3
> >>
> >>           shrink_zone() can deactivate active anon pages even if we don't have a
> >>           swap device.  Many embedded products don't have a swap device.  So the
> >>           deactivation of anon pages is unnecessary.
> >>
> >>           This patch prevents unnecessary deactivation of anon lru pages.  But, it
> >>           don't prevent aging of anon pages to swap out.
> >>
> >>           Signed-off-by: Minchan Kim <minchan.kim [at] gmail>
> >>           Acked-by: KOSAKI Motohiro <kosaki.motohiro [at] jp>
> >>           Cc: Johannes Weiner <hannes [at] cmpxchg>
> >>           Acked-by: Rik van Riel <riel [at] redhat>
> >>           Signed-off-by: Andrew Morton <akpm [at] linux-foundation>
> >>           Signed-off-by: Linus Torvalds <torvalds [at] linux-foundation>
> >>
> >> This exhibits the problem.  The previous commit:
> >>
> >>       commit 35282a2de4e5e4e173ab61aa9d7015886021a821
> >>       Author: Brice Goglin <Brice.Goglin [at] ens-lyon>
> >>       Date:   Tue Jun 16 15:32:43 2009 -0700
> >>
> >>           migration: only migrate_prep() once per move_pages()
> >>
> >> survives 16 iterations of the LTP syscall testsuite without exhibiting the
> >> problem.
> >
> > Here is the patch in question:
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 7592d8e..879d034 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1570,7 +1570,7 @@ static void shrink_zone(int priority, struct zone *zone,
> >         * Even if we did not try to evict anon pages at all, we want to
> >         * rebalance the anon lru active/inactive ratio.
> >         */
> > -       if (inactive_anon_is_low(zone, sc))
> > +       if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
> >                shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
> >
> >        throttle_vm_writeout(sc->gfp_mask);
> >
> > When this was discussed, I think we missed that nr_swap_pages can
> > actually get zero on swap systems as well and this should have been
> > total_swap_pages - otherwise we also stop balancing the two anon lists
> > when swap is _full_ which was not the intention of this change at all.
>
> At that time we considered it so that we didn't prevent anon list
> aging for background reclaim.
> Do you think it is not enough ?

With a heavy multiprocess anon load, direct reclaimers will likely
reuse the reclaimed pages for anon mappings, so you have a handful of
processes shuffling pages on the active list and only one thread that
tries to balance. I can imagine that it can not keep up for long.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kosaki.motohiro at jp

Jun 27, 2009, 8:52 AM

Post #6 of 65 (735 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

> Here is the patch in question:
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7592d8e..879d034 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1570,7 +1570,7 @@ static void shrink_zone(int priority, struct zone *zone,
>         * Even if we did not try to evict anon pages at all, we want to
>         * rebalance the anon lru active/inactive ratio.
>         */
> -       if (inactive_anon_is_low(zone, sc))
> +       if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
>                shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
>
>        throttle_vm_writeout(sc->gfp_mask);
>
> When this was discussed, I think we missed that nr_swap_pages can
> actually get zero on swap systems as well and this should have been
> total_swap_pages - otherwise we also stop balancing the two anon lists
> when swap is _full_ which was not the intention of this change at all.
>
> [. There is another one hiding in shrink_zone() that does the same - it
> was moved from get_scan_ratio() and is pretty old but we still kept
> the inactive/active ratio halfway sane without MinChan's patch. ]
>
> This is from your OOM-run dmesg, David:
>
>  Adding 32k swap on swapfile22.  Priority:-21 extents:1 across:32k
>  Adding 32k swap on swapfile23.  Priority:-22 extents:1 across:32k
>  Adding 32k swap on swapfile24.  Priority:-23 extents:3 across:44k
>  Adding 32k swap on swapfile25.  Priority:-24 extents:1 across:32k
>
> So we actually have swap?  Or are those removed again before the OOM?

[grep to ltp source file]

ltp/testcases/kernel/syscalls/swapon/swapon03.c makes a lot of swap,
but it was removed when the test exit.

Then, When OOM happed, David's system don't have any swap. I don't think
your patch strike the target, unfortunately.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dhowells at redhat

Jun 27, 2009, 11:35 AM

Post #7 of 65 (728 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

Johannes Weiner <hannes [at] cmpxchg> wrote:

> This is from your OOM-run dmesg, David:
>
> Adding 32k swap on swapfile22. Priority:-21 extents:1 across:32k
> Adding 32k swap on swapfile23. Priority:-22 extents:1 across:32k
> Adding 32k swap on swapfile24. Priority:-23 extents:3 across:44k
> Adding 32k swap on swapfile25. Priority:-24 extents:1 across:32k
>
> So we actually have swap? Or are those removed again before the OOM?

That's merely a transient situation caused by the LTP swapfile tests.
Ordinarily, my test machine does not have swap. At the time the OOMs occur
there is no swapspace and the msgctl9 or msgctl11 tests are usually being run.

> The following patch should improve on that.

I can give it a spin when I get home later.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dhowells at redhat

Jun 27, 2009, 11:58 AM

Post #8 of 65 (729 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

Minchan Kim <minchan.kim [at] gmail> wrote:

> Unfortunately, I don't have followed your problem.
> I guess you met OOM problem with no swap device. right ?

That's correct. There seems to be a little bit of confusion stemming from my
report on the OOM. LTP briefly adds swap devices - which is what's appearing
in the log.

> Could I show your oops and show_mem information, please ?

There wasn't an oops per se, only a couple of OOMs, and then the systems
mostly hung (it was still accessible over the serial link to do SysRq things),
but the network was dead, and the VT logins were unusable.

I put information on the OOM in my initial report (which I'll attach here).
If you want more informaton I can get it for you when I get back home.

David

> Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley
> Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United
> Kingdom.
> Registered in England and Wales under Company Registration No. 3798903
> From: David Howells <dhowells [at] redhat>
> To: Wu Fengguang <fengguang.wu [at] intel>
> Cc: dhowells [at] redhat, Andrew Morton <akpm [at] linux-foundation>,
> LKML <linux-kernel [at] vger>,
> Christoph Lameter <cl [at] linux-foundation>,
> KOSAKI Motohiro <kosaki.motohiro [at] jp>,
> "hannes [at] cmpxchg" <hannes [at] cmpxchg>,
> "peterz [at] infradead" <peterz [at] infradead>,
> "riel [at] redhat" <riel [at] redhat>, "tytso [at] mit" <tytso [at] mit>,
> "linux-mm [at] kvack" <linux-mm [at] kvack>,
> "elladan [at] eskimo" <elladan [at] eskimo>,
> "npiggin [at] suse" <npiggin [at] suse>,
> "minchan.kim [at] gmail" <minchan.kim [at] gmail>
> Subject: Re: [PATCH 0/3] make mapped executable pages the first class citizen
> Date: Thu, 18 Jun 2009 15:46:52 +0100
> Sender: dhowells [at] redhat
>
>
> Hmmm.... It's possible that this makes my test box implode horribly when
> running LTP.
>
> I'm going to bisect it to see if this is actually due to your patches.
>
> Note that I don't have any swap space. This after a fresh reboot:
>
> [root [at] andromed ~]# cat /proc/meminfo
> MemTotal: 1000624 kB
> MemFree: 797328 kB
> Buffers: 13272 kB
> Cached: 121744 kB
> SwapCached: 0 kB
> Active: 36240 kB
> Inactive: 115856 kB
> Active(anon): 17448 kB
> Inactive(anon): 0 kB
> Active(file): 18792 kB
> Inactive(file): 115856 kB
> Unevictable: 0 kB
> Mlocked: 0 kB
> SwapTotal: 0 kB
> SwapFree: 0 kB
> Dirty: 28 kB
> Writeback: 0 kB
> AnonPages: 17280 kB
> Mapped: 5376 kB
> Slab: 42984 kB
> SReclaimable: 6956 kB
> SUnreclaim: 36028 kB
> PageTables: 1304 kB
> NFS_Unstable: 0 kB
> Bounce: 0 kB
> WritebackTmp: 0 kB
> CommitLimit: 500312 kB
> Committed_AS: 52596 kB
> VmallocTotal: 34359738367 kB
> VmallocUsed: 190044 kB
> VmallocChunk: 34359546363 kB
> DirectMap4k: 13312 kB
> DirectMap2M: 1009664 kB
>
> David
> ---
> Initializing cgroup subsys cpuset
> Linux version 2.6.30-cachefs (dhowells [at] warthog) (gcc version 4.4.0 20090506 (Red Hat 4.4.0-4) (GCC) ) #106 SMP Wed Jun 17 22:10:31 BST 2009
> Command line: initrd=andromeda-initrd console=tty0 console=ttyS0,115200 ro root=/dev/sda2 enforcing=1 debug BOOT_IMAGE=andromeda-vmlinuz
> KERNEL supported cpus:
> Intel GenuineIntel
> AMD AuthenticAMD
> Centaur CentaurHauls
> BIOS-provided physical RAM map:
> BIOS-e820: 0000000000000000 - 000000000009ec00 (usable)
> BIOS-e820: 000000000009ec00 - 00000000000a0000 (reserved)
> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> BIOS-e820: 0000000000100000 - 000000003e59a000 (usable)
> BIOS-e820: 000000003e59a000 - 000000003e5a6000 (reserved)
> BIOS-e820: 000000003e5a6000 - 000000003e644000 (usable)
> BIOS-e820: 000000003e644000 - 000000003e6a9000 (ACPI NVS)
> BIOS-e820: 000000003e6a9000 - 000000003e6ac000 (ACPI data)
> BIOS-e820: 000000003e6ac000 - 000000003e6f2000 (ACPI NVS)
> BIOS-e820: 000000003e6f2000 - 000000003e6ff000 (ACPI data)
> BIOS-e820: 000000003e6ff000 - 000000003e700000 (usable)
> BIOS-e820: 000000003e700000 - 000000003f000000 (reserved)
> BIOS-e820: 00000000fff00000 - 0000000100000000 (reserved)
> DMI 2.4 present.
> last_pfn = 0x3e700 max_arch_pfn = 0x400000000
> MTRR default type: uncachable
> MTRR fixed ranges enabled:
> 00000-9FFFF write-back
> A0000-FFFFF uncachable
> MTRR variable ranges enabled:
> 0 base 000000000 mask FC0000000 write-back
> 1 base 03F000000 mask FFF000000 uncachable
> 2 base 03E800000 mask FFF800000 uncachable
> 3 base 03E700000 mask FFFF00000 uncachable
> 4 disabled
> 5 disabled
> 6 disabled
> 7 disabled
> x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
> initial memory mapped : 0 - 20000000
> init_memory_mapping: 0000000000000000-000000003e700000
> 0000000000 - 003e600000 page 2M
> 003e600000 - 003e700000 page 4k
> kernel direct mapping tables up to 3e700000 @ 8000-b000
> RAMDISK: 3e2ee000 - 3e57991c
> ACPI: RSDP 00000000000fe020 00014 (v00 INTEL )
> ACPI: RSDT 000000003e6fd038 0004C (v01 INTEL DG965RY 00000330 01000013)
> ACPI: FACP 000000003e6fc000 00074 (v01 INTEL DG965RY 00000330 MSFT 01000013)
> ACPI: DSDT 000000003e6f8000 03EDA (v01 INTEL DG965RY 00000330 MSFT 01000013)
> ACPI: FACS 000000003e6ac000 00040
> ACPI: APIC 000000003e6f7000 00078 (v01 INTEL DG965RY 00000330 MSFT 01000013)
> ACPI: WDDT 000000003e6f6000 00040 (v01 INTEL DG965RY 00000330 MSFT 01000013)
> ACPI: MCFG 000000003e6f5000 0003C (v01 INTEL DG965RY 00000330 MSFT 01000013)
> ACPI: ASF! 000000003e6f4000 000A6 (v32 INTEL DG965RY 00000330 MSFT 01000013)
> ACPI: SSDT 000000003e6f3000 001BC (v01 INTEL CpuPm 00000330 MSFT 01000013)
> ACPI: SSDT 000000003e6f2000 00175 (v01 INTEL Cpu0Ist 00000330 MSFT 01000013)
> ACPI: SSDT 000000003e6ab000 00175 (v01 INTEL Cpu1Ist 00000330 MSFT 01000013)
> ACPI: SSDT 000000003e6aa000 00175 (v01 INTEL Cpu2Ist 00000330 MSFT 01000013)
> ACPI: SSDT 000000003e6a9000 00175 (v01 INTEL Cpu3Ist 00000330 MSFT 01000013)
> ACPI: Local APIC address 0xfee00000
> (7 early reservations) ==> bootmem [0000000000 - 003e700000]
> #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
> #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000]
> #2 [0001000000 - 0001535d90] TEXT DATA BSS ==> [0001000000 - 0001535d90]
> #3 [003e2ee000 - 003e57991c] RAMDISK ==> [003e2ee000 - 003e57991c]
> #4 [000009e800 - 0000100000] BIOS reserved ==> [000009e800 - 0000100000]
> #5 [0001536000 - 0001536199] BRK ==> [0001536000 - 0001536199]
> #6 [0000008000 - 0000009000] PGTABLE ==> [0000008000 - 0000009000]
> found SMP MP-table at [ffff8800000fe200] fe200
> [ffffea0000000000-ffffea0000dfffff] PMD -> [ffff880001a00000-ffff8800027fffff] on node 0
> Zone PFN ranges:
> DMA 0x00000000 -> 0x00001000
> DMA32 0x00001000 -> 0x00100000
> Normal 0x00100000 -> 0x00100000
> Movable zone start PFN for each node
> early_node_map[4] active PFN ranges
> 0: 0x00000000 -> 0x0000009e
> 0: 0x00000100 -> 0x0003e59a
> 0: 0x0003e5a6 -> 0x0003e644
> 0: 0x0003e6ff -> 0x0003e700
> On node 0 totalpages: 255447
> DMA zone: 56 pages used for memmap
> DMA zone: 101 pages reserved
> DMA zone: 3841 pages, LIFO batch:0
> DMA32 zone: 3441 pages used for memmap
> DMA32 zone: 248008 pages, LIFO batch:31
> ACPI: PM-Timer IO Port: 0x408
> ACPI: Local APIC address 0xfee00000
> ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
> ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
> ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
> ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
> ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
> ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
> IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> ACPI: IRQ0 used by override.
> ACPI: IRQ2 used by override.
> ACPI: IRQ9 used by override.
> Using ACPI (MADT) for SMP configuration information
> 4 Processors exceeds NR_CPUS limit of 2
> SMP: Allowing 2 CPUs, 0 hotplug CPUs
> nr_irqs_gsi: 24
> PM: Registered nosave memory: 000000000009e000 - 000000000009f000
> PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
> PM: Registered nosave memory: 00000000000a0000 - 00000000000e0000
> PM: Registered nosave memory: 00000000000e0000 - 0000000000100000
> PM: Registered nosave memory: 000000003e59a000 - 000000003e5a6000
> PM: Registered nosave memory: 000000003e644000 - 000000003e6a9000
> PM: Registered nosave memory: 000000003e6a9000 - 000000003e6ac000
> PM: Registered nosave memory: 000000003e6ac000 - 000000003e6f2000
> PM: Registered nosave memory: 000000003e6f2000 - 000000003e6ff000
> Allocating PCI resources starting at 3f000000 (gap: 3f000000:c0f00000)
> NR_CPUS:2 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:1
> PERCPU: Embedded 24 pages at ffff880001541000, static data 67296 bytes
> Built 1 zonelists in Zone order, mobility grouping on. Total pages: 251849
> Kernel command line: initrd=andromeda-initrd console=tty0 console=ttyS0,115200 ro root=/dev/sda2 enforcing=1 debug BOOT_IMAGE=andromeda-vmlinuz
> PID hash table entries: 4096 (order: 12, 32768 bytes)
> Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
> Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
> Initializing CPU#0
> Checking aperture...
> No AGP bridge found
> Memory: 996952k/1022976k available (2953k kernel code, 1188k absent, 24132k reserved, 1678k data, 360k init)
> NR_IRQS:320
> Fast TSC calibration using PIT
> Detected 1864.978 MHz processor.
> Console: colour VGA+ 80x25
> console [tty0] enabled
> console [ttyS0] enabled
> Calibrating delay loop (skipped), value calculated using timer frequency.. 3729.95 BogoMIPS (lpj=7459912)
> Security Framework initialized
> SELinux: Initializing.
> SELinux: Starting in enforcing mode
> Mount-cache hash table entries: 256
> Initializing cgroup subsys debug
> Initializing cgroup subsys ns
> Initializing cgroup subsys devices
> CPU: L1 I cache: 32K, L1 D cache: 32K
> CPU: L2 cache: 2048K
> CPU: Physical Processor ID: 0
> CPU: Processor Core ID: 0
> mce: CPU supports 6 MCE banks
> CPU0: Thermal monitoring enabled (TM2)
> using mwait in idle threads.
> ACPI: Core revision 20090521
> Setting APIC routing to flat
> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> CPU0: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz stepping 06
> Booting processor 1 APIC 0x1 ip 0x6000
> Initializing CPU#1
> Calibrating delay using timer specific routine.. 3525.06 BogoMIPS (lpj=7050122)
> CPU: L1 I cache: 32K, L1 D cache: 32K
> CPU: L2 cache: 2048K
> CPU: Physical Processor ID: 0
> CPU: Processor Core ID: 1
> mce: CPU supports 6 MCE banks
> CPU1: Thermal monitoring enabled (TM2)
> x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
> CPU1: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz stepping 06
> checking TSC synchronization [CPU#0 -> CPU#1]: passed.
> Brought up 2 CPUs
> Total of 2 processors activated (7255.01 BogoMIPS).
> NET: Registered protocol family 16
> ACPI: bus type pci registered
> PCI: MCFG configuration 0: base f0000000 segment 0 buses 0 - 127
> PCI: Not using MMCONFIG.
> PCI: Using configuration type 1 for base access
> bio: create slab <bio-0> at 0
> ACPI: EC: Look up EC in DSDT
> ACPI: Interpreter enabled
> ACPI: (supports S0 S3 S4 S5)
> ACPI: Using IOAPIC for interrupt routing
> PCI: MCFG configuration 0: base f0000000 segment 0 buses 0 - 127
> PCI: MCFG area at f0000000 reserved in ACPI motherboard resources
> PCI: Using MMCONFIG at f0000000 - f7ffffff
> ACPI: No dock devices found.
> ACPI: PCI Root Bridge [PCI0] (0000:00)
> pci 0000:00:02.0: reg 10 32bit mmio: [0x50200000-0x502fffff]
> pci 0000:00:02.0: reg 18 64bit mmio: [0x40000000-0x4fffffff]
> pci 0000:00:02.0: reg 20 io port: [0x2110-0x2117]
> pci 0000:00:03.0: reg 10 64bit mmio: [0x50326100-0x5032610f]
> pci 0000:00:03.0: PME# supported from D0 D3hot D3cold
> pci 0000:00:03.0: PME# disabled
> pci 0000:00:19.0: reg 10 32bit mmio: [0x50300000-0x5031ffff]
> pci 0000:00:19.0: reg 14 32bit mmio: [0x50324000-0x50324fff]
> pci 0000:00:19.0: reg 18 io port: [0x20e0-0x20ff]
> pci 0000:00:19.0: PME# supported from D0 D3hot D3cold
> pci 0000:00:19.0: PME# disabled
> pci 0000:00:1a.0: reg 20 io port: [0x20c0-0x20df]
> pci 0000:00:1a.1: reg 20 io port: [0x20a0-0x20bf]
> pci 0000:00:1a.7: reg 10 32bit mmio: [0x50325c00-0x50325fff]
> pci 0000:00:1a.7: PME# supported from D0 D3hot D3cold
> pci 0000:00:1a.7: PME# disabled
> pci 0000:00:1b.0: reg 10 64bit mmio: [0x50320000-0x50323fff]
> pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
> pci 0000:00:1b.0: PME# disabled
> pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
> pci 0000:00:1c.0: PME# disabled
> pci 0000:00:1c.1: PME# supported from D0 D3hot D3cold
> pci 0000:00:1c.1: PME# disabled
> pci 0000:00:1c.2: PME# supported from D0 D3hot D3cold
> pci 0000:00:1c.2: PME# disabled
> pci 0000:00:1c.3: PME# supported from D0 D3hot D3cold
> pci 0000:00:1c.3: PME# disabled
> pci 0000:00:1c.4: PME# supported from D0 D3hot D3cold
> pci 0000:00:1c.4: PME# disabled
> pci 0000:00:1d.0: reg 20 io port: [0x2080-0x209f]
> pci 0000:00:1d.1: reg 20 io port: [0x2060-0x207f]
> pci 0000:00:1d.2: reg 20 io port: [0x2040-0x205f]
> pci 0000:00:1d.7: reg 10 32bit mmio: [0x50325800-0x50325bff]
> pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold
> pci 0000:00:1d.7: PME# disabled
> pci 0000:00:1f.0: quirk: region 0400-047f claimed by ICH6 ACPI/GPIO/TCO
> pci 0000:00:1f.0: quirk: region 0500-053f claimed by ICH6 GPIO
> pci 0000:00:1f.0: ICH7 LPC Generic IO decode 1 PIO at 0680 (mask 007f)
> pci 0000:00:1f.2: reg 10 io port: [0x2108-0x210f]
> pci 0000:00:1f.2: reg 14 io port: [0x211c-0x211f]
> pci 0000:00:1f.2: reg 18 io port: [0x2100-0x2107]
> pci 0000:00:1f.2: reg 1c io port: [0x2118-0x211b]
> pci 0000:00:1f.2: reg 20 io port: [0x2020-0x203f]
> pci 0000:00:1f.2: reg 24 32bit mmio: [0x50325000-0x503257ff]
> pci 0000:00:1f.2: PME# supported from D3hot
> pci 0000:00:1f.2: PME# disabled
> pci 0000:00:1f.3: reg 10 32bit mmio: [0x50326000-0x503260ff]
> pci 0000:00:1f.3: reg 20 io port: [0x2000-0x201f]
> pci 0000:00:1c.0: bridge 32bit mmio: [0x50400000-0x504fffff]
> pci 0000:02:00.0: reg 10 io port: [0x1018-0x101f]
> pci 0000:02:00.0: reg 14 io port: [0x1024-0x1027]
> pci 0000:02:00.0: reg 18 io port: [0x1010-0x1017]
> pci 0000:02:00.0: reg 1c io port: [0x1020-0x1023]
> pci 0000:02:00.0: reg 20 io port: [0x1000-0x100f]
> pci 0000:02:00.0: reg 24 32bit mmio: [0x50100000-0x501001ff]
> pci 0000:02:00.0: supports D1
> pci 0000:02:00.0: PME# supported from D0 D1 D3hot
> pci 0000:02:00.0: PME# disabled
> pci 0000:00:1c.1: bridge io port: [0x1000-0x1fff]
> pci 0000:00:1c.1: bridge 32bit mmio: [0x50100000-0x501fffff]
> pci 0000:00:1c.2: bridge 32bit mmio: [0x50500000-0x505fffff]
> pci 0000:00:1c.3: bridge 32bit mmio: [0x50600000-0x506fffff]
> pci 0000:00:1c.4: bridge 32bit mmio: [0x50700000-0x507fffff]
> pci 0000:06:03.0: reg 10 32bit mmio: [0x50004000-0x500047ff]
> pci 0000:06:03.0: reg 14 32bit mmio: [0x50000000-0x50003fff]
> pci 0000:06:03.0: supports D1 D2
> pci 0000:06:03.0: PME# supported from D0 D1 D2 D3hot
> pci 0000:06:03.0: PME# disabled
> pci 0000:00:1e.0: transparent bridge
> pci 0000:00:1e.0: bridge 32bit mmio: [0x50000000-0x500fffff]
> pci_bus 0000:00: on NUMA node 0
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P32_._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX0._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX1._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX2._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX3._PRT]
> ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX4._PRT]
> ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 7 9 10 *11 12)
> ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 7 9 *10 11 12)
> ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 7 9 10 *11 12)
> ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 7 9 10 *11 12)
> ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 7 *9 10 11 12)
> ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 7 9 *10 11 12)
> ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 7 *9 10 11 12)
> ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 7 9 10 *11 12)
> SCSI subsystem initialized
> libata version 3.00 loaded.
> PCI: Using ACPI for IRQ routing
> NetLabel: Initializing
> NetLabel: domain hash size = 128
> NetLabel: protocols = UNLABELED CIPSOv4
> NetLabel: unlabeled traffic allowed by default
> pnp: PnP ACPI init
> ACPI: bus type pnp registered
> pnp: PnP ACPI: found 12 devices
> ACPI: ACPI bus type pnp unregistered
> system 00:01: iomem range 0xf0000000-0xf7ffffff has been reserved
> system 00:01: iomem range 0xfed13000-0xfed13fff has been reserved
> system 00:01: iomem range 0xfed14000-0xfed17fff has been reserved
> system 00:01: iomem range 0xfed18000-0xfed18fff has been reserved
> system 00:01: iomem range 0xfed19000-0xfed19fff has been reserved
> system 00:01: iomem range 0xfed1c000-0xfed1ffff has been reserved
> system 00:01: iomem range 0xfed20000-0xfed3ffff has been reserved
> system 00:01: iomem range 0xfed45000-0xfed99fff has been reserved
> system 00:01: iomem range 0xc0000-0xdffff has been reserved
> system 00:01: iomem range 0xe0000-0xfffff could not be reserved
> system 00:06: ioport range 0x500-0x53f has been reserved
> system 00:06: ioport range 0x400-0x47f has been reserved
> system 00:06: ioport range 0x680-0x6ff has been reserved
> pci 0000:00:1c.0: PCI bridge, secondary bus 0000:01
> pci 0000:00:1c.0: IO window: disabled
> pci 0000:00:1c.0: MEM window: 0x50400000-0x504fffff
> pci 0000:00:1c.0: PREFETCH window: disabled
> pci 0000:00:1c.1: PCI bridge, secondary bus 0000:02
> pci 0000:00:1c.1: IO window: 0x1000-0x1fff
> pci 0000:00:1c.1: MEM window: 0x50100000-0x501fffff
> pci 0000:00:1c.1: PREFETCH window: disabled
> pci 0000:00:1c.2: PCI bridge, secondary bus 0000:03
> pci 0000:00:1c.2: IO window: disabled
> pci 0000:00:1c.2: MEM window: 0x50500000-0x505fffff
> pci 0000:00:1c.2: PREFETCH window: disabled
> pci 0000:00:1c.3: PCI bridge, secondary bus 0000:04
> pci 0000:00:1c.3: IO window: disabled
> pci 0000:00:1c.3: MEM window: 0x50600000-0x506fffff
> pci 0000:00:1c.3: PREFETCH window: disabled
> pci 0000:00:1c.4: PCI bridge, secondary bus 0000:05
> pci 0000:00:1c.4: IO window: disabled
> pci 0000:00:1c.4: MEM window: 0x50700000-0x507fffff
> pci 0000:00:1c.4: PREFETCH window: disabled
> pci 0000:00:1e.0: PCI bridge, secondary bus 0000:06
> pci 0000:00:1e.0: IO window: disabled
> pci 0000:00:1e.0: MEM window: 0x50000000-0x500fffff
> pci 0000:00:1e.0: PREFETCH window: disabled
> pci 0000:00:1c.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> pci 0000:00:1c.0: setting latency timer to 64
> pci 0000:00:1c.1: PCI INT B -> GSI 16 (level, low) -> IRQ 16
> pci 0000:00:1c.1: setting latency timer to 64
> pci 0000:00:1c.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
> pci 0000:00:1c.2: setting latency timer to 64
> pci 0000:00:1c.3: PCI INT D -> GSI 19 (level, low) -> IRQ 19
> pci 0000:00:1c.3: setting latency timer to 64
> pci 0000:00:1c.4: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> pci 0000:00:1c.4: setting latency timer to 64
> pci 0000:00:1e.0: setting latency timer to 64
> pci_bus 0000:00: resource 0 io: [0x00-0xffff]
> pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
> pci_bus 0000:01: resource 1 mem: [0x50400000-0x504fffff]
> pci_bus 0000:02: resource 0 io: [0x1000-0x1fff]
> pci_bus 0000:02: resource 1 mem: [0x50100000-0x501fffff]
> pci_bus 0000:03: resource 1 mem: [0x50500000-0x505fffff]
> pci_bus 0000:04: resource 1 mem: [0x50600000-0x506fffff]
> pci_bus 0000:05: resource 1 mem: [0x50700000-0x507fffff]
> pci_bus 0000:06: resource 1 mem: [0x50000000-0x500fffff]
> pci_bus 0000:06: resource 3 io: [0x00-0xffff]
> pci_bus 0000:06: resource 4 mem: [0x000000-0xffffffffffffffff]
> NET: Registered protocol family 2
> IP route cache hash table entries: 32768 (order: 6, 262144 bytes)
> TCP established hash table entries: 131072 (order: 9, 2097152 bytes)
> TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
> TCP: Hash tables configured (established 131072 bind 65536)
> TCP reno registered
> NET: Registered protocol family 1
> Unpacking initramfs...
> Freeing initrd memory: 2606k freed
> audit: initializing netlink socket (disabled)
> type=2000 audit(1245320564.157:1): initialized
> VFS: Disk quotas dquot_6.5.2
> Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
> SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
> msgmni has been set to 1953
> SELinux: Registering netfilter hooks
> alg: No test for fcrypt (fcrypt-generic)
> alg: No test for stdrng (krng)
> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
> io scheduler noop registered
> io scheduler anticipatory registered (default)
> io scheduler deadline registered
> io scheduler cfq registered
> pci 0000:00:02.0: Boot video device
> pcieport-driver 0000:00:1c.0: irq 24 for MSI/MSI-X
> pcieport-driver 0000:00:1c.0: setting latency timer to 64
> pcieport-driver 0000:00:1c.1: irq 25 for MSI/MSI-X
> pcieport-driver 0000:00:1c.1: setting latency timer to 64
> pcieport-driver 0000:00:1c.2: irq 26 for MSI/MSI-X
> pcieport-driver 0000:00:1c.2: setting latency timer to 64
> pcieport-driver 0000:00:1c.3: irq 27 for MSI/MSI-X
> pcieport-driver 0000:00:1c.3: setting latency timer to 64
> pcieport-driver 0000:00:1c.4: irq 28 for MSI/MSI-X
> pcieport-driver 0000:00:1c.4: setting latency timer to 64
> input: Power Button as /class/input/input0
> ACPI: Power Button [PWRF]
> input: Sleep Button as /class/input/input1
> ACPI: Sleep Button [SLPB]
> processor ACPI_CPU:00: registered as cooling_device0
> ACPI: Processor [CPU0] (supports 8 throttling states)
> processor ACPI_CPU:01: registered as cooling_device1
> ACPI: Processor [CPU1] (supports 8 throttling states)
> Linux agpgart interface v0.103
> agpgart-intel 0000:00:00.0: Intel 965G Chipset
> agpgart-intel 0000:00:00.0: detected 7676K stolen memory
> agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0x40000000
> intelfb: Framebuffer driver for Intel(R) 830M/845G/852GM/855GM/865G/915G/915GM/945G/945GM/945GME/965G/965GM chipsets
> intelfb: Version 0.9.6
> intelfb 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> intelfb: 00:02.0: Intel(R) 965G, aperture size 256MB, stolen memory 7932kB
> intelfb: Initial video mode is 1024x768-32 [at] 70
> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> Platform driver 'serial8250' needs updating - please use dev_pm_ops
> 00:0a: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> loop: module loaded
> Driver 'sd' needs updating - please use bus_type methods
> ahci 0000:00:1f.2: version 3.0
> ahci 0000:00:1f.2: PCI INT A -> GSI 19 (level, low) -> IRQ 19
> ahci 0000:00:1f.2: irq 29 for MSI/MSI-X
> ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0x33 impl SATA mode
> ahci 0000:00:1f.2: flags: 64bit ncq sntf led clo pio slum part ems
> ahci 0000:00:1f.2: setting latency timer to 64
> scsi0 : ahci
> scsi1 : ahci
> scsi2 : ahci
> scsi3 : ahci
> scsi4 : ahci
> scsi5 : ahci
> ata1: SATA max UDMA/133 abar m2048 [at] 0x5032500 port 0x50325100 irq 29
> ata2: SATA max UDMA/133 abar m2048 [at] 0x5032500 port 0x50325180 irq 29
> ata3: DUMMY
> ata4: DUMMY
> ata5: SATA max UDMA/133 abar m2048 [at] 0x5032500 port 0x50325300 irq 29
> ata6: SATA max UDMA/133 abar m2048 [at] 0x5032500 port 0x50325380 irq 29
> e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
> e1000e: Copyright (c) 1999-2008 Intel Corporation.
> e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
> e1000e 0000:00:19.0: setting latency timer to 64
> e1000e 0000:00:19.0: irq 30 for MSI/MSI-X
> 0000:00:19.0: eth0: (PCI Express:2.5GB/s:Width x1) 00:16:76:ce:3a:3c
> 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
> 0000:00:19.0: eth0: MAC: 6, PHY: 6, PBA No: ffffff-0ff
> PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12
> Platform driver 'i8042' needs updating - please use dev_pm_ops
> serio: i8042 KBD port at 0x60,0x64 irq 1
> serio: i8042 AUX port at 0x60,0x64 irq 12
> mice: PS/2 mouse device common for all mice
> rtc_cmos 00:03: RTC can wake from S4
> rtc_cmos 00:03: rtc core: registered rtc_cmos as rtc0
> rtc0: alarms up to one month, 114 bytes nvram
> i2c /dev entries driver
> i801_smbus 0000:00:1f.3: PCI INT B -> GSI 21 (level, low) -> IRQ 21
> coretemp coretemp.0: Using relative temperature scale!
> coretemp coretemp.1: Using relative temperature scale!
> cpuidle: using governor ladder
> ip_tables: (C) 2000-2006 Netfilter Core Team
> TCP cubic registered
> input: AT Translated Set 2 keyboard as /class/input/input2
> NET: Registered protocol family 17
> ata2: SATA link down (SStatus 0 SControl 300)
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> registered taskstats version 1
> ata6: SATA link down (SStatus 0 SControl 300)
> ata5: SATA link down (SStatus 0 SControl 300)
> rtc_cmos 00:03: setting system clock to 2009-06-18 10:22:46 UTC (1245320566)
> ata1.00: ATA-7: ST380211AS, 3.AAE, max UDMA/133
> ata1.00: 156301488 sectors, multi 0: LBA48 NCQ (depth 31/32)
> ata1.00: configured for UDMA/133
> scsi 0:0:0:0: Direct-Access ATA ST380211AS 3.AA PQ: 0 ANSI: 5
> sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors: (80.0 GB/74.5 GiB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 >
> sd 0:0:0:0: [sda] Attached SCSI disk
> Freeing unused kernel memory: 360k freed
> Write protecting the kernel read-only data: 4324k
> Red Hat nash version 6.0.52 starting
> Mounting proc filesystem
> Mounting sysfs filesystem
> Creating /dev
> Creating initial device nodes
> Setting up hotplug.
> input: ImPS/2 Generic Wheel Mouse as /class/input/input3
> Creating block device nodes.
> mount: could not find filesystem '/proc/bus/usb'
> Waiting for driver initialization.
> Waiting for driver initialization.
> Creating root device.
> Mounting root filesystem.
> EXT3-fs: INFO: recovery required on readonly filesystem.
> EXT3-fs: write access will be enabled during recovery.
> kjournald starting. Commit interval 5 seconds
> Setting up otherEXT3-fs: recovery complete.
> filesystems.
> EXT3-fs: mounted filesystem with writeback data mode.
> Setting up new root fs
> no fstab.sys, mounting internal defaults
> SELinux: 8192 avtab hash slots, 177803 rules.
> SELinux: 8192 avtab hash slots, 177803 rules.
> SELinux: 6 users, 12 roles, 2431 types, 118 bools, 1 sens, 1024 cats
> SELinux: 73 classes, 177803 rules
> SELinux: class kernel_service not defined in policy
> SELinux: permission open in class sock_file not defined in policy
> SELinux: permission nlmsg_tty_audit in class netlink_audit_socket not defined in policy
> SELinux: the above unknown classes and permissions will be allowed
> SELinux: Completing initialization.
> SELinux: Setting up existing superblocks.
> SELinux: initialized (dev sda2, type ext3), uses xattr
> SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
> SELinux: initialized (dev selinuxfs, type selinuxfs), uses genfs_contexts
> SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs
> SELinux: initialized (dev devpts, type devpts), uses transition SIDs
> SELinux: initialized (dev inotifyfs, type inotifyfs), uses genfs_contexts
> SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
> SELinux: initialized (dev anon_inodefs, type anon_inodefs), uses genfs_contexts
> SELinux: initialized (dev pipefs, type pipefs), uses task SIDs
> SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts
> SELinux: initialized (dev sockfs, type sockfs), uses task SIDs
> SELinux: initialized (dev proc, type proc), uses genfs_contexts
> SELinux: initialized (dev bdev, type bdev), uses genfs_contexts
> SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts
> SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts
> type=1403 audit(1245320574.561:2): policy loaded auid=4294967295 ses=4294967295
> Switching to new root and running init.
> unmounting old /dev
> unmounting old /proc
> unmounting old /sys
> Welcome to Fedora
> Press 'I' to enter interactive startup.
> Starting udev: [ OK ]
> Setting hostname andromeda.procyon.org.uk: [ OK ]
> Checking filesystems
> Checking all file systems.
> [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/sda2
> /1: clean, 330515/2621440 files, 1528849/2620603 blocks
> [/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a /dev/sda1
> /boot1: recovering journal
> /boot1: clean, 79/50200 files, 72187/200780 blocks
> [ OK ]
> Remounting root filesystem in read-write mode: [ OK ]
> Mounting local filesystems: [ OK ]
> Enabling local filesystem quotas: [ OK ]
> Enabling /etc/fstab swaps: [ OK ]
> Entering non-interactive startup
> Starting background readahead (early, fast mode): [ OK ]
> FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
> Bringing up loopback interface: [ OK ]
> Bringing up interface eth0:
> Determining IP information for eth0... done.
> [ OK ]
> FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
> Starting restorecond: [ OK ]
> Starting auditd: [ OK ]
> Starting irqbalance: [ OK ]
> Starting mcstransd: [ OK ]
> Starting rpcbind: modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> rpcbind: cannot create socket for udp6
> modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> rpcbind: cannot create socket for tcp6
> [ OK ]
> modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> Starting NFS statd: [ OK ]
> Starting system message bus: [ OK ]
> Starting lm_sensors: not configured, run sensors-detect[WARNING]
> Starting sshd: modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> [ OK ]
> modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> Starting ntpd: [ OK ]
> modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> SysRq : Changing Loglevel
> Loglevel set to 8
> Now booted
> Starting smartd: modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> [ OK ]
>
> Fedora release 9 (Sulphur)
> Kernel 2.6.30-cachefs on an x86_64 (/dev/ttyS0)
>
> andromeda.procyon.org.uk login: modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> warning: `capget01' uses 32-bit capabilities (legacy support in use)
> modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> modprobe: FATAL: Could not load /lib/modules/2.6.30-cachefs/modules.dep: No such file or directory
>
> msgctl11 invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0
> msgctl11 cpuset=/ mems_allowed=0
> Pid: 30549, comm: msgctl11 Not tainted 2.6.30-cachefs #106
> Call Trace:
> [<ffffffff81071dae>] ? oom_kill_process.clone.0+0xa9/0x245
> [<ffffffff81072075>] ? __out_of_memory+0x12b/0x142
> [<ffffffff810720f6>] ? out_of_memory+0x6a/0x94
> [<ffffffff8107479e>] ? __alloc_pages_nodemask+0x422/0x50b
> [<ffffffff81031110>] ? copy_process+0x93/0x113f
> [<ffffffff810748f1>] ? __get_free_pages+0x12/0x50
> [<ffffffff81031130>] ? copy_process+0xb3/0x113f
> [<ffffffff81081ae2>] ? handle_mm_fault+0x2d5/0x645
> [<ffffffff810322fb>] ? do_fork+0x13f/0x2ba
> [<ffffffff81022a0b>] ? do_page_fault+0x1f1/0x206
> [<ffffffff8100b0d3>] ? stub_clone+0x13/0x20
> [<ffffffff8100ad6b>] ? system_call_fastpath+0x16/0x1b
> Mem-Info:
> DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> CPU 1: hi: 0, btch: 1 usd: 0
> DMA32 per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 0
> CPU 1: hi: 186, btch: 31 usd: 47
> Active_anon:80388 active_file:0 inactive_anon:822
> inactive_file:2 unevictable:0 dirty:0 writeback:0 unstable:0
> free:2053 slab:38793 mapped:357 pagetables:60476 bounce:0
> DMA free:3916kB min:60kB low:72kB high:88kB active_anon:3608kB inactive_anon:128kB active_file:0kB inactive_file:0kB unevictable:0kB present:15364kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 968 968 968
> DMA32 free:4296kB min:3948kB low:4932kB high:5920kB active_anon:317944kB inactive_anon:3160kB active_file:0kB inactive_file:8kB unevictable:0kB present:992032kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> DMA: 1*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3916kB
> DMA32: 576*4kB 15*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 4296kB
> 1854 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap = 0kB
> Total swap = 0kB
> 255744 pages RAM
> 5588 pages reserved
> 230698 pages shared
> 217103 pages non-shared
> Out of memory: kill process 25166 (msgctl11) score 133496 or a child
> Killed process 28855 (msgctl11)
> msgctl11 invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0
> msgctl11 cpuset=/ mems_allowed=0
> Pid: 30312, comm: msgctl11 Not tainted 2.6.30-cachefs #106
> Call Trace:
> [<ffffffff81071dae>] ? oom_kill_process.clone.0+0xa9/0x245
> [<ffffffff81072075>] ? __out_of_memory+0x12b/0x142
> [<ffffffff810720f6>] ? out_of_memory+0x6a/0x94
> [<ffffffff8107479e>] ? __alloc_pages_nodemask+0x422/0x50b
> [<ffffffff81031110>] ? copy_process+0x93/0x113f
> [<ffffffff810748f1>] ? __get_free_pages+0x12/0x50
> [<ffffffff81031130>] ? copy_process+0xb3/0x113f
> [<ffffffff81029a83>] ? update_curr+0x53/0xdf
> [<ffffffff81081e00>] ? handle_mm_fault+0x5f3/0x645
> [<ffffffff810322fb>] ? do_fork+0x13f/0x2ba
> [<ffffffff81022a0b>] ? do_page_fault+0x1f1/0x206
> [<ffffffff8100b0d3>] ? stub_clone+0x13/0x20
> [<ffffffff8100ad6b>] ? system_call_fastpath+0x16/0x1b
> Mem-Info:
> DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> CPU 1: hi: 0, btch: 1 usd: 0
> DMA32 per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 0
> CPU 1: hi: 186, btch: 31 usd: 0
> Active_anon:79646 active_file:2 inactive_anon:4113
> inactive_file:0 unevictable:0 dirty:0 writeback:0 unstable:0
> free:1966 slab:38417 mapped:2 pagetables:61720 bounce:0
> DMA free:3916kB min:60kB low:72kB high:88kB active_anon:3608kB inactive_anon:256kB active_file:0kB inactive_file:0kB unevictable:0kB present:15364kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 968 968 968
> DMA32 free:3948kB min:3948kB low:4932kB high:5920kB active_anon:314976kB inactive_anon:16196kB active_file:8kB inactive_file:0kB unevictable:0kB present:992032kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> DMA: 1*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3916kB
> DMA32: 443*4kB 20*8kB 10*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3948kB
> 36 total pagecache pages
> 0 pages in swap cache
> Swap cache stats: add 0, delete 0, find 0/0
> Free swap = 0kB
> Total swap = 0kB
> 255744 pages RAM
> 5588 pages reserved
> 151665 pages shared
> 220702 pages non-shared
> Out of memory: kill process 25166 (msgctl11) score 133404 or a child
> Killed process 28860 (msgctl11)
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dhowells at redhat

Jun 28, 2009, 12:55 AM

Post #9 of 65 (711 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

Johannes Weiner <hannes [at] cmpxchg> wrote:

> From: Johannes Weiner <hannes [at] cmpxchg>
> Subject: vmscan: keep balancing anon lists on swap-full conditions
>
> Page reclaim doesn't scan and balance the anon LRU lists when
> nr_swap_pages is zero to save the scan overhead for swapless systems.
>
> Unfortunately, this variable can reach zero when all present swap
> space is occupied as well and we don't want to stop balancing in that
> case or we encounter an unreclaimable mess of anon lists when swap
> space gets freed up and we are theoretically in the position to page
> out again.
>
> Use the total_swap_pages variable to have a better indicator when to
> scan the anon LRU lists.
>
> We still might have unbalanced anon lists when swap space is added
> during run time but it is a a less dynamic change in state and we
> still save the scanning overhead for CONFIG_SWAP systems that never
> actually set up swap space.
>
> Signed-off-by: Johannes Weiner <hannes [at] cmpxchg>

This doesn't help.

It may change the behaviour though: rather than locking up after a couple of
OOMs, it generated 42MB of OOM messages.

It didn't go wrong until its 5th pass through the LTP syscalls testsuite this
time. Attached is the first part of the log where OOM messages were generated.

David
---
msgctl11 invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0
msgctl11 cpuset=/ mems_allowed=0
Pid: 689, comm: msgctl11 Not tainted 2.6.31-rc1-cachefs #143
Call Trace:
[<ffffffff810718a2>] ? oom_kill_process.clone.0+0xa9/0x245
[<ffffffff81071b69>] ? __out_of_memory+0x12b/0x142
[<ffffffff81071bea>] ? out_of_memory+0x6a/0x94
[<ffffffff810742b4>] ? __alloc_pages_nodemask+0x42e/0x51d
[<ffffffff81090d86>] ? cache_alloc_refill+0x353/0x69c
[<ffffffff8106f20f>] ? find_get_page+0x1a/0x72
[<ffffffff810313e6>] ? copy_process+0x95/0x114f
[<ffffffff81091364>] ? kmem_cache_alloc+0x83/0xc5
[<ffffffff810313e6>] ? copy_process+0x95/0x114f
[<ffffffff810815da>] ? handle_mm_fault+0x2b9/0x62f
[<ffffffff810325df>] ? do_fork+0x13f/0x2ba
[<ffffffff81022c02>] ? do_page_fault+0x1f8/0x20d
[<ffffffff8100b0d3>] ? stub_clone+0x13/0x20
[<ffffffff8100ad6b>] ? system_call_fastpath+0x16/0x1b
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
CPU 1: hi: 0, btch: 1 usd: 0
DMA32 per-cpu:
CPU 0: hi: 186, btch: 31 usd: 62
CPU 1: hi: 186, btch: 31 usd: 0
Active_anon:71393 active_file:1 inactive_anon:4670
inactive_file:0 unevictable:0 dirty:11 writeback:0 unstable:0
free:3987 slab:38927 mapped:451 pagetables:58190 bounce:0
DMA free:3928kB min:60kB low:72kB high:88kB active_anon:3176kB inactive_anon:256kB active_file:0kB inactive_file:0kB unevictable:0kB present:15364kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 968 968 968
DMA32 free:12020kB min:3948kB low:4932kB high:5920kB active_anon:282396kB inactive_anon:18424kB active_file:4kB inactive_file:0kB unevictable:0kB present:992000kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 8*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3928kB
DMA32: 2367*4kB 71*8kB 10*16kB 1*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 12020kB
2342 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
255744 pages RAM
5597 pages reserved
230753 pages shared
216782 pages non-shared
Out of memory: kill process 30280 (msgctl11) score 161571 or a child
Killed process 31149 (msgctl11)
msgctl11 invoked oom-killer: gfp_mask=0xd0, order=1, oom_adj=0
msgctl11 cpuset=/ mems_allowed=0
Pid: 689, comm: msgctl11 Not tainted 2.6.31-rc1-cachefs #143
Call Trace:
[<ffffffff810718a2>] ? oom_kill_process.clone.0+0xa9/0x245
[<ffffffff81071b69>] ? __out_of_memory+0x12b/0x142
[<ffffffff81071bea>] ? out_of_memory+0x6a/0x94
[<ffffffff810742b4>] ? __alloc_pages_nodemask+0x42e/0x51d
[<ffffffff81090d86>] ? cache_alloc_refill+0x353/0x69c
[<ffffffff8106f20f>] ? find_get_page+0x1a/0x72
[<ffffffff810313e6>] ? copy_process+0x95/0x114f
[<ffffffff81091364>] ? kmem_cache_alloc+0x83/0xc5
[<ffffffff810313e6>] ? copy_process+0x95/0x114f
[<ffffffff810815da>] ? handle_mm_fault+0x2b9/0x62f
[<ffffffff810325df>] ? do_fork+0x13f/0x2ba
[<ffffffff81022c02>] ? do_page_fault+0x1f8/0x20d
[<ffffffff8100b0d3>] ? stub_clone+0x13/0x20
[<ffffffff8100ad6b>] ? system_call_fastpath+0x16/0x1b
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
CPU 1: hi: 0, btch: 1 usd: 0
DMA32 per-cpu:
CPU 0: hi: 186, btch: 31 usd: 0
CPU 1: hi: 186, btch: 31 usd: 0
Active_anon:75955 active_file:0 inactive_anon:4990
inactive_file:2 unevictable:0 dirty:0 writeback:0 unstable:0
free:1970 slab:38326 mapped:5 pagetables:59166 bounce:0
DMA free:3932kB min:60kB low:72kB high:88kB active_anon:3172kB inactive_anon:256kB active_file:0kB inactive_file:0kB unevictable:0kB present:15364kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 968 968 968
DMA32 free:3948kB min:3948kB low:4932kB high:5920kB active_anon:300648kB inactive_anon:19704kB active_file:0kB inactive_file:8kB unevictable:0kB present:992000kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 9*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3932kB
DMA32: 457*4kB 39*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3948kB
36 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
255744 pages RAM
5597 pages reserved
162238 pages shared
220698 pages non-shared
Out of memory: kill process 30280 (msgctl11) score 160654 or a child
Killed process 31155 (msgctl11)
msgctl11: page allocation failure. order:1, mode:0x20
Pid: 3095, comm: msgctl11 Not tainted 2.6.31-rc1-cachefs #143
Call Trace:
<IRQ> [<ffffffff8107435a>] ? __alloc_pages_nodemask+0x4d4/0x51d
[<ffffffff81090d86>] ? cache_alloc_refill+0x353/0x69c
[<ffffffff810734a4>] ? free_pages_bulk.clone.1+0x4d/0x20d
[<ffffffff81265935>] ? __alloc_skb+0x38/0x148
[<ffffffff81266512>] ? __netdev_alloc_skb+0x15/0x2f
[<ffffffff81091195>] ? __kmalloc_track_caller+0xc6/0x108
[<ffffffff8126595e>] ? __alloc_skb+0x61/0x148
[<ffffffff81266512>] ? __netdev_alloc_skb+0x15/0x2f
[<ffffffff8123f092>] ? e1000_clean_rx_irq+0x1ab/0x2de
[<ffffffff8124072f>] ? e1000_clean+0x71/0x20f
[<ffffffff81269cab>] ? net_rx_action+0x64/0x129
[<ffffffff8103b47d>] ? process_timeout+0x0/0xb
[<ffffffff810375d1>] ? __do_softirq+0x92/0x129
[<ffffffff8100be7c>] ? call_softirq+0x1c/0x28
[<ffffffff8100d824>] ? do_softirq+0x2c/0x68
[<ffffffff8100cf3b>] ? do_IRQ+0x9c/0xb2
[<ffffffff8100b713>] ? ret_from_intr+0x0/0xa
<EOI> [<ffffffff810791e9>] ? shrink_zone+0x1d6/0x30f
[<ffffffff810cec7d>] ? mb_cache_shrink_fn+0x26/0x115
[<ffffffff8118b977>] ? __up_read+0x13/0x90
[<ffffffff81079460>] ? shrink_slab+0x13e/0x150
[<ffffffff8107a004>] ? try_to_free_pages+0x20d/0x362
[<ffffffff8107760f>] ? isolate_pages_global+0x0/0x219
[<ffffffff810741d3>] ? __alloc_pages_nodemask+0x34d/0x51d
[<ffffffff81075f05>] ? __do_page_cache_readahead+0x9e/0x1a1
[<ffffffff81076024>] ? ra_submit+0x1c/0x20
[<ffffffff8106f9f4>] ? filemap_fault+0x18a/0x316
[<ffffffff8107f7cb>] ? __do_fault+0x54/0x3d6
[<ffffffff810815da>] ? handle_mm_fault+0x2b9/0x62f
[<ffffffff81022c02>] ? do_page_fault+0x1f8/0x20d
[<ffffffff812dfb7f>] ? page_fault+0x1f/0x30
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


fengguang.wu at intel

Jun 28, 2009, 4:32 AM

Post #10 of 65 (707 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Sat, Jun 27, 2009 at 08:54:12PM +0800, Johannes Weiner wrote:
> On Sat, Jun 27, 2009 at 08:12:49AM +0100, David Howells wrote:
> >
> > I've managed to bisect things to find the commit that causes the OOMs. It's:
> >
> > commit 69c854817566db82c362797b4a6521d0b00fe1d8
> > Author: MinChan Kim <minchan.kim [at] gmail>
> > Date: Tue Jun 16 15:32:44 2009 -0700
> >
> > vmscan: prevent shrinking of active anon lru list in case of no swap space V3
> >
> > shrink_zone() can deactivate active anon pages even if we don't have a
> > swap device. Many embedded products don't have a swap device. So the
> > deactivation of anon pages is unnecessary.
> >
> > This patch prevents unnecessary deactivation of anon lru pages. But, it
> > don't prevent aging of anon pages to swap out.
> >
> > Signed-off-by: Minchan Kim <minchan.kim [at] gmail>
> > Acked-by: KOSAKI Motohiro <kosaki.motohiro [at] jp>
> > Cc: Johannes Weiner <hannes [at] cmpxchg>
> > Acked-by: Rik van Riel <riel [at] redhat>
> > Signed-off-by: Andrew Morton <akpm [at] linux-foundation>
> > Signed-off-by: Linus Torvalds <torvalds [at] linux-foundation>
> >
> > This exhibits the problem. The previous commit:
> >
> > commit 35282a2de4e5e4e173ab61aa9d7015886021a821
> > Author: Brice Goglin <Brice.Goglin [at] ens-lyon>
> > Date: Tue Jun 16 15:32:43 2009 -0700
> >
> > migration: only migrate_prep() once per move_pages()
> >
> > survives 16 iterations of the LTP syscall testsuite without exhibiting the
> > problem.
>
> Here is the patch in question:
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 7592d8e..879d034 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1570,7 +1570,7 @@ static void shrink_zone(int priority, struct zone *zone,
> * Even if we did not try to evict anon pages at all, we want to
> * rebalance the anon lru active/inactive ratio.
> */
> - if (inactive_anon_is_low(zone, sc))
> + if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
> shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
>
> throttle_vm_writeout(sc->gfp_mask);
>
> When this was discussed, I think we missed that nr_swap_pages can
> actually get zero on swap systems as well and this should have been
> total_swap_pages - otherwise we also stop balancing the two anon lists
> when swap is _full_ which was not the intention of this change at all.

Exactly. In Jesse's OOM case, the swap is exhausted.
total_swap_pages is the better choice in this situation.

Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426766] Active_anon:290797 active_file:28 inactive_anon:97034
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426767] inactive_file:61 unevictable:11322 dirty:0 writeback:0 unstable:0
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426768] free:3341 slab:13776 mapped:5880 pagetables:6851 bounce:0
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426772] DMA free:7776kB min:40kB low:48kB high:60kB active_anon:556kB inactive_anon:524kB
+active_file:16kB inactive_file:0kB unevictable:0kB present:15340kB pages_scanned:30 all_unreclaimable? no
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426775] lowmem_reserve[]: 0 1935 1935 1935
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426781] DMA32 free:5588kB min:5608kB low:7008kB high:8412kB active_anon:1162632kB
+inactive_anon:387612kB active_file:96kB inactive_file:256kB unevictable:45288kB present:1982128kB pages_scanned:980
+all_unreclaimable? no
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426784] lowmem_reserve[]: 0 0 0 0
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426787] DMA: 64*4kB 77*8kB 45*16kB 18*32kB 4*64kB 2*128kB 2*256kB 3*512kB 1*1024kB
+1*2048kB 0*4096kB = 7800kB
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426796] DMA32: 871*4kB 149*8kB 1*16kB 2*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB
+0*2048kB 0*4096kB = 5588kB
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426804] 151250 total pagecache pages
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426806] 18973 pages in swap cache
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426808] Swap cache stats: add 610640, delete 591667, find 144356/181468
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426810] Free swap = 0kB
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426811] Total swap = 979956kB
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434828] 507136 pages RAM
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434831] 23325 pages reserved
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434832] 190892 pages shared
Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434833] 248816 pages non-shared


In David's OOM case, there are two symptoms:
1) 70000 unaccounted/leaked pages as found by Andrew
(plus rather big number of PG_buddy and pagetable pages)
2) almost zero active_file/inactive_file; small inactive_anon;
many slab and active_anon pages.

In the situation of (2), the slab cache is _under_ scanned. So David
got OOM when vmscan should have squeezed some free pages from the slab
cache. Which is one important side effect of MinChan's patch?

Thanks,
Fengguang

> [. There is another one hiding in shrink_zone() that does the same - it
> was moved from get_scan_ratio() and is pretty old but we still kept
> the inactive/active ratio halfway sane without MinChan's patch. ]
>
> This is from your OOM-run dmesg, David:
>
> Adding 32k swap on swapfile22. Priority:-21 extents:1 across:32k
> Adding 32k swap on swapfile23. Priority:-22 extents:1 across:32k
> Adding 32k swap on swapfile24. Priority:-23 extents:3 across:44k
> Adding 32k swap on swapfile25. Priority:-24 extents:1 across:32k
>
> So we actually have swap? Or are those removed again before the OOM?
>
> If not, I think we let the anon lists rot while swap is full and when
> some swap space gets freed up and we should be able to evict anon
> pages again, we don't find any candidates. The following patch should
> improve on that.
>
> If it's not true for your particular situation, I think we still need
> it for the scenario described above.
>
> ---
> From: Johannes Weiner <hannes [at] cmpxchg>
> Subject: vmscan: keep balancing anon lists on swap-full conditions
>
> Page reclaim doesn't scan and balance the anon LRU lists when
> nr_swap_pages is zero to save the scan overhead for swapless systems.
>
> Unfortunately, this variable can reach zero when all present swap
> space is occupied as well and we don't want to stop balancing in that
> case or we encounter an unreclaimable mess of anon lists when swap
> space gets freed up and we are theoretically in the position to page
> out again.
>
> Use the total_swap_pages variable to have a better indicator when to
> scan the anon LRU lists.
>
> We still might have unbalanced anon lists when swap space is added
> during run time but it is a a less dynamic change in state and we
> still save the scanning overhead for CONFIG_SWAP systems that never
> actually set up swap space.
>
> Signed-off-by: Johannes Weiner <hannes [at] cmpxchg>
> ---
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 5415526..5ea7fc3 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1524,7 +1524,7 @@ static void shrink_zone(int priority, struct zone *zone,
> int noswap = 0;
>
> /* If we have no swap space, do not bother scanning anon pages. */
> - if (!sc->may_swap || (nr_swap_pages <= 0)) {
> + if (!sc->may_swap || (total_swap_pages <= 0)) {
> noswap = 1;
> percent[0] = 0;
> percent[1] = 100;
> @@ -1578,7 +1578,7 @@ static void shrink_zone(int priority, struct zone *zone,
> * Even if we did not try to evict anon pages at all, we want to
> * rebalance the anon lru active/inactive ratio.
> */
> - if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
> + if (inactive_anon_is_low(zone, sc) && total_swap_pages > 0)
> shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
>
> throttle_vm_writeout(sc->gfp_mask);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Jun 28, 2009, 6:30 AM

Post #11 of 65 (704 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

HI, Wu.

On Sun, Jun 28, 2009 at 8:32 PM, Wu Fengguang<fengguang.wu [at] intel> wrote:
> On Sat, Jun 27, 2009 at 08:54:12PM +0800, Johannes Weiner wrote:
>> On Sat, Jun 27, 2009 at 08:12:49AM +0100, David Howells wrote:
>> >
>> > I've managed to bisect things to find the commit that causes the OOMs.  It's:
>> >
>> >     commit 69c854817566db82c362797b4a6521d0b00fe1d8
>> >     Author: MinChan Kim <minchan.kim [at] gmail>
>> >     Date:   Tue Jun 16 15:32:44 2009 -0700
>> >
>> >         vmscan: prevent shrinking of active anon lru list in case of no swap space V3
>> >
>> >         shrink_zone() can deactivate active anon pages even if we don't have a
>> >         swap device.  Many embedded products don't have a swap device.  So the
>> >         deactivation of anon pages is unnecessary.
>> >
>> >         This patch prevents unnecessary deactivation of anon lru pages.  But, it
>> >         don't prevent aging of anon pages to swap out.
>> >
>> >         Signed-off-by: Minchan Kim <minchan.kim [at] gmail>
>> >         Acked-by: KOSAKI Motohiro <kosaki.motohiro [at] jp>
>> >         Cc: Johannes Weiner <hannes [at] cmpxchg>
>> >         Acked-by: Rik van Riel <riel [at] redhat>
>> >         Signed-off-by: Andrew Morton <akpm [at] linux-foundation>
>> >         Signed-off-by: Linus Torvalds <torvalds [at] linux-foundation>
>> >
>> > This exhibits the problem.  The previous commit:
>> >
>> >     commit 35282a2de4e5e4e173ab61aa9d7015886021a821
>> >     Author: Brice Goglin <Brice.Goglin [at] ens-lyon>
>> >     Date:   Tue Jun 16 15:32:43 2009 -0700
>> >
>> >         migration: only migrate_prep() once per move_pages()
>> >
>> > survives 16 iterations of the LTP syscall testsuite without exhibiting the
>> > problem.
>>
>> Here is the patch in question:
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 7592d8e..879d034 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1570,7 +1570,7 @@ static void shrink_zone(int priority, struct zone *zone,
>>        * Even if we did not try to evict anon pages at all, we want to
>>        * rebalance the anon lru active/inactive ratio.
>>        */
>> -     if (inactive_anon_is_low(zone, sc))
>> +     if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
>>               shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
>>
>>       throttle_vm_writeout(sc->gfp_mask);
>>
>> When this was discussed, I think we missed that nr_swap_pages can
>> actually get zero on swap systems as well and this should have been
>> total_swap_pages - otherwise we also stop balancing the two anon lists
>> when swap is _full_ which was not the intention of this change at all.
>
> Exactly. In Jesse's OOM case, the swap is exhausted.
> total_swap_pages is the better choice in this situation.
>
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426766] Active_anon:290797 active_file:28 inactive_anon:97034
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426767]  inactive_file:61 unevictable:11322 dirty:0 writeback:0 unstable:0
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426768]  free:3341 slab:13776 mapped:5880 pagetables:6851 bounce:0
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426772] DMA free:7776kB min:40kB low:48kB high:60kB active_anon:556kB inactive_anon:524kB
> +active_file:16kB inactive_file:0kB unevictable:0kB present:15340kB pages_scanned:30 all_unreclaimable? no
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426775] lowmem_reserve[]: 0 1935 1935 1935
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426781] DMA32 free:5588kB min:5608kB low:7008kB high:8412kB active_anon:1162632kB
> +inactive_anon:387612kB active_file:96kB inactive_file:256kB unevictable:45288kB present:1982128kB pages_scanned:980
> +all_unreclaimable? no
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426784] lowmem_reserve[]: 0 0 0 0
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426787] DMA: 64*4kB 77*8kB 45*16kB 18*32kB 4*64kB 2*128kB 2*256kB 3*512kB 1*1024kB
> +1*2048kB 0*4096kB = 7800kB
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426796] DMA32: 871*4kB 149*8kB 1*16kB 2*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB
> +0*2048kB 0*4096kB = 5588kB
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426804] 151250 total pagecache pages
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426806] 18973 pages in swap cache
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426808] Swap cache stats: add 610640, delete 591667, find 144356/181468
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426810] Free swap  = 0kB
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426811] Total swap = 979956kB
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434828] 507136 pages RAM
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434831] 23325 pages reserved
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434832] 190892 pages shared
> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434833] 248816 pages non-shared
>
>
> In David's OOM case, there are two symptoms:
> 1) 70000 unaccounted/leaked pages as found by Andrew
>   (plus rather big number of PG_buddy and pagetable pages)
> 2) almost zero active_file/inactive_file; small inactive_anon;
>   many slab and active_anon pages.
>
> In the situation of (2), the slab cache is _under_ scanned. So David
> got OOM when vmscan should have squeezed some free pages from the slab
> cache. Which is one important side effect of MinChan's patch?

My patch's side effect is (2).

My guessing is following as.

1. The number of page scanned in shrink_slab is increased in shrink_page_list.
And it is doubled for mapped page or swapcache.
2. shrink_page_list is called by shrink_inactive_list
3. shrink_inactive_list is called by shrink_list

Look at the shrink_list.
If inactive lru list is low, it always call shrink_active_list not
shrink_inactive_list in case of anon.
It means it doesn't increased sc->nr_scanned.
Then shrink_slab can't shrink enough slab pages.
So, David OOM have a lot of slab pages and active anon pages.

Does it make sense ?
If it make sense, we have to change shrink_slab's pressure method.
What do you think ?


--
Kinds regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Jun 28, 2009, 6:36 AM

Post #12 of 65 (704 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Sun, Jun 28, 2009 at 10:30 PM, Minchan Kim<minchan.kim [at] gmail> wrote:
> HI, Wu.
>
> On Sun, Jun 28, 2009 at 8:32 PM, Wu Fengguang<fengguang.wu [at] intel> wrote:
>> On Sat, Jun 27, 2009 at 08:54:12PM +0800, Johannes Weiner wrote:
>>> On Sat, Jun 27, 2009 at 08:12:49AM +0100, David Howells wrote:
>>> >
>>> > I've managed to bisect things to find the commit that causes the OOMs.  It's:
>>> >
>>> >     commit 69c854817566db82c362797b4a6521d0b00fe1d8
>>> >     Author: MinChan Kim <minchan.kim [at] gmail>
>>> >     Date:   Tue Jun 16 15:32:44 2009 -0700
>>> >
>>> >         vmscan: prevent shrinking of active anon lru list in case of no swap space V3
>>> >
>>> >         shrink_zone() can deactivate active anon pages even if we don't have a
>>> >         swap device.  Many embedded products don't have a swap device.  So the
>>> >         deactivation of anon pages is unnecessary.
>>> >
>>> >         This patch prevents unnecessary deactivation of anon lru pages.  But, it
>>> >         don't prevent aging of anon pages to swap out.
>>> >
>>> >         Signed-off-by: Minchan Kim <minchan.kim [at] gmail>
>>> >         Acked-by: KOSAKI Motohiro <kosaki.motohiro [at] jp>
>>> >         Cc: Johannes Weiner <hannes [at] cmpxchg>
>>> >         Acked-by: Rik van Riel <riel [at] redhat>
>>> >         Signed-off-by: Andrew Morton <akpm [at] linux-foundation>
>>> >         Signed-off-by: Linus Torvalds <torvalds [at] linux-foundation>
>>> >
>>> > This exhibits the problem.  The previous commit:
>>> >
>>> >     commit 35282a2de4e5e4e173ab61aa9d7015886021a821
>>> >     Author: Brice Goglin <Brice.Goglin [at] ens-lyon>
>>> >     Date:   Tue Jun 16 15:32:43 2009 -0700
>>> >
>>> >         migration: only migrate_prep() once per move_pages()
>>> >
>>> > survives 16 iterations of the LTP syscall testsuite without exhibiting the
>>> > problem.
>>>
>>> Here is the patch in question:
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 7592d8e..879d034 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1570,7 +1570,7 @@ static void shrink_zone(int priority, struct zone *zone,
>>>        * Even if we did not try to evict anon pages at all, we want to
>>>        * rebalance the anon lru active/inactive ratio.
>>>        */
>>> -     if (inactive_anon_is_low(zone, sc))
>>> +     if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
>>>               shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
>>>
>>>       throttle_vm_writeout(sc->gfp_mask);
>>>
>>> When this was discussed, I think we missed that nr_swap_pages can
>>> actually get zero on swap systems as well and this should have been
>>> total_swap_pages - otherwise we also stop balancing the two anon lists
>>> when swap is _full_ which was not the intention of this change at all.
>>
>> Exactly. In Jesse's OOM case, the swap is exhausted.
>> total_swap_pages is the better choice in this situation.
>>
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426766] Active_anon:290797 active_file:28 inactive_anon:97034
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426767]  inactive_file:61 unevictable:11322 dirty:0 writeback:0 unstable:0
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426768]  free:3341 slab:13776 mapped:5880 pagetables:6851 bounce:0
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426772] DMA free:7776kB min:40kB low:48kB high:60kB active_anon:556kB inactive_anon:524kB
>> +active_file:16kB inactive_file:0kB unevictable:0kB present:15340kB pages_scanned:30 all_unreclaimable? no
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426775] lowmem_reserve[]: 0 1935 1935 1935
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426781] DMA32 free:5588kB min:5608kB low:7008kB high:8412kB active_anon:1162632kB
>> +inactive_anon:387612kB active_file:96kB inactive_file:256kB unevictable:45288kB present:1982128kB pages_scanned:980
>> +all_unreclaimable? no
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426784] lowmem_reserve[]: 0 0 0 0
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426787] DMA: 64*4kB 77*8kB 45*16kB 18*32kB 4*64kB 2*128kB 2*256kB 3*512kB 1*1024kB
>> +1*2048kB 0*4096kB = 7800kB
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426796] DMA32: 871*4kB 149*8kB 1*16kB 2*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB
>> +0*2048kB 0*4096kB = 5588kB
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426804] 151250 total pagecache pages
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426806] 18973 pages in swap cache
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426808] Swap cache stats: add 610640, delete 591667, find 144356/181468
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426810] Free swap  = 0kB
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426811] Total swap = 979956kB
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434828] 507136 pages RAM
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434831] 23325 pages reserved
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434832] 190892 pages shared
>> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434833] 248816 pages non-shared
>>
>>
>> In David's OOM case, there are two symptoms:
>> 1) 70000 unaccounted/leaked pages as found by Andrew
>>   (plus rather big number of PG_buddy and pagetable pages)
>> 2) almost zero active_file/inactive_file; small inactive_anon;
>>   many slab and active_anon pages.
>>
>> In the situation of (2), the slab cache is _under_ scanned. So David
>> got OOM when vmscan should have squeezed some free pages from the slab
>> cache. Which is one important side effect of MinChan's patch?
>
> My patch's side effect is (2).
>
> My guessing is following as.
>
> 1. The number of page scanned in shrink_slab is increased in shrink_page_list.
> And it is doubled for mapped page or swapcache.
> 2. shrink_page_list is called by shrink_inactive_list
> 3. shrink_inactive_list is called by shrink_list
>
> Look at the shrink_list.
> If inactive lru list is low, it always call shrink_active_list not
> shrink_inactive_list in case of anon.

I missed most important point.
My patch's side effect is that it keeps inactive anon's lru low.
So I think it is caused by my patch's side effect.

> It means it doesn't increased sc->nr_scanned.
> Then shrink_slab can't shrink enough slab pages.
> So, David OOM have a lot of slab pages and active anon pages.
>
> Does it make sense ?
> If it make sense, we have to change shrink_slab's pressure method.
> What do you think ?
>
>
> --
> Kinds regards,
> Minchan Kim
>



--
Kinds regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


fengguang.wu at intel

Jun 28, 2009, 7:22 AM

Post #13 of 65 (705 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Sun, Jun 28, 2009 at 09:36:49PM +0800, Minchan Kim wrote:
> On Sun, Jun 28, 2009 at 10:30 PM, Minchan Kim<minchan.kim [at] gmail> wrote:
> > HI, Wu.
> >
> > On Sun, Jun 28, 2009 at 8:32 PM, Wu Fengguang<fengguang.wu [at] intel> wrote:
> >> On Sat, Jun 27, 2009 at 08:54:12PM +0800, Johannes Weiner wrote:
> >>> On Sat, Jun 27, 2009 at 08:12:49AM +0100, David Howells wrote:
> >>> >
> >>> > I've managed to bisect things to find the commit that causes the OOMs.  It's:
> >>> >
> >>> >     commit 69c854817566db82c362797b4a6521d0b00fe1d8
> >>> >     Author: MinChan Kim <minchan.kim [at] gmail>
> >>> >     Date:   Tue Jun 16 15:32:44 2009 -0700
> >>> >
> >>> >         vmscan: prevent shrinking of active anon lru list in case of no swap space V3
> >>> >
> >>> >         shrink_zone() can deactivate active anon pages even if we don't have a
> >>> >         swap device.  Many embedded products don't have a swap device.  So the
> >>> >         deactivation of anon pages is unnecessary.
> >>> >
> >>> >         This patch prevents unnecessary deactivation of anon lru pages.  But, it
> >>> >         don't prevent aging of anon pages to swap out.
> >>> >
> >>> >         Signed-off-by: Minchan Kim <minchan.kim [at] gmail>
> >>> >         Acked-by: KOSAKI Motohiro <kosaki.motohiro [at] jp>
> >>> >         Cc: Johannes Weiner <hannes [at] cmpxchg>
> >>> >         Acked-by: Rik van Riel <riel [at] redhat>
> >>> >         Signed-off-by: Andrew Morton <akpm [at] linux-foundation>
> >>> >         Signed-off-by: Linus Torvalds <torvalds [at] linux-foundation>
> >>> >
> >>> > This exhibits the problem.  The previous commit:
> >>> >
> >>> >     commit 35282a2de4e5e4e173ab61aa9d7015886021a821
> >>> >     Author: Brice Goglin <Brice.Goglin [at] ens-lyon>
> >>> >     Date:   Tue Jun 16 15:32:43 2009 -0700
> >>> >
> >>> >         migration: only migrate_prep() once per move_pages()
> >>> >
> >>> > survives 16 iterations of the LTP syscall testsuite without exhibiting the
> >>> > problem.
> >>>
> >>> Here is the patch in question:
> >>>
> >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>> index 7592d8e..879d034 100644
> >>> --- a/mm/vmscan.c
> >>> +++ b/mm/vmscan.c
> >>> @@ -1570,7 +1570,7 @@ static void shrink_zone(int priority, struct zone *zone,
> >>>        * Even if we did not try to evict anon pages at all, we want to
> >>>        * rebalance the anon lru active/inactive ratio.
> >>>        */
> >>> -     if (inactive_anon_is_low(zone, sc))
> >>> +     if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
> >>>               shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
> >>>
> >>>       throttle_vm_writeout(sc->gfp_mask);
> >>>
> >>> When this was discussed, I think we missed that nr_swap_pages can
> >>> actually get zero on swap systems as well and this should have been
> >>> total_swap_pages - otherwise we also stop balancing the two anon lists
> >>> when swap is _full_ which was not the intention of this change at all.
> >>
> >> Exactly. In Jesse's OOM case, the swap is exhausted.
> >> total_swap_pages is the better choice in this situation.
> >>
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426766] Active_anon:290797 active_file:28 inactive_anon:97034
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426767]  inactive_file:61 unevictable:11322 dirty:0 writeback:0 unstable:0
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426768]  free:3341 slab:13776 mapped:5880 pagetables:6851 bounce:0
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426772] DMA free:7776kB min:40kB low:48kB high:60kB active_anon:556kB inactive_anon:524kB
> >> +active_file:16kB inactive_file:0kB unevictable:0kB present:15340kB pages_scanned:30 all_unreclaimable? no
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426775] lowmem_reserve[]: 0 1935 1935 1935
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426781] DMA32 free:5588kB min:5608kB low:7008kB high:8412kB active_anon:1162632kB
> >> +inactive_anon:387612kB active_file:96kB inactive_file:256kB unevictable:45288kB present:1982128kB pages_scanned:980
> >> +all_unreclaimable? no
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426784] lowmem_reserve[]: 0 0 0 0
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426787] DMA: 64*4kB 77*8kB 45*16kB 18*32kB 4*64kB 2*128kB 2*256kB 3*512kB 1*1024kB
> >> +1*2048kB 0*4096kB = 7800kB
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426796] DMA32: 871*4kB 149*8kB 1*16kB 2*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB
> >> +0*2048kB 0*4096kB = 5588kB
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426804] 151250 total pagecache pages
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426806] 18973 pages in swap cache
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426808] Swap cache stats: add 610640, delete 591667, find 144356/181468
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426810] Free swap  = 0kB
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.426811] Total swap = 979956kB
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434828] 507136 pages RAM
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434831] 23325 pages reserved
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434832] 190892 pages shared
> >> Jun 18 07:44:53 jbarnes-g45 kernel: [64377.434833] 248816 pages non-shared
> >>
> >>
> >> In David's OOM case, there are two symptoms:
> >> 1) 70000 unaccounted/leaked pages as found by Andrew
> >>   (plus rather big number of PG_buddy and pagetable pages)
> >> 2) almost zero active_file/inactive_file; small inactive_anon;
> >>   many slab and active_anon pages.
> >>
> >> In the situation of (2), the slab cache is _under_ scanned. So David
> >> got OOM when vmscan should have squeezed some free pages from the slab
> >> cache. Which is one important side effect of MinChan's patch?
> >
> > My patch's side effect is (2).
> >
> > My guessing is following as.
> >
> > 1. The number of page scanned in shrink_slab is increased in shrink_page_list.
> > And it is doubled for mapped page or swapcache.
> > 2. shrink_page_list is called by shrink_inactive_list
> > 3. shrink_inactive_list is called by shrink_list
> >
> > Look at the shrink_list.
> > If inactive lru list is low, it always call shrink_active_list not
> > shrink_inactive_list in case of anon.
>
> I missed most important point.
> My patch's side effect is that it keeps inactive anon's lru low.
> So I think it is caused by my patch's side effect.

Yes, smaller inactive_anon means smaller (pointless) nr_scanned,
and therefore less slab scans. Strictly speaking, it's not the fault
of your patch. It indicates that the slab scan ratio algorithm should
be updated too :)

We could refine the estimation of "reclaimable" pages like this:

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 416f748..e9c5b0e 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -167,14 +167,7 @@ static inline unsigned long zone_page_state(struct zone *zone,
}

extern unsigned long global_lru_pages(void);
-
-static inline unsigned long zone_lru_pages(struct zone *zone)
-{
- return (zone_page_state(zone, NR_ACTIVE_ANON)
- + zone_page_state(zone, NR_ACTIVE_FILE)
- + zone_page_state(zone, NR_INACTIVE_ANON)
- + zone_page_state(zone, NR_INACTIVE_FILE));
-}
+extern unsigned long zone_lru_pages(void);

#ifdef CONFIG_NUMA
/*
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 026f452..4281c6f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2123,10 +2123,31 @@ void wakeup_kswapd(struct zone *zone, int order)

unsigned long global_lru_pages(void)
{
- return global_page_state(NR_ACTIVE_ANON)
- + global_page_state(NR_ACTIVE_FILE)
- + global_page_state(NR_INACTIVE_ANON)
- + global_page_state(NR_INACTIVE_FILE);
+ int nr;
+
+ nr = global_page_state(zone, NR_ACTIVE_FILE) +
+ global_page_state(zone, NR_INACTIVE_FILE);
+
+ if (total_swap_pages)
+ nr += global_page_state(zone, NR_ACTIVE_ANON) +
+ global_page_state(zone, NR_INACTIVE_ANON);
+
+ return nr;
+}
+
+
+unsigned long zone_lru_pages(struct zone *zone)
+{
+ int nr;
+
+ nr = zone_page_state(zone, NR_ACTIVE_FILE) +
+ zone_page_state(zone, NR_INACTIVE_FILE);
+
+ if (total_swap_pages)
+ nr += zone_page_state(zone, NR_ACTIVE_ANON) +
+ zone_page_state(zone, NR_INACTIVE_ANON);
+
+ return nr;
}

#ifdef CONFIG_HIBERNATION
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kosaki.motohiro at jp

Jun 28, 2009, 7:49 AM

Post #14 of 65 (705 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

>> In David's OOM case, there are two symptoms:
>> 1) 70000 unaccounted/leaked pages as found by Andrew
>>   (plus rather big number of PG_buddy and pagetable pages)
>> 2) almost zero active_file/inactive_file; small inactive_anon;
>>   many slab and active_anon pages.
>>
>> In the situation of (2), the slab cache is _under_ scanned. So David
>> got OOM when vmscan should have squeezed some free pages from the slab
>> cache. Which is one important side effect of MinChan's patch?
>
> My patch's side effect is (2).
>
> My guessing is following as.
>
> 1. The number of page scanned in shrink_slab is increased in shrink_page_list.
> And it is doubled for mapped page or swapcache.
> 2. shrink_page_list is called by shrink_inactive_list
> 3. shrink_inactive_list is called by shrink_list
>
> Look at the shrink_list.
> If inactive lru list is low, it always call shrink_active_list not
> shrink_inactive_list in case of anon.
> It means it doesn't increased sc->nr_scanned.
> Then shrink_slab can't shrink enough slab pages.
> So, David OOM have a lot of slab pages and active anon pages.
>
> Does it make sense ?
> If it make sense, we have to change shrink_slab's pressure method.
> What do you think ?

I'm confused.

if system have no swap, get_scan_ratio() always return anon=0%.
Then, the numver of inactive_anon is not effect to sc.nr_scanned.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kosaki.motohiro at jp

Jun 28, 2009, 8:01 AM

Post #15 of 65 (706 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

> Yes, smaller inactive_anon means smaller (pointless) nr_scanned,
> and therefore less slab scans. Strictly speaking, it's not the fault
> of your patch. It indicates that the slab scan ratio algorithm should
> be updated too :)

I don't think this patch is related to minchan's patch.
but I think this patch is good.


> We could refine the estimation of "reclaimable" pages like this:

hmhm, reasonable idea.

>
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 416f748..e9c5b0e 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -167,14 +167,7 @@ static inline unsigned long zone_page_state(struct zone *zone,
>  }
>
>  extern unsigned long global_lru_pages(void);
> -
> -static inline unsigned long zone_lru_pages(struct zone *zone)
> -{
> -       return (zone_page_state(zone, NR_ACTIVE_ANON)
> -               + zone_page_state(zone, NR_ACTIVE_FILE)
> -               + zone_page_state(zone, NR_INACTIVE_ANON)
> -               + zone_page_state(zone, NR_INACTIVE_FILE));
> -}
> +extern unsigned long zone_lru_pages(void);
>
>  #ifdef CONFIG_NUMA
>  /*
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 026f452..4281c6f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2123,10 +2123,31 @@ void wakeup_kswapd(struct zone *zone, int order)
>
>  unsigned long global_lru_pages(void)
>  {
> -       return global_page_state(NR_ACTIVE_ANON)
> -               + global_page_state(NR_ACTIVE_FILE)
> -               + global_page_state(NR_INACTIVE_ANON)
> -               + global_page_state(NR_INACTIVE_FILE);
> +       int nr;
> +
> +       nr = global_page_state(zone, NR_ACTIVE_FILE) +
> +            global_page_state(zone, NR_INACTIVE_FILE);
> +
> +       if (total_swap_pages)
> +               nr += global_page_state(zone, NR_ACTIVE_ANON) +
> +                     global_page_state(zone, NR_INACTIVE_ANON);
> +
> +       return nr;
> +}

Please change function name too.
Now, this function only account reclaimable pages.

Plus, total_swap_pages is bad. if we need to concern "reclaimable
pages", we should use nr_swap_pages.
I mean, swap-full also makes anon is unreclaimable althouth system
have sone swap device.



> +
> +
> +unsigned long zone_lru_pages(struct zone *zone)
> +{
> +       int nr;
> +
> +       nr = zone_page_state(zone, NR_ACTIVE_FILE) +
> +            zone_page_state(zone, NR_INACTIVE_FILE);
> +
> +       if (total_swap_pages)
> +               nr += zone_page_state(zone, NR_ACTIVE_ANON) +
> +                     zone_page_state(zone, NR_INACTIVE_ANON);
> +
> +       return nr;
>  }
>
>  #ifdef CONFIG_HIBERNATION
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo [at] kvack  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont [at] kvack"> email [at] kvack </a>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


fengguang.wu at intel

Jun 28, 2009, 8:04 AM

Post #16 of 65 (704 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Sun, Jun 28, 2009 at 10:49:52PM +0800, KOSAKI Motohiro wrote:
> >> In David's OOM case, there are two symptoms:
> >> 1) 70000 unaccounted/leaked pages as found by Andrew
> >>   (plus rather big number of PG_buddy and pagetable pages)
> >> 2) almost zero active_file/inactive_file; small inactive_anon;
> >>   many slab and active_anon pages.
> >>
> >> In the situation of (2), the slab cache is _under_ scanned. So David
> >> got OOM when vmscan should have squeezed some free pages from the slab
> >> cache. Which is one important side effect of MinChan's patch?
> >
> > My patch's side effect is (2).
> >
> > My guessing is following as.
> >
> > 1. The number of page scanned in shrink_slab is increased in shrink_page_list.
> > And it is doubled for mapped page or swapcache.
> > 2. shrink_page_list is called by shrink_inactive_list
> > 3. shrink_inactive_list is called by shrink_list
> >
> > Look at the shrink_list.
> > If inactive lru list is low, it always call shrink_active_list not
> > shrink_inactive_list in case of anon.
> > It means it doesn't increased sc->nr_scanned.
> > Then shrink_slab can't shrink enough slab pages.
> > So, David OOM have a lot of slab pages and active anon pages.
> >
> > Does it make sense ?
> > If it make sense, we have to change shrink_slab's pressure method.
> > What do you think ?
>
> I'm confused.
>
> if system have no swap, get_scan_ratio() always return anon=0%.
> Then, the numver of inactive_anon is not effect to sc.nr_scanned.

You are right. Hehe, so that's not a real side effect.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


fengguang.wu at intel

Jun 28, 2009, 8:10 AM

Post #17 of 65 (705 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Sun, Jun 28, 2009 at 11:01:40PM +0800, KOSAKI Motohiro wrote:
> > Yes, smaller inactive_anon means smaller (pointless) nr_scanned,
> > and therefore less slab scans. Strictly speaking, it's not the fault
> > of your patch. It indicates that the slab scan ratio algorithm should
> > be updated too :)
>
> I don't think this patch is related to minchan's patch.
> but I think this patch is good.

OK.

>
> > We could refine the estimation of "reclaimable" pages like this:
>
> hmhm, reasonable idea.

Thank you.

> >
> > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> > index 416f748..e9c5b0e 100644
> > --- a/include/linux/vmstat.h
> > +++ b/include/linux/vmstat.h
> > @@ -167,14 +167,7 @@ static inline unsigned long zone_page_state(struct zone *zone,
> >  }
> >
> >  extern unsigned long global_lru_pages(void);
> > -
> > -static inline unsigned long zone_lru_pages(struct zone *zone)
> > -{
> > -       return (zone_page_state(zone, NR_ACTIVE_ANON)
> > -               + zone_page_state(zone, NR_ACTIVE_FILE)
> > -               + zone_page_state(zone, NR_INACTIVE_ANON)
> > -               + zone_page_state(zone, NR_INACTIVE_FILE));
> > -}
> > +extern unsigned long zone_lru_pages(void);
> >
> >  #ifdef CONFIG_NUMA
> >  /*
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 026f452..4281c6f 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2123,10 +2123,31 @@ void wakeup_kswapd(struct zone *zone, int order)
> >
> >  unsigned long global_lru_pages(void)
> >  {
> > -       return global_page_state(NR_ACTIVE_ANON)
> > -               + global_page_state(NR_ACTIVE_FILE)
> > -               + global_page_state(NR_INACTIVE_ANON)
> > -               + global_page_state(NR_INACTIVE_FILE);
> > +       int nr;
> > +
> > +       nr = global_page_state(zone, NR_ACTIVE_FILE) +
> > +            global_page_state(zone, NR_INACTIVE_FILE);
> > +
> > +       if (total_swap_pages)
> > +               nr += global_page_state(zone, NR_ACTIVE_ANON) +
> > +                     global_page_state(zone, NR_INACTIVE_ANON);
> > +
> > +       return nr;
> > +}
>
> Please change function name too.
> Now, this function only account reclaimable pages.

Good suggestion - I did considered renaming them to *_relaimable_pages.

> Plus, total_swap_pages is bad. if we need to concern "reclaimable
> pages", we should use nr_swap_pages.

> I mean, swap-full also makes anon is unreclaimable althouth system
> have sone swap device.

Right, changed to (nr_swap_pages > 0).

Thanks,
Fengguang
---

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 416f748..8d8aa20 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -166,15 +166,8 @@ static inline unsigned long zone_page_state(struct zone *zone,
return x;
}

-extern unsigned long global_lru_pages(void);
-
-static inline unsigned long zone_lru_pages(struct zone *zone)
-{
- return (zone_page_state(zone, NR_ACTIVE_ANON)
- + zone_page_state(zone, NR_ACTIVE_FILE)
- + zone_page_state(zone, NR_INACTIVE_ANON)
- + zone_page_state(zone, NR_INACTIVE_FILE));
-}
+extern unsigned long global_reclaimable_pages(void);
+extern unsigned long zone_reclaimable_pages(void);

#ifdef CONFIG_NUMA
/*
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index a91b870..74c3067 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -394,7 +394,8 @@ static unsigned long highmem_dirtyable_memory(unsigned long total)
struct zone *z =
&NODE_DATA(node)->node_zones[ZONE_HIGHMEM];

- x += zone_page_state(z, NR_FREE_PAGES) + zone_lru_pages(z);
+ x += zone_page_state(z, NR_FREE_PAGES) +
+ zone_reclaimable_pages(z);
}
/*
* Make sure that the number of highmem pages is never larger
@@ -418,7 +419,7 @@ unsigned long determine_dirtyable_memory(void)
{
unsigned long x;

- x = global_page_state(NR_FREE_PAGES) + global_lru_pages();
+ x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();

if (!vm_highmem_is_dirtyable)
x -= highmem_dirtyable_memory(x);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 026f452..3768332 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1693,7 +1693,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
continue;

- lru_pages += zone_lru_pages(zone);
+ lru_pages += zone_reclaimable_pages(zone);
}
}

@@ -1910,7 +1910,7 @@ loop_again:
for (i = 0; i <= end_zone; i++) {
struct zone *zone = pgdat->node_zones + i;

- lru_pages += zone_lru_pages(zone);
+ lru_pages += zone_reclaimable_pages(zone);
}

/*
@@ -1954,7 +1954,7 @@ loop_again:
if (zone_is_all_unreclaimable(zone))
continue;
if (nr_slab == 0 && zone->pages_scanned >=
- (zone_lru_pages(zone) * 6))
+ (zone_reclaimable_pages(zone) * 6))
zone_set_flag(zone,
ZONE_ALL_UNRECLAIMABLE);
/*
@@ -2121,12 +2121,33 @@ void wakeup_kswapd(struct zone *zone, int order)
wake_up_interruptible(&pgdat->kswapd_wait);
}

-unsigned long global_lru_pages(void)
+unsigned long global_reclaimable_pages(void)
{
- return global_page_state(NR_ACTIVE_ANON)
- + global_page_state(NR_ACTIVE_FILE)
- + global_page_state(NR_INACTIVE_ANON)
- + global_page_state(NR_INACTIVE_FILE);
+ int nr;
+
+ nr = global_page_state(zone, NR_ACTIVE_FILE) +
+ global_page_state(zone, NR_INACTIVE_FILE);
+
+ if (total_swap_pages)
+ nr += global_page_state(zone, NR_ACTIVE_ANON) +
+ global_page_state(zone, NR_INACTIVE_ANON);
+
+ return nr;
+}
+
+
+unsigned long zone_reclaimable_pages(struct zone *zone)
+{
+ int nr;
+
+ nr = zone_page_state(zone, NR_ACTIVE_FILE) +
+ zone_page_state(zone, NR_INACTIVE_FILE);
+
+ if (nr_swap_pages > 0)
+ nr += zone_page_state(zone, NR_ACTIVE_ANON) +
+ zone_page_state(zone, NR_INACTIVE_ANON);
+
+ return nr;
}

#ifdef CONFIG_HIBERNATION
@@ -2198,7 +2219,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)

current->reclaim_state = &reclaim_state;

- lru_pages = global_lru_pages();
+ lru_pages = global_reclaimable_pages();
nr_slab = global_page_state(NR_SLAB_RECLAIMABLE);
/* If slab caches are huge, it's better to hit them first */
while (nr_slab >= lru_pages) {
@@ -2240,7 +2261,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)

reclaim_state.reclaimed_slab = 0;
shrink_slab(sc.nr_scanned, sc.gfp_mask,
- global_lru_pages());
+ global_reclaimable_pages());
sc.nr_reclaimed += reclaim_state.reclaimed_slab;
if (sc.nr_reclaimed >= nr_pages)
goto out;
@@ -2257,7 +2278,8 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
if (!sc.nr_reclaimed) {
do {
reclaim_state.reclaimed_slab = 0;
- shrink_slab(nr_pages, sc.gfp_mask, global_lru_pages());
+ shrink_slab(nr_pages, sc.gfp_mask,
+ global_reclaimable_pages());
sc.nr_reclaimed += reclaim_state.reclaimed_slab;
} while (sc.nr_reclaimed < nr_pages &&
reclaim_state.reclaimed_slab > 0);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Jun 28, 2009, 9:47 AM

Post #18 of 65 (711 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Sun, Jun 28, 2009 at 11:49 PM, KOSAKI
Motohiro<kosaki.motohiro [at] jp> wrote:
>>> In David's OOM case, there are two symptoms:
>>> 1) 70000 unaccounted/leaked pages as found by Andrew
>>>   (plus rather big number of PG_buddy and pagetable pages)
>>> 2) almost zero active_file/inactive_file; small inactive_anon;
>>>   many slab and active_anon pages.
>>>
>>> In the situation of (2), the slab cache is _under_ scanned. So David
>>> got OOM when vmscan should have squeezed some free pages from the slab
>>> cache. Which is one important side effect of MinChan's patch?
>>
>> My patch's side effect is (2).
>>
>> My guessing is following as.
>>
>> 1. The number of page scanned in shrink_slab is increased in shrink_page_list.
>> And it is doubled for mapped page or swapcache.
>> 2. shrink_page_list is called by shrink_inactive_list
>> 3. shrink_inactive_list is called by shrink_list
>>
>> Look at the shrink_list.
>> If inactive lru list is low, it always call shrink_active_list not
>> shrink_inactive_list in case of anon.
>> It means it doesn't increased sc->nr_scanned.
>> Then shrink_slab can't shrink enough slab pages.
>> So, David OOM have a lot of slab pages and active anon pages.
>>
>> Does it make sense ?
>> If it make sense, we have to change shrink_slab's pressure method.
>> What do you think ?
>
> I'm confused.
>
> if system have no swap, get_scan_ratio() always return anon=0%.
> Then, the numver of inactive_anon is not effect to sc.nr_scanned.
>

My patch isn't a concern since the number of anon lru list(active +
anon) always same. I mean shrink_slab's lru_pages is same whether my
patch there is. OOM or Pass depends on sc->nr_scanned, I think.

Why I think it is my patch's side effect is follow as.

Compared to old behavior, my patch can change balancing of anon lru
list when "swap file" is full as Hannes already pointed me out.

It can affect reclaimable anon pages while David is going on swap test on LTP.
When swap file test is end, pages on swap file is inserted anon lru list, again.

My patch can change physical location of anon pages on ram compared to old.

From now on, we have no swap file so that we can reclaim only file pages.
But we have missed one thing. lumpy reclaim!. (In fact, we should not
reclaim anon pages in no swap space. A few days ago, I sended patch
about this problem. http://patchwork.kernel.org/patch/32651/)

It can reclaim anon pages although we have no swap file.
But after all, shrink_page_list can't reclaim anon pages. But it
increases sc->nr_scanned.

So I think whether Shrink_slab can reclaim enough or not depends on
sc->nr_scanned.

David's problem is very subtle.

1. If lumpy picks up the anon pages, it can pass LTP since
sc->nr_scanned is increased.
2. If lumpy don't pick up the anon pages, it can meet OOM since
sc->nr_scanned is almost zero or very small.

Unfortunately, my patch seems to change physical location of pages on
ram compared to old so that it selects 2.

It's my imaginary novel.

Okay. I believe Wu's patch will solve David's problem.
David. Could you test with Wu's patch ?

--
Kinds regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Jun 28, 2009, 9:50 AM

Post #19 of 65 (701 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

Looks good.

David, Can you test with this patch ?

On Mon, Jun 29, 2009 at 12:10 AM, Wu Fengguang<fengguang.wu [at] intel> wrote:
> On Sun, Jun 28, 2009 at 11:01:40PM +0800, KOSAKI Motohiro wrote:
>> > Yes, smaller inactive_anon means smaller (pointless) nr_scanned,
>> > and therefore less slab scans. Strictly speaking, it's not the fault
>> > of your patch. It indicates that the slab scan ratio algorithm should
>> > be updated too :)
>>
>> I don't think this patch is related to minchan's patch.
>> but I think this patch is good.
>
> OK.
>
>>
>> > We could refine the estimation of "reclaimable" pages like this:
>>
>> hmhm, reasonable idea.
>
> Thank you.
>
>> >
>> > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
>> > index 416f748..e9c5b0e 100644
>> > --- a/include/linux/vmstat.h
>> > +++ b/include/linux/vmstat.h
>> > @@ -167,14 +167,7 @@ static inline unsigned long zone_page_state(struct zone *zone,
>> >  }
>> >
>> >  extern unsigned long global_lru_pages(void);
>> > -
>> > -static inline unsigned long zone_lru_pages(struct zone *zone)
>> > -{
>> > -       return (zone_page_state(zone, NR_ACTIVE_ANON)
>> > -               + zone_page_state(zone, NR_ACTIVE_FILE)
>> > -               + zone_page_state(zone, NR_INACTIVE_ANON)
>> > -               + zone_page_state(zone, NR_INACTIVE_FILE));
>> > -}
>> > +extern unsigned long zone_lru_pages(void);
>> >
>> >  #ifdef CONFIG_NUMA
>> >  /*
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index 026f452..4281c6f 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -2123,10 +2123,31 @@ void wakeup_kswapd(struct zone *zone, int order)
>> >
>> >  unsigned long global_lru_pages(void)
>> >  {
>> > -       return global_page_state(NR_ACTIVE_ANON)
>> > -               + global_page_state(NR_ACTIVE_FILE)
>> > -               + global_page_state(NR_INACTIVE_ANON)
>> > -               + global_page_state(NR_INACTIVE_FILE);
>> > +       int nr;
>> > +
>> > +       nr = global_page_state(zone, NR_ACTIVE_FILE) +
>> > +            global_page_state(zone, NR_INACTIVE_FILE);
>> > +
>> > +       if (total_swap_pages)
>> > +               nr += global_page_state(zone, NR_ACTIVE_ANON) +
>> > +                     global_page_state(zone, NR_INACTIVE_ANON);
>> > +
>> > +       return nr;
>> > +}
>>
>> Please change function name too.
>> Now, this function only account reclaimable pages.
>
> Good suggestion - I did considered renaming them to *_relaimable_pages.
>
>> Plus, total_swap_pages is bad. if we need to concern "reclaimable
>> pages", we should use nr_swap_pages.
>
>> I mean, swap-full also makes anon is unreclaimable althouth system
>> have sone swap device.
>
> Right, changed to (nr_swap_pages > 0).
>
> Thanks,
> Fengguang
> ---
>
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 416f748..8d8aa20 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -166,15 +166,8 @@ static inline unsigned long zone_page_state(struct zone *zone,
>        return x;
>  }
>
> -extern unsigned long global_lru_pages(void);
> -
> -static inline unsigned long zone_lru_pages(struct zone *zone)
> -{
> -       return (zone_page_state(zone, NR_ACTIVE_ANON)
> -               + zone_page_state(zone, NR_ACTIVE_FILE)
> -               + zone_page_state(zone, NR_INACTIVE_ANON)
> -               + zone_page_state(zone, NR_INACTIVE_FILE));
> -}
> +extern unsigned long global_reclaimable_pages(void);
> +extern unsigned long zone_reclaimable_pages(void);
>
>  #ifdef CONFIG_NUMA
>  /*
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index a91b870..74c3067 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -394,7 +394,8 @@ static unsigned long highmem_dirtyable_memory(unsigned long total)
>                struct zone *z =
>                        &NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
>
> -               x += zone_page_state(z, NR_FREE_PAGES) + zone_lru_pages(z);
> +               x += zone_page_state(z, NR_FREE_PAGES) +
> +                    zone_reclaimable_pages(z);
>        }
>        /*
>         * Make sure that the number of highmem pages is never larger
> @@ -418,7 +419,7 @@ unsigned long determine_dirtyable_memory(void)
>  {
>        unsigned long x;
>
> -       x = global_page_state(NR_FREE_PAGES) + global_lru_pages();
> +       x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
>
>        if (!vm_highmem_is_dirtyable)
>                x -= highmem_dirtyable_memory(x);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 026f452..3768332 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1693,7 +1693,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
>                        if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
>                                continue;
>
> -                       lru_pages += zone_lru_pages(zone);
> +                       lru_pages += zone_reclaimable_pages(zone);
>                }
>        }
>
> @@ -1910,7 +1910,7 @@ loop_again:
>                for (i = 0; i <= end_zone; i++) {
>                        struct zone *zone = pgdat->node_zones + i;
>
> -                       lru_pages += zone_lru_pages(zone);
> +                       lru_pages += zone_reclaimable_pages(zone);
>                }
>
>                /*
> @@ -1954,7 +1954,7 @@ loop_again:
>                        if (zone_is_all_unreclaimable(zone))
>                                continue;
>                        if (nr_slab == 0 && zone->pages_scanned >=
> -                                               (zone_lru_pages(zone) * 6))
> +                                       (zone_reclaimable_pages(zone) * 6))
>                                        zone_set_flag(zone,
>                                                      ZONE_ALL_UNRECLAIMABLE);
>                        /*
> @@ -2121,12 +2121,33 @@ void wakeup_kswapd(struct zone *zone, int order)
>        wake_up_interruptible(&pgdat->kswapd_wait);
>  }
>
> -unsigned long global_lru_pages(void)
> +unsigned long global_reclaimable_pages(void)
>  {
> -       return global_page_state(NR_ACTIVE_ANON)
> -               + global_page_state(NR_ACTIVE_FILE)
> -               + global_page_state(NR_INACTIVE_ANON)
> -               + global_page_state(NR_INACTIVE_FILE);
> +       int nr;
> +
> +       nr = global_page_state(zone, NR_ACTIVE_FILE) +
> +            global_page_state(zone, NR_INACTIVE_FILE);
> +
> +       if (total_swap_pages)
> +               nr += global_page_state(zone, NR_ACTIVE_ANON) +
> +                     global_page_state(zone, NR_INACTIVE_ANON);
> +
> +       return nr;
> +}
> +
> +
> +unsigned long zone_reclaimable_pages(struct zone *zone)
> +{
> +       int nr;
> +
> +       nr = zone_page_state(zone, NR_ACTIVE_FILE) +
> +            zone_page_state(zone, NR_INACTIVE_FILE);
> +
> +       if (nr_swap_pages > 0)
> +               nr += zone_page_state(zone, NR_ACTIVE_ANON) +
> +                     zone_page_state(zone, NR_INACTIVE_ANON);
> +
> +       return nr;
>  }
>
>  #ifdef CONFIG_HIBERNATION
> @@ -2198,7 +2219,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>
>        current->reclaim_state = &reclaim_state;
>
> -       lru_pages = global_lru_pages();
> +       lru_pages = global_reclaimable_pages();
>        nr_slab = global_page_state(NR_SLAB_RECLAIMABLE);
>        /* If slab caches are huge, it's better to hit them first */
>        while (nr_slab >= lru_pages) {
> @@ -2240,7 +2261,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>
>                        reclaim_state.reclaimed_slab = 0;
>                        shrink_slab(sc.nr_scanned, sc.gfp_mask,
> -                                       global_lru_pages());
> +                                   global_reclaimable_pages());
>                        sc.nr_reclaimed += reclaim_state.reclaimed_slab;
>                        if (sc.nr_reclaimed >= nr_pages)
>                                goto out;
> @@ -2257,7 +2278,8 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>        if (!sc.nr_reclaimed) {
>                do {
>                        reclaim_state.reclaimed_slab = 0;
> -                       shrink_slab(nr_pages, sc.gfp_mask, global_lru_pages());
> +                       shrink_slab(nr_pages, sc.gfp_mask,
> +                                   global_reclaimable_pages());
>                        sc.nr_reclaimed += reclaim_state.reclaimed_slab;
>                } while (sc.nr_reclaimed < nr_pages &&
>                                reclaim_state.reclaimed_slab > 0);
>



--
Kinds regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Jun 28, 2009, 9:53 AM

Post #20 of 65 (699 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Sun, Jun 28, 2009 at 12:36 AM, Johannes Weiner<hannes [at] cmpxchg> wrote:
> On Sat, Jun 27, 2009 at 10:50:25PM +0900, Minchan Kim wrote:
>> Hi, Hannes.
>>
>> On Sat, Jun 27, 2009 at 9:54 PM, Johannes Weiner<hannes [at] cmpxchg> wrote:
>> > On Sat, Jun 27, 2009 at 08:12:49AM +0100, David Howells wrote:
>> >>
>> >> I've managed to bisect things to find the commit that causes the OOMs.  It's:
>> >>
>> >>       commit 69c854817566db82c362797b4a6521d0b00fe1d8
>> >>       Author: MinChan Kim <minchan.kim [at] gmail>
>> >>       Date:   Tue Jun 16 15:32:44 2009 -0700
>> >>
>> >>           vmscan: prevent shrinking of active anon lru list in case of no swap space V3
>> >>
>> >>           shrink_zone() can deactivate active anon pages even if we don't have a
>> >>           swap device.  Many embedded products don't have a swap device.  So the
>> >>           deactivation of anon pages is unnecessary.
>> >>
>> >>           This patch prevents unnecessary deactivation of anon lru pages.  But, it
>> >>           don't prevent aging of anon pages to swap out.
>> >>
>> >>           Signed-off-by: Minchan Kim <minchan.kim [at] gmail>
>> >>           Acked-by: KOSAKI Motohiro <kosaki.motohiro [at] jp>
>> >>           Cc: Johannes Weiner <hannes [at] cmpxchg>
>> >>           Acked-by: Rik van Riel <riel [at] redhat>
>> >>           Signed-off-by: Andrew Morton <akpm [at] linux-foundation>
>> >>           Signed-off-by: Linus Torvalds <torvalds [at] linux-foundation>
>> >>
>> >> This exhibits the problem.  The previous commit:
>> >>
>> >>       commit 35282a2de4e5e4e173ab61aa9d7015886021a821
>> >>       Author: Brice Goglin <Brice.Goglin [at] ens-lyon>
>> >>       Date:   Tue Jun 16 15:32:43 2009 -0700
>> >>
>> >>           migration: only migrate_prep() once per move_pages()
>> >>
>> >> survives 16 iterations of the LTP syscall testsuite without exhibiting the
>> >> problem.
>> >
>> > Here is the patch in question:
>> >
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index 7592d8e..879d034 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -1570,7 +1570,7 @@ static void shrink_zone(int priority, struct zone *zone,
>> >         * Even if we did not try to evict anon pages at all, we want to
>> >         * rebalance the anon lru active/inactive ratio.
>> >         */
>> > -       if (inactive_anon_is_low(zone, sc))
>> > +       if (inactive_anon_is_low(zone, sc) && nr_swap_pages > 0)
>> >                shrink_active_list(SWAP_CLUSTER_MAX, zone, sc, priority, 0);
>> >
>> >        throttle_vm_writeout(sc->gfp_mask);
>> >
>> > When this was discussed, I think we missed that nr_swap_pages can
>> > actually get zero on swap systems as well and this should have been
>> > total_swap_pages - otherwise we also stop balancing the two anon lists
>> > when swap is _full_ which was not the intention of this change at all.
>>
>> At that time we considered it so that we didn't prevent anon list
>> aging for background reclaim.
>> Do you think it is not enough ?
>
> With a heavy multiprocess anon load, direct reclaimers will likely
> reuse the reclaimed pages for anon mappings, so you have a handful of
> processes shuffling pages on the active list and only one thread that
> tries to balance.  I can imagine that it can not keep up for long.

I agree. :)
total_swap_pages is better than nr_swap_pages although it isn't
related this problem.


>



--
Kinds regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Jun 28, 2009, 5:17 PM

Post #21 of 65 (691 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Sun, 28 Jun 2009 23:10:26 +0800
Wu Fengguang <fengguang.wu [at] intel> wrote:

> On Sun, Jun 28, 2009 at 11:01:40PM +0800, KOSAKI Motohiro wrote:
> > > Yes, smaller inactive_anon means smaller (pointless) nr_scanned,
> > > and therefore less slab scans. Strictly speaking, it's not the fault
> > > of your patch. It indicates that the slab scan ratio algorithm should
> > > be updated too :)
> >
> > I don't think this patch is related to minchan's patch.
> > but I think this patch is good.
>
> OK.
>
> >
> > > We could refine the estimation of "reclaimable" pages like this:
> >
> > hmhm, reasonable idea.
>
> Thank you.
>
> > >
> > > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> > > index 416f748..e9c5b0e 100644
> > > --- a/include/linux/vmstat.h
> > > +++ b/include/linux/vmstat.h
> > > @@ -167,14 +167,7 @@ static inline unsigned long zone_page_state(struct zone *zone,
> > >  }
> > >
> > >  extern unsigned long global_lru_pages(void);
> > > -
> > > -static inline unsigned long zone_lru_pages(struct zone *zone)
> > > -{
> > > -       return (zone_page_state(zone, NR_ACTIVE_ANON)
> > > -               + zone_page_state(zone, NR_ACTIVE_FILE)
> > > -               + zone_page_state(zone, NR_INACTIVE_ANON)
> > > -               + zone_page_state(zone, NR_INACTIVE_FILE));
> > > -}
> > > +extern unsigned long zone_lru_pages(void);
> > >
> > >  #ifdef CONFIG_NUMA
> > >  /*
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 026f452..4281c6f 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -2123,10 +2123,31 @@ void wakeup_kswapd(struct zone *zone, int order)
> > >
> > >  unsigned long global_lru_pages(void)
> > >  {
> > > -       return global_page_state(NR_ACTIVE_ANON)
> > > -               + global_page_state(NR_ACTIVE_FILE)
> > > -               + global_page_state(NR_INACTIVE_ANON)
> > > -               + global_page_state(NR_INACTIVE_FILE);
> > > +       int nr;
> > > +
> > > +       nr = global_page_state(zone, NR_ACTIVE_FILE) +
> > > +            global_page_state(zone, NR_INACTIVE_FILE);
> > > +
> > > +       if (total_swap_pages)
> > > +               nr += global_page_state(zone, NR_ACTIVE_ANON) +
> > > +                     global_page_state(zone, NR_INACTIVE_ANON);
> > > +
> > > +       return nr;
> > > +}
> >
> > Please change function name too.
> > Now, this function only account reclaimable pages.
>
> Good suggestion - I did considered renaming them to *_relaimable_pages.
>
> > Plus, total_swap_pages is bad. if we need to concern "reclaimable
> > pages", we should use nr_swap_pages.
>
> > I mean, swap-full also makes anon is unreclaimable althouth system
> > have sone swap device.
>
> Right, changed to (nr_swap_pages > 0).
>
> Thanks,
> Fengguang
> ---
>
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 416f748..8d8aa20 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -166,15 +166,8 @@ static inline unsigned long zone_page_state(struct zone *zone,
> return x;
> }
>
> -extern unsigned long global_lru_pages(void);
> -
> -static inline unsigned long zone_lru_pages(struct zone *zone)
> -{
> - return (zone_page_state(zone, NR_ACTIVE_ANON)
> - + zone_page_state(zone, NR_ACTIVE_FILE)
> - + zone_page_state(zone, NR_INACTIVE_ANON)
> - + zone_page_state(zone, NR_INACTIVE_FILE));
> -}
> +extern unsigned long global_reclaimable_pages(void);
> +extern unsigned long zone_reclaimable_pages(void);
>
> #ifdef CONFIG_NUMA
> /*
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index a91b870..74c3067 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -394,7 +394,8 @@ static unsigned long highmem_dirtyable_memory(unsigned long total)
> struct zone *z =
> &NODE_DATA(node)->node_zones[ZONE_HIGHMEM];
>
> - x += zone_page_state(z, NR_FREE_PAGES) + zone_lru_pages(z);
> + x += zone_page_state(z, NR_FREE_PAGES) +
> + zone_reclaimable_pages(z);
> }
> /*
> * Make sure that the number of highmem pages is never larger
> @@ -418,7 +419,7 @@ unsigned long determine_dirtyable_memory(void)
> {
> unsigned long x;
>
> - x = global_page_state(NR_FREE_PAGES) + global_lru_pages();
> + x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
>
> if (!vm_highmem_is_dirtyable)
> x -= highmem_dirtyable_memory(x);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 026f452..3768332 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1693,7 +1693,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> continue;
>
> - lru_pages += zone_lru_pages(zone);
> + lru_pages += zone_reclaimable_pages(zone);
> }
> }
>
> @@ -1910,7 +1910,7 @@ loop_again:
> for (i = 0; i <= end_zone; i++) {
> struct zone *zone = pgdat->node_zones + i;
>
> - lru_pages += zone_lru_pages(zone);
> + lru_pages += zone_reclaimable_pages(zone);
> }
>
> /*
> @@ -1954,7 +1954,7 @@ loop_again:
> if (zone_is_all_unreclaimable(zone))
> continue;
> if (nr_slab == 0 && zone->pages_scanned >=
> - (zone_lru_pages(zone) * 6))
> + (zone_reclaimable_pages(zone) * 6))
> zone_set_flag(zone,
> ZONE_ALL_UNRECLAIMABLE);
> /*
> @@ -2121,12 +2121,33 @@ void wakeup_kswapd(struct zone *zone, int order)
> wake_up_interruptible(&pgdat->kswapd_wait);
> }
>
> -unsigned long global_lru_pages(void)
> +unsigned long global_reclaimable_pages(void)
> {
> - return global_page_state(NR_ACTIVE_ANON)
> - + global_page_state(NR_ACTIVE_FILE)
> - + global_page_state(NR_INACTIVE_ANON)
> - + global_page_state(NR_INACTIVE_FILE);
> + int nr;
> +
> + nr = global_page_state(zone, NR_ACTIVE_FILE) +
> + global_page_state(zone, NR_INACTIVE_FILE);
> +
> + if (total_swap_pages)


Dont' we have to change from total_swap_pages to nr_swap_pages, too ?

> + nr += global_page_state(zone, NR_ACTIVE_ANON) +
> + global_page_state(zone, NR_INACTIVE_ANON);
> +
> + return nr;
> +}
> +
> +
> +unsigned long zone_reclaimable_pages(struct zone *zone)
> +{
> + int nr;
> +
> + nr = zone_page_state(zone, NR_ACTIVE_FILE) +
> + zone_page_state(zone, NR_INACTIVE_FILE);
> +
> + if (nr_swap_pages > 0)
> + nr += zone_page_state(zone, NR_ACTIVE_ANON) +
> + zone_page_state(zone, NR_INACTIVE_ANON);
> +
> + return nr;
> }
>
> #ifdef CONFIG_HIBERNATION
> @@ -2198,7 +2219,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>
> current->reclaim_state = &reclaim_state;
>
> - lru_pages = global_lru_pages();
> + lru_pages = global_reclaimable_pages();
> nr_slab = global_page_state(NR_SLAB_RECLAIMABLE);
> /* If slab caches are huge, it's better to hit them first */
> while (nr_slab >= lru_pages) {
> @@ -2240,7 +2261,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>
> reclaim_state.reclaimed_slab = 0;
> shrink_slab(sc.nr_scanned, sc.gfp_mask,
> - global_lru_pages());
> + global_reclaimable_pages());
> sc.nr_reclaimed += reclaim_state.reclaimed_slab;
> if (sc.nr_reclaimed >= nr_pages)
> goto out;
> @@ -2257,7 +2278,8 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
> if (!sc.nr_reclaimed) {
> do {
> reclaim_state.reclaimed_slab = 0;
> - shrink_slab(nr_pages, sc.gfp_mask, global_lru_pages());
> + shrink_slab(nr_pages, sc.gfp_mask,
> + global_reclaimable_pages());
> sc.nr_reclaimed += reclaim_state.reclaimed_slab;
> } while (sc.nr_reclaimed < nr_pages &&
> reclaim_state.reclaimed_slab > 0);


--
Kinds Regards
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


fengguang.wu at intel

Jun 29, 2009, 12:34 AM

Post #22 of 65 (689 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Mon, Jun 29, 2009 at 08:17:41AM +0800, Minchan Kim wrote:
> On Sun, 28 Jun 2009 23:10:26 +0800
> Wu Fengguang <fengguang.wu [at] intel> wrote:
> > +unsigned long global_reclaimable_pages(void)
> > {
> > - return global_page_state(NR_ACTIVE_ANON)
> > - + global_page_state(NR_ACTIVE_FILE)
> > - + global_page_state(NR_INACTIVE_ANON)
> > - + global_page_state(NR_INACTIVE_FILE);
> > + int nr;
> > +
> > + nr = global_page_state(zone, NR_ACTIVE_FILE) +
> > + global_page_state(zone, NR_INACTIVE_FILE);
> > +
> > + if (total_swap_pages)
>
>
> Dont' we have to change from total_swap_pages to nr_swap_pages, too ?

Yes, good catch! (sorry I was in a hurry at the time..)

Thanks,
Fengguang

---

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 416f748..8d8aa20 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -166,15 +166,8 @@ static inline unsigned long zone_page_state(struct zone *zone,
return x;
}

-extern unsigned long global_lru_pages(void);
-
-static inline unsigned long zone_lru_pages(struct zone *zone)
-{
- return (zone_page_state(zone, NR_ACTIVE_ANON)
- + zone_page_state(zone, NR_ACTIVE_FILE)
- + zone_page_state(zone, NR_INACTIVE_ANON)
- + zone_page_state(zone, NR_INACTIVE_FILE));
-}
+extern unsigned long global_reclaimable_pages(void);
+extern unsigned long zone_reclaimable_pages(void);

#ifdef CONFIG_NUMA
/*
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index a91b870..74c3067 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -394,7 +394,8 @@ static unsigned long highmem_dirtyable_memory(unsigned long total)
struct zone *z =
&NODE_DATA(node)->node_zones[ZONE_HIGHMEM];

- x += zone_page_state(z, NR_FREE_PAGES) + zone_lru_pages(z);
+ x += zone_page_state(z, NR_FREE_PAGES) +
+ zone_reclaimable_pages(z);
}
/*
* Make sure that the number of highmem pages is never larger
@@ -418,7 +419,7 @@ unsigned long determine_dirtyable_memory(void)
{
unsigned long x;

- x = global_page_state(NR_FREE_PAGES) + global_lru_pages();
+ x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();

if (!vm_highmem_is_dirtyable)
x -= highmem_dirtyable_memory(x);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 026f452..09976da 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1693,7 +1693,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
continue;

- lru_pages += zone_lru_pages(zone);
+ lru_pages += zone_reclaimable_pages(zone);
}
}

@@ -1910,7 +1910,7 @@ loop_again:
for (i = 0; i <= end_zone; i++) {
struct zone *zone = pgdat->node_zones + i;

- lru_pages += zone_lru_pages(zone);
+ lru_pages += zone_reclaimable_pages(zone);
}

/*
@@ -1954,7 +1954,7 @@ loop_again:
if (zone_is_all_unreclaimable(zone))
continue;
if (nr_slab == 0 && zone->pages_scanned >=
- (zone_lru_pages(zone) * 6))
+ (zone_reclaimable_pages(zone) * 6))
zone_set_flag(zone,
ZONE_ALL_UNRECLAIMABLE);
/*
@@ -2121,12 +2121,33 @@ void wakeup_kswapd(struct zone *zone, int order)
wake_up_interruptible(&pgdat->kswapd_wait);
}

-unsigned long global_lru_pages(void)
+unsigned long global_reclaimable_pages(void)
{
- return global_page_state(NR_ACTIVE_ANON)
- + global_page_state(NR_ACTIVE_FILE)
- + global_page_state(NR_INACTIVE_ANON)
- + global_page_state(NR_INACTIVE_FILE);
+ int nr;
+
+ nr = global_page_state(zone, NR_ACTIVE_FILE) +
+ global_page_state(zone, NR_INACTIVE_FILE);
+
+ if (nr_swap_pages > 0)
+ nr += global_page_state(zone, NR_ACTIVE_ANON) +
+ global_page_state(zone, NR_INACTIVE_ANON);
+
+ return nr;
+}
+
+
+unsigned long zone_reclaimable_pages(struct zone *zone)
+{
+ int nr;
+
+ nr = zone_page_state(zone, NR_ACTIVE_FILE) +
+ zone_page_state(zone, NR_INACTIVE_FILE);
+
+ if (nr_swap_pages > 0)
+ nr += zone_page_state(zone, NR_ACTIVE_ANON) +
+ zone_page_state(zone, NR_INACTIVE_ANON);
+
+ return nr;
}

#ifdef CONFIG_HIBERNATION
@@ -2198,7 +2219,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)

current->reclaim_state = &reclaim_state;

- lru_pages = global_lru_pages();
+ lru_pages = global_reclaimable_pages();
nr_slab = global_page_state(NR_SLAB_RECLAIMABLE);
/* If slab caches are huge, it's better to hit them first */
while (nr_slab >= lru_pages) {
@@ -2240,7 +2261,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)

reclaim_state.reclaimed_slab = 0;
shrink_slab(sc.nr_scanned, sc.gfp_mask,
- global_lru_pages());
+ global_reclaimable_pages());
sc.nr_reclaimed += reclaim_state.reclaimed_slab;
if (sc.nr_reclaimed >= nr_pages)
goto out;
@@ -2257,7 +2278,8 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
if (!sc.nr_reclaimed) {
do {
reclaim_state.reclaimed_slab = 0;
- shrink_slab(nr_pages, sc.gfp_mask, global_lru_pages());
+ shrink_slab(nr_pages, sc.gfp_mask,
+ global_reclaimable_pages());
sc.nr_reclaimed += reclaim_state.reclaimed_slab;
} while (sc.nr_reclaimed < nr_pages &&
reclaim_state.reclaimed_slab > 0);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kosaki.motohiro at jp

Jun 29, 2009, 12:48 AM

Post #23 of 65 (688 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

2009/6/29 Minchan Kim <minchan.kim [at] gmail>:
> On Sun, Jun 28, 2009 at 11:49 PM, KOSAKI
> Motohiro<kosaki.motohiro [at] jp> wrote:
>>>> In David's OOM case, there are two symptoms:
>>>> 1) 70000 unaccounted/leaked pages as found by Andrew
>>>>   (plus rather big number of PG_buddy and pagetable pages)
>>>> 2) almost zero active_file/inactive_file; small inactive_anon;
>>>>   many slab and active_anon pages.
>>>>
>>>> In the situation of (2), the slab cache is _under_ scanned. So David
>>>> got OOM when vmscan should have squeezed some free pages from the slab
>>>> cache. Which is one important side effect of MinChan's patch?
>>>
>>> My patch's side effect is (2).
>>>
>>> My guessing is following as.
>>>
>>> 1. The number of page scanned in shrink_slab is increased in shrink_page_list.
>>> And it is doubled for mapped page or swapcache.
>>> 2. shrink_page_list is called by shrink_inactive_list
>>> 3. shrink_inactive_list is called by shrink_list
>>>
>>> Look at the shrink_list.
>>> If inactive lru list is low, it always call shrink_active_list not
>>> shrink_inactive_list in case of anon.
>>> It means it doesn't increased sc->nr_scanned.
>>> Then shrink_slab can't shrink enough slab pages.
>>> So, David OOM have a lot of slab pages and active anon pages.
>>>
>>> Does it make sense ?
>>> If it make sense, we have to change shrink_slab's pressure method.
>>> What do you think ?
>>
>> I'm confused.
>>
>> if system have no swap, get_scan_ratio() always return anon=0%.
>> Then, the numver of inactive_anon is not effect to sc.nr_scanned.
>>
>
> My patch isn't a concern since the number of anon lru list(active +
> anon) always same.  I mean shrink_slab's lru_pages is same whether my
> patch there is.  OOM or Pass depends on sc->nr_scanned, I think.
>
> Why I think it is my patch's side effect is follow as.
>
> Compared to old behavior, my patch can change balancing of anon lru
> list when "swap file" is full as Hannes already pointed me out.
>
> It can affect reclaimable anon pages while David is going on swap test on LTP.
> When swap file test is end, pages on swap file is inserted anon lru list, again.
>
> My patch can change physical location of anon pages on ram compared to old.

No.
shrink_active_list() doesn't change page physical address.


> From now on, we have no swap file so that we can reclaim only file pages.
> But we have missed one thing. lumpy reclaim!. (In fact, we should not
> reclaim anon pages in no swap space. A few days ago, I sended patch
> about this problem. http://patchwork.kernel.org/patch/32651/)
>
> It can reclaim anon pages although we have no swap file.
> But after all, shrink_page_list can't reclaim anon pages.  But it
> increases sc->nr_scanned.
>
> So I think whether Shrink_slab can reclaim enough or not depends on
> sc->nr_scanned.
>
> David's problem is very subtle.
>
> 1. If lumpy picks up the anon pages, it can pass LTP since
> sc->nr_scanned is increased.
> 2. If lumpy don't pick up the anon pages, it can meet OOM since
> sc->nr_scanned is almost zero or very small.
>
> Unfortunately, my patch seems to change physical location of pages on
> ram compared to old so that it selects 2.
>
> It's my imaginary novel.
>
> Okay. I believe Wu's patch will solve David's problem.
> David. Could you test with Wu's patch ?

However, lumpy reclaim is good viewpoint.
Recently KAMEZAWA-san fix one serious lumpy reclaim problem. since
2.6.28 lumpy reclaim can insert file mapped pages to anon lru list.
Then, the page become to be not able to reclaimable.

David, Can you please try to following patch? it was posted to LKML
about 1-2 week ago.

Subject "[BUGFIX][PATCH] fix lumpy reclaim lru handiling at
isolate_lru_pages v2"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan.kim at gmail

Jun 29, 2009, 2:32 AM

Post #24 of 65 (685 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

On Mon, 29 Jun 2009 16:48:13 +0900
KOSAKI Motohiro <kosaki.motohiro [at] jp> wrote:

> 2009/6/29 Minchan Kim <minchan.kim [at] gmail>:
> > On Sun, Jun 28, 2009 at 11:49 PM, KOSAKI
> > Motohiro<kosaki.motohiro [at] jp> wrote:
> >>>> In David's OOM case, there are two symptoms:
> >>>> 1) 70000 unaccounted/leaked pages as found by Andrew
> >>>>   (plus rather big number of PG_buddy and pagetable pages)
> >>>> 2) almost zero active_file/inactive_file; small inactive_anon;
> >>>>   many slab and active_anon pages.
> >>>>
> >>>> In the situation of (2), the slab cache is _under_ scanned. So David
> >>>> got OOM when vmscan should have squeezed some free pages from the slab
> >>>> cache. Which is one important side effect of MinChan's patch?
> >>>
> >>> My patch's side effect is (2).
> >>>
> >>> My guessing is following as.
> >>>
> >>> 1. The number of page scanned in shrink_slab is increased in shrink_page_list.
> >>> And it is doubled for mapped page or swapcache.
> >>> 2. shrink_page_list is called by shrink_inactive_list
> >>> 3. shrink_inactive_list is called by shrink_list
> >>>
> >>> Look at the shrink_list.
> >>> If inactive lru list is low, it always call shrink_active_list not
> >>> shrink_inactive_list in case of anon.
> >>> It means it doesn't increased sc->nr_scanned.
> >>> Then shrink_slab can't shrink enough slab pages.
> >>> So, David OOM have a lot of slab pages and active anon pages.
> >>>
> >>> Does it make sense ?
> >>> If it make sense, we have to change shrink_slab's pressure method.
> >>> What do you think ?
> >>
> >> I'm confused.
> >>
> >> if system have no swap, get_scan_ratio() always return anon=0%.
> >> Then, the numver of inactive_anon is not effect to sc.nr_scanned.
> >>
> >
> > My patch isn't a concern since the number of anon lru list(active +
> > anon) always same.  I mean shrink_slab's lru_pages is same whether my
> > patch there is.  OOM or Pass depends on sc->nr_scanned, I think.
> >
> > Why I think it is my patch's side effect is follow as.
> >
> > Compared to old behavior, my patch can change balancing of anon lru
> > list when "swap file" is full as Hannes already pointed me out.
> >
> > It can affect reclaimable anon pages while David is going on swap test on LTP.
> > When swap file test is end, pages on swap file is inserted anon lru list, again.
> >
> > My patch can change physical location of anon pages on ram compared to old.
>
> No.
> shrink_active_list() doesn't change page physical address.

Sorry for makeig misunderstanding you.
I mean follow as.

1. Daivd tests swapfile on LTP.
2. while it is going on, swap file is full
(My patch didn't consider this case. It means it didn't do aging of anon pages.
so my patch can change swap out page's pattern)
3. swapfile test is ended successfully.
4. Anon pages on swap file will reload on DRAM from HDD or any swap device.

In 4) when anon pages are swapped in, we have to allocate new page to copy from swap page.
So, It could change page's physical location.
Then, It can affect lumpy reclaim. :)

>
> > From now on, we have no swap file so that we can reclaim only file pages.
> > But we have missed one thing. lumpy reclaim!. (In fact, we should not
> > reclaim anon pages in no swap space. A few days ago, I sended patch
> > about this problem. http://patchwork.kernel.org/patch/32651/)
> >
> > It can reclaim anon pages although we have no swap file.
> > But after all, shrink_page_list can't reclaim anon pages.  But it
> > increases sc->nr_scanned.
> >
> > So I think whether Shrink_slab can reclaim enough or not depends on
> > sc->nr_scanned.
> >
> > David's problem is very subtle.
> >
> > 1. If lumpy picks up the anon pages, it can pass LTP since
> > sc->nr_scanned is increased.
> > 2. If lumpy don't pick up the anon pages, it can meet OOM since
> > sc->nr_scanned is almost zero or very small.
> >
> > Unfortunately, my patch seems to change physical location of pages on
> > ram compared to old so that it selects 2.
> >
> > It's my imaginary novel.
> >
> > Okay. I believe Wu's patch will solve David's problem.
> > David. Could you test with Wu's patch ?
>
> However, lumpy reclaim is good viewpoint.
> Recently KAMEZAWA-san fix one serious lumpy reclaim problem. since
> 2.6.28 lumpy reclaim can insert file mapped pages to anon lru list.
> Then, the page become to be not able to reclaimable.

Yes. It is also another possibility.
But I have a question why it didn't happen without my patch.
My question is thath why my patch happen OOM with high probability ?

> David, Can you please try to following patch? it was posted to LKML
> about 1-2 week ago.
>
> Subject "[BUGFIX][PATCH] fix lumpy reclaim lru handiling at
> isolate_lru_pages v2"


--
Kinds Regards
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dhowells at redhat

Jun 29, 2009, 3:10 AM

Post #25 of 65 (685 views)
Permalink
Re: Found the commit that causes the OOMs [In reply to]

Wu Fengguang <fengguang.wu [at] intel> wrote:

> Yes, good catch! (sorry I was in a hurry at the time..)

That doesn't compile:

mm/vmscan.c: In function 'do_try_to_free_pages':
mm/vmscan.c:1683: error: too many arguments to function 'zone_reclaimable_pages'
mm/vmscan.c: In function 'balance_pgdat':
mm/vmscan.c:1900: error: too many arguments to function 'zone_reclaimable_pages'
mm/vmscan.c:1944: error: too many arguments to function 'zone_reclaimable_pages'
mm/vmscan.c: In function 'global_reclaimable_pages':
mm/vmscan.c:2115: error: 'zone' undeclared (first use in this function)
mm/vmscan.c:2115: error: (Each undeclared identifier is reported only once
mm/vmscan.c:2115: error: for each function it appears in.)
mm/vmscan.c:2115: error: too many arguments to function 'global_page_state'
mm/vmscan.c:2116: error: too many arguments to function 'global_page_state'
mm/vmscan.c:2119: error: too many arguments to function 'global_page_state'
mm/vmscan.c:2120: error: too many arguments to function 'global_page_state'
mm/vmscan.c: At top level:
mm/vmscan.c:2126: error: conflicting types for 'zone_reclaimable_pages'
include/linux/vmstat.h:170: note: previous declaration of 'zone_reclaimable_pages' was here
make[1]: *** [mm/vmscan.o] Error 1

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.