Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH 1/9] ksm: fix mlockfreed to munlocked

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


hugh.dickins at tiscali

Nov 24, 2009, 8:40 AM

Post #1 of 8 (229 views)
Permalink
[PATCH 1/9] ksm: fix mlockfreed to munlocked

When KSM merges an mlocked page, it has been forgetting to munlock it:
that's been left to free_page_mlock(), which reports it in /proc/vmstat
as unevictable_pgs_mlockfreed instead of unevictable_pgs_munlocked (and
whinges "Page flag mlocked set for process" in mmotm, whereas mainline
is silently forgiving). Call munlock_vma_page() to fix that.

Signed-off-by: Hugh Dickins <hugh.dickins [at] tiscali>
---
Is this a fix that I ought to backport to 2.6.32? It does rely on part of
an earlier patch (moved unlock_page down), so does not apply cleanly as is.

mm/internal.h | 3 ++-
mm/ksm.c | 4 ++++
mm/mlock.c | 4 ++--
3 files changed, 8 insertions(+), 3 deletions(-)

--- ksm0/mm/internal.h 2009-11-14 10:17:02.000000000 +0000
+++ ksm1/mm/internal.h 2009-11-22 20:39:56.000000000 +0000
@@ -105,9 +105,10 @@ static inline int is_mlocked_vma(struct
}

/*
- * must be called with vma's mmap_sem held for read, and page locked.
+ * must be called with vma's mmap_sem held for read or write, and page locked.
*/
extern void mlock_vma_page(struct page *page);
+extern void munlock_vma_page(struct page *page);

/*
* Clear the page's PageMlocked(). This can be useful in a situation where
--- ksm0/mm/ksm.c 2009-11-14 10:17:02.000000000 +0000
+++ ksm1/mm/ksm.c 2009-11-22 20:39:56.000000000 +0000
@@ -34,6 +34,7 @@
#include <linux/ksm.h>

#include <asm/tlbflush.h>
+#include "internal.h"

/*
* A few notes about the KSM scanning process,
@@ -762,6 +763,9 @@ static int try_to_merge_one_page(struct
pages_identical(page, kpage))
err = replace_page(vma, page, kpage, orig_pte);

+ if ((vma->vm_flags & VM_LOCKED) && !err)
+ munlock_vma_page(page);
+
unlock_page(page);
out:
return err;
--- ksm0/mm/mlock.c 2009-11-14 10:17:02.000000000 +0000
+++ ksm1/mm/mlock.c 2009-11-22 20:39:56.000000000 +0000
@@ -99,14 +99,14 @@ void mlock_vma_page(struct page *page)
* not get another chance to clear PageMlocked. If we successfully
* isolate the page and try_to_munlock() detects other VM_LOCKED vmas
* mapping the page, it will restore the PageMlocked state, unless the page
- * is mapped in a non-linear vma. So, we go ahead and SetPageMlocked(),
+ * is mapped in a non-linear vma. So, we go ahead and ClearPageMlocked(),
* perhaps redundantly.
* If we lose the isolation race, and the page is mapped by other VM_LOCKED
* vmas, we'll detect this in vmscan--via try_to_munlock() or try_to_unmap()
* either of which will restore the PageMlocked state by calling
* mlock_vma_page() above, if it can grab the vma's mmap sem.
*/
-static void munlock_vma_page(struct page *page)
+void munlock_vma_page(struct page *page)
{
BUG_ON(!PageLocked(page));

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


riel at redhat

Nov 24, 2009, 3:53 PM

Post #2 of 8 (214 views)
Permalink
Re: [PATCH 1/9] ksm: fix mlockfreed to munlocked [In reply to]

On 11/24/2009 11:40 AM, Hugh Dickins wrote:
> When KSM merges an mlocked page, it has been forgetting to munlock it:
> that's been left to free_page_mlock(), which reports it in /proc/vmstat
> as unevictable_pgs_mlockfreed instead of unevictable_pgs_munlocked (and
> whinges "Page flag mlocked set for process" in mmotm, whereas mainline
> is silently forgiving). Call munlock_vma_page() to fix that.
>
> Signed-off-by: Hugh Dickins<hugh.dickins [at] tiscali>
>
>
Acked-by: Rik van Riel <riel [at] redhat>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mel at csn

Nov 26, 2009, 8:20 AM

Post #3 of 8 (205 views)
Permalink
Re: [PATCH 1/9] ksm: fix mlockfreed to munlocked [In reply to]

On Tue, Nov 24, 2009 at 04:40:55PM +0000, Hugh Dickins wrote:
> When KSM merges an mlocked page, it has been forgetting to munlock it:
> that's been left to free_page_mlock(), which reports it in /proc/vmstat
> as unevictable_pgs_mlockfreed instead of unevictable_pgs_munlocked (and
> whinges "Page flag mlocked set for process" in mmotm, whereas mainline
> is silently forgiving). Call munlock_vma_page() to fix that.
>
> Signed-off-by: Hugh Dickins <hugh.dickins [at] tiscali>

Acked-by: Mel Gorman <mel [at] csn>

> ---
> Is this a fix that I ought to backport to 2.6.32? It does rely on part of
> an earlier patch (moved unlock_page down), so does not apply cleanly as is.
>
> mm/internal.h | 3 ++-
> mm/ksm.c | 4 ++++
> mm/mlock.c | 4 ++--
> 3 files changed, 8 insertions(+), 3 deletions(-)
>
> --- ksm0/mm/internal.h 2009-11-14 10:17:02.000000000 +0000
> +++ ksm1/mm/internal.h 2009-11-22 20:39:56.000000000 +0000
> @@ -105,9 +105,10 @@ static inline int is_mlocked_vma(struct
> }
>
> /*
> - * must be called with vma's mmap_sem held for read, and page locked.
> + * must be called with vma's mmap_sem held for read or write, and page locked.
> */
> extern void mlock_vma_page(struct page *page);
> +extern void munlock_vma_page(struct page *page);
>
> /*
> * Clear the page's PageMlocked(). This can be useful in a situation where
> --- ksm0/mm/ksm.c 2009-11-14 10:17:02.000000000 +0000
> +++ ksm1/mm/ksm.c 2009-11-22 20:39:56.000000000 +0000
> @@ -34,6 +34,7 @@
> #include <linux/ksm.h>
>
> #include <asm/tlbflush.h>
> +#include "internal.h"
>
> /*
> * A few notes about the KSM scanning process,
> @@ -762,6 +763,9 @@ static int try_to_merge_one_page(struct
> pages_identical(page, kpage))
> err = replace_page(vma, page, kpage, orig_pte);
>
> + if ((vma->vm_flags & VM_LOCKED) && !err)
> + munlock_vma_page(page);
> +
> unlock_page(page);
> out:
> return err;
> --- ksm0/mm/mlock.c 2009-11-14 10:17:02.000000000 +0000
> +++ ksm1/mm/mlock.c 2009-11-22 20:39:56.000000000 +0000
> @@ -99,14 +99,14 @@ void mlock_vma_page(struct page *page)
> * not get another chance to clear PageMlocked. If we successfully
> * isolate the page and try_to_munlock() detects other VM_LOCKED vmas
> * mapping the page, it will restore the PageMlocked state, unless the page
> - * is mapped in a non-linear vma. So, we go ahead and SetPageMlocked(),
> + * is mapped in a non-linear vma. So, we go ahead and ClearPageMlocked(),
> * perhaps redundantly.
> * If we lose the isolation race, and the page is mapped by other VM_LOCKED
> * vmas, we'll detect this in vmscan--via try_to_munlock() or try_to_unmap()
> * either of which will restore the PageMlocked state by calling
> * mlock_vma_page() above, if it can grab the vma's mmap sem.
> */
> -static void munlock_vma_page(struct page *page)
> +void munlock_vma_page(struct page *page)
> {
> BUG_ON(!PageLocked(page));
>
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hugh.dickins at tiscali

Nov 27, 2009, 4:45 AM

Post #4 of 8 (202 views)
Permalink
Re: [PATCH 1/9] ksm: fix mlockfreed to munlocked [In reply to]

On Thu, 26 Nov 2009, Mel Gorman wrote:
> On Tue, Nov 24, 2009 at 04:40:55PM +0000, Hugh Dickins wrote:
> > When KSM merges an mlocked page, it has been forgetting to munlock it:
> > that's been left to free_page_mlock(), which reports it in /proc/vmstat
> > as unevictable_pgs_mlockfreed instead of unevictable_pgs_munlocked (and
> > whinges "Page flag mlocked set for process" in mmotm, whereas mainline
> > is silently forgiving). Call munlock_vma_page() to fix that.
> >
> > Signed-off-by: Hugh Dickins <hugh.dickins [at] tiscali>
>
> Acked-by: Mel Gorman <mel [at] csn>

Rik & Mel, thanks for the Acks.

But please clarify: that patch was for mmotm and hopefully 2.6.33,
but the vmstat issue (minus warning message) is there in 2.6.32-rc.
Should I

(a) forget it for 2.6.32
(b) rush Linus a patch for 2.6.32 final
(c) send a patch for 2.6.32.stable later on

? I just don't have a feel for how important this is.

Typically, these pages are immediately freed, and the only issue is
which stats they get added to; but if fork has copied them into other
mms, then such pages might stay unevictable indefinitely, despite no
longer being in any mlocked vma.

There's a remark in munlock_vma_page(), apropos a different issue,
/*
* We lost the race. let try_to_unmap() deal
* with it. At least we get the page state and
* mlock stats right. However, page is still on
* the noreclaim list. We'll fix that up when
* the page is eventually freed or we scan the
* noreclaim list.
*/
which implies that sometimes we scan the unevictable list and resolve
such cases. But I wonder if that's nowadays the case?

>
> > ---
> > Is this a fix that I ought to backport to 2.6.32? It does rely on part of
> > an earlier patch (moved unlock_page down), so does not apply cleanly as is.

Thanks,
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kosaki.motohiro at jp

Nov 29, 2009, 10:01 PM

Post #5 of 8 (199 views)
Permalink
Re: [PATCH 1/9] ksm: fix mlockfreed to munlocked [In reply to]

> On Thu, 26 Nov 2009, Mel Gorman wrote:
> > On Tue, Nov 24, 2009 at 04:40:55PM +0000, Hugh Dickins wrote:
> > > When KSM merges an mlocked page, it has been forgetting to munlock it:
> > > that's been left to free_page_mlock(), which reports it in /proc/vmstat
> > > as unevictable_pgs_mlockfreed instead of unevictable_pgs_munlocked (and
> > > whinges "Page flag mlocked set for process" in mmotm, whereas mainline
> > > is silently forgiving). Call munlock_vma_page() to fix that.
> > >
> > > Signed-off-by: Hugh Dickins <hugh.dickins [at] tiscali>
> >
> > Acked-by: Mel Gorman <mel [at] csn>
>
> Rik & Mel, thanks for the Acks.
>
> But please clarify: that patch was for mmotm and hopefully 2.6.33,
> but the vmstat issue (minus warning message) is there in 2.6.32-rc.
> Should I
>
> (a) forget it for 2.6.32
> (b) rush Linus a patch for 2.6.32 final
> (c) send a patch for 2.6.32.stable later on

I personally prefer (3). though I don't know ksm so detail.


>
> ? I just don't have a feel for how important this is.
>
> Typically, these pages are immediately freed, and the only issue is
> which stats they get added to; but if fork has copied them into other
> mms, then such pages might stay unevictable indefinitely, despite no
> longer being in any mlocked vma.
>
> There's a remark in munlock_vma_page(), apropos a different issue,
> /*
> * We lost the race. let try_to_unmap() deal
> * with it. At least we get the page state and
> * mlock stats right. However, page is still on
> * the noreclaim list. We'll fix that up when
> * the page is eventually freed or we scan the
> * noreclaim list.
> */
> which implies that sometimes we scan the unevictable list and resolve
> such cases. But I wonder if that's nowadays the case?

We don't scan unevictable list at all. munlock_vma_page() logic is.

1) clear PG_mlock always anyway
2) isolate page
3) scan related vma and remark PG_mlock if necessary

So, as far as I understand, the above comment describe the case when (2) is
failed. it mean another task already isolated the page. it makes the task
putback the page to evictable list and vmscan's try_to_unmap() move
the page to unevictable list again.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


hugh.dickins at tiscali

Nov 30, 2009, 4:26 AM

Post #6 of 8 (197 views)
Permalink
Re: [PATCH 1/9] ksm: fix mlockfreed to munlocked [In reply to]

On Mon, 30 Nov 2009, KOSAKI Motohiro wrote:
> >
> > But please clarify: that patch was for mmotm and hopefully 2.6.33,
> > but the vmstat issue (minus warning message) is there in 2.6.32-rc.
> > Should I
> >
> > (a) forget it for 2.6.32
> > (b) rush Linus a patch for 2.6.32 final
> > (c) send a patch for 2.6.32.stable later on
>
> I personally prefer (3). though I don't know ksm so detail.

Thanks, I think that would be my preference by now too.

> > There's a remark in munlock_vma_page(), apropos a different issue,
> > /*
> > * We lost the race. let try_to_unmap() deal
> > * with it. At least we get the page state and
> > * mlock stats right. However, page is still on
> > * the noreclaim list. We'll fix that up when
> > * the page is eventually freed or we scan the
> > * noreclaim list.
> > */
> > which implies that sometimes we scan the unevictable list and resolve
> > such cases. But I wonder if that's nowadays the case?
>
> We don't scan unevictable list at all. munlock_vma_page() logic is.
>
> 1) clear PG_mlock always anyway
> 2) isolate page
> 3) scan related vma and remark PG_mlock if necessary
>
> So, as far as I understand, the above comment describe the case when (2) is
> failed. it mean another task already isolated the page. it makes the task
> putback the page to evictable list and vmscan's try_to_unmap() move
> the page to unevictable list again.

That is the case it's addressing, yes; but both references to
"the noreclaim list" are untrue and misleading (now: they may well
have been accurate when the comment went in). I'd like to correct
it, but cannot do so without spending the time to make sure that
what I'm saying instead isn't equally misleading...

Even "We lost the race" is worrying: which race? there might be several.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


Lee.Schermerhorn at hp

Nov 30, 2009, 1:27 PM

Post #7 of 8 (190 views)
Permalink
Re: [PATCH 1/9] ksm: fix mlockfreed to munlocked [In reply to]

On Mon, 2009-11-30 at 12:26 +0000, Hugh Dickins wrote:
> On Mon, 30 Nov 2009, KOSAKI Motohiro wrote:
> > >
> > > But please clarify: that patch was for mmotm and hopefully 2.6.33,
> > > but the vmstat issue (minus warning message) is there in 2.6.32-rc.
> > > Should I
> > >
> > > (a) forget it for 2.6.32
> > > (b) rush Linus a patch for 2.6.32 final
> > > (c) send a patch for 2.6.32.stable later on
> >
> > I personally prefer (3). though I don't know ksm so detail.
>
> Thanks, I think that would be my preference by now too.
>
> > > There's a remark in munlock_vma_page(), apropos a different issue,
> > > /*
> > > * We lost the race. let try_to_unmap() deal
> > > * with it. At least we get the page state and
> > > * mlock stats right. However, page is still on
> > > * the noreclaim list. We'll fix that up when
> > > * the page is eventually freed or we scan the
> > > * noreclaim list.
> > > */
> > > which implies that sometimes we scan the unevictable list and resolve
> > > such cases. But I wonder if that's nowadays the case?
> >
> > We don't scan unevictable list at all. munlock_vma_page() logic is.
> >
> > 1) clear PG_mlock always anyway
> > 2) isolate page
> > 3) scan related vma and remark PG_mlock if necessary
> >
> > So, as far as I understand, the above comment describe the case when (2) is
> > failed. it mean another task already isolated the page. it makes the task
> > putback the page to evictable list and vmscan's try_to_unmap() move
> > the page to unevictable list again.
>
> That is the case it's addressing, yes; but both references to
> "the noreclaim list" are untrue and misleading (now: they may well
> have been accurate when the comment went in). I'd like to correct
> it, but cannot do so without spending the time to make sure that
> what I'm saying instead isn't equally misleading...
>
> Even "We lost the race" is worrying: which race? there might be several.

I agree that this is likely a stale comment. At the time I wrote it,
putback_lru_page() didn't recheck whether the page was reclaimable [now
"evictable"]. isolate_lru_page() preserves the lru state flags Active
and Unevictable; at that time putback' just put the page back to the
list indicated.

"The race" referred to the "isolation race" discussed in the comment
block on munlock_vma_page().

Had we been munlock()ing or munmap()ing the last VMA holding the page
locked, we should take it off the unevictable list. But, we need to
isolate the page to move it between lists or even to call
try_to_unlock() to check whether there are other vmas holding the page
mlocked. If we were unable to isolate the page in munlock_vma_page()
and it were "putback" by whatever was holding it [page migration
maybe?], it would go back onto the unevictable list where it would be
stranded.

Now that we recheck the page state in putback_lru_page(), this shouldn't
be an issue. We've already cleared the Mlock page flag, so that
condition won't force it onto the unevictable list.

Even the part about try_to_unmap() dealing with it is stale. Now,
vmscan detects VM_LOCKED pages in page_referenced() before it gets to
try_to_unmap(). The function comment block needs updating as well. If
no one beats me to it, I'll post a cleanup patch for consideration
shortly.

Lee


>
> Hugh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo [at] kvack For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont [at] kvack"> email [at] kvack </a>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mel at csn

Dec 1, 2009, 3:14 AM

Post #8 of 8 (184 views)
Permalink
Re: [PATCH 1/9] ksm: fix mlockfreed to munlocked [In reply to]

On Fri, Nov 27, 2009 at 12:45:04PM +0000, Hugh Dickins wrote:
> On Thu, 26 Nov 2009, Mel Gorman wrote:
> > On Tue, Nov 24, 2009 at 04:40:55PM +0000, Hugh Dickins wrote:
> > > When KSM merges an mlocked page, it has been forgetting to munlock it:
> > > that's been left to free_page_mlock(), which reports it in /proc/vmstat
> > > as unevictable_pgs_mlockfreed instead of unevictable_pgs_munlocked (and
> > > whinges "Page flag mlocked set for process" in mmotm, whereas mainline
> > > is silently forgiving). Call munlock_vma_page() to fix that.
> > >
> > > Signed-off-by: Hugh Dickins <hugh.dickins [at] tiscali>
> >
> > Acked-by: Mel Gorman <mel [at] csn>
>
> Rik & Mel, thanks for the Acks.
>
> But please clarify: that patch was for mmotm and hopefully 2.6.33,
> but the vmstat issue (minus warning message) is there in 2.6.32-rc.
> Should I
>
> (a) forget it for 2.6.32
> (b) rush Linus a patch for 2.6.32 final
> (c) send a patch for 2.6.32.stable later on
>
> ? I just don't have a feel for how important this is.
>

My ack was based on the view that pages should not be getting to the buddy
allocator with the mlocked bit set. It only warns in -mm because it's meant
to be harmless-if-incorrect in all cases. Based on my reading of your
patch, it looked like a reasonable way of clearing the locked bit that
deal with the same type of isolation races typically faced by reclaim.

I felt it would be a case that either the isolation failed and it would
end up back on the LRU list or it would remain on whatever unevitable
LRU list it previously existed on where it would be found there.

> Typically, these pages are immediately freed, and the only issue is
> which stats they get added to; but if fork has copied them into other
> mms, then such pages might stay unevictable indefinitely, despite no
> longer being in any mlocked vma.
>
> There's a remark in munlock_vma_page(), apropos a different issue,
> /*
> * We lost the race. let try_to_unmap() deal
> * with it. At least we get the page state and
> * mlock stats right. However, page is still on
> * the noreclaim list. We'll fix that up when
> * the page is eventually freed or we scan the
> * noreclaim list.
> */
> which implies that sometimes we scan the unevictable list and resolve
> such cases. But I wonder if that's nowadays the case?
>

My understanding was that if it failed to isolate then another process had
already done the necessary work and dropped the reference. The page would
then get properly freed at the last put_page. I did not double check this
assumption.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.