Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


kosaki.motohiro at jp

Dec 7, 2009, 3:36 AM

Post #1 of 3 (97 views)
Permalink
[early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page

Andrea, Can you please try following patch on your workload?


From a7758c66d36a136d5fbbcf0b042839445f0ca522 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro [at] jp>
Date: Mon, 7 Dec 2009 18:37:20 +0900
Subject: [PATCH] [RFC] vmscan: Don't deactivate many touched page

Changelog
o from andrea's original patch
- Rebase topon my patches.
- Use list_cut_position/list_splice_tail pair instead
list_del/list_add to make pte scan fairness.
- Only use max young threshold when soft_try is true.
It avoid wrong OOM sideeffect.
- Return SWAP_AGAIN instead successful result if max
young threshold exceed. It prevent the pages without clear
pte young bit will be deactivated wrongly.
- Add to treat ksm page logic

Many shared and frequently used page don't need deactivate and
try_to_unamp(). It's pointless while VM pressure is low, the page
might reactivate soon. it's only makes cpu wasting.

Then, This patch makes to stop pte scan if wipe_page_reference()
found lots young pte bit.

Originally-Signed-off-by: Andrea Arcangeli <aarcange [at] redhat>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro [at] jp>
---
include/linux/rmap.h | 17 +++++++++++++++++
mm/ksm.c | 4 ++++
mm/rmap.c | 19 +++++++++++++++++++
3 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 499972e..9ad69b5 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -128,6 +128,23 @@ int wipe_page_reference_one(struct page *page,
struct page_reference_context *refctx,
struct vm_area_struct *vma, unsigned long address);

+#define MAX_YOUNG_BIT_CLEARED 64
+/*
+ * if VM pressure is low and the page have too many active mappings, there isn't
+ * any reason to continue clear young bit of other ptes. Otherwise,
+ * - Makes meaningless cpu wasting, many touched page sholdn't be reclaimed.
+ * - Makes lots IPI for pte change and it might cause another sadly lock
+ * contention.
+ */
+static inline
+int too_many_young_bit_found(struct page_reference_context *refctx)
+{
+ if (refctx->soft_try &&
+ refctx->referenced >= MAX_YOUNG_BIT_CLEARED)
+ return 1;
+ return 0;
+}
+
enum ttu_flags {
TTU_UNMAP = 0, /* unmap mode */
TTU_MIGRATION = 1, /* migration mode */
diff --git a/mm/ksm.c b/mm/ksm.c
index 3c121c8..46ea519 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1586,6 +1586,10 @@ again:
rmap_item->address);
if (ret != SWAP_SUCCESS)
goto out;
+ if (too_many_young_bit_found(refctx)) {
+ ret = SWAP_AGAIN;
+ goto out;
+ }
mapcount--;
if (!search_new_forks || !mapcount)
break;
diff --git a/mm/rmap.c b/mm/rmap.c
index cfda0a0..f4517f3 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -473,6 +473,21 @@ static int wipe_page_reference_anon(struct page *page,
ret = wipe_page_reference_one(page, refctx, vma, address);
if (ret != SWAP_SUCCESS)
break;
+ if (too_many_young_bit_found(refctx)) {
+ LIST_HEAD(tmp_list);
+
+ /*
+ * The scanned ptes move to list tail. it help every ptes
+ * on this page will be tested by ptep_clear_young().
+ * Otherwise, this shortcut makes unfair thing.
+ */
+ list_cut_position(&tmp_list,
+ &vma->anon_vma_node,
+ &anon_vma->head);
+ list_splice_tail(&tmp_list, &vma->anon_vma_node);
+ ret = SWAP_AGAIN;
+ break;
+ }
mapcount--;
if (!mapcount || refctx->maybe_mlocked)
break;
@@ -543,6 +558,10 @@ static int wipe_page_reference_file(struct page *page,
ret = wipe_page_reference_one(page, refctx, vma, address);
if (ret != SWAP_SUCCESS)
break;
+ if (too_many_young_bit_found(refctx)) {
+ ret = SWAP_AGAIN;
+ break;
+ }
mapcount--;
if (!mapcount || refctx->maybe_mlocked)
break;
--
1.6.5.2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


riel at redhat

Dec 7, 2009, 10:10 AM

Post #2 of 3 (82 views)
Permalink
Re: [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page [In reply to]

On 12/07/2009 06:36 AM, KOSAKI Motohiro wrote:
>
> Andrea, Can you please try following patch on your workload?
>
>
> From a7758c66d36a136d5fbbcf0b042839445f0ca522 Mon Sep 17 00:00:00 2001
> From: KOSAKI Motohiro<kosaki.motohiro [at] jp>
> Date: Mon, 7 Dec 2009 18:37:20 +0900
> Subject: [PATCH] [RFC] vmscan: Don't deactivate many touched page
>
> Changelog
> o from andrea's original patch
> - Rebase topon my patches.
> - Use list_cut_position/list_splice_tail pair instead
> list_del/list_add to make pte scan fairness.
> - Only use max young threshold when soft_try is true.
> It avoid wrong OOM sideeffect.
> - Return SWAP_AGAIN instead successful result if max
> young threshold exceed. It prevent the pages without clear
> pte young bit will be deactivated wrongly.
> - Add to treat ksm page logic

I like the concept and your changes, and really only
have a few small nitpicks :)

First, the VM uses a mix of "referenced", "accessed" and
"young". We should probably avoid adding "active" to that
mix, and may even want to think about moving to just one
or two terms :)

> +#define MAX_YOUNG_BIT_CLEARED 64
> +/*
> + * if VM pressure is low and the page have too many active mappings, there isn't
> + * any reason to continue clear young bit of other ptes. Otherwise,
> + * - Makes meaningless cpu wasting, many touched page sholdn't be reclaimed.
> + * - Makes lots IPI for pte change and it might cause another sadly lock
> + * contention.
> + */

If VM pressure is low and the page has lots of active users, we only
clear up to MAX_YOUNG_BIT_CLEARED accessed bits at a time. Clearing
accessed bits takes CPU time, needs TLB invalidate IPIs and could
cause lock contention. Since a heavily shared page is very likely
to be used again soon, the cost outweighs the benefit of making such
a heavily shared page a candidate for eviction.

> diff --git a/mm/rmap.c b/mm/rmap.c
> index cfda0a0..f4517f3 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -473,6 +473,21 @@ static int wipe_page_reference_anon(struct page *page,
> ret = wipe_page_reference_one(page, refctx, vma, address);
> if (ret != SWAP_SUCCESS)
> break;
> + if (too_many_young_bit_found(refctx)) {
> + LIST_HEAD(tmp_list);
> +
> + /*
> + * The scanned ptes move to list tail. it help every ptes
> + * on this page will be tested by ptep_clear_young().
> + * Otherwise, this shortcut makes unfair thing.
> + */
> + list_cut_position(&tmp_list,
> + &vma->anon_vma_node,
> + &anon_vma->head);
> + list_splice_tail(&tmp_list,&vma->anon_vma_node);
> + ret = SWAP_AGAIN;
> + break;
> + }

I do not understand the unfairness here, since all a page needs
to stay on the active list is >64 referenced PTEs. It does not
matter which of the PTEs mapping the page were recently referenced.

However, rotating the anon vmas around may help spread out lock
pressure in the VM and help things that way, so the code looks
useful to me.

In short, you can give the next version of this patch my

Reviewed-by: Rik van Riel <riel [at] redhat>

All I have are comment nitpicks :)

--
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kosaki.motohiro at jp

Dec 7, 2009, 10:27 PM

Post #3 of 3 (84 views)
Permalink
Re: [early RFC][PATCH 8/7] vmscan: Don't deactivate many touched page [In reply to]

> On 12/07/2009 06:36 AM, KOSAKI Motohiro wrote:
> >
> > Andrea, Can you please try following patch on your workload?
> >
> >
> > From a7758c66d36a136d5fbbcf0b042839445f0ca522 Mon Sep 17 00:00:00 2001
> > From: KOSAKI Motohiro<kosaki.motohiro [at] jp>
> > Date: Mon, 7 Dec 2009 18:37:20 +0900
> > Subject: [PATCH] [RFC] vmscan: Don't deactivate many touched page
> >
> > Changelog
> > o from andrea's original patch
> > - Rebase topon my patches.
> > - Use list_cut_position/list_splice_tail pair instead
> > list_del/list_add to make pte scan fairness.
> > - Only use max young threshold when soft_try is true.
> > It avoid wrong OOM sideeffect.
> > - Return SWAP_AGAIN instead successful result if max
> > young threshold exceed. It prevent the pages without clear
> > pte young bit will be deactivated wrongly.
> > - Add to treat ksm page logic
>
> I like the concept and your changes, and really only
> have a few small nitpicks :)
>
> First, the VM uses a mix of "referenced", "accessed" and
> "young". We should probably avoid adding "active" to that
> mix, and may even want to think about moving to just one
> or two terms :)

Ah yes, certainly.


> > +#define MAX_YOUNG_BIT_CLEARED 64
> > +/*
> > + * if VM pressure is low and the page have too many active mappings, there isn't
> > + * any reason to continue clear young bit of other ptes. Otherwise,
> > + * - Makes meaningless cpu wasting, many touched page sholdn't be reclaimed.
> > + * - Makes lots IPI for pte change and it might cause another sadly lock
> > + * contention.
> > + */
>
> If VM pressure is low and the page has lots of active users, we only
> clear up to MAX_YOUNG_BIT_CLEARED accessed bits at a time. Clearing
> accessed bits takes CPU time, needs TLB invalidate IPIs and could
> cause lock contention. Since a heavily shared page is very likely
> to be used again soon, the cost outweighs the benefit of making such
> a heavily shared page a candidate for eviction.

Thanks. Will fix.


> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index cfda0a0..f4517f3 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -473,6 +473,21 @@ static int wipe_page_reference_anon(struct page *page,
> > ret = wipe_page_reference_one(page, refctx, vma, address);
> > if (ret != SWAP_SUCCESS)
> > break;
> > + if (too_many_young_bit_found(refctx)) {
> > + LIST_HEAD(tmp_list);
> > +
> > + /*
> > + * The scanned ptes move to list tail. it help every ptes
> > + * on this page will be tested by ptep_clear_young().
> > + * Otherwise, this shortcut makes unfair thing.
> > + */
> > + list_cut_position(&tmp_list,
> > + &vma->anon_vma_node,
> > + &anon_vma->head);
> > + list_splice_tail(&tmp_list,&vma->anon_vma_node);
> > + ret = SWAP_AGAIN;
> > + break;
> > + }
>
> I do not understand the unfairness here, since all a page needs
> to stay on the active list is >64 referenced PTEs. It does not
> matter which of the PTEs mapping the page were recently referenced.
>
> However, rotating the anon vmas around may help spread out lock
> pressure in the VM and help things that way, so the code looks
> useful to me.

agreed. I have to rewrite the comment.


> In short, you can give the next version of this patch my
>
> Reviewed-by: Rik van Riel <riel [at] redhat>
>
> All I have are comment nitpicks :)

No. It's really worth.

Thank you.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.