Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[RFC PATCH v2 0/4] mm: reclaim zbud pages on migration and compaction

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


k.kozlowski at samsung

Aug 9, 2013, 3:22 AM

Post #1 of 5 (14 views)
Permalink
[RFC PATCH v2 0/4] mm: reclaim zbud pages on migration and compaction

Hi,

Currently zbud pages are not movable and they cannot be allocated from CMA
region. These patches try to address the problem by:
1. Adding a new form of reclaim of zbud pages.
2. Reclaiming zbud pages during migration and compaction.
3. Allocating zbud pages with __GFP_RECLAIMABLE flag.

This reclaim process is different than zbud_reclaim_page(). It acts more
like swapoff() by trying to unuse pages stored in zbud page and bring
them back to memory. The standard zbud_reclaim_page() on the other hand
tries to write them back.

One of patches introduces PageZbud() function which identifies zbud pages
my page->_mapcount. Dave Hansen proposed aliasing PG_zbud=PG_slab but in
such case patch would be more intrusive.

Any ideas for a better solution are welcome.

TODO-s:
1. Migrate zbud pages directly instead of reclaiming.

Changes since v1:
1. Rebased against v3.11-rc4-103-g6c2580c.
2. Remove rebalance_lists() to fix reinserting zbud page after zbud_free.
This function was added because similar code was present in
zbud_free/zbud_alloc/zbud_reclaim_page but it turns out that there
is no benefit in generalizing this code.
(suggested by Seth Jennings)
3. Remove BUG_ON checks for first/last chunks during free and reclaim.
(suggested by Seth Jennings)
4. Use page->_mapcount==-127 instead of new PG_zbud flag.
(suggested by Dave Hansen)
5. Fix invalid dereference of pointer to compact_control in page_alloc.c.
6. Fix lost return value in try_to_unuse() in swapfile.c (this fixes
hang when swapoff was interrupted e.g. by CTRL+C).


Best regards,
Krzysztof Kozlowski


Krzysztof Kozlowski (4):
zbud: use page ref counter for zbud pages
mm: split code for unusing swap entries from try_to_unuse
mm: use mapcount for identifying zbud pages
mm: reclaim zbud pages on migration and compaction

include/linux/mm.h | 23 +++
include/linux/swapfile.h | 2 +
include/linux/zbud.h | 11 +-
mm/compaction.c | 20 ++-
mm/internal.h | 1 +
mm/page_alloc.c | 6 +
mm/swapfile.c | 356 ++++++++++++++++++++++++----------------------
mm/zbud.c | 247 +++++++++++++++++++++++---------
mm/zswap.c | 57 +++++++-
9 files changed, 476 insertions(+), 247 deletions(-)

--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan at kernel

Aug 11, 2013, 7:25 PM

Post #2 of 5 (12 views)
Permalink
Re: [RFC PATCH v2 0/4] mm: reclaim zbud pages on migration and compaction [In reply to]

Hello,

On Fri, Aug 09, 2013 at 12:22:16PM +0200, Krzysztof Kozlowski wrote:
> Hi,
>
> Currently zbud pages are not movable and they cannot be allocated from CMA
> region. These patches try to address the problem by:

The zcache, zram and GUP pages for memory-hotplug and/or CMA are
same situation.

> 1. Adding a new form of reclaim of zbud pages.
> 2. Reclaiming zbud pages during migration and compaction.
> 3. Allocating zbud pages with __GFP_RECLAIMABLE flag.

So I'd like to solve it with general approach.

Each subsystem or GUP caller who want to pin pages long time should
create own migration handler and register the page into pin-page
control subsystem like this.

driver/foo.c

int foo_migrate(struct page *page, void *private);

static struct pin_page_owner foo_migrate = {
.migrate = foo_migrate;
};

int foo_allocate()
{
struct page *newpage = alloc_pages();
set_pinned_page(newpage, &foo_migrate);
}

And in compaction.c or somewhere where want to move/reclaim the page,
general VM can ask to owner if it founds it's pinned page.

mm/compaction.c

if (PagePinned(page)) {
struct pin_page_info *info = get_page_pin_info(page);
info->migrate(page);

}

Only hurdle for that is that we should introduce a new page flag and
I believe if we all agree this approch, we can find a solution at last.

What do you think?

From 9a4f652006b7d0c750933d738e1bd6f53754bcf6 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan [at] kernel>
Date: Sun, 11 Aug 2013 00:31:57 +0900
Subject: [RFC] pin page control subsystem


Signed-off-by: Minchan Kim <minchan [at] kernel>
---
mm/Makefile | 2 +-
mm/pin-page.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 102 insertions(+), 1 deletion(-)
create mode 100644 mm/pin-page.c

diff --git a/mm/Makefile b/mm/Makefile
index f008033..245c2f7 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -5,7 +5,7 @@
mmu-y := nommu.o
mmu-$(CONFIG_MMU) := fremap.o highmem.o madvise.o memory.o mincore.o \
mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
- vmalloc.o pagewalk.o pgtable-generic.o
+ vmalloc.o pagewalk.o pgtable-generic.o pin-page.o

ifdef CONFIG_CROSS_MEMORY_ATTACH
mmu-$(CONFIG_MMU) += process_vm_access.o
diff --git a/mm/pin-page.c b/mm/pin-page.c
new file mode 100644
index 0000000..74b07f8
--- /dev/null
+++ b/mm/pin-page.c
@@ -0,0 +1,101 @@
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/hashtable.h>
+
+#define PPAGE_HASH_BITS 10
+
+static DEFINE_SPINLOCK(hash_lock);
+/*
+ * Should consider what's data struct we should use.
+ * It would be better use radix tree if we try to pin contigous
+ * pages a lot but if we pin spread pages, it wouldn't be a good idea.
+ */
+static DEFINE_HASHTABLE(pin_page_hash, PPAGE_HASH_BITS);
+
+/*
+ * Each subsystems should provide own page migration handler
+ */
+struct pin_page_owner {
+ int (*migrate)(struct page *page, void *private);
+};
+
+struct pin_page_info {
+ struct pin_page_owner *owner;
+ struct hlist_node hlist;
+
+ unsigned long pfn;
+ void *private;
+};
+
+/* TODO : Introduce new page flags */
+void SetPinnedPage(struct page *page)
+{
+
+}
+
+int PinnedPage(struct page *page)
+{
+ return 0;
+}
+
+/*
+ * GUP caller or subsystems which pin the page should call this function
+ * to register @page in pin-page control subsystem so that VM can ask us
+ * when it want to migrate @page.
+ *
+ * Each pinned page would have some private key to identify itself
+ * like custom-allocator-returned handle.
+ */
+int set_pinned_page(struct pin_page_owner *owner,
+ struct page *page, void *private)
+{
+ struct pin_page_info *pinfo = kmalloc(sizeof(pinfo), GFP_KERNEL);
+
+ INIT_HLIST_NODE(&pinfo->hlist);
+ pinfo->owner = owner;
+
+ pinfo->pfn = page_to_pfn(page);
+ pinfo->private = private;
+
+ spin_lock(&hash_lock);
+ hash_add(pin_page_hash, &pinfo->hlist, pinfo->pfn);
+ spin_unlock(&hash_lock);
+
+ SetPinnedPage(page);
+ return 0;
+};
+
+struct pin_page_info *get_pin_page_info(struct page *page)
+{
+ struct pin_page_info *tmp;
+ unsigned long pfn = page_to_pfn(page);
+
+ spin_lock(&hash_lock);
+ hash_for_each_possible(pin_page_hash, tmp, hlist, pfn) {
+ if (tmp->pfn == pfn) {
+ spin_unlock(&hash_lock);
+ return tmp;
+ }
+ }
+ spin_unlock(&hash_lock);
+ return NULL;
+}
+
+/* Used in compaction.c */
+int migrate_pinned_page(struct page *page)
+{
+ int ret = 1;
+ struct pin_page_info *pinfo = NULL;
+
+ if (PinnedPage(page)) {
+ while ((pinfo = get_pin_page_info(page))) {
+ /* If one of owners failed, bail out */
+ if (pinfo->owner->migrate(page, pinfo->private))
+ break;
+ }
+
+ ret = 0;
+ }
+ return ret;
+}
--
1.7.9.5

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


bcrl at kvack

Aug 11, 2013, 8:16 PM

Post #3 of 5 (12 views)
Permalink
Re: [RFC PATCH v2 0/4] mm: reclaim zbud pages on migration and compaction [In reply to]

Hello Minchan,

On Mon, Aug 12, 2013 at 11:25:35AM +0900, Minchan Kim wrote:
> Hello,
>
> On Fri, Aug 09, 2013 at 12:22:16PM +0200, Krzysztof Kozlowski wrote:
> > Hi,
> >
> > Currently zbud pages are not movable and they cannot be allocated from CMA
> > region. These patches try to address the problem by:
>
> The zcache, zram and GUP pages for memory-hotplug and/or CMA are
> same situation.
>
> > 1. Adding a new form of reclaim of zbud pages.
> > 2. Reclaiming zbud pages during migration and compaction.
> > 3. Allocating zbud pages with __GFP_RECLAIMABLE flag.
>
> So I'd like to solve it with general approach.
>
> Each subsystem or GUP caller who want to pin pages long time should
> create own migration handler and register the page into pin-page
> control subsystem like this.
>
> driver/foo.c
>
> int foo_migrate(struct page *page, void *private);
>
> static struct pin_page_owner foo_migrate = {
> .migrate = foo_migrate;
> };
>
> int foo_allocate()
> {
> struct page *newpage = alloc_pages();
> set_pinned_page(newpage, &foo_migrate);
> }
>
> And in compaction.c or somewhere where want to move/reclaim the page,
> general VM can ask to owner if it founds it's pinned page.
>
> mm/compaction.c
>
> if (PagePinned(page)) {
> struct pin_page_info *info = get_page_pin_info(page);
> info->migrate(page);
>
> }
>
> Only hurdle for that is that we should introduce a new page flag and
> I believe if we all agree this approch, we can find a solution at last.
>
> What do you think?

I don't like this approach. There will be too many collisions in the
hash that's been implemented (read: I don't think you can get away with
a naive implementation for core infrastructure that has to suite all
users), you've got a global spin lock, and it doesn't take into account
NUMA issues. The address space migratepage method doesn't have those
issues (at least where it is usable as in aio's use-case).

If you're going to go down this path, you'll have to decide if *all* users
of pinned pages are going to have to subscribe to supporting the un-pinning
of pages, and that means taking a real hard look at how O_DIRECT pins pages.
Once you start thinking about that, you'll find that addressing the
performance concerns is going to be an essential part of any design work to
be done in this area.

-ben
--
"Thought is the essence of where you are now."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minchan at kernel

Aug 11, 2013, 8:49 PM

Post #4 of 5 (8 views)
Permalink
Re: [RFC PATCH v2 0/4] mm: reclaim zbud pages on migration and compaction [In reply to]

Hello Benjamin,

On Sun, Aug 11, 2013 at 11:16:47PM -0400, Benjamin LaHaise wrote:
> Hello Minchan,
>
> On Mon, Aug 12, 2013 at 11:25:35AM +0900, Minchan Kim wrote:
> > Hello,
> >
> > On Fri, Aug 09, 2013 at 12:22:16PM +0200, Krzysztof Kozlowski wrote:
> > > Hi,
> > >
> > > Currently zbud pages are not movable and they cannot be allocated from CMA
> > > region. These patches try to address the problem by:
> >
> > The zcache, zram and GUP pages for memory-hotplug and/or CMA are
> > same situation.
> >
> > > 1. Adding a new form of reclaim of zbud pages.
> > > 2. Reclaiming zbud pages during migration and compaction.
> > > 3. Allocating zbud pages with __GFP_RECLAIMABLE flag.
> >
> > So I'd like to solve it with general approach.
> >
> > Each subsystem or GUP caller who want to pin pages long time should
> > create own migration handler and register the page into pin-page
> > control subsystem like this.
> >
> > driver/foo.c
> >
> > int foo_migrate(struct page *page, void *private);
> >
> > static struct pin_page_owner foo_migrate = {
> > .migrate = foo_migrate;
> > };
> >
> > int foo_allocate()
> > {
> > struct page *newpage = alloc_pages();
> > set_pinned_page(newpage, &foo_migrate);
> > }
> >
> > And in compaction.c or somewhere where want to move/reclaim the page,
> > general VM can ask to owner if it founds it's pinned page.
> >
> > mm/compaction.c
> >
> > if (PagePinned(page)) {
> > struct pin_page_info *info = get_page_pin_info(page);
> > info->migrate(page);
> >
> > }
> >
> > Only hurdle for that is that we should introduce a new page flag and
> > I believe if we all agree this approch, we can find a solution at last.
> >
> > What do you think?
>
> I don't like this approach. There will be too many collisions in the
> hash that's been implemented (read: I don't think you can get away with

Yeb. That's why I'd like to change it with radix tree of pfn as
I mentioned as comment(just used hash for fast prototyping without big
considering).

> a naive implementation for core infrastructure that has to suite all
> users), you've got a global spin lock, and it doesn't take into account

I think batching-drain of pinned page would be sufficient for avoiding
global spinlock problem because we have been used it with page-allocator
which is one of most critical hotpath.

> NUMA issues. The address space migratepage method doesn't have those

NUMA issues? Could you elaborate it a bit?

> issues (at least where it is usable as in aio's use-case).
>
> If you're going to go down this path, you'll have to decide if *all* users
> of pinned pages are going to have to subscribe to supporting the un-pinning
> of pages, and that means taking a real hard look at how O_DIRECT pins pages.
> Once you start thinking about that, you'll find that addressing the
> performance concerns is going to be an essential part of any design work to
> be done in this area.

True. The patch I included just shows the cocnept so I didn't consider any
performance critical part but if we all agree this arpproch does make sense
and we can implement little overhead, I will step into next phase to enhance
performance.

Thanks for the input, Ben!

>
> -ben
> --
> "Thought is the essence of where you are now."
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo [at] kvack For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont [at] kvack"> email [at] kvack </a>

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dave.hansen at intel

Aug 12, 2013, 9:48 AM

Post #5 of 5 (3 views)
Permalink
Re: [RFC PATCH v2 0/4] mm: reclaim zbud pages on migration and compaction [In reply to]

On 08/11/2013 07:25 PM, Minchan Kim wrote:
> +int set_pinned_page(struct pin_page_owner *owner,
> + struct page *page, void *private)
> +{
> + struct pin_page_info *pinfo = kmalloc(sizeof(pinfo), GFP_KERNEL);
> +
> + INIT_HLIST_NODE(&pinfo->hlist);
> + pinfo->owner = owner;
> +
> + pinfo->pfn = page_to_pfn(page);
> + pinfo->private = private;
> +
> + spin_lock(&hash_lock);
> + hash_add(pin_page_hash, &pinfo->hlist, pinfo->pfn);
> + spin_unlock(&hash_lock);
> +
> + SetPinnedPage(page);
> + return 0;
> +};

I definitely agree that we're getting to the point where we need to look
at this more generically. We've got at least four use-cases that have a
need for deterministically relocating memory:

1. CMA (many sub use cases)
2. Memory hot-remove
3. Memory power management
4. Runtime hugetlb-GB page allocations

Whatever we do, it _should_ be good enough to largely let us replace
PG_slab with this new bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.