Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH v10 1/5] mm: introduce a common interface for balloon pages mobility

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


aquini at redhat

Sep 17, 2012, 9:38 AM

Post #1 of 7 (75 views)
Permalink
[PATCH v10 1/5] mm: introduce a common interface for balloon pages mobility

Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.

This patch introduces a common interface to help a balloon driver on
making its page set movable to compaction, and thus allowing the system
to better leverage the compation efforts on memory defragmentation.

Signed-off-by: Rafael Aquini <aquini [at] redhat>
---
include/linux/balloon_compaction.h | 147 +++++++++++++++++++++++++++++++++++
include/linux/pagemap.h | 18 +++++
mm/Kconfig | 15 ++++
mm/Makefile | 1 +
mm/balloon_compaction.c | 152 +++++++++++++++++++++++++++++++++++++
5 files changed, 333 insertions(+)
create mode 100644 include/linux/balloon_compaction.h
create mode 100644 mm/balloon_compaction.c

diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
new file mode 100644
index 0000000..1c27a93
--- /dev/null
+++ b/include/linux/balloon_compaction.h
@@ -0,0 +1,147 @@
+/*
+ * include/linux/balloon_compaction.h
+ *
+ * Common interface definitions for making balloon pages movable to compaction.
+ *
+ * Copyright (C) 2012, Red Hat, Inc. Rafael Aquini <aquini [at] redhat>
+ */
+#ifndef _LINUX_BALLOON_COMPACTION_H
+#define _LINUX_BALLOON_COMPACTION_H
+
+#include <linux/rcupdate.h>
+#include <linux/pagemap.h>
+#include <linux/gfp.h>
+#include <linux/err.h>
+
+/* return code to identify when a ballooned page has been migrated */
+#define BALLOON_MIGRATION_RETURN 0xba1100
+
+#ifdef CONFIG_BALLOON_COMPACTION
+#define count_balloon_event(e) count_vm_event(e)
+#define free_balloon_mapping(m) kfree(m)
+
+extern bool isolate_balloon_page(struct page *);
+extern void putback_balloon_page(struct page *);
+extern int migrate_balloon_page(struct page *newpage,
+ struct page *page, enum migrate_mode mode);
+extern struct address_space *alloc_balloon_mapping(void *balloon_device,
+ const struct address_space_operations *a_ops);
+
+static inline void assign_balloon_mapping(struct page *page,
+ struct address_space *mapping)
+{
+ page->mapping = mapping;
+ smp_wmb();
+}
+
+static inline void clear_balloon_mapping(struct page *page)
+{
+ page->mapping = NULL;
+ smp_wmb();
+}
+
+static inline gfp_t balloon_mapping_gfp_mask(void)
+{
+ return GFP_HIGHUSER_MOVABLE;
+}
+
+static inline bool __is_movable_balloon_page(struct page *page)
+{
+ struct address_space *mapping = ACCESS_ONCE(page->mapping);
+ smp_read_barrier_depends();
+ return mapping_balloon(mapping);
+}
+
+/*
+ * movable_balloon_page - test page->mapping->flags to identify balloon pages
+ * that can be moved by compaction/migration.
+ *
+ * This function is used at core compaction's page isolation scheme and so it's
+ * exposed to several system pages which may, or may not, be part of a memory
+ * balloon, and thus we cannot afford to hold a page locked to perform tests.
+ *
+ * Therefore, as we might return false positives in the case a balloon page
+ * is just released under us, the page->mapping->flags need to be retested
+ * with the proper page lock held, on the functions that will cope with the
+ * balloon page later.
+ */
+static inline bool movable_balloon_page(struct page *page)
+{
+ /*
+ * Before dereferencing and testing mapping->flags, lets make sure
+ * this is not a page that uses ->mapping in a different way
+ */
+ if (!PageSlab(page) && !PageSwapCache(page) && !PageAnon(page) &&
+ !page_mapped(page))
+ return __is_movable_balloon_page(page);
+
+ return false;
+}
+
+/*
+ * __page_balloon_device - get the balloon device that owns the given page.
+ *
+ * This shall only be used at driver callbacks under proper page lock,
+ * to get access to the balloon device which @page belongs.
+ */
+static inline void *__page_balloon_device(struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+ if (mapping)
+ mapping = mapping->assoc_mapping;
+
+ return mapping;
+}
+
+/*
+ * DEFINE_BALLOON_MAPPING_AOPS - declare and instantiate a callback descriptor
+ * to be used as balloon page->mapping->a_ops.
+ *
+ * @label : declaration identifier (var name)
+ * @isolatepg : callback symbol name for performing the page isolation step
+ * @migratepg : callback symbol name for performing the page migration step
+ * @putbackpg : callback symbol name for performing the page putback step
+ *
+ * address_space_operations utilized methods for ballooned pages:
+ * .migratepage - used to perform balloon's page migration (as is)
+ * .invalidatepage - used to isolate a page from balloon's page list
+ * .freepage - used to reinsert an isolated page to balloon's page list
+ */
+#define DEFINE_BALLOON_MAPPING_AOPS(label, isolatepg, migratepg, putbackpg) \
+ const struct address_space_operations (label) = { \
+ .migratepage = (migratepg), \
+ .invalidatepage = (isolatepg), \
+ .freepage = (putbackpg), \
+ }
+
+#else
+#define assign_balloon_mapping(p, m) do { } while (0)
+#define clear_balloon_mapping(p) do { } while (0)
+#define free_balloon_mapping(m) do { } while (0)
+#define count_balloon_event(e) do { } while (0)
+#define DEFINE_BALLOON_MAPPING_AOPS(label, isolatepg, migratepg, putbackpg) \
+ const struct {} (label) = {}
+
+static inline bool movable_balloon_page(struct page *page) { return false; }
+static inline bool isolate_balloon_page(struct page *page) { return false; }
+static inline void putback_balloon_page(struct page *page) { return; }
+
+static inline int migrate_balloon_page(struct page *newpage,
+ struct page *page, enum migrate_mode mode)
+{
+ return 0;
+}
+
+static inline gfp_t balloon_mapping_gfp_mask(void)
+{
+ return GFP_HIGHUSER;
+}
+
+static inline void *alloc_balloon_mapping(void *balloon_device,
+ const struct address_space_operations *a_ops)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
+#endif /* CONFIG_BALLOON_COMPACTION */
+
+#endif /* _LINUX_BALLOON_COMPACTION_H */
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e42c762..6df0664 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -24,6 +24,7 @@ enum mapping_flags {
AS_ENOSPC = __GFP_BITS_SHIFT + 1, /* ENOSPC on async write */
AS_MM_ALL_LOCKS = __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks() */
AS_UNEVICTABLE = __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK */
+ AS_BALLOON_MAP = __GFP_BITS_SHIFT + 4, /* balloon page special map */
};

static inline void mapping_set_error(struct address_space *mapping, int error)
@@ -53,6 +54,23 @@ static inline int mapping_unevictable(struct address_space *mapping)
return !!mapping;
}

+static inline void mapping_set_balloon(struct address_space *mapping)
+{
+ set_bit(AS_BALLOON_MAP, &mapping->flags);
+}
+
+static inline void mapping_clear_balloon(struct address_space *mapping)
+{
+ clear_bit(AS_BALLOON_MAP, &mapping->flags);
+}
+
+static inline int mapping_balloon(struct address_space *mapping)
+{
+ if (mapping)
+ return test_bit(AS_BALLOON_MAP, &mapping->flags);
+ return !!mapping;
+}
+
static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
{
return (__force gfp_t)mapping->flags & __GFP_BITS_MASK;
diff --git a/mm/Kconfig b/mm/Kconfig
index d5c8019..0bd783b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -188,6 +188,21 @@ config SPLIT_PTLOCK_CPUS
default "4"

#
+# support for memory balloon compaction
+config BALLOON_COMPACTION
+ bool "Allow for balloon memory compaction/migration"
+ select COMPACTION
+ depends on VIRTIO_BALLOON
+ help
+ Memory fragmentation introduced by ballooning might reduce
+ significantly the number of 2MB contiguous memory blocks that can be
+ used within a guest, thus imposing performance penalties associated
+ with the reduced number of transparent huge pages that could be used
+ by the guest workload. Allowing the compaction & migration for memory
+ pages enlisted as being part of memory balloon devices avoids the
+ scenario aforementioned and helps improving memory defragmentation.
+
+#
# support for memory compaction
config COMPACTION
bool "Allow for memory compaction"
diff --git a/mm/Makefile b/mm/Makefile
index 92753e2..23e54c5 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -57,3 +57,4 @@ obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
obj-$(CONFIG_CLEANCACHE) += cleancache.o
obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
+obj-$(CONFIG_BALLOON_COMPACTION) += balloon_compaction.o
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
new file mode 100644
index 0000000..74c09ab
--- /dev/null
+++ b/mm/balloon_compaction.c
@@ -0,0 +1,152 @@
+/*
+ * mm/balloon_compaction.c
+ *
+ * Common interface for making balloon pages movable to compaction.
+ *
+ * Copyright (C) 2012, Red Hat, Inc. Rafael Aquini <aquini [at] redhat>
+ */
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/export.h>
+#include <linux/balloon_compaction.h>
+
+#ifdef CONFIG_BALLOON_COMPACTION
+/*
+ * alloc_balloon_mapping - allocates a special ->mapping for ballooned pages.
+ * @balloon_device: pointer address that references the balloon device which
+ * owns pages bearing this ->mapping.
+ * @a_ops: balloon_mapping address_space_operations descriptor.
+ *
+ * Users must call it to properly allocate and initialize an instance of
+ * struct address_space which will be used as the special page->mapping for
+ * balloon devices enlisted page instances.
+ */
+struct address_space *alloc_balloon_mapping(void *balloon_device,
+ const struct address_space_operations *a_ops)
+{
+ struct address_space *mapping;
+
+ mapping = kmalloc(sizeof(*mapping), GFP_KERNEL);
+ if (!mapping)
+ return ERR_PTR(-ENOMEM);
+
+ /*
+ * Give a clean 'zeroed' status to all elements of this special
+ * balloon page->mapping struct address_space instance.
+ */
+ address_space_init_once(mapping);
+
+ /*
+ * Set mapping->flags appropriately, to allow balloon ->mapping
+ * identification, as well as give a proper hint to the balloon
+ * driver on what GFP allocation mask shall be used.
+ */
+ mapping_set_balloon(mapping);
+ mapping_set_gfp_mask(mapping, balloon_mapping_gfp_mask());
+
+ /* balloon's page->mapping->a_ops callback descriptor */
+ mapping->a_ops = a_ops;
+
+ /*
+ * balloon special page->mapping overloads ->assoc_mapping
+ * to held a reference back to the balloon device wich 'owns'
+ * a given page. This is the way we can cope with multiple
+ * balloon devices without losing reference of several
+ * ballooned pagesets.
+ */
+ mapping->assoc_mapping = balloon_device;
+
+ return mapping;
+}
+EXPORT_SYMBOL_GPL(alloc_balloon_mapping);
+
+static inline void __isolate_balloon_page(struct page *page)
+{
+ page->mapping->a_ops->invalidatepage(page, 0);
+}
+
+static inline void __putback_balloon_page(struct page *page)
+{
+ page->mapping->a_ops->freepage(page);
+}
+
+static inline int __migrate_balloon_page(struct address_space *mapping,
+ struct page *newpage, struct page *page, enum migrate_mode mode)
+{
+ return page->mapping->a_ops->migratepage(mapping, newpage, page, mode);
+}
+
+/* __isolate_lru_page() counterpart for a ballooned page */
+bool isolate_balloon_page(struct page *page)
+{
+ if (likely(get_page_unless_zero(page))) {
+ /*
+ * As balloon pages are not isolated from LRU lists, concurrent
+ * compaction threads can race against page migration functions
+ * move_to_new_page() & __unmap_and_move().
+ * In order to avoid having an already isolated balloon page
+ * being (wrongly) re-isolated while it is under migration,
+ * lets be sure we have the page lock before proceeding with
+ * the balloon page isolation steps.
+ */
+ if (likely(trylock_page(page))) {
+ /*
+ * A ballooned page, by default, has just one refcount.
+ * Prevent concurrent compaction threads from isolating
+ * an already isolated balloon page by refcount check.
+ */
+ if (__is_movable_balloon_page(page) &&
+ page_count(page) == 2) {
+ __isolate_balloon_page(page);
+ unlock_page(page);
+ return true;
+ }
+ unlock_page(page);
+ }
+ put_page(page);
+ }
+ return false;
+}
+
+/* putback_lru_page() counterpart for a ballooned page */
+void putback_balloon_page(struct page *page)
+{
+ /*
+ * 'lock_page()' stabilizes the page and prevents races against
+ * concurrent isolation threads attempting to re-isolate it.
+ */
+ lock_page(page);
+
+ if (__is_movable_balloon_page(page)) {
+ __putback_balloon_page(page);
+ put_page(page);
+ } else {
+ __WARN();
+ dump_page(page);
+ }
+ unlock_page(page);
+}
+
+/* move_to_new_page() counterpart for a ballooned page */
+int migrate_balloon_page(struct page *newpage,
+ struct page *page, enum migrate_mode mode)
+{
+ struct address_space *mapping;
+ int rc = -EAGAIN;
+
+ BUG_ON(!trylock_page(newpage));
+
+ if (WARN_ON(!__is_movable_balloon_page(page))) {
+ dump_page(page);
+ unlock_page(newpage);
+ return rc;
+ }
+
+ mapping = page->mapping;
+ if (mapping)
+ rc = __migrate_balloon_page(mapping, newpage, page, mode);
+
+ unlock_page(newpage);
+ return rc;
+}
+#endif /* CONFIG_BALLOON_COMPACTION */
--
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


akpm at linux-foundation

Sep 17, 2012, 3:15 PM

Post #2 of 7 (72 views)
Permalink
Re: [PATCH v10 1/5] mm: introduce a common interface for balloon pages mobility [In reply to]

On Mon, 17 Sep 2012 13:38:16 -0300
Rafael Aquini <aquini [at] redhat> wrote:

> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
>
> This patch introduces a common interface to help a balloon driver on
> making its page set movable to compaction, and thus allowing the system
> to better leverage the compation efforts on memory defragmentation.
>
>
> ...
>
> +#ifndef _LINUX_BALLOON_COMPACTION_H
> +#define _LINUX_BALLOON_COMPACTION_H
> +
> +#include <linux/rcupdate.h>
> +#include <linux/pagemap.h>
> +#include <linux/gfp.h>
> +#include <linux/err.h>
> +
> +/* return code to identify when a ballooned page has been migrated */
> +#define BALLOON_MIGRATION_RETURN 0xba1100

I didn't really spend enough time to work out why this was done this
way, but I know a hack when I see one!

We forgot to document the a_ops.migratepage() return value. Perhaps
it's time to work out what it should be.

> +#ifdef CONFIG_BALLOON_COMPACTION
> +#define count_balloon_event(e) count_vm_event(e)
> +#define free_balloon_mapping(m) kfree(m)

It would be better to write these in C please. That way we get
typechecking, even when CONFIG_BALLOON_COMPACTION=n.

> +extern bool isolate_balloon_page(struct page *);
> +extern void putback_balloon_page(struct page *);
> +extern int migrate_balloon_page(struct page *newpage,
> + struct page *page, enum migrate_mode mode);
> +extern struct address_space *alloc_balloon_mapping(void *balloon_device,
> + const struct address_space_operations *a_ops);

There's a useful convention that interface identifiers are prefixed by
their interface's name. IOW, everything in this file would start with
"balloon_". balloon_page_isolate, balloon_page_putback, etc. I think
we could follow that convention here?

> +static inline void assign_balloon_mapping(struct page *page,
> + struct address_space *mapping)
> +{
> + page->mapping = mapping;
> + smp_wmb();
> +}
> +
> +static inline void clear_balloon_mapping(struct page *page)
> +{
> + page->mapping = NULL;
> + smp_wmb();
> +}
> +
> +static inline gfp_t balloon_mapping_gfp_mask(void)
> +{
> + return GFP_HIGHUSER_MOVABLE;
> +}
> +
> +static inline bool __is_movable_balloon_page(struct page *page)
> +{
> + struct address_space *mapping = ACCESS_ONCE(page->mapping);
> + smp_read_barrier_depends();
> + return mapping_balloon(mapping);
> +}

hm. Are these barrier tricks copied from somewhere else, or home-made?

> +/*
> + * movable_balloon_page - test page->mapping->flags to identify balloon pages
> + * that can be moved by compaction/migration.
> + *
> + * This function is used at core compaction's page isolation scheme and so it's
> + * exposed to several system pages which may, or may not, be part of a memory
> + * balloon, and thus we cannot afford to hold a page locked to perform tests.

I don't understand this. What is a "system page"? If I knew that, I
migth perhaps understand why we cannot lock such a page.

> + * Therefore, as we might return false positives in the case a balloon page
> + * is just released under us, the page->mapping->flags need to be retested
> + * with the proper page lock held, on the functions that will cope with the
> + * balloon page later.
> + */
> +static inline bool movable_balloon_page(struct page *page)
> +{
> + /*
> + * Before dereferencing and testing mapping->flags, lets make sure
> + * this is not a page that uses ->mapping in a different way
> + */
> + if (!PageSlab(page) && !PageSwapCache(page) && !PageAnon(page) &&
> + !page_mapped(page))
> + return __is_movable_balloon_page(page);
> +
> + return false;
> +}
> +
> +/*
> + * __page_balloon_device - get the balloon device that owns the given page.
> + *
> + * This shall only be used at driver callbacks under proper page lock,
> + * to get access to the balloon device which @page belongs.
> + */
> +static inline void *__page_balloon_device(struct page *page)
> +{
> + struct address_space *mapping = page->mapping;
> + if (mapping)
> + mapping = mapping->assoc_mapping;
> +
> + return mapping;
> +}

So you've repurposed address_space.assoc_mapping in new and unexpected
ways.

I don't immediately see a problem with doing this, but we should do it
properly. Something like:

- rename address_space.assoc_mapping to private_data
- it has type void*
- document its ownership rules
- convert fs/buffer.c

all done as a standalone preparatory patch.

Also, your usage of ->private_data should minimise its use of void* -
use more specific types wherever possible. So this function should
return a "struct virtio_balloon *".

It is unobvious why this interface function is prefixed with __.

> +/*
> + * DEFINE_BALLOON_MAPPING_AOPS - declare and instantiate a callback descriptor
> + * to be used as balloon page->mapping->a_ops.
> + *
> + * @label : declaration identifier (var name)
> + * @isolatepg : callback symbol name for performing the page isolation step
> + * @migratepg : callback symbol name for performing the page migration step
> + * @putbackpg : callback symbol name for performing the page putback step
> + *
> + * address_space_operations utilized methods for ballooned pages:
> + * .migratepage - used to perform balloon's page migration (as is)
> + * .invalidatepage - used to isolate a page from balloon's page list
> + * .freepage - used to reinsert an isolated page to balloon's page list
> + */
> +#define DEFINE_BALLOON_MAPPING_AOPS(label, isolatepg, migratepg, putbackpg) \
> + const struct address_space_operations (label) = { \
> + .migratepage = (migratepg), \
> + .invalidatepage = (isolatepg), \
> + .freepage = (putbackpg), \
> + }

erp. Can we avoid doing this? afaict it would be pretty simple to
avoid instantiating virtio_balloon_aops at all if
CONFIG_BALLOON_COMPACTION=n?

> +#else
> +#define assign_balloon_mapping(p, m) do { } while (0)
> +#define clear_balloon_mapping(p) do { } while (0)
> +#define free_balloon_mapping(m) do { } while (0)
> +#define count_balloon_event(e) do { } while (0)

Written in C with proper types if possible, please.

> +#define DEFINE_BALLOON_MAPPING_AOPS(label, isolatepg, migratepg, putbackpg) \
> + const struct {} (label) = {}
> +
> +static inline bool movable_balloon_page(struct page *page) { return false; }
> +static inline bool isolate_balloon_page(struct page *page) { return false; }
> +static inline void putback_balloon_page(struct page *page) { return; }
> +
> +static inline int migrate_balloon_page(struct page *newpage,
> + struct page *page, enum migrate_mode mode)
> +{
> + return 0;
> +}
> +
>
> ...
>
> @@ -53,6 +54,23 @@ static inline int mapping_unevictable(struct address_space *mapping)
> return !!mapping;
> }
>
> +static inline void mapping_set_balloon(struct address_space *mapping)
> +{
> + set_bit(AS_BALLOON_MAP, &mapping->flags);
> +}
> +
> +static inline void mapping_clear_balloon(struct address_space *mapping)
> +{
> + clear_bit(AS_BALLOON_MAP, &mapping->flags);
> +}
> +
> +static inline int mapping_balloon(struct address_space *mapping)
> +{
> + if (mapping)
> + return test_bit(AS_BALLOON_MAP, &mapping->flags);
> + return !!mapping;

Why not "return 0"?

Or

return mapping && test_bit(AS_BALLOON_MAP, &mapping->flags);

> +}
> +
>
> ...
>
> +struct address_space *alloc_balloon_mapping(void *balloon_device,
> + const struct address_space_operations *a_ops)
> +{
> + struct address_space *mapping;
> +
> + mapping = kmalloc(sizeof(*mapping), GFP_KERNEL);
> + if (!mapping)
> + return ERR_PTR(-ENOMEM);
> +
> + /*
> + * Give a clean 'zeroed' status to all elements of this special
> + * balloon page->mapping struct address_space instance.
> + */
> + address_space_init_once(mapping);
> +
> + /*
> + * Set mapping->flags appropriately, to allow balloon ->mapping
> + * identification, as well as give a proper hint to the balloon
> + * driver on what GFP allocation mask shall be used.
> + */
> + mapping_set_balloon(mapping);
> + mapping_set_gfp_mask(mapping, balloon_mapping_gfp_mask());
> +
> + /* balloon's page->mapping->a_ops callback descriptor */
> + mapping->a_ops = a_ops;
> +
> + /*
> + * balloon special page->mapping overloads ->assoc_mapping
> + * to held a reference back to the balloon device wich 'owns'
> + * a given page. This is the way we can cope with multiple
> + * balloon devices without losing reference of several
> + * ballooned pagesets.

I don't really understand the final part of this comment. Can you
expand more fully on the problem which this code is solving?

> + */
> + mapping->assoc_mapping = balloon_device;
> +
> + return mapping;
> +}
> +EXPORT_SYMBOL_GPL(alloc_balloon_mapping);

balloon_mapping_alloc() :)

> +static inline void __isolate_balloon_page(struct page *page)
> +{
> + page->mapping->a_ops->invalidatepage(page, 0);
> +}
> +
>
> ...
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


aquini at redhat

Sep 18, 2012, 9:24 AM

Post #3 of 7 (74 views)
Permalink
Re: [PATCH v10 1/5] mm: introduce a common interface for balloon pages mobility [In reply to]

On Mon, Sep 17, 2012 at 03:15:43PM -0700, Andrew Morton wrote:
> > +/* return code to identify when a ballooned page has been migrated */
> > +#define BALLOON_MIGRATION_RETURN 0xba1100
>
> I didn't really spend enough time to work out why this was done this
> way, but I know a hack when I see one!
>
Yes, I'm afraid it's a hack, but, unfortunately, it's a necessary one (IMHO).

This 'distinct' return code is used to flag a sucessful balloon page migration
at the following unmap_and_move() snippet (patch 2).
If by any reason we fail to identify a sucessfull balloon page migration, we
will cause a page leak, as the old 'page' won't be properly released.
.....
rc = __unmap_and_move(page, newpage, force, offlining, mode);
+
+ if (unlikely(rc == BALLOON_MIGRATION_RETURN)) {
+ /*
+ * A ballooned page has been migrated already.
+ * Now, it's the time to remove the old page from the isolated
+ * pageset list and handle it back to Buddy, wrap-up counters
+ * and return.
+ */
......

By reaching that point in code, we cannot rely on testing page->mapping flags
anymore for both 'page' and 'newpage' because:
a) migration has already finished and 'page'->mapping is wiped out;
b) balloon might have started to deflate, and 'newpage' might be released
already;

If the return code approach is unnaceptable, we might defer the 'page'->mapping
wipe-out step to that point in code for the balloon page case.
That, however, tends to be a little bit heavier, IMHO, as it will require us to
acquire the page lock once more to proceed the mapping wipe out, thus
potentially introducing overhead by lock contention (specially when several
parallel compaction threads are scanning pages for isolation)


> We forgot to document the a_ops.migratepage() return value. Perhaps
> it's time to work out what it should be.
>
> > +#ifdef CONFIG_BALLOON_COMPACTION
> > +#define count_balloon_event(e) count_vm_event(e)
> > +#define free_balloon_mapping(m) kfree(m)
>
> It would be better to write these in C please. That way we get
> typechecking, even when CONFIG_BALLOON_COMPACTION=n.
>

Consider it done, sir.


> > +extern bool isolate_balloon_page(struct page *);
> > +extern void putback_balloon_page(struct page *);
> > +extern int migrate_balloon_page(struct page *newpage,
> > + struct page *page, enum migrate_mode mode);
> > +extern struct address_space *alloc_balloon_mapping(void *balloon_device,
> > + const struct address_space_operations *a_ops);
>
> There's a useful convention that interface identifiers are prefixed by
> their interface's name. IOW, everything in this file would start with
> "balloon_". balloon_page_isolate, balloon_page_putback, etc. I think
> we could follow that convention here?
>

Consider it done, sir.


> > +static inline void assign_balloon_mapping(struct page *page,
> > + struct address_space *mapping)
> > +{
> > + page->mapping = mapping;
> > + smp_wmb();
> > +}
> > +
> > +static inline void clear_balloon_mapping(struct page *page)
> > +{
> > + page->mapping = NULL;
> > + smp_wmb();
> > +}
> > +
> > +static inline gfp_t balloon_mapping_gfp_mask(void)
> > +{
> > + return GFP_HIGHUSER_MOVABLE;
> > +}
> > +
> > +static inline bool __is_movable_balloon_page(struct page *page)
> > +{
> > + struct address_space *mapping = ACCESS_ONCE(page->mapping);
> > + smp_read_barrier_depends();
> > + return mapping_balloon(mapping);
> > +}
>
> hm. Are these barrier tricks copied from somewhere else, or home-made?
>

They were introduced by a reviewer request to assure the proper ordering when
inserting or deleting pages to/from a balloon device, so a given page won't get
elected as being a balloon page before it gets inserted into the balloon's page
list, just as it will only be deleted from the balloon's page list after it is
decomissioned of its balloon page status (page->mapping wipe-out).

Despite the mentioned operations only take place under proper locking, I thought
it wouldn't hurt enforcing such order, thus I kept the barrier stuff. Btw,
considering the aforementioned usage case, I just realized the
assign_balloon_mapping() barrier is misplaced. I'll fix that and introduce
comments on those function's usage.


> > +/*
> > + * movable_balloon_page - test page->mapping->flags to identify balloon pages
> > + * that can be moved by compaction/migration.
> > + *
> > + * This function is used at core compaction's page isolation scheme and so it's
> > + * exposed to several system pages which may, or may not, be part of a memory
> > + * balloon, and thus we cannot afford to hold a page locked to perform tests.
>
> I don't understand this. What is a "system page"? If I knew that, I
> migth perhaps understand why we cannot lock such a page.
>

I've attempted to mean compaction threads scan through all memory pages in the
system to check whether a page can be isolated or not, thus we cannot held the
page locked to perform the mapping->flags test as it can cause undesired effects
on other subsystems where the weŕe about to test here page is on use.

I'll try to re-arrange the comment to make it clear.


> > + * Therefore, as we might return false positives in the case a balloon page
> > + * is just released under us, the page->mapping->flags need to be retested
> > + * with the proper page lock held, on the functions that will cope with the
> > + * balloon page later.
> > + */
> > +static inline bool movable_balloon_page(struct page *page)
> > +{
> > + /*
> > + * Before dereferencing and testing mapping->flags, lets make sure
> > + * this is not a page that uses ->mapping in a different way
> > + */
> > + if (!PageSlab(page) && !PageSwapCache(page) && !PageAnon(page) &&
> > + !page_mapped(page))
> > + return __is_movable_balloon_page(page);
> > +
> > + return false;
> > +}
> > +
> > +/*
> > + * __page_balloon_device - get the balloon device that owns the given page.
> > + *
> > + * This shall only be used at driver callbacks under proper page lock,
> > + * to get access to the balloon device which @page belongs.
> > + */
> > +static inline void *__page_balloon_device(struct page *page)
> > +{
> > + struct address_space *mapping = page->mapping;
> > + if (mapping)
> > + mapping = mapping->assoc_mapping;
> > +
> > + return mapping;
> > +}
>
> So you've repurposed address_space.assoc_mapping in new and unexpected
> ways.
>
> I don't immediately see a problem with doing this, but we should do it
> properly. Something like:
>
> - rename address_space.assoc_mapping to private_data
> - it has type void*
> - document its ownership rules
> - convert fs/buffer.c
>
> all done as a standalone preparatory patch.
>

I'll do it as you requested/suggested.


> Also, your usage of ->private_data should minimise its use of void* -
> use more specific types wherever possible. So this function should
> return a "struct virtio_balloon *".
>

I believe we can keep it returning the opaque type, as it will allow us to expand
its usage to other balloon drivers which might have different balloon descriptors
in the future. I didn't looked at all balloon drivers, but in a glance seems that
vmware's balloon is using a fairly similar scheme to virtio's and it could
leverage all this interfaces, as well.


> It is unobvious why this interface function is prefixed with __.
>
> > +/*
> > + * DEFINE_BALLOON_MAPPING_AOPS - declare and instantiate a callback descriptor
> > + * to be used as balloon page->mapping->a_ops.
> > + *
> > + * @label : declaration identifier (var name)
> > + * @isolatepg : callback symbol name for performing the page isolation step
> > + * @migratepg : callback symbol name for performing the page migration step
> > + * @putbackpg : callback symbol name for performing the page putback step
> > + *
> > + * address_space_operations utilized methods for ballooned pages:
> > + * .migratepage - used to perform balloon's page migration (as is)
> > + * .invalidatepage - used to isolate a page from balloon's page list
> > + * .freepage - used to reinsert an isolated page to balloon's page list
> > + */
> > +#define DEFINE_BALLOON_MAPPING_AOPS(label, isolatepg, migratepg, putbackpg) \
> > + const struct address_space_operations (label) = { \
> > + .migratepage = (migratepg), \
> > + .invalidatepage = (isolatepg), \
> > + .freepage = (putbackpg), \
> > + }
>
> erp. Can we avoid doing this? afaict it would be pretty simple to
> avoid instantiating virtio_balloon_aops at all if
> CONFIG_BALLOON_COMPACTION=n?
>

That was being instantiated at driver's level directly,
and driver folks have requested this change.

I'll look a way around of it, though. (Just to make sure we're on the same page
here: Are you against the preprocessor macro, or just the way it's being used?)


> > +#else
> > +#define assign_balloon_mapping(p, m) do { } while (0)
> > +#define clear_balloon_mapping(p) do { } while (0)
> > +#define free_balloon_mapping(m) do { } while (0)
> > +#define count_balloon_event(e) do { } while (0)
>
> Written in C with proper types if possible, please.
>

Consider it done, sir.


> > +#define DEFINE_BALLOON_MAPPING_AOPS(label, isolatepg, migratepg, putbackpg) \
> > + const struct {} (label) = {}
> > +
> > +static inline bool movable_balloon_page(struct page *page) { return false; }
> > +static inline bool isolate_balloon_page(struct page *page) { return false; }
> > +static inline void putback_balloon_page(struct page *page) { return; }
> > +
> > +static inline int migrate_balloon_page(struct page *newpage,
> > + struct page *page, enum migrate_mode mode)
> > +{
> > + return 0;
> > +}
> > +
> >
> > ...
> >
> > @@ -53,6 +54,23 @@ static inline int mapping_unevictable(struct address_space *mapping)
> > return !!mapping;
> > }
> >
> > +static inline void mapping_set_balloon(struct address_space *mapping)
> > +{
> > + set_bit(AS_BALLOON_MAP, &mapping->flags);
> > +}
> > +
> > +static inline void mapping_clear_balloon(struct address_space *mapping)
> > +{
> > + clear_bit(AS_BALLOON_MAP, &mapping->flags);
> > +}
> > +
> > +static inline int mapping_balloon(struct address_space *mapping)
> > +{
> > + if (mapping)
> > + return test_bit(AS_BALLOON_MAP, &mapping->flags);
> > + return !!mapping;
>
> Why not "return 0"?
>
> Or
>
> return mapping && test_bit(AS_BALLOON_MAP, &mapping->flags);

Consider it done, sir.


>
> > +}
> > +
> >
> > ...
> >
> > +struct address_space *alloc_balloon_mapping(void *balloon_device,
> > + const struct address_space_operations *a_ops)
> > +{
> > + struct address_space *mapping;
> > +
> > + mapping = kmalloc(sizeof(*mapping), GFP_KERNEL);
> > + if (!mapping)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + /*
> > + * Give a clean 'zeroed' status to all elements of this special
> > + * balloon page->mapping struct address_space instance.
> > + */
> > + address_space_init_once(mapping);
> > +
> > + /*
> > + * Set mapping->flags appropriately, to allow balloon ->mapping
> > + * identification, as well as give a proper hint to the balloon
> > + * driver on what GFP allocation mask shall be used.
> > + */
> > + mapping_set_balloon(mapping);
> > + mapping_set_gfp_mask(mapping, balloon_mapping_gfp_mask());
> > +
> > + /* balloon's page->mapping->a_ops callback descriptor */
> > + mapping->a_ops = a_ops;
> > +
> > + /*
> > + * balloon special page->mapping overloads ->assoc_mapping
> > + * to held a reference back to the balloon device wich 'owns'
> > + * a given page. This is the way we can cope with multiple
> > + * balloon devices without losing reference of several
> > + * ballooned pagesets.
>
> I don't really understand the final part of this comment. Can you
> expand more fully on the problem which this code is solving?
>

I was trying to state we can have several virtio_balloon devices instantiated
for a single guest (virtio folks have told me that's a quite easy scenario), and
the only way we can safely get the right balloon page lists to isolate pages
from, or put pages back in, is maintaining a pointer back to the device descriptor
at every balloon's page->mapping instance.

I'll try to reprhase the commentary, to make it clearer.


> > + */
> > + mapping->assoc_mapping = balloon_device;
> > +
> > + return mapping;
> > +}
> > +EXPORT_SYMBOL_GPL(alloc_balloon_mapping);
>
> balloon_mapping_alloc() :)

Consider it done, sir.


>
> > +static inline void __isolate_balloon_page(struct page *page)
> > +{
> > + page->mapping->a_ops->invalidatepage(page, 0);
> > +}
> > +
> >
> > ...
> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


akpm at linux-foundation

Sep 18, 2012, 3:09 PM

Post #4 of 7 (73 views)
Permalink
Re: [PATCH v10 1/5] mm: introduce a common interface for balloon pages mobility [In reply to]

On Tue, 18 Sep 2012 13:24:21 -0300
Rafael Aquini <aquini [at] redhat> wrote:

> On Mon, Sep 17, 2012 at 03:15:43PM -0700, Andrew Morton wrote:
> > > +/* return code to identify when a ballooned page has been migrated */
> > > +#define BALLOON_MIGRATION_RETURN 0xba1100
> >
> > I didn't really spend enough time to work out why this was done this
> > way, but I know a hack when I see one!
> >
> Yes, I'm afraid it's a hack, but, unfortunately, it's a necessary one (IMHO).
>
> This 'distinct' return code is used to flag a sucessful balloon page migration
> at the following unmap_and_move() snippet (patch 2).
> If by any reason we fail to identify a sucessfull balloon page migration, we
> will cause a page leak, as the old 'page' won't be properly released.
> .....
> rc = __unmap_and_move(page, newpage, force, offlining, mode);
> +
> + if (unlikely(rc == BALLOON_MIGRATION_RETURN)) {
> + /*
> + * A ballooned page has been migrated already.
> + * Now, it's the time to remove the old page from the isolated
> + * pageset list and handle it back to Buddy, wrap-up counters
> + * and return.
> + */
> ......
>
> By reaching that point in code, we cannot rely on testing page->mapping flags
> anymore for both 'page' and 'newpage' because:
> a) migration has already finished and 'page'->mapping is wiped out;
> b) balloon might have started to deflate, and 'newpage' might be released
> already;
>
> If the return code approach is unnaceptable, we might defer the 'page'->mapping
> wipe-out step to that point in code for the balloon page case.
> That, however, tends to be a little bit heavier, IMHO, as it will require us to
> acquire the page lock once more to proceed the mapping wipe out, thus
> potentially introducing overhead by lock contention (specially when several
> parallel compaction threads are scanning pages for isolation)

I think the return code approach _is_ acceptable, but the
implementation could be improved.

As it stands, a naive caller could be expecting either 0 (success) or a
negative errno. A large positive return value could trigger havoc. We
can defend against such programming mistakes with code commentary, but
a better approach would be to enumerate the return values. Along the
lines of

/*
* Return values from addresss_space_operations.migratepage(). Returns a
* negative errno on failure.
*/
#define MIGRATEPAGE_SUCCESS 0
#define MIGRATEPAGE_BALLOON_THINGY 1 /* nice comment goes here */

and convert all callers to explicitly check for MIGRATEPAGE_SUCCESS,
not literal zero. We should be particularly careful to look for
codesites which are unprepared for positive return values, such as

ret = migratepage();
if (ret < 0)
return ret;
...
return ret; /* success!! */



If we wanted to be really vigilant about this, we could do

#define MIGRATEPAGE_SUCCESS 1
#define MIGRATEPAGE_BALLOON_THINGY 2

so any naive code which tests for literal zero will nicely explode early
in testing.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peterz at infradead

Sep 24, 2012, 5:44 AM

Post #5 of 7 (66 views)
Permalink
Re: [PATCH v10 1/5] mm: introduce a common interface for balloon pages mobility [In reply to]

On Mon, 2012-09-17 at 13:38 -0300, Rafael Aquini wrote:
> +static inline void assign_balloon_mapping(struct page *page,
> + struct address_space
> *mapping)
> +{
> + page->mapping = mapping;
> + smp_wmb();
> +}
> +
> +static inline void clear_balloon_mapping(struct page *page)
> +{
> + page->mapping = NULL;
> + smp_wmb();
> +}

barriers without a comment describing the data race are a mortal sin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mst at redhat

Sep 24, 2012, 6:05 PM

Post #6 of 7 (66 views)
Permalink
Re: [PATCH v10 1/5] mm: introduce a common interface for balloon pages mobility [In reply to]

On Tue, Sep 18, 2012 at 01:24:21PM -0300, Rafael Aquini wrote:
> > > +static inline void assign_balloon_mapping(struct page *page,
> > > + struct address_space *mapping)
> > > +{
> > > + page->mapping = mapping;
> > > + smp_wmb();
> > > +}
> > > +
> > > +static inline void clear_balloon_mapping(struct page *page)
> > > +{
> > > + page->mapping = NULL;
> > > + smp_wmb();
> > > +}
> > > +
> > > +static inline gfp_t balloon_mapping_gfp_mask(void)
> > > +{
> > > + return GFP_HIGHUSER_MOVABLE;
> > > +}
> > > +
> > > +static inline bool __is_movable_balloon_page(struct page *page)
> > > +{
> > > + struct address_space *mapping = ACCESS_ONCE(page->mapping);
> > > + smp_read_barrier_depends();
> > > + return mapping_balloon(mapping);
> > > +}
> >
> > hm. Are these barrier tricks copied from somewhere else, or home-made?
> >
>
> They were introduced by a reviewer request to assure the proper ordering when
> inserting or deleting pages to/from a balloon device, so a given page won't get
> elected as being a balloon page before it gets inserted into the balloon's page
> list, just as it will only be deleted from the balloon's page list after it is
> decomissioned of its balloon page status (page->mapping wipe-out).
>
> Despite the mentioned operations only take place under proper locking, I thought
> it wouldn't hurt enforcing such order, thus I kept the barrier stuff. Btw,
> considering the aforementioned usage case, I just realized the
> assign_balloon_mapping() barrier is misplaced. I'll fix that and introduce
> comments on those function's usage.

If these are all under page lock these barriers just confuse things,
because they are almost never enough by themselves.
So in that case it would be better to drop them and document
usage as you are going to.

Even better would be lockdep check but unfortunately it
does not seem to be possible for page lock.

--
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


aquini at redhat

Sep 25, 2012, 7:00 AM

Post #7 of 7 (62 views)
Permalink
Re: [PATCH v10 1/5] mm: introduce a common interface for balloon pages mobility [In reply to]

On Tue, Sep 25, 2012 at 03:05:49AM +0200, Michael S. Tsirkin wrote:
> If these are all under page lock these barriers just confuse things,
> because they are almost never enough by themselves.
> So in that case it would be better to drop them and document
> usage as you are going to.
>

Would the following make more sense (with the proprer comments, as well) ?

---8<---
+static inline void balloon_page_set(struct page *page,
+ struct address_space *mapping,
+ struct list_head *head)
+{
+ list_add(&page->lru, head);
+ smp_wmb();
+ page->mapping = mapping;
+}
+
+static inline void balloon_page_del(struct page *page)
+{
+ page->mapping = NULL;
+ smp_wmb();
+ list_del(&page->lru);
+}
+
+static inline bool __is_movable_balloon_page(struct page *page)
+{
+ struct address_space *mapping = ACCESS_ONCE(page->mapping);
+ smp_read_barrier_depends();
+ return mapping_balloon(mapping);
+}
+
---8<---

There's still a case where we have to test page->mapping->flags and we cannot
afford to wait for, or grab, the page lock @ isolate_migratepages_range().
The barriers won't avoid leak_ballon() racing against isolate_migratepages_range(),
but they surely will make tests for page->mapping more consistent. And for those
cases where leak_balloon() races against
isolate_migratepages_range->isolate_balloon_page(), we solve the conflict of
interest through page refcounting and page lock. I'm preparing a more extensive
doc to include at Documentation/ to explain the interfaces and how we cope with
these mentioned races, as well.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.