Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[RFC][2/3] Account and control virtual address space allocations

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


balbir at linux

Mar 16, 2008, 10:30 AM

Post #1 of 17 (2675 views)
Permalink
[RFC][2/3] Account and control virtual address space allocations

This patch implements accounting and control of virtual address space.
Accounting is done when the virtual address space of any task/mm_struct
belonging to the cgroup is incremented or decremented. This patch
fails the expansion if the cgroup goes over its limit. A new function
mem_cgroup_update_as() is added to deal with the accounting of the virtual
address space usage of cgroups.

TODOs

1. IA64 has code in perfmon.c pfm_smpl_buffer_alloc(), which increments
the total_vm of the mm_struct. This code has not yet been brought into
virtual address space control
2. Only when CONFIG_MMU is enabled, is the virtual address space control
enabled. Should we do this for nommu cases as well? My suspicion is
that we don't have to.

Signed-off-by: Balbir Singh <balbir [at] linux>
---

arch/x86/kernel/ptrace.c | 10 +++++++++-
include/linux/memcontrol.h | 7 +++++++
init/Kconfig | 4 +++-
kernel/fork.c | 9 +++++++--
mm/memcontrol.c | 37 +++++++++++++++++++++++++++++++++++++
mm/memory.c | 5 +++++
mm/mmap.c | 22 ++++++++++++++++++++--
mm/mremap.c | 21 ++++++++++++++++++---
8 files changed, 106 insertions(+), 9 deletions(-)

diff -puN mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control mm/memcontrol.c
--- linux-2.6.25-rc5/mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/mm/memcontrol.c 2008-03-16 22:57:40.000000000 +0530
@@ -525,6 +525,32 @@ unsigned long mem_cgroup_isolate_pages(u
}

/*
+ * Check if the current cgroup exceeds its address space limit.
+ * Returns 0 on success and 1 on failure.
+ */
+int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
+{
+ int ret = 0;
+ struct mem_cgroup *mem;
+ if (mem_cgroup_subsys.disabled)
+ return ret;
+
+ rcu_read_lock();
+ mem = rcu_dereference(mm->mem_cgroup);
+ css_get(&mem->css);
+ rcu_read_unlock();
+
+ if (nr_pages > 0) {
+ if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
+ ret = 1;
+ } else
+ res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
+
+ css_put(&mem->css);
+ return ret;
+}
+
+/*
* Charge the memory controller for page usage.
* Return
* 0 if the charge was successful
@@ -1103,6 +1129,17 @@ static void mem_cgroup_move_task(struct
goto out;

css_get(&mem->css);
+ /*
+ * For address space accounting, the charges are migrated.
+ * We need to migrate it since all the future uncharge/charge will
+ * now happen to the new cgroup. For consistency, we need to migrate
+ * all charges, otherwise we could end up dropping charges from
+ * the new cgroup (even though they were incurred in the current
+ * group).
+ */
+ if (res_counter_charge(&mem->as_res, mm->total_vm))
+ goto out;
+ res_counter_uncharge(&old_mem->as_res, mm->total_vm);
rcu_assign_pointer(mm->mem_cgroup, mem);
css_put(&old_mem->css);

diff -puN include/linux/memcontrol.h~memory-controller-virtual-address-space-accounting-and-control include/linux/memcontrol.h
--- linux-2.6.25-rc5/include/linux/memcontrol.h~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/include/linux/memcontrol.h 2008-03-16 22:57:40.000000000 +0530
@@ -54,6 +54,7 @@ int task_in_mem_cgroup(struct task_struc
extern int mem_cgroup_prepare_migration(struct page *page);
extern void mem_cgroup_end_migration(struct page *page);
extern void mem_cgroup_page_migration(struct page *page, struct page *newpage);
+extern int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages);

/*
* For memory reclaim.
@@ -172,6 +173,12 @@ static inline long mem_cgroup_calc_recla
{
return 0;
}
+
+static inline int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
+{
+ return 0;
+}
+
#endif /* CONFIG_CGROUP_MEM_CONT */

#endif /* _LINUX_MEMCONTROL_H */
diff -puN mm/mmap.c~memory-controller-virtual-address-space-accounting-and-control mm/mmap.c
--- linux-2.6.25-rc5/mm/mmap.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/mm/mmap.c 2008-03-16 22:57:40.000000000 +0530
@@ -1117,6 +1117,9 @@ munmap_back:
}
}

+ if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
+ return -ENOMEM;
+
/*
* Can we just expand an old private anonymous mapping?
* The VM_SHARED test is necessary because shmem_zero_setup
@@ -1226,8 +1229,11 @@ unmap_and_free_vma:
free_vma:
kmem_cache_free(vm_area_cachep, vma);
unacct_error:
- if (charged)
+ if (charged) {
+ mem_cgroup_update_as(mm, -charged);
vm_unacct_memory(charged);
+ }
+unacct_as_error:
return error;
}

@@ -1555,6 +1561,9 @@ static int acct_stack_growth(struct vm_a
if (security_vm_enough_memory(grow))
return -ENOMEM;

+ if (mem_cgroup_update_as(mm, grow))
+ return -ENOMEM;
+
/* Ok, everything looks good - let it rip */
mm->total_vm += grow;
if (vma->vm_flags & VM_LOCKED)
@@ -2003,9 +2012,14 @@ unsigned long do_brk(unsigned long addr,
if (mm->map_count > sysctl_max_map_count)
return -ENOMEM;

- if (security_vm_enough_memory(len >> PAGE_SHIFT))
+ if (mem_cgroup_update_as(mm, (len >> PAGE_SHIFT)))
return -ENOMEM;

+ if (security_vm_enough_memory(len >> PAGE_SHIFT)) {
+ mem_cgroup_update_as(mm, -(len >> PAGE_SHIFT));
+ return -ENOMEM;
+ }
+
/* Can we just expand an old private anonymous mapping? */
if (vma_merge(mm, prev, addr, addr + len, flags,
NULL, NULL, pgoff, NULL))
@@ -2236,6 +2250,9 @@ int install_special_mapping(struct mm_st
if (unlikely(vma == NULL))
return -ENOMEM;

+ if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
+ return -ENOMEM;
+
vma->vm_mm = mm;
vma->vm_start = addr;
vma->vm_end = addr + len;
@@ -2248,6 +2265,7 @@ int install_special_mapping(struct mm_st

if (unlikely(insert_vm_struct(mm, vma))) {
kmem_cache_free(vm_area_cachep, vma);
+ mem_cgroup_update_as(mm, -(len >> PAGE_SHIFT));
return -ENOMEM;
}

diff -puN arch/x86/kernel/ptrace.c~memory-controller-virtual-address-space-accounting-and-control arch/x86/kernel/ptrace.c
--- linux-2.6.25-rc5/arch/x86/kernel/ptrace.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/arch/x86/kernel/ptrace.c 2008-03-16 22:57:40.000000000 +0530
@@ -20,6 +20,7 @@
#include <linux/audit.h>
#include <linux/seccomp.h>
#include <linux/signal.h>
+#include <linux/memcontrol.h>

#include <asm/uaccess.h>
#include <asm/pgtable.h>
@@ -787,6 +788,8 @@ static int ptrace_bts_realloc(struct tas
current->mm->total_vm -= old_size;
current->mm->locked_vm -= old_size;

+ mem_cgroup_update_as(current->mm, -old_size);
+
if (size == 0)
goto out;

@@ -816,10 +819,15 @@ static int ptrace_bts_realloc(struct tas
goto out;
}

+ if (mem_cgroup_update_as(current->mm, size))
+ goto out;
+
ret = ds_allocate((void **)&child->thread.ds_area_msr,
size << PAGE_SHIFT);
- if (ret < 0)
+ if (ret < 0) {
+ mem_cgroup_update_as(current->mm, -size);
goto out;
+ }

current->mm->total_vm += size;
current->mm->locked_vm += size;
diff -puN kernel/fork.c~memory-controller-virtual-address-space-accounting-and-control kernel/fork.c
--- linux-2.6.25-rc5/kernel/fork.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/kernel/fork.c 2008-03-16 22:57:40.000000000 +0530
@@ -53,6 +53,7 @@
#include <linux/tty.h>
#include <linux/proc_fs.h>
#include <linux/blkdev.h>
+#include <linux/memcontrol.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -237,6 +238,7 @@ static int dup_mmap(struct mm_struct *mm

for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
struct file *file;
+ unsigned int len = vma_pages(mpnt);

if (mpnt->vm_flags & VM_DONTCOPY) {
long pages = vma_pages(mpnt);
@@ -247,11 +249,12 @@ static int dup_mmap(struct mm_struct *mm
}
charge = 0;
if (mpnt->vm_flags & VM_ACCOUNT) {
- unsigned int len = (mpnt->vm_end - mpnt->vm_start) >> PAGE_SHIFT;
if (security_vm_enough_memory(len))
goto fail_nomem;
charge = len;
}
+ if (mem_cgroup_update_as(mm, len))
+ goto fail_nomem_as;
tmp = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
if (!tmp)
goto fail_nomem;
@@ -311,8 +314,10 @@ out:
fail_nomem_policy:
kmem_cache_free(vm_area_cachep, tmp);
fail_nomem:
- retval = -ENOMEM;
+ mem_cgroup_update_as(mm, -charge);
vm_unacct_memory(charge);
+fail_nomem_as:
+ retval = -ENOMEM;
goto out;
}

diff -puN mm/mremap.c~memory-controller-virtual-address-space-accounting-and-control mm/mremap.c
--- linux-2.6.25-rc5/mm/mremap.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/mm/mremap.c 2008-03-16 22:57:40.000000000 +0530
@@ -174,10 +174,15 @@ static unsigned long move_vma(struct vm_
if (mm->map_count >= sysctl_max_map_count - 3)
return -ENOMEM;

+ if (mem_cgroup_update_as(mm, new_len >> PAGE_SHIFT))
+ return -ENOMEM;
+
new_pgoff = vma->vm_pgoff + ((old_addr - vma->vm_start) >> PAGE_SHIFT);
new_vma = copy_vma(&vma, new_addr, new_len, new_pgoff);
- if (!new_vma)
+ if (!new_vma) {
+ mem_cgroup_update_as(mm, -(new_len >> PAGE_SHIFT));
return -ENOMEM;
+ }

moved_len = move_page_tables(vma, old_addr, new_vma, new_addr, old_len);
if (moved_len < old_len) {
@@ -187,6 +192,7 @@ static unsigned long move_vma(struct vm_
* and then proceed to unmap new area instead of old.
*/
move_page_tables(new_vma, new_addr, vma, old_addr, moved_len);
+ mem_cgroup_update_as(mm, -(new_len >> PAGE_SHIFT));
vma = new_vma;
old_len = new_len;
old_addr = new_addr;
@@ -347,10 +353,17 @@ unsigned long do_mremap(unsigned long ad
goto out;
}

+ if (mem_cgroup_update_as(mm, (new_len - old_len) >> PAGE_SHIFT)) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
if (vma->vm_flags & VM_ACCOUNT) {
charged = (new_len - old_len) >> PAGE_SHIFT;
- if (security_vm_enough_memory(charged))
+ if (security_vm_enough_memory(charged)) {
+ mem_cgroup_update_as(mm, -charged);
goto out_nc;
+ }
}

/* old_len exactly to the end of the area..
@@ -406,8 +419,10 @@ unsigned long do_mremap(unsigned long ad
ret = move_vma(vma, addr, old_len, new_len, new_addr);
}
out:
- if (ret & ~PAGE_MASK)
+ if (ret & ~PAGE_MASK) {
vm_unacct_memory(charged);
+ mem_cgroup_update_as(mm, -charged);
+ }
out_nc:
return ret;
}
diff -puN init/Kconfig~memory-controller-virtual-address-space-accounting-and-control init/Kconfig
--- linux-2.6.25-rc5/init/Kconfig~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/init/Kconfig 2008-03-16 22:57:40.000000000 +0530
@@ -369,7 +369,9 @@ config CGROUP_MEM_RES_CTLR
depends on CGROUPS && RESOURCE_COUNTERS
help
Provides a memory resource controller that manages both page cache and
- RSS memory.
+ RSS memory. It also provide accounting and control of address
+ space allocations (along the lines of RLIMIT_AS) for cgroups
+ when CONFIG_MMU is enabled.

Note that setting this option increases fixed memory overhead
associated with each page of memory in the system by 4/8 bytes
diff -puN mm/swapfile.c~memory-controller-virtual-address-space-accounting-and-control mm/swapfile.c
diff -puN mm/memory.c~memory-controller-virtual-address-space-accounting-and-control mm/memory.c
--- linux-2.6.25-rc5/mm/memory.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530
+++ linux-2.6.25-rc5-balbir/mm/memory.c 2008-03-16 22:57:40.000000000 +0530
@@ -838,6 +838,11 @@ unsigned long unmap_vmas(struct mmu_gath

if (vma->vm_flags & VM_ACCOUNT)
*nr_accounted += (end - start) >> PAGE_SHIFT;
+ /*
+ * Unaccount used virtual memory for cgroups
+ */
+ mem_cgroup_update_as(vma->vm_mm,
+ ((long)(start - end)) >> PAGE_SHIFT);

while (start != end) {
if (!tlb_start_valid) {
_

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


menage at google

Mar 16, 2008, 7:02 PM

Post #2 of 17 (2623 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

On Mon, Mar 17, 2008 at 1:30 AM, Balbir Singh <balbir [at] linux> wrote:
> /*
> + * Check if the current cgroup exceeds its address space limit.
> + * Returns 0 on success and 1 on failure.
> + */
> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
> +{
> + int ret = 0;
> + struct mem_cgroup *mem;
> + if (mem_cgroup_subsys.disabled)
> + return ret;
> +
> + rcu_read_lock();
> + mem = rcu_dereference(mm->mem_cgroup);
> + css_get(&mem->css);
> + rcu_read_unlock();
> +

How about if this function avoided charging the root cgroup? You'd
save 4 atomic operations on a global data structure on every
mmap/munmap when the virtual address limit cgroup wasn't in use, which
could be significant on a large system. And I don't see situations
where you really need to limit the address space of the root cgroup.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Mar 16, 2008, 7:57 PM

Post #3 of 17 (2623 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

Paul Menage wrote:
> On Mon, Mar 17, 2008 at 1:30 AM, Balbir Singh <balbir [at] linux> wrote:
>> /*
>> + * Check if the current cgroup exceeds its address space limit.
>> + * Returns 0 on success and 1 on failure.
>> + */
>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>> +{
>> + int ret = 0;
>> + struct mem_cgroup *mem;
>> + if (mem_cgroup_subsys.disabled)
>> + return ret;
>> +
>> + rcu_read_lock();
>> + mem = rcu_dereference(mm->mem_cgroup);
>> + css_get(&mem->css);
>> + rcu_read_unlock();
>> +
>
> How about if this function avoided charging the root cgroup? You'd
> save 4 atomic operations on a global data structure on every
> mmap/munmap when the virtual address limit cgroup wasn't in use, which
> could be significant on a large system. And I don't see situations
> where you really need to limit the address space of the root cgroup.

4 atomic operations is very tempting, but we want to account for root usage due
to the following reasons:

1. We want to be able to support hierarchial accounting and control
2. We want to track usage of the root cgroup and report it back to the user
3. We don't want to treat the root cgroup as a special case.



--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


menage at google

Mar 16, 2008, 8:03 PM

Post #4 of 17 (2622 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

On Mon, Mar 17, 2008 at 10:57 AM, Balbir Singh
<balbir [at] linux> wrote:
>
> 1. We want to be able to support hierarchial accounting and control

> 2. We want to track usage of the root cgroup and report it back to the user

What use cases do you have for that?

> 3. We don't want to treat the root cgroup as a special case.

Why? It is a special case, in that in a lot of machines there's only
going to be the root cgroup, and the subsystem won't be mounted. So in
those cases, paying any overhead is a cost without a benefit.

Alternatively, how about you skip tracking virtual address space
changes if the virtual address cgroup isn't mounted on any hierarchy?
When you mount it, you can do a pass across all mms and set the root
cgroup usage to their total.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


xemul at openvz

Mar 17, 2008, 4:36 AM

Post #5 of 17 (2619 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

[snip]

> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
> +{
> + int ret = 0;
> + struct mem_cgroup *mem;
> + if (mem_cgroup_subsys.disabled)
> + return ret;
> +
> + rcu_read_lock();
> + mem = rcu_dereference(mm->mem_cgroup);
> + css_get(&mem->css);
> + rcu_read_unlock();
> +
> + if (nr_pages > 0) {
> + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
> + ret = 1;
> + } else
> + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));

No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.

[snip]

> @@ -1117,6 +1117,9 @@ munmap_back:
> }
> }
>
> + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
> + return -ENOMEM;
> +

Why not use existintg cap_vm_enough_memory and co?

[snip]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Mar 17, 2008, 5:29 AM

Post #6 of 17 (2641 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

Pavel Emelyanov wrote:
> [snip]
>
>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>> +{
>> + int ret = 0;
>> + struct mem_cgroup *mem;
>> + if (mem_cgroup_subsys.disabled)
>> + return ret;
>> +
>> + rcu_read_lock();
>> + mem = rcu_dereference(mm->mem_cgroup);
>> + css_get(&mem->css);
>> + rcu_read_unlock();
>> +
>> + if (nr_pages > 0) {
>> + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
>> + ret = 1;
>> + } else
>> + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
>
> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.
>
> [snip]
>

Yes, sure :)

>> @@ -1117,6 +1117,9 @@ munmap_back:
>> }
>> }
>>
>> + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
>> + return -ENOMEM;
>> +
>
> Why not use existintg cap_vm_enough_memory and co?
>

I thought about it and almost used may_expand_vm(), but there is a slight catch
there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are
called after total_vm has been calculated. In our case we need to keep the
cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


xemul at openvz

Mar 17, 2008, 5:40 AM

Post #7 of 17 (2613 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

Balbir Singh wrote:
> Pavel Emelyanov wrote:
>> [snip]
>>
>>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>>> +{
>>> + int ret = 0;
>>> + struct mem_cgroup *mem;
>>> + if (mem_cgroup_subsys.disabled)
>>> + return ret;
>>> +
>>> + rcu_read_lock();
>>> + mem = rcu_dereference(mm->mem_cgroup);
>>> + css_get(&mem->css);
>>> + rcu_read_unlock();
>>> +
>>> + if (nr_pages > 0) {
>>> + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
>>> + ret = 1;
>>> + } else
>>> + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
>> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.
>>
>> [snip]
>>
>
> Yes, sure :)

Thanks :)

>>> @@ -1117,6 +1117,9 @@ munmap_back:
>>> }
>>> }
>>>
>>> + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
>>> + return -ENOMEM;
>>> +
>> Why not use existintg cap_vm_enough_memory and co?
>>
>
> I thought about it and almost used may_expand_vm(), but there is a slight catch
> there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are
> called after total_vm has been calculated. In our case we need to keep the
> cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as.

So? What prevents us from using these hooks? :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Mar 17, 2008, 5:51 AM

Post #8 of 17 (2625 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

Pavel Emelyanov wrote:
> Balbir Singh wrote:
>> Pavel Emelyanov wrote:
>>> [snip]
>>>
>>>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>>>> +{
>>>> + int ret = 0;
>>>> + struct mem_cgroup *mem;
>>>> + if (mem_cgroup_subsys.disabled)
>>>> + return ret;
>>>> +
>>>> + rcu_read_lock();
>>>> + mem = rcu_dereference(mm->mem_cgroup);
>>>> + css_get(&mem->css);
>>>> + rcu_read_unlock();
>>>> +
>>>> + if (nr_pages > 0) {
>>>> + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
>>>> + ret = 1;
>>>> + } else
>>>> + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
>>> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.
>>>
>>> [snip]
>>>
>> Yes, sure :)
>
> Thanks :)
>
>>>> @@ -1117,6 +1117,9 @@ munmap_back:
>>>> }
>>>> }
>>>>
>>>> + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
>>>> + return -ENOMEM;
>>>> +
>>> Why not use existintg cap_vm_enough_memory and co?
>>>
>> I thought about it and almost used may_expand_vm(), but there is a slight catch
>> there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are
>> called after total_vm has been calculated. In our case we need to keep the
>> cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as.
>
> So? What prevents us from using these hooks? :)

1. We need to account total_vm usage of the task anyway. So why have two places,
one for accounting and second for control?
2. These hooks are activated for conditionally invoked for vma's with VM_ACCOUNT
set.


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


xemul at openvz

Mar 17, 2008, 6:01 AM

Post #9 of 17 (2624 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

Balbir Singh wrote:
> Pavel Emelyanov wrote:
>> Balbir Singh wrote:
>>> Pavel Emelyanov wrote:
>>>> [snip]
>>>>
>>>>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>>>>> +{
>>>>> + int ret = 0;
>>>>> + struct mem_cgroup *mem;
>>>>> + if (mem_cgroup_subsys.disabled)
>>>>> + return ret;
>>>>> +
>>>>> + rcu_read_lock();
>>>>> + mem = rcu_dereference(mm->mem_cgroup);
>>>>> + css_get(&mem->css);
>>>>> + rcu_read_unlock();
>>>>> +
>>>>> + if (nr_pages > 0) {
>>>>> + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
>>>>> + ret = 1;
>>>>> + } else
>>>>> + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
>>>> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.
>>>>
>>>> [snip]
>>>>
>>> Yes, sure :)
>> Thanks :)
>>
>>>>> @@ -1117,6 +1117,9 @@ munmap_back:
>>>>> }
>>>>> }
>>>>>
>>>>> + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
>>>>> + return -ENOMEM;
>>>>> +
>>>> Why not use existintg cap_vm_enough_memory and co?
>>>>
>>> I thought about it and almost used may_expand_vm(), but there is a slight catch
>>> there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are
>>> called after total_vm has been calculated. In our case we need to keep the
>>> cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as.
>> So? What prevents us from using these hooks? :)
>
> 1. We need to account total_vm usage of the task anyway. So why have two places,
> one for accounting and second for control?

We still have two of them even placing hooks in each place manually.

Besides, putting the mem_cgroup_(un)charge_as() in these vm hooks will
1. save the number of places to patch
2. help keeping memcgroup consistent in case someone adds more places
that expand tasks vm (arches, drivers) - in case we have our hooks
celled from inside vm ones, we won't have to patch more.

> 2. These hooks are activated for conditionally invoked for vma's with VM_ACCOUNT
> set.

This is a good point against. But, wrt my previous comment, can we handle
this somehow?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Mar 17, 2008, 7:39 AM

Post #10 of 17 (2609 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

Pavel Emelyanov wrote:
> Balbir Singh wrote:
>> Pavel Emelyanov wrote:
>>> Balbir Singh wrote:
>>>> Pavel Emelyanov wrote:
>>>>> [snip]
>>>>>
>>>>>> +int mem_cgroup_update_as(struct mm_struct *mm, long nr_pages)
>>>>>> +{
>>>>>> + int ret = 0;
>>>>>> + struct mem_cgroup *mem;
>>>>>> + if (mem_cgroup_subsys.disabled)
>>>>>> + return ret;
>>>>>> +
>>>>>> + rcu_read_lock();
>>>>>> + mem = rcu_dereference(mm->mem_cgroup);
>>>>>> + css_get(&mem->css);
>>>>>> + rcu_read_unlock();
>>>>>> +
>>>>>> + if (nr_pages > 0) {
>>>>>> + if (res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)))
>>>>>> + ret = 1;
>>>>>> + } else
>>>>>> + res_counter_uncharge(&mem->as_res, (-nr_pages * PAGE_SIZE));
>>>>> No, please, no. Let's make two calls - mem_cgroup_charge_as and mem_cgroup_uncharge_as.
>>>>>
>>>>> [snip]
>>>>>
>>>> Yes, sure :)
>>> Thanks :)
>>>
>>>>>> @@ -1117,6 +1117,9 @@ munmap_back:
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> + if (mem_cgroup_update_as(mm, len >> PAGE_SHIFT))
>>>>>> + return -ENOMEM;
>>>>>> +
>>>>> Why not use existintg cap_vm_enough_memory and co?
>>>>>
>>>> I thought about it and almost used may_expand_vm(), but there is a slight catch
>>>> there. With cap_vm_enough_memory() or security_vm_enough_memory(), they are
>>>> called after total_vm has been calculated. In our case we need to keep the
>>>> cgroups equivalent of total_vm up to date, and we do this in mem_cgorup_update_as.
>>> So? What prevents us from using these hooks? :)
>> 1. We need to account total_vm usage of the task anyway. So why have two places,
>> one for accounting and second for control?
>
> We still have two of them even placing hooks in each place manually.
>
> Besides, putting the mem_cgroup_(un)charge_as() in these vm hooks will
> 1. save the number of places to patch
> 2. help keeping memcgroup consistent in case someone adds more places
> that expand tasks vm (arches, drivers) - in case we have our hooks
> celled from inside vm ones, we won't have to patch more.
>

I am not sure I understand your proposal. Without manually placing these hooks
how do we track

1. When the vm size has increased/decreased
2. In case due to some reason, the call following these hooks fail, how do we
undo it, without placing hooks?


>> 2. These hooks are activated for conditionally invoked for vma's with VM_ACCOUNT
>> set.
>
> This is a good point against. But, wrt my previous comment, can we handle
> this somehow?

Not sure I understand

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


haveblue at us

Mar 17, 2008, 9:53 AM

Post #11 of 17 (2618 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

On Sun, 2008-03-16 at 23:00 +0530, Balbir Singh wrote:
> @@ -787,6 +788,8 @@ static int ptrace_bts_realloc(struct tas
> current->mm->total_vm -= old_size;
> current->mm->locked_vm -= old_size;
>
> + mem_cgroup_update_as(current->mm, -old_size);
> +
> if (size == 0)
> goto out;

I think splattering these things all over is probably a bad idea.

If you're going to do this, I think you need a couple of phases.

1. update the vm_(un)acct_memory() functions to take an mm
2. start using them (or some other abstracted functions in place)
3. update the new functions for cgroups

It's a bit non-obvious why you do the mem_cgroup_update_as() calls in
the places that you do from context.

Having some other vm-abstracted functions will also keep you from
splattering mem_cgroup_update_as() across the tree. That's a pretty bad
name. :) ...update_mapped() or ...update_vm() might be a wee bit
better.

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


yamamoto at valinux

Mar 17, 2008, 4:35 PM

Post #12 of 17 (2615 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

> diff -puN mm/swapfile.c~memory-controller-virtual-address-space-accounting-and-control mm/swapfile.c
> diff -puN mm/memory.c~memory-controller-virtual-address-space-accounting-and-control mm/memory.c
> --- linux-2.6.25-rc5/mm/memory.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530
> +++ linux-2.6.25-rc5-balbir/mm/memory.c 2008-03-16 22:57:40.000000000 +0530
> @@ -838,6 +838,11 @@ unsigned long unmap_vmas(struct mmu_gath
>
> if (vma->vm_flags & VM_ACCOUNT)
> *nr_accounted += (end - start) >> PAGE_SHIFT;
> + /*
> + * Unaccount used virtual memory for cgroups
> + */
> + mem_cgroup_update_as(vma->vm_mm,
> + ((long)(start - end)) >> PAGE_SHIFT);
>
> while (start != end) {
> if (!tlb_start_valid) {

i think you can sum and uncharge it with a single call.

YAMAMOTO Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Mar 17, 2008, 6:10 PM

Post #13 of 17 (2617 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

YAMAMOTO Takashi wrote:
>> diff -puN mm/swapfile.c~memory-controller-virtual-address-space-accounting-and-control mm/swapfile.c
>> diff -puN mm/memory.c~memory-controller-virtual-address-space-accounting-and-control mm/memory.c
>> --- linux-2.6.25-rc5/mm/memory.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-16 22:57:40.000000000 +0530
>> +++ linux-2.6.25-rc5-balbir/mm/memory.c 2008-03-16 22:57:40.000000000 +0530
>> @@ -838,6 +838,11 @@ unsigned long unmap_vmas(struct mmu_gath
>>
>> if (vma->vm_flags & VM_ACCOUNT)
>> *nr_accounted += (end - start) >> PAGE_SHIFT;
>> + /*
>> + * Unaccount used virtual memory for cgroups
>> + */
>> + mem_cgroup_update_as(vma->vm_mm,
>> + ((long)(start - end)) >> PAGE_SHIFT);
>>
>> while (start != end) {
>> if (!tlb_start_valid) {
>
> i think you can sum and uncharge it with a single call.
>

Like nr_accounted? I'll have to duplicate nr_accounted since that depends
conditionally on VM_ACCOUNT.

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Mar 17, 2008, 6:14 PM

Post #14 of 17 (2612 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

Dave Hansen wrote:
> On Sun, 2008-03-16 at 23:00 +0530, Balbir Singh wrote:
>> @@ -787,6 +788,8 @@ static int ptrace_bts_realloc(struct tas
>> current->mm->total_vm -= old_size;
>> current->mm->locked_vm -= old_size;
>>
>> + mem_cgroup_update_as(current->mm, -old_size);
>> +
>> if (size == 0)
>> goto out;
>
> I think splattering these things all over is probably a bad idea.
>

I agree and I tried to avoid the splattering

> If you're going to do this, I think you need a couple of phases.
>
> 1. update the vm_(un)acct_memory() functions to take an mm

There are other problems

1. vm_(un)acct_memory is conditionally dependent on VM_ACCOUNT. Look at
shmem_(un)acct_size for example
2. These routines are not called from all contexts that we care about (look at
insert_special_mapping())

> 2. start using them (or some other abstracted functions in place)
> 3. update the new functions for cgroups
>
> It's a bit non-obvious why you do the mem_cgroup_update_as() calls in
> the places that you do from context.
>
> Having some other vm-abstracted functions will also keep you from
> splattering mem_cgroup_update_as() across the tree. That's a pretty bad
> name. :) ...update_mapped() or ...update_vm() might be a wee bit
> better.
>

I am going to split mem_cgroup_update_as() to two routines with a better name. I
agree with you in principle about splattering, but please see my comments above

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


haveblue at us

Mar 18, 2008, 10:11 AM

Post #15 of 17 (2611 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

On Tue, 2008-03-18 at 06:44 +0530, Balbir Singh wrote:
> > If you're going to do this, I think you need a couple of phases.
> >
> > 1. update the vm_(un)acct_memory() functions to take an mm
>
> There are other problems
>
> 1. vm_(un)acct_memory is conditionally dependent on VM_ACCOUNT. Look at
> shmem_(un)acct_size for example

Yeah, but if VM_ACCOUNT isn't set, do you really want the controller
accounting for them? It's there for a reason. :)

The shmem_acct_size() helpers look good. I wonder if we should be using
that kind of things more generically.

> 2. These routines are not called from all contexts that we care about (look at
> insert_special_mapping())

Could you explain why "we" care about it and why it isn't accounted for
now?

-- Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Mar 18, 2008, 10:58 AM

Post #16 of 17 (2631 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Mar 18, 2008, 10:58 AM

Post #17 of 17 (2631 views)
Permalink
Re: [RFC][2/3] Account and control virtual address space allocations [In reply to]

Dave Hansen wrote:
> On Tue, 2008-03-18 at 06:44 +0530, Balbir Singh wrote:
>>> If you're going to do this, I think you need a couple of phases.
>>>
>>> 1. update the vm_(un)acct_memory() functions to take an mm
>> There are other problems
>>
>> 1. vm_(un)acct_memory is conditionally dependent on VM_ACCOUNT. Look at
>> shmem_(un)acct_size for example
>
> Yeah, but if VM_ACCOUNT isn't set, do you really want the controller
> accounting for them? It's there for a reason. :)
>

We are trying to account for virtual memory usage. Please see
http://lwn.net/Articles/5016/ to see what VM_ACCOUNT does or
Documentation/vm/overcommit-accounting. We want to account and control virtual
memory usage and not necessarily implement overcommit accounting

> The shmem_acct_size() helpers look good. I wonder if we should be using
> that kind of things more generically.
>

Yes, it is well written. I wish there were more such abstractions, but it does
not help us.

>> 2. These routines are not called from all contexts that we care about (look at
>> insert_special_mapping())
>
> Could you explain why "we" care about it and why it isn't accounted for
> now?

It is accounted for in total_vm and that's why we care about :)

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.