Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

Re: [PATCH] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE\

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


konrad at darnok

Jun 9, 2012, 7:03 PM

Post #1 of 4 (98 views)
Permalink
Re: [PATCH] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE\

On Thu, Jun 07, 2012 at 11:00:33PM +0200, Andrea Arcangeli wrote:
> In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding
> the mmap_sem for reading, cmpxchg8b cannot be used to read pmd
> contents under Xen.
>
> So instead of dealing only with "consistent" pmdvals in
> pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
> simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
> where the low 32bit and high 32bit could be inconsistent (to avoid
> having to use cmpxchg8b).

<nods>
>
> The only guarantee we get from pmd_read_atomic is that if the low part
> of the pmd was found null, the high part will be null too (so the pmd
> will be considered unstable). And if the low part of the pmd is found
> "stable" later, then it means the whole pmd was read atomically
> (because after a pmd is stable, neither MADV_DONTNEED nor page faults
> can alter it anymore, and we read the high part after the low part).
>
> In the 32bit PAE x86 case, it is enough to read the low part of the
> pmdval atomically to declare the pmd as "stable" and that's true for
> THP and no THP, furthermore in the THP case we also have a barrier()
> that will prevent any inconsistent pmdvals to be cached by a later
> re-read of the *pmd.

Nice. Andrew, any chane you could test this patch on the affected
Xen hypervisors? Was it as easy to reproduce this on a RHEL5 (U1?)
hypervisor or is it really only on Linode and Amazon EC2?

>
> Signed-off-by: Andrea Arcangeli <aarcange [at] redhat>
> ---
> arch/x86/include/asm/pgtable-3level.h | 30 +++++++++++++++++-------------
> include/asm-generic/pgtable.h | 10 ++++++++++
> 2 files changed, 27 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
> index 43876f1..cb00ccc 100644
> --- a/arch/x86/include/asm/pgtable-3level.h
> +++ b/arch/x86/include/asm/pgtable-3level.h
> @@ -47,16 +47,26 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte)
> * they can run pmd_offset_map_lock or pmd_trans_huge or other pmd
> * operations.
> *
> - * Without THP if the mmap_sem is hold for reading, the
> - * pmd can only transition from null to not null while pmd_read_atomic runs.
> - * So there's no need of literally reading it atomically.
> + * Without THP if the mmap_sem is hold for reading, the pmd can only
> + * transition from null to not null while pmd_read_atomic runs. So
> + * we can always return atomic pmd values with this function.
> *
> * With THP if the mmap_sem is hold for reading, the pmd can become
> - * THP or null or point to a pte (and in turn become "stable") at any
> - * time under pmd_read_atomic, so it's mandatory to read it atomically
> - * with cmpxchg8b.
> + * trans_huge or none or point to a pte (and in turn become "stable")
> + * at any time under pmd_read_atomic. We could read it really
> + * atomically here with a atomic64_read for the THP enabled case (and
> + * it would be a whole lot simpler), but to avoid using cmpxchg8b we
> + * only return an atomic pmdval if the low part of the pmdval is later
> + * found stable (i.e. pointing to a pte). And we're returning a none
> + * pmdval if the low part of the pmd is none. In some cases the high
> + * and low part of the pmdval returned may not be consistent if THP is
> + * enabled (the low part may point to previously mapped hugepage,
> + * while the high part may point to a more recently mapped hugepage),
> + * but pmd_none_or_trans_huge_or_clear_bad() only needs the low part
> + * of the pmd to be read atomically to decide if the pmd is unstable
> + * or not, with the only exception of when the low part of the pmd is
> + * zero in which case we return a none pmd.
> */
> -#ifndef CONFIG_TRANSPARENT_HUGEPAGE
> static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
> {
> pmdval_t ret;
> @@ -74,12 +84,6 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
>
> return (pmd_t) { ret };
> }
> -#else /* CONFIG_TRANSPARENT_HUGEPAGE */
> -static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
> -{
> - return (pmd_t) { atomic64_read((atomic64_t *)pmdp) };
> -}
> -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
> {
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index ae39c4b..0ff87ec 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -484,6 +484,16 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd)
> /*
> * The barrier will stabilize the pmdval in a register or on
> * the stack so that it will stop changing under the code.
> + *
> + * When CONFIG_TRANSPARENT_HUGEPAGE=y on x86 32bit PAE,
> + * pmd_read_atomic is allowed to return a not atomic pmdval
> + * (for example pointing to an hugepage that has never been
> + * mapped in the pmd). The below checks will only care about
> + * the low part of the pmd with 32bit PAE x86 anyway, with the
> + * exception of pmd_none(). So the important thing is that if
> + * the low part of the pmd is found null, the high part will
> + * be also null or the pmd_none() check below would be
> + * confused.
> */
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> barrier();
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo [at] kvack For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont [at] kvack"> email [at] kvack </a>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


drjones at redhat

Jun 11, 2012, 3:34 AM

Post #2 of 4 (92 views)
Permalink
Re: [Xen-devel] [PATCH] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE\ [In reply to]

----- Original Message -----
> On Thu, Jun 07, 2012 at 11:00:33PM +0200, Andrea Arcangeli wrote:
> > In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while
> > holding
> > the mmap_sem for reading, cmpxchg8b cannot be used to read pmd
> > contents under Xen.
> >
> > So instead of dealing only with "consistent" pmdvals in
> > pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
> > simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with
> > pmdvals
> > where the low 32bit and high 32bit could be inconsistent (to avoid
> > having to use cmpxchg8b).
>
> <nods>
> >
> > The only guarantee we get from pmd_read_atomic is that if the low
> > part
> > of the pmd was found null, the high part will be null too (so the
> > pmd
> > will be considered unstable). And if the low part of the pmd is
> > found
> > "stable" later, then it means the whole pmd was read atomically
> > (because after a pmd is stable, neither MADV_DONTNEED nor page
> > faults
> > can alter it anymore, and we read the high part after the low
> > part).
> >
> > In the 32bit PAE x86 case, it is enough to read the low part of the
> > pmdval atomically to declare the pmd as "stable" and that's true
> > for
> > THP and no THP, furthermore in the THP case we also have a
> > barrier()
> > that will prevent any inconsistent pmdvals to be cached by a later
> > re-read of the *pmd.
>
> Nice. Andrew, any chane you could test this patch on the affected
> Xen hypervisors? Was it as easy to reproduce this on a RHEL5 (U1?)
> hypervisor or is it really only on Linode and Amazon EC2?
>

Originally, I was able to reproduce the issue easily with a RHEL5
host. Now, with this patch it's fixed.

Drew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


konrad.wilk at oracle

Jun 11, 2012, 12:27 PM

Post #3 of 4 (91 views)
Permalink
Re: [Xen-devel] [PATCH] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE\ [In reply to]

> > Nice. Andrew, any chane you could test this patch on the affected
> > Xen hypervisors? Was it as easy to reproduce this on a RHEL5 (U1?)
> > hypervisor or is it really only on Linode and Amazon EC2?
> >
>
> Originally, I was able to reproduce the issue easily with a RHEL5
> host. Now, with this patch it's fixed.

OK, so Tested-by: Andrew Jones..
and from my perspective it looks good - so Acked-by: Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>

Andrea, any chance you can respin this patch and send it to Linus for 3.5 please?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


aarcange at redhat

Jun 11, 2012, 12:41 PM

Post #4 of 4 (91 views)
Permalink
Re: [Xen-devel] [PATCH] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE\ [In reply to]

Hi,

On Mon, Jun 11, 2012 at 03:27:38PM -0400, Konrad Rzeszutek Wilk wrote:
> > > Nice. Andrew, any chane you could test this patch on the affected
> > > Xen hypervisors? Was it as easy to reproduce this on a RHEL5 (U1?)
> > > hypervisor or is it really only on Linode and Amazon EC2?
> > >
> >
> > Originally, I was able to reproduce the issue easily with a RHEL5
> > host. Now, with this patch it's fixed.
>
> OK, so Tested-by: Andrew Jones..
> and from my perspective it looks good - so Acked-by: Konrad Rzeszutek Wilk <konrad.wilk [at] oracle>

Thanks for testing and reviews.

> Andrea, any chance you can respin this patch and send it to Linus for 3.5 please?

Andrew merged it in -mm last Friday, so I would expect it to go
upstream soon through the -mm flow (I assume everyone has been
rightfully waiting a bit of time for testing and reviews to be sure).

Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.