Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH 3/5] x86/pvclock: add vsyscall implementation

 

 

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


jeremy at goop

Oct 27, 2009, 11:20 AM

Post #26 of 50 (654 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 10/27/09 10:29, Dan Magenheimer wrote:
> Is there any way for an application to conclusively determine
> programmatically if the "fast vsyscall" pvclock is functional
> vs the much much slower gettimeofday/clock_gettime equivalents?
>
> If not, might it be possible to implement some (sysfs?)
> way to determine this, that would also be backwards compatible
> to existing OS's that don't have pvclock+vsyscall supported?
>

It would probably be simplest and most portable for the app to just
measure the performance of gettimeofday and see if it meets its needs.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Oct 27, 2009, 10:52 PM

Post #27 of 50 (649 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 10/27/2009 08:20 PM, Jeremy Fitzhardinge wrote:
> On 10/27/09 10:29, Dan Magenheimer wrote:
>
>> Is there any way for an application to conclusively determine
>> programmatically if the "fast vsyscall" pvclock is functional
>> vs the much much slower gettimeofday/clock_gettime equivalents?
>>
>> If not, might it be possible to implement some (sysfs?)
>> way to determine this, that would also be backwards compatible
>> to existing OS's that don't have pvclock+vsyscall supported?
>>
>>
> It would probably be simplest and most portable for the app to just
> measure the performance of gettimeofday and see if it meets its needs.
>

How can you reliably measure performance in a virtualized environment?

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


glommer at redhat

Oct 28, 2009, 2:29 AM

Post #28 of 50 (651 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On Wed, Oct 28, 2009 at 07:52:04AM +0200, Avi Kivity wrote:
> On 10/27/2009 08:20 PM, Jeremy Fitzhardinge wrote:
>> On 10/27/09 10:29, Dan Magenheimer wrote:
>>
>>> Is there any way for an application to conclusively determine
>>> programmatically if the "fast vsyscall" pvclock is functional
>>> vs the much much slower gettimeofday/clock_gettime equivalents?
>>>
>>> If not, might it be possible to implement some (sysfs?)
>>> way to determine this, that would also be backwards compatible
>>> to existing OS's that don't have pvclock+vsyscall supported?
>>>
>>>
>> It would probably be simplest and most portable for the app to just
>> measure the performance of gettimeofday and see if it meets its needs.
>>
>
> How can you reliably measure performance in a virtualized environment?
If we loop gettimeofday(), I would expect the vsyscall-based version not to show
up in strace, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Oct 28, 2009, 2:34 AM

Post #29 of 50 (652 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 10/28/2009 11:29 AM, Glauber Costa wrote:
>> How can you reliably measure performance in a virtualized environment?
>>
> If we loop gettimeofday(), I would expect the vsyscall-based version not to show
> up in strace, right?
>

Much better to have an API for this. Life is hacky enough already.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeremy at goop

Oct 28, 2009, 10:47 AM

Post #30 of 50 (651 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 10/28/09 02:34, Avi Kivity wrote:
> On 10/28/2009 11:29 AM, Glauber Costa wrote:
>>> How can you reliably measure performance in a virtualized environment?
>>>
>> If we loop gettimeofday(), I would expect the vsyscall-based version
>> not to show
>> up in strace, right?
>>
>
> Much better to have an API for this. Life is hacky enough already.

My point is that if an app cares about property X then it should just
measure property X. The fact that gettimeofday is a vsyscall is just an
implementation detail that apps don't really care about. What they care
about is whether gettimeofday is fast or not.

If the environment has such unstable timing that the effect can't be
measured, then it is moot whether its a vsyscall or not (but in that
case its almost certainly better to use the standard API rather than
trying to roll your own timesource with rdtsc).

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Oct 29, 2009, 5:13 AM

Post #31 of 50 (647 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 10/28/2009 07:47 PM, Jeremy Fitzhardinge wrote:
>> Much better to have an API for this. Life is hacky enough already.
>>
> My point is that if an app cares about property X then it should just
> measure property X. The fact that gettimeofday is a vsyscall is just an
> implementation detail that apps don't really care about. What they care
> about is whether gettimeofday is fast or not.
>

But we can not make a reliable measurement.

> If the environment has such unstable timing that the effect can't be
> measured, then it is moot whether its a vsyscall or not (but in that
> case its almost certainly better to use the standard API rather than
> trying to roll your own timesource with rdtsc).
>

If you're interested in gettimeofday() for a global monotonic counter
you can fall back to atomic_fetch_and_add() which will be faster than a
syscall even on large systems. Maybe we should provide a vsyscall for
global monotonic counters and implement it using a atomics or tsc
instead of these hacks (I'm assuming here that the gettimeofday() calls
are used to implement an atomic counter - are they?)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


chris.mason at oracle

Oct 29, 2009, 6:03 AM

Post #32 of 50 (646 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On Thu, Oct 29, 2009 at 02:13:50PM +0200, Avi Kivity wrote:
> On 10/28/2009 07:47 PM, Jeremy Fitzhardinge wrote:
> >>Much better to have an API for this. Life is hacky enough already.
> >My point is that if an app cares about property X then it should just
> >measure property X. The fact that gettimeofday is a vsyscall is just an
> >implementation detail that apps don't really care about. What they care
> >about is whether gettimeofday is fast or not.
>
> But we can not make a reliable measurement.

I can't imagine how we'd decide what fast is? Please don't make the
applications guess.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dan.magenheimer at oracle

Oct 29, 2009, 7:46 AM

Post #33 of 50 (646 views)
Permalink
RE: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

> From: Avi Kivity [mailto:avi [at] redhat]
>
> On 10/28/2009 07:47 PM, Jeremy Fitzhardinge wrote:
> >> Much better to have an API for this. Life is hacky enough already.
> >>
> > My point is that if an app cares about property X then it
> should just
> > measure property X. The fact that gettimeofday is a
> vsyscall is just an
> > implementation detail that apps don't really care about.
> What they care
> > about is whether gettimeofday is fast or not.
> >
>
> But we can not make a reliable measurement.
>
> > If the environment has such unstable timing that the effect can't be
> > measured, then it is moot whether its a vsyscall or not (but in that
> > case its almost certainly better to use the standard API rather than
> > trying to roll your own timesource with rdtsc).
> >
>
> If you're interested in gettimeofday() for a global monotonic counter
> you can fall back to atomic_fetch_and_add() which will be
> faster than a
> syscall even on large systems. Maybe we should provide a
> vsyscall for
> global monotonic counters and implement it using a atomics or tsc
> instead of these hacks (I'm assuming here that the
> gettimeofday() calls
> are used to implement an atomic counter - are they?)

No, the apps I'm familiar with (a DB and a JVM) need a timestamp
not a monotonic counter. The timestamps must be relatively
accurate (e.g. we've been talking about gettimeofday generically,
but these apps would use clock_gettime for nsec resolution),
monotonically increasing, and work properly across a VM
migration. The timestamps are taken up to a 100K/sec or
more so the apps need to ensure they are using the fastest
mechanism available that meets those requirements.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Oct 29, 2009, 8:07 AM

Post #34 of 50 (643 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 10/29/2009 04:46 PM, Dan Magenheimer wrote:
> No, the apps I'm familiar with (a DB and a JVM) need a timestamp
> not a monotonic counter. The timestamps must be relatively
> accurate (e.g. we've been talking about gettimeofday generically,
> but these apps would use clock_gettime for nsec resolution),
> monotonically increasing, and work properly across a VM
> migration. The timestamps are taken up to a 100K/sec or
> more so the apps need to ensure they are using the fastest
> mechanism available that meets those requirements.
>

Out of interest, do you know (and can you relate) why those apps need
100k/sec monotonically increasing timestamps?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dan.magenheimer at oracle

Oct 29, 2009, 8:55 AM

Post #35 of 50 (642 views)
Permalink
RE: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

> From: Avi Kivity [mailto:avi [at] redhat]
> Sent: Thursday, October 29, 2009 9:07 AM
> To: Dan Magenheimer
> Cc: Jeremy Fitzhardinge; Glauber Costa; Jeremy Fitzhardinge; Kurt
> Hackel; the arch/x86 maintainers; Linux Kernel Mailing List;
> Glauber de
> Oliveira Costa; Xen-devel; Keir Fraser; Zach Brown; Chris Mason; Ingo
> Molnar
> Subject: Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall
> implementation
>
>
> On 10/29/2009 04:46 PM, Dan Magenheimer wrote:
> > No, the apps I'm familiar with (a DB and a JVM) need a timestamp
> > not a monotonic counter. The timestamps must be relatively
> > accurate (e.g. we've been talking about gettimeofday generically,
> > but these apps would use clock_gettime for nsec resolution),
> > monotonically increasing, and work properly across a VM
> > migration. The timestamps are taken up to a 100K/sec or
> > more so the apps need to ensure they are using the fastest
> > mechanism available that meets those requirements.
>
> Out of interest, do you know (and can you relate) why those apps need
> 100k/sec monotonically increasing timestamps?

I don't have any public data available for this DB usage, but basically
assume it is measuring transactions at a very high throughput, some
of which are to a memory-resident portion of the DB. Anecdotally,
I'm told the difference between non-vsyscall gettimeofday
and native rdtsc (on a machine with Invariant TSC support) can
affect overall DB performance by as much as 10-20%.

I did find the following public link for the JVM:

http://download.oracle.com/docs/cd/E13188_01/jrockit/tools/intro/jmc3.html

Search for "flight recorder". This feature is intended to
be enabled all the time, but with non-vsyscall gettimeofday
the performance impact is unacceptably high, so they are using
rdtscp instead (on those machines where it is available). With
rdtscp, the performance impact is not measureable.

Though the processor/server vendors have finally fixed the
"unsynced TSC" problem on recent x86 platforms, thus allowing
enterprise software to obtain timestamps at rdtsc performance,
the problem comes back all over again with virtualization
because of migration. Jeremy's vsyscall+pvclock is a great
solution if the app can ensure that it is present; if not,
the apps will instead continue to use rdtsc as even emulated
rdtsc is 2-3x faster than non-vsyscall gettimeofday.

Does that help?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dan.magenheimer at oracle

Oct 29, 2009, 9:15 AM

Post #36 of 50 (641 views)
Permalink
RE: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On a related note, though some topic drift, many of
the problems that occur in virtualization due to migration
could be better addressed if Linux had an architected
interface to allow it to be signaled if a migration
occurred, and if Linux could signal applications of
the same. I don't have any cycles (pun intended) to
think about this right now, but if anyone else starts
looking at it, I'd love to be cc'ed.

Thanks,
Dan

> -----Original Message-----
> From: Dan Magenheimer
> Sent: Thursday, October 29, 2009 9:56 AM
> To: Avi Kivity
> Cc: Jeremy Fitzhardinge; Jeremy Fitzhardinge; Kurt Hackel; Glauber
> Costa; the arch/x86 maintainers; Linux Kernel Mailing List; Glauber de
> Oliveira Costa; Xen-devel; Keir Fraser; Zach Brown; Ingo Molnar; Chris
> Mason
> Subject: RE: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall
> implementation
>
>
> > From: Avi Kivity [mailto:avi [at] redhat]
> > Sent: Thursday, October 29, 2009 9:07 AM
> > To: Dan Magenheimer
> > Cc: Jeremy Fitzhardinge; Glauber Costa; Jeremy Fitzhardinge; Kurt
> > Hackel; the arch/x86 maintainers; Linux Kernel Mailing List;
> > Glauber de
> > Oliveira Costa; Xen-devel; Keir Fraser; Zach Brown; Chris
> Mason; Ingo
> > Molnar
> > Subject: Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall
> > implementation
> >
> >
> > On 10/29/2009 04:46 PM, Dan Magenheimer wrote:
> > > No, the apps I'm familiar with (a DB and a JVM) need a timestamp
> > > not a monotonic counter. The timestamps must be relatively
> > > accurate (e.g. we've been talking about gettimeofday generically,
> > > but these apps would use clock_gettime for nsec resolution),
> > > monotonically increasing, and work properly across a VM
> > > migration. The timestamps are taken up to a 100K/sec or
> > > more so the apps need to ensure they are using the fastest
> > > mechanism available that meets those requirements.
> >
> > Out of interest, do you know (and can you relate) why those
> apps need
> > 100k/sec monotonically increasing timestamps?
>
> I don't have any public data available for this DB usage, but
> basically
> assume it is measuring transactions at a very high throughput, some
> of which are to a memory-resident portion of the DB. Anecdotally,
> I'm told the difference between non-vsyscall gettimeofday
> and native rdtsc (on a machine with Invariant TSC support) can
> affect overall DB performance by as much as 10-20%.
>
> I did find the following public link for the JVM:
>
> http://download.oracle.com/docs/cd/E13188_01/jrockit/tools/int
ro/jmc3.html

Search for "flight recorder". This feature is intended to
be enabled all the time, but with non-vsyscall gettimeofday
the performance impact is unacceptably high, so they are using
rdtscp instead (on those machines where it is available). With
rdtscp, the performance impact is not measureable.

Though the processor/server vendors have finally fixed the
"unsynced TSC" problem on recent x86 platforms, thus allowing
enterprise software to obtain timestamps at rdtsc performance,
the problem comes back all over again with virtualization
because of migration. Jeremy's vsyscall+pvclock is a great
solution if the app can ensure that it is present; if not,
the apps will instead continue to use rdtsc as even emulated
rdtsc is 2-3x faster than non-vsyscall gettimeofday.

Does that help?

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Nov 1, 2009, 1:28 AM

Post #37 of 50 (621 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 10/29/2009 06:15 PM, Dan Magenheimer wrote:
> On a related note, though some topic drift, many of
> the problems that occur in virtualization due to migration
> could be better addressed if Linux had an architected
> interface to allow it to be signaled if a migration
> occurred, and if Linux could signal applications of
> the same. I don't have any cycles (pun intended) to
> think about this right now, but if anyone else starts
> looking at it, I'd love to be cc'ed.
>

IMO that's not a good direction. The hypervisor should not depend on
the guest for migration (the guest may be broken, or malicious, or being
debugged, or slow). So the notification must be asynchronous, which
means that it will only be delivered to applications after migration has
completed.

Instead of a "migration has occured, run for the hills" signal we're
better of finding out why applications want to know about this event and
addressing specific needs.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Nov 1, 2009, 1:32 AM

Post #38 of 50 (620 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 10/29/2009 05:55 PM, Dan Magenheimer wrote:
>> From: Avi Kivity [mailto:avi [at] redhat]
>> Sent: Thursday, October 29, 2009 9:07 AM
>> To: Dan Magenheimer
>> Cc: Jeremy Fitzhardinge; Glauber Costa; Jeremy Fitzhardinge; Kurt
>> Hackel; the arch/x86 maintainers; Linux Kernel Mailing List;
>> Glauber de
>> Oliveira Costa; Xen-devel; Keir Fraser; Zach Brown; Chris Mason; Ingo
>> Molnar
>> Subject: Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall
>> implementation
>>
>>
>> On 10/29/2009 04:46 PM, Dan Magenheimer wrote:
>>
>>> No, the apps I'm familiar with (a DB and a JVM) need a timestamp
>>> not a monotonic counter. The timestamps must be relatively
>>> accurate (e.g. we've been talking about gettimeofday generically,
>>> but these apps would use clock_gettime for nsec resolution),
>>> monotonically increasing, and work properly across a VM
>>> migration. The timestamps are taken up to a 100K/sec or
>>> more so the apps need to ensure they are using the fastest
>>> mechanism available that meets those requirements.
>>>
>> Out of interest, do you know (and can you relate) why those apps need
>> 100k/sec monotonically increasing timestamps?
>>
> I don't have any public data available for this DB usage, but basically
> assume it is measuring transactions at a very high throughput, some
> of which are to a memory-resident portion of the DB. Anecdotally,
> I'm told the difference between non-vsyscall gettimeofday
> and native rdtsc (on a machine with Invariant TSC support) can
> affect overall DB performance by as much as 10-20%.
>

Sorry, that doesn't explain anything.

> I did find the following public link for the JVM:
>
> http://download.oracle.com/docs/cd/E13188_01/jrockit/tools/intro/jmc3.html
>
> Search for "flight recorder". This feature is intended to
> be enabled all the time, but with non-vsyscall gettimeofday
> the performance impact is unacceptably high, so they are using
> rdtscp instead (on those machines where it is available). With
> rdtscp, the performance impact is not measureable.
>
> Though the processor/server vendors have finally fixed the
> "unsynced TSC" problem on recent x86 platforms, thus allowing
> enterprise software to obtain timestamps at rdtsc performance,
> the problem comes back all over again with virtualization
> because of migration. Jeremy's vsyscall+pvclock is a great
> solution if the app can ensure that it is present; if not,
> the apps will instead continue to use rdtsc as even emulated
> rdtsc is 2-3x faster than non-vsyscall gettimeofday.
>
> Does that help?
>

For profiling work fast timestamping is of course great, but surely
there is no monotonicity requirement?

I don't think we'll be able to provide monotonicity with vsyscall on
tsc-broken hosts, so we'll be limited to correcting the tsc frequency
after migration for good-tsc hosts.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dan.magenheimer at oracle

Nov 2, 2009, 7:28 AM

Post #39 of 50 (617 views)
Permalink
RE: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

> From: Avi Kivity [mailto:avi [at] redhat]
>
> On 10/29/2009 06:15 PM, Dan Magenheimer wrote:
> > On a related note, though some topic drift, many of
> > the problems that occur in virtualization due to migration
> > could be better addressed if Linux had an architected
> > interface to allow it to be signaled if a migration
> > occurred, and if Linux could signal applications of
> > the same. I don't have any cycles (pun intended) to
> > think about this right now, but if anyone else starts
> > looking at it, I'd love to be cc'ed.
>
> IMO that's not a good direction. The hypervisor should not depend on
> the guest for migration (the guest may be broken, or
> malicious, or being
> debugged, or slow). So the notification must be asynchronous, which
> means that it will only be delivered to applications after
> migration has
> completed.

I definitely agree that the hypervisor can't wait for a guest
to respond.

You've likely thought through this a lot more than I have,
but I was thinking that if the kernel received the notification
as some form of interrupt, it could determine immediately
if any running threads had registered for "SIG_MIGRATE"
and deliver the signal synchronously.

> Instead of a "migration has occured, run for the hills" signal we're
> better of finding out why applications want to know about
> this event and
> addressing specific needs.

Perhaps. It certainly isn't warranted for this one
special case of timestamp handling. But I'll bet 5-10 years
from now, after we've handled a few special cases, we'll
wish that we would have handled it more generically.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Nov 2, 2009, 7:41 AM

Post #40 of 50 (615 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 11/02/2009 05:28 PM, Dan Magenheimer wrote:
>> From: Avi Kivity [mailto:avi [at] redhat]
>>
>> On 10/29/2009 06:15 PM, Dan Magenheimer wrote:
>>
>>> On a related note, though some topic drift, many of
>>> the problems that occur in virtualization due to migration
>>> could be better addressed if Linux had an architected
>>> interface to allow it to be signaled if a migration
>>> occurred, and if Linux could signal applications of
>>> the same. I don't have any cycles (pun intended) to
>>> think about this right now, but if anyone else starts
>>> looking at it, I'd love to be cc'ed.
>>>
>> IMO that's not a good direction. The hypervisor should not depend on
>> the guest for migration (the guest may be broken, or
>> malicious, or being
>> debugged, or slow). So the notification must be asynchronous, which
>> means that it will only be delivered to applications after
>> migration has
>> completed.
>>
> I definitely agree that the hypervisor can't wait for a guest
> to respond.
>
> You've likely thought through this a lot more than I have,
> but I was thinking that if the kernel received the notification
> as some form of interrupt, it could determine immediately
> if any running threads had registered for "SIG_MIGRATE"
> and deliver the signal synchronously.
>

Interrupts cannot be delivered immediately. Exceptions can, but not all
guest code is prepared to handle them. Once you start to handle the
exception, migration is complete and you are late.


>> Instead of a "migration has occured, run for the hills" signal we're
>> better of finding out why applications want to know about
>> this event and
>> addressing specific needs.
>>
> Perhaps. It certainly isn't warranted for this one
> special case of timestamp handling. But I'll bet 5-10 years
> from now, after we've handled a few special cases, we'll
> wish that we would have handled it more generically.
>

Or we'll find that backwards compatibility for the generic signal is
killing some optimization.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dan.magenheimer at oracle

Nov 2, 2009, 7:46 AM

Post #41 of 50 (617 views)
Permalink
RE: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

> > I don't have any public data available for this DB usage,
>
> Sorry, that doesn't explain anything.

Well for now just consider the DB usage as another use
of profiling. But one can easily draw scenarios where
a monotonic timestamp is also used to guarantee transaction
ordering.

> > Search for "flight recorder". This feature is intended to
> > be enabled all the time, but with non-vsyscall gettimeofday
> > the performance impact is unacceptably high, so they are using
>
> For profiling work fast timestamping is of course great, but surely
> there is no monotonicity requirement?

Yes and no. Monotonicity is a poor substitute for a more
generic mechanism that might provide an indication that a
discontinuity has occurred (forward or backward); if an app
could get both the timestamp AND some kind of "continuity
generation counter" (basically a much more sophisticated
form of TSC_AUX that changes whenever the timestamp is
coming from a different source), perhaps all problems could be solved.

> I don't think we'll be able to provide monotonicity with vsyscall on
> tsc-broken hosts, so we'll be limited to correcting the tsc frequency
> after migration for good-tsc hosts.

True, though clock_gettime(CLOCK_MONOTONIC) can provide
the monotonicity where it is required.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Nov 2, 2009, 9:12 PM

Post #42 of 50 (614 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 11/02/2009 05:46 PM, Dan Magenheimer wrote:
>>> I don't have any public data available for this DB usage,
>>>
>> Sorry, that doesn't explain anything.
>>
> Well for now just consider the DB usage as another use
> of profiling. But one can easily draw scenarios where
> a monotonic timestamp is also used to guarantee transaction
> ordering.
>

In this case we should provide a facility for this. Providing a global
monotonic counter may be easier than providing a monotonic clock. Hence
my question.

>>> Search for "flight recorder". This feature is intended to
>>> be enabled all the time, but with non-vsyscall gettimeofday
>>> the performance impact is unacceptably high, so they are using
>>>
>> For profiling work fast timestamping is of course great, but surely
>> there is no monotonicity requirement?
>>
> Yes and no. Monotonicity is a poor substitute for a more
> generic mechanism that might provide an indication that a
> discontinuity has occurred (forward or backward); if an app
> could get both the timestamp AND some kind of "continuity
> generation counter" (basically a much more sophisticated
> form of TSC_AUX that changes whenever the timestamp is
> coming from a different source), perhaps all problems could be solved.
>

I doubt it. A discontinuity has occured, but what do we know about it?
nothing.

>> I don't think we'll be able to provide monotonicity with vsyscall on
>> tsc-broken hosts, so we'll be limited to correcting the tsc frequency
>> after migration for good-tsc hosts.
>>
> True, though clock_gettime(CLOCK_MONOTONIC) can provide
> the monotonicity where it is required.
>

We have that already. The question is how to implement it in a vsyscall.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dan.magenheimer at oracle

Nov 4, 2009, 12:30 PM

Post #43 of 50 (609 views)
Permalink
RE: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

> From: Avi Kivity [mailto:avi [at] redhat]
> Subject: Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall
> implementation
>
> On 11/02/2009 05:46 PM, Dan Magenheimer wrote:
> >>> I don't have any public data available for this DB usage,
> >>>
> >> Sorry, that doesn't explain anything.
> >>
> > Well for now just consider the DB usage as another use
> > of profiling. But one can easily draw scenarios where
> > a monotonic timestamp is also used to guarantee transaction
> > ordering.
> >
>
> In this case we should provide a facility for this.
> Providing a global
> monotonic counter may be easier than providing a monotonic
> clock. Hence
> my question.

Maybe I'm misunderstanding something, but enterprise
apps can do this entirely on their own without any
kernel help, correct? Or are you trying to provide
it across guests, e.g. for clusters or something?

> >> For profiling work fast timestamping is of course great, but surely
> >> there is no monotonicity requirement?
> >>
> > Yes and no. Monotonicity is a poor substitute for a more
> > generic mechanism that might provide an indication that a
> > discontinuity has occurred (forward or backward); if an app
> > could get both the timestamp AND some kind of "continuity
> > generation counter" (basically a much more sophisticated
> > form of TSC_AUX that changes whenever the timestamp is
> > coming from a different source), perhaps all problems could
> be solved.
>
> I doubt it. A discontinuity has occured, but what do we know
> about it? nothing.

Actually, I think for many/most profiling applications,
just knowing a discontinuity occurred between two
timestamps is very useful as that one specific measurement
can be discarded. If a discontinuity is invisible,
one clearly knows that a negative interval is bad,
but if an interval is very small or very large,
one never knows if it is due to a discontinuity or
due to some other reason.

This would argue for a syscall/vsyscall that can
"return" two values: the "time" and a second
"continuity generation" counter.

> >> I don't think we'll be able to provide monotonicity with
> vsyscall on
> >> tsc-broken hosts, so we'll be limited to correcting the
> tsc frequency
> >> after migration for good-tsc hosts.
> >>
> > True, though clock_gettime(CLOCK_MONOTONIC) can provide
> > the monotonicity where it is required.
>
> We have that already. The question is how to implement it in
> a vsyscall.

Oh, I see. I missed that very crucial point.

So, just to verify/clarify... There is NO WAY for
a vsyscall to ensure monotonicity (presumably because
the previous reading can't be safely stored?). So
speed and "correctness" are mutually exclusive?

If true, yes, that's a potentially significant problem\
though an intelligent app can layer monotonicity
on top of the call I suppose.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


johnstul at us

Nov 4, 2009, 1:19 PM

Post #44 of 50 (610 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On Thu, Oct 29, 2009 at 7:07 AM, Avi Kivity <avi [at] redhat> wrote:
> On 10/29/2009 04:46 PM, Dan Magenheimer wrote:
>>
>> No, the apps I'm familiar with (a DB and a JVM) need a timestamp
>> not a monotonic counter.  The timestamps must be relatively
>> accurate (e.g. we've been talking about gettimeofday generically,
>> but these apps would use clock_gettime for nsec resolution),
>> monotonically increasing, and work properly across a VM
>> migration.  The timestamps are taken up to a 100K/sec or
>> more so the apps need to ensure they are using the fastest
>> mechanism available that meets those requirements.
>>
>
> Out of interest, do you know (and can you relate) why those apps need
> 100k/sec monotonically increasing timestamps?

This is sort of tangential, but depending on the need, this might be
of interest: Recently I've added a new clock_id,
CLOCK_MONOTONIC_COARSE (as well as CLOCK_REALTIME_COARSE), which
return a HZ granular timestamp (same granularity as filesystem
timestamps). Its very fast to access, since there's no hardware to
touch, and is accessible via vsyscall.

The idea being, if your hitting clock_gettime 100k/sec but you really
don't have the need for nsec granular timestamps, it might provide a
really nice performance boost.

Here's the commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=da15cfdae03351c689736f8d142618592e3cebc3

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dan.magenheimer at oracle

Nov 4, 2009, 1:28 PM

Post #45 of 50 (615 views)
Permalink
RE: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

> From: john stultz [mailto:johnstul [at] us]
> On Thu, Oct 29, 2009 at 7:07 AM, Avi Kivity <avi [at] redhat> wrote:
> >
> > Out of interest, do you know (and can you relate) why those
> apps need
> > 100k/sec monotonically increasing timestamps?
>
> This is sort of tangential, but depending on the need, this might be
> of interest: Recently I've added a new clock_id,
> CLOCK_MONOTONIC_COARSE (as well as CLOCK_REALTIME_COARSE), which
> return a HZ granular timestamp (same granularity as filesystem
> timestamps). Its very fast to access, since there's no hardware to
> touch, and is accessible via vsyscall.
>
> The idea being, if your hitting clock_gettime 100k/sec but you really
> don't have the need for nsec granular timestamps, it might provide a
> really nice performance boost.
>
> Here's the commit:

Hi John --

Yes, possibly of interest. But does it work with CONFIG_NO_HZ?
(I'm expecting that over time NO_HZ will become widespread
for VM OS's, though interested in if you agree.)

Also very interested in your thoughts about a variation
that returns something similar to a TSC_AUX to notify
caller that the underlying reference clock has/may have
changed.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


johnstul at us

Nov 4, 2009, 4:02 PM

Post #46 of 50 (615 views)
Permalink
RE: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On Wed, 2009-11-04 at 13:28 -0800, Dan Magenheimer wrote:
> > From: john stultz [mailto:johnstul [at] us]
> > On Thu, Oct 29, 2009 at 7:07 AM, Avi Kivity <avi [at] redhat> wrote:
> > >
> > > Out of interest, do you know (and can you relate) why those
> > apps need
> > > 100k/sec monotonically increasing timestamps?
> >
> > This is sort of tangential, but depending on the need, this might be
> > of interest: Recently I've added a new clock_id,
> > CLOCK_MONOTONIC_COARSE (as well as CLOCK_REALTIME_COARSE), which
> > return a HZ granular timestamp (same granularity as filesystem
> > timestamps). Its very fast to access, since there's no hardware to
> > touch, and is accessible via vsyscall.
> >
> > The idea being, if your hitting clock_gettime 100k/sec but you really
> > don't have the need for nsec granular timestamps, it might provide a
> > really nice performance boost.
> >
> > Here's the commit:
>
> Hi John --
>
> Yes, possibly of interest. But does it work with CONFIG_NO_HZ?
> (I'm expecting that over time NO_HZ will become widespread
> for VM OS's, though interested in if you agree.)

It should work, with CONFIG_NO_HZ, as soon as we come out of a long idle
(likely due to a timer tick), the timekeeping code will accumulate all
the skipped ticks.

If we ever get to non-idle NOHZ, we'll need some extra work here
(probably lazy accumulation done conditionally in the read path), but
that's also true for filesystem timestamps.


> Also very interested in your thoughts about a variation
> that returns something similar to a TSC_AUX to notify
> caller that the underlying reference clock has/may have
> changed.

I haven't been following that closely. Personally, experience makes me
skeptical of workarounds for unsynced TSCs. But I'm sure there's sharper
folks out there that might make it work. The kernel just requires that
it *really really* works, and not "mostly" works. :)

thanks
-john




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dan.magenheimer at oracle

Nov 4, 2009, 4:45 PM

Post #47 of 50 (608 views)
Permalink
RE: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

> > Yes, possibly of interest. But does it work with CONFIG_NO_HZ?
> > (I'm expecting that over time NO_HZ will become widespread
> > for VM OS's, though interested in if you agree.)
>
> It should work, with CONFIG_NO_HZ, as soon as we come out of
> a long idle
> (likely due to a timer tick), the timekeeping code will accumulate all
> the skipped ticks.
>
> If we ever get to non-idle NOHZ, we'll need some extra work here
> (probably lazy accumulation done conditionally in the read path), but
> that's also true for filesystem timestamps.

OK, sounds good.

> > Also very interested in your thoughts about a variation
> > that returns something similar to a TSC_AUX to notify
> > caller that the underlying reference clock has/may have
> > changed.
>
> I haven't been following that closely. Personally, experience makes me
> skeptical of workarounds for unsynced TSCs. But I'm sure
> there's sharper
> folks out there that might make it work. The kernel just requires that
> it *really really* works, and not "mostly" works. :)

This is less a workaround for unsynced TSCs than it
is for VM migration (and maybe also time where a
VM is out-of-context or moved to a different pcpu)
though it could probably
be made to work on unsynced TSC boxes also.
Basically an application needing hi-res profiling
info would do:

nsec1 = clock_gettime2(MONOTONIC,&aux1);
(time passes)
nsec2 = clock_gettime2(MONOTONIC,&aux2);
if (aux1 != aux2)
discard_measurement();
else
use_measurement(nsec2-nsec1);

and system software (hypervisor or kernel or
both) is responsible for ensuring aux value
monotonically increases whenever a different
crystal is used.

Without something like this as a vsyscall,
apps will just use rdtscp (which must be emulated
to work properly across a migration).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Nov 4, 2009, 10:47 PM

Post #48 of 50 (602 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 11/04/2009 10:30 PM, Dan Magenheimer wrote:
>>
>> In this case we should provide a facility for this.
>> Providing a global
>> monotonic counter may be easier than providing a monotonic
>> clock. Hence
>> my question.
>>
> Maybe I'm misunderstanding something, but enterprise
> apps can do this entirely on their own without any
> kernel help, correct?
>

Within a process, yes. Across processes, not without writable shared
memory.

That's why I'm trying to understand what the actual requirements are.
Real monotonic, accurate, high resolution, low cost time sources are
hard to come by.

>> I doubt it. A discontinuity has occured, but what do we know
>> about it? nothing.
>>
> Actually, I think for many/most profiling applications,
> just knowing a discontinuity occurred between two
> timestamps is very useful as that one specific measurement
> can be discarded. If a discontinuity is invisible,
> one clearly knows that a negative interval is bad,
> but if an interval is very small or very large,
> one never knows if it is due to a discontinuity or
> due to some other reason.
>
> This would argue for a syscall/vsyscall that can
> "return" two values: the "time" and a second
> "continuity generation" counter.
>
>

I doubt it. You should expect discontinuities in user space due to
being swapped out, scheduled out, migrated to a different cpu, or your
laptop lid being closed. There are no guarantees to a userspace
application. Even the kernel can expect discontinuities due to SMIs.
So an explicit notification about one type of discontinuity adds nothing.

>>> True, though clock_gettime(CLOCK_MONOTONIC) can provide
>>> the monotonicity where it is required.
>>>
>> We have that already. The question is how to implement it in
>> a vsyscall.
>>
> Oh, I see. I missed that very crucial point.
>
> So, just to verify/clarify... There is NO WAY for
> a vsyscall to ensure monotonicity (presumably because
> the previous reading can't be safely stored?). So
> speed and "correctness" are mutually exclusive?
>

Yes.

> If true, yes, that's a potentially significant problem\
> though an intelligent app can layer monotonicity
> on top of the call I suppose.
>

Unless it's a multi-process app with limited trust.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dan.magenheimer at oracle

Nov 5, 2009, 6:52 AM

Post #49 of 50 (596 views)
Permalink
RE: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

> From: Avi Kivity [mailto:avi [at] redhat]
>
> Within a process, yes. Across processes, not without writable shared
> memory.
>
> That's why I'm trying to understand what the actual
> requirements are.
> Real monotonic, accurate, high resolution, low cost time sources are
> hard to come by.

Hmmm... this has significant implications for the rdtsc
emulation discussion on xen-devel. Since that's not
a Linux question, I'll start another thread on xen-devel
with a shorter cc list.

> > Actually, I think for many/most profiling applications,
> > just knowing a discontinuity occurred between two
> > timestamps is very useful as that one specific measurement
> > can be discarded. If a discontinuity is invisible,
> > one clearly knows that a negative interval is bad,
> > but if an interval is very small or very large,
> > one never knows if it is due to a discontinuity or
> > due to some other reason.
> >
> > This would argue for a syscall/vsyscall that can
> > "return" two values: the "time" and a second
> > "continuity generation" counter.
>
> I doubt it. You should expect discontinuities in user space due to
> being swapped out, scheduled out, migrated to a different
> cpu, or your
> laptop lid being closed. There are no guarantees to a userspace
> application. Even the kernel can expect discontinuities due
> to SMIs.
> So an explicit notification about one type of discontinuity
> adds nothing.

Good point. I'm interested in enterprise apps that have more
control over the machine (and rarely suffer from laptop lid
closures :-) and would intend for all discontinuities visible
to a hypervisor or kernel to increment "AUX", but bare-metal-
kernel-invisible discontinuities such as SMI do throw a wrench
in the works.

Well, all this discussion has convince me that
my original proposals do make sense for enterprise apps to be
virtualization-aware and use rdtsc/p directly for timestamping
needs rather than OS APIs (with the hypervisor deciding
whether or not to emulate rdtsc/p based on the underlying
physical machine and whether or not migration is enabled
or has occurred).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


keir.fraser at eu

Nov 5, 2009, 7:07 AM

Post #50 of 50 (596 views)
Permalink
Re: [Xen-devel] Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation [In reply to]

On 05/11/2009 14:52, "Dan Magenheimer" <dan.magenheimer [at] oracle> wrote:

> Well, all this discussion has convince me that
> my original proposals do make sense

You surprise me, Dan. ;-)

-- Keir


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.