Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: OpenStack: Dev

Nova and asynchronous instance launching

 

 

OpenStack dev RSS feed   Index | Next | Previous | View Threaded


devin at openstack

Jun 27, 2012, 12:53 PM

Post #1 of 22 (260 views)
Permalink
Nova and asynchronous instance launching

We filed a blueprint for this yesterday:

https://blueprints.launchpad.net/nova/+spec/launch-instances-async

"Currently if a user attempts to create a lot of instances with a single API call (using min_count) the request will hang for a long time while all RPC calls are completed. For a large number of instances this can take a very long time. The API should return immediately and asynchronously make RPC calls."

We are looking for creative ways to work around this problem, but in the meantime I'd like to hear from folks on what they think the preferred solution would be.


Devin


dug at us

Jun 27, 2012, 3:51 PM

Post #2 of 22 (263 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

Consider the creation of a "Job" type of entity that will be returned from
the original call - probably a 202. Then the client can check the Job to
see how things are going.
BTW - this pattern can be used for any async op, not just the launching of
multiple instances since technically any op might be long-running (or
queued) based on the current state of the system.

thanks
-Doug
______________________________________________________
STSM | Standards Architect | IBM Software Group
(919) 254-6905 | IBM 444-6905 | dug [at] us
The more I'm around some people, the more I like my dog.



Devin Carlen <devin [at] openstack>
Sent by: openstack-bounces+dug=us.ibm.com [at] lists
06/27/2012 03:53 PM

To
"openstack [at] lists (openstack [at] lists)"
<openstack [at] lists>
cc

Subject
[Openstack] Nova and asynchronous instance launching






We filed a blueprint for this yesterday:

https://blueprints.launchpad.net/nova/+spec/launch-instances-async

"Currently if a user attempts to create a lot of instances with a single
API call (using min_count) the request will hang for a long time while all
RPC calls are completed. For a large number of instances this can take a
very long time. The API should return immediately and asynchronously make
RPC calls."

We are looking for creative ways to work around this problem, but in the
meantime I'd like to hear from folks on what they think the preferred
solution would be.


Devin_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


jaypipes at gmail

Jun 28, 2012, 9:01 AM

Post #3 of 22 (254 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

On 06/27/2012 06:51 PM, Doug Davis wrote:
> Consider the creation of a "Job" type of entity that will be returned
> from the original call - probably a 202. Then the client can check the
> Job to see how things are going.
> BTW - this pattern can be used for any async op, not just the launching
> of multiple instances since technically any op might be long-running (or
> queued) based on the current state of the system.

Note that much of the job of launching an instance is already
asynchronous -- the initial call to create an instance really just
creates an instance UUID and returns to the caller -- most of the actual
work to create the instance is then done via messaging calls and the
caller can continue to call for a status of her instance to check on it.
In this particular case, I believe Devin is referring to when you
indicate you want to spawn a whole bunch of instances and in that case,
things happen synchronously instead of asynchronously?

Devin, is that correct? If so, it seems like returning a packet
immediately that contains a list of the instance UUIDs that can be used
for checking status is the best option?

Or am I missing something here?
-jay

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


dug at us

Jun 28, 2012, 11:17 AM

Post #4 of 22 (250 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

Understood but I'd rather solve this more generically once instead of each
possible async op doing its own thing. I like consistency :-)

Note that I do distinguish between a 'real' async op (where you really
return little more than a 202) and one that returns a skeleton of the
resource being created - like instance.create() does now.

thanks
-Doug
______________________________________________________
STSM | Standards Architect | IBM Software Group
(919) 254-6905 | IBM 444-6905 | dug [at] us
The more I'm around some people, the more I like my dog.



Jay Pipes <jaypipes [at] gmail>
Sent by: openstack-bounces+dug=us.ibm.com [at] lists
06/28/2012 12:01 PM

To
openstack [at] lists
cc

Subject
Re: [Openstack] Nova and asynchronous instance launching






On 06/27/2012 06:51 PM, Doug Davis wrote:
> Consider the creation of a "Job" type of entity that will be returned
> from the original call - probably a 202. Then the client can check the
> Job to see how things are going.
> BTW - this pattern can be used for any async op, not just the launching
> of multiple instances since technically any op might be long-running (or
> queued) based on the current state of the system.

Note that much of the job of launching an instance is already
asynchronous -- the initial call to create an instance really just
creates an instance UUID and returns to the caller -- most of the actual
work to create the instance is then done via messaging calls and the
caller can continue to call for a status of her instance to check on it.
In this particular case, I believe Devin is referring to when you
indicate you want to spawn a whole bunch of instances and in that case,
things happen synchronously instead of asynchronously?

Devin, is that correct? If so, it seems like returning a packet
immediately that contains a list of the instance UUIDs that can be used
for checking status is the best option?

Or am I missing something here?
-jay

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


devin at openstack

Jun 28, 2012, 2:19 PM

Post #5 of 22 (252 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

On Jun 28, 2012, at 9:01 AM, Jay Pipes wrote:

> On 06/27/2012 06:51 PM, Doug Davis wrote:
>> Consider the creation of a "Job" type of entity that will be returned
>> from the original call - probably a 202. Then the client can check the
>> Job to see how things are going.
>> BTW - this pattern can be used for any async op, not just the launching
>> of multiple instances since technically any op might be long-running (or
>> queued) based on the current state of the system.
>
> Note that much of the job of launching an instance is already asynchronous -- the initial call to create an instance really just creates an instance UUID and returns to the caller -- most of the actual work to create the instance is then done via messaging calls and the caller can continue to call for a status of her instance to check on it. In this particular case, I believe Devin is referring to when you indicate you want to spawn a whole bunch of instances and in that case, things happen synchronously instead of asynchronously?
>
> Devin, is that correct? If so, it seems like returning a packet immediately that contains a list of the instance UUIDs that can be used for checking status is the best option?

Yep, exactly. The client still waits synchronously for the underlying RPC to complete. An immediate 202 would be a great way to deal with this.

>
> Or am I missing something here?
> -jay
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp


_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


winston.d at gmail

Jun 29, 2012, 1:25 AM

Post #6 of 22 (245 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

On Fri, Jun 29, 2012 at 5:19 AM, Devin Carlen <devin [at] openstack> wrote:
> On Jun 28, 2012, at 9:01 AM, Jay Pipes wrote:
>
>> On 06/27/2012 06:51 PM, Doug Davis wrote:
>>> Consider the creation of a "Job" type of entity that will be returned
>>> from the original call - probably a 202.  Then the client can check the
>>> Job to see how things are going.
>>> BTW - this pattern can be used for any async op, not just the launching
>>> of multiple instances since technically any op might be long-running (or
>>> queued) based on the current state of the system.
>>
>> Note that much of the job of launching an instance is already asynchronous -- the initial call to create an instance really just creates an instance UUID and returns to the caller -- most of the actual work to create the instance is then done via messaging calls and the caller can continue to call for a status of her instance to check on it. In this particular case, I believe Devin is referring to when you indicate you want to spawn a whole bunch of instances and in that case, things happen synchronously instead of asynchronously?
>>
>> Devin, is that correct? If so, it seems like returning a packet immediately that contains a list of the instance UUIDs that can be used for checking status is the best option?
>
> Yep, exactly.  The client still waits synchronously for the underlying RPC to complete.
Sound like a performance issue. I think this symptom can be much
eased if we spend sometime fixing whatever bottleneck causing this
(slow AMQP, scheduler, or network)? Now that Nova API has got
multprocess enabled, we'd move to next bottleneck in long path of
'launching instance'.
Devin, is it possible that you provide more details about this issue
so that someone else can reproduce it?

>
>>
>> Or am I missing something here?
>> -jay
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack [at] lists
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp



--
Regards
Huang Zhiteng

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


eglynn at redhat

Jun 29, 2012, 3:00 AM

Post #7 of 22 (246 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

> Note that I do distinguish between a 'real' async op (where you
> really return little more than a 202) and one that returns a
> skeleton of the resource being created - like instance.create() does
> now.

So the latter approach at least provides a way to poll on the resource
status, so as to figure out if and when it becomes usable.

In the happy-path, eventually the instance status transitions to
ACTIVE and away we go.

However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?

For example even just an indication that failure occurred in the scheduler
(e.g. resource starvation) or on the target compute node. Is the thought
that such information may be operationally sensitive, or just TMI for a
typical cloud user?

Cheers,
Eoghan

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


dug at us

Jun 29, 2012, 4:45 AM

Post #8 of 22 (245 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

Right - examining the current state isn't a good way to determine what
happened with one particular request. This is exactly one of the reasons
some providers create Jobs for all actions. Checking the resource "later"
to see why something bad happened is fragile since other opertaons might
have happened since then, erasing any "error message" type of state info.
And relying on event/error logs is hard since correlating one particular
action with a flood of events is tricky - especially in a multi-user
environment where several actions could be underway at once. If each
action resulted in a Job URI being returned then the client can check that
Job resource when its convinient for them - and this could be quite useful
in both happy and unhappy situations.

And to be clear, a Job doesn't necessarily need to be a a full new
resource, it could (under the covers) map to a grouping of event logs
entries but the point is that from a client's perspective they have an
easy mechanism (e.g. issue a GET to a single URI) that returns all of the
info needed to determine what happened with one particular operation.

thanks
-Doug
______________________________________________________
STSM | Standards Architect | IBM Software Group
(919) 254-6905 | IBM 444-6905 | dug [at] us
The more I'm around some people, the more I like my dog.



Eoghan Glynn <eglynn [at] redhat>
06/29/2012 06:00 AM

To
Doug Davis/Raleigh/IBM [at] IBMU
cc
openstack [at] lists, Jay Pipes <jaypipes [at] gmail>
Subject
Re: [Openstack] Nova and asynchronous instance launching







> Note that I do distinguish between a 'real' async op (where you
> really return little more than a 202) and one that returns a
> skeleton of the resource being created - like instance.create() does
> now.

So the latter approach at least provides a way to poll on the resource
status, so as to figure out if and when it becomes usable.

In the happy-path, eventually the instance status transitions to
ACTIVE and away we go.

However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?

For example even just an indication that failure occurred in the scheduler
(e.g. resource starvation) or on the target compute node. Is the thought
that such information may be operationally sensitive, or just TMI for a
typical cloud user?

Cheers,
Eoghan


eglynn at redhat

Jun 29, 2012, 7:10 AM

Post #9 of 22 (245 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

> Right - examining the current state isn't a good way to determine
> what happened with one particular request. This is exactly one of
> the reasons some providers create Jobs for all actions. Checking the
> resource "later" to see why something bad happened is fragile since
> other opertaons might have happened since then, erasing any "error
> message" type of state info. And relying on event/error logs is hard
> since correlating one particular action with a flood of events is
> tricky - especially in a multi-user environment where several
> actions could be underway at once. If each action resulted in a Job
> URI being returned then the client can check that Job resource when
> its convinient for them - and this could be quite useful in both
> happy and unhappy situations.
>
> And to be clear, a Job doesn't necessarily need to be a a full new
> resource, it could (under the covers) map to a grouping of event
> logs entries but the point is that from a client's perspective they
> have an easy mechanism (e.g. issue a GET to a single URI) that
> returns all of the info needed to determine what happened with one
> particular operation.

Agreed on all points.

I wonder could we simply leverage the existing X-Compute-Request-Id
header to provide the context on the over-arching operation that the
client wishes to be informed about?

For example, by providing an administrative API extension allowing queries
on the async "Job" status, identified via the req-<UUID> string returned
from the initial call invoking the operation.

Since the components serving such an operation are generally distributed
(e.g. nova-api, nova-scheduler, nova-compute etc.) and tied together via
async messaging, I don't think simple log scraping would be sufficient.

But if each component was to follow logic such as:

1. when a context is received, check status in the nova DB for that
request ID - if absent, mark as in-progress

2. when an operation hits an unrecoverable error condition, the exception-
handling path should mark the request as failed in the nova DB

3. when an operation reaches a definitive endpoint, e.g. the instance
is successfully launched, then the request status is marked as complete

Step #3 would probably be most problematic, in the sense of identifying
what constitutes the logical endpoint for every operation (e.g. a volume
might created from a snapshot in order to be attached somewhere in a
subsequent operation, or as part of a boot-from-volume operation).

There would be some extra DB manipulation to consider, adding overhead &
latency.

There would also be wrinkles around the lifecycle of entries in the request
status table, when to reap old entries etc.

Just a thought in any case ...

Cheers,
Eoghan

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


philip.day at hp

Jun 29, 2012, 8:40 AM

Post #10 of 22 (247 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

>However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?


I assume the philosophy is that the API has validated the request as far and it can, and returned any meaningful error messages, etc. Anything that fails past that point is something going wrong from the cloud provider and there is nothing the user could have done to avoid the error, so any additional information won't help them.

However on the basis that up-front validation is seldom perfect, and things can change while a request is in flight I think that being able to tell a user that, for example, their request failed because the image was deleted before it could be downloaded would be useful.

One approach might be to make the task_state more granular and use that to qualify the error. In general our users have found having the state shown as "vm_state (task_state)" was useful as it shows progress during things like building.

Phil



From: openstack-bounces+philip.day=hp.com [at] lists [mailto:openstack-bounces+philip.day=hp.com [at] lists] On Behalf Of Doug Davis
Sent: 29 June 2012 12:45
To: Eoghan Glynn
Cc: openstack [at] lists
Subject: Re: [Openstack] Nova and asynchronous instance launching


Right - examining the current state isn't a good way to determine what happened with one particular request. This is exactly one of the reasons some providers create Jobs for all actions. Checking the resource "later" to see why something bad happened is fragile since other opertaons might have happened since then, erasing any "error message" type of state info. And relying on event/error logs is hard since correlating one particular action with a flood of events is tricky - especially in a multi-user environment where several actions could be underway at once. If each action resulted in a Job URI being returned then the client can check that Job resource when its convinient for them - and this could be quite useful in both happy and unhappy situations.

And to be clear, a Job doesn't necessarily need to be a a full new resource, it could (under the covers) map to a grouping of event logs entries but the point is that from a client's perspective they have an easy mechanism (e.g. issue a GET to a single URI) that returns all of the info needed to determine what happened with one particular operation.

thanks
-Doug
______________________________________________________
STSM | Standards Architect | IBM Software Group
(919) 254-6905 | IBM 444-6905 | dug [at] us<mailto:dug [at] us>
The more I'm around some people, the more I like my dog.

Eoghan Glynn <eglynn [at] redhat<mailto:eglynn [at] redhat>>

06/29/2012 06:00 AM

To

Doug Davis/Raleigh/IBM [at] IBMU

cc

openstack [at] lists<mailto:openstack [at] lists>, Jay Pipes <jaypipes [at] gmail<mailto:jaypipes [at] gmail>>

Subject

Re: [Openstack] Nova and asynchronous instance launching








> Note that I do distinguish between a 'real' async op (where you
> really return little more than a 202) and one that returns a
> skeleton of the resource being created - like instance.create() does
> now.

So the latter approach at least provides a way to poll on the resource
status, so as to figure out if and when it becomes usable.

In the happy-path, eventually the instance status transitions to
ACTIVE and away we go.

However, considering the unhappy-path for a second, is there a place
for surfacing some more context as to why the new instance unexpectedly
went into the ERROR state?

For example even just an indication that failure occurred in the scheduler
(e.g. resource starvation) or on the target compute node. Is the thought
that such information may be operationally sensitive, or just TMI for a
typical cloud user?

Cheers,
Eoghan


jaypipes at gmail

Jun 29, 2012, 10:46 AM

Post #11 of 22 (249 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

On 06/29/2012 04:25 AM, Huang Zhiteng wrote:
> Sound like a performance issue. I think this symptom can be much
> eased if we spend sometime fixing whatever bottleneck causing this
> (slow AMQP, scheduler, or network)? Now that Nova API has got
> multprocess enabled, we'd move to next bottleneck in long path of
> 'launching instance'.
> Devin, is it possible that you provide more details about this issue
> so that someone else can reproduce it?

Actually, Vish, David Kranz and I had a discussion about similar stuff
on IRC yesterday. I think that an easy win for this would be to add much
more fine-grained DEBUG logging statements in the various nova service
pieces -- nova-compute, nova-network, etc. Right now, there are areas
that seem to look like performance or locking culprits (iptables
save/restore for example), but because there isn't very fine-grained
logging statements, it's tough to say whether:

a) A process (or greenthread) has simply yielded to another while it
waits for something

b) A process is doing something that is blocking

or

c) A process is doing some other work but no log statements are being
logged about that work, which makes it seem like some other work is
taking much longer than it really is

This would be a really easy win for a beginner developer or someone
looking for something to assist with -- simply add informative
LOG.debug() statements at various points in the API call pipelines

Best,
-jay

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


david.kranz at qrclab

Jun 29, 2012, 10:50 AM

Post #12 of 22 (247 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

An assumption is being made here that the "user" and "cloud provider"
are unrelated. But I think there are many projects under development
where a cloud-based service is being provided on top of an OpenStack
infrastructure. In that use case, the direct user of OpenStack APIs and
the "cloud provider" may be the same entity. It would be really nice if
when an application fires up an instance that enters the error state,
there was an api that could get the reason why it failed with as much
information as the OpenStack code that set the instance state to ERROR had.

If we are concerned that such information is sensitive and a public
provider might not want to give it all to users, this could be an
admin-only API. There are many
variations of how the information is controlled.

-David

If we are concerned that a public provider might not want to give some
information to users, this could be an admin-only API.
On 6/29/2012 11:40 AM, Day, Phil wrote:
>
> >However, considering the unhappy-path for a second, is there a place
> for surfacing some more context as to why the new instance unexpectedly
> went into the ERROR state?
>
> I assume the philosophy is that the API has validated the request as
> far and it can, and returned any meaningful error messages, etc.
> Anything that fails past that point is something going wrong from the
> cloud provider and there is nothing the user could have done to avoid
> the error, so any additional information won't help them.
>
> However on the basis that up-front validation is seldom perfect, and
> things can change while a request is in flight I think that being able
> to tell a user that, for example, their request failed because the
> image was deleted before it could be downloaded would be useful.
>
> One approach might be to make the task_state more granular and use
> that to qualify the error. In general our users have found having
> the state shown as "vm_state (task_state)" was useful as it shows
> progress during things like building.
>
> Phil
>
> *From:*openstack-bounces+philip.day=hp.com [at] lists
> [mailto:openstack-bounces+philip.day=hp.com [at] lists] *On
> Behalf Of *Doug Davis
> *Sent:* 29 June 2012 12:45
> *To:* Eoghan Glynn
> *Cc:* openstack [at] lists
> *Subject:* Re: [Openstack] Nova and asynchronous instance launching
>
>
> Right - examining the current state isn't a good way to determine what
> happened with one particular request. This is exactly one of the
> reasons some providers create Jobs for all actions. Checking the
> resource "later" to see why something bad happened is fragile since
> other opertaons might have happened since then, erasing any "error
> message" type of state info. And relying on event/error logs is hard
> since correlating one particular action with a flood of events is
> tricky - especially in a multi-user environment where several actions
> could be underway at once. If each action resulted in a Job URI being
> returned then the client can check that Job resource when its
> convinient for them - and this could be quite useful in both happy and
> unhappy situations.
>
> And to be clear, a Job doesn't necessarily need to be a a full new
> resource, it could (under the covers) map to a grouping of event logs
> entries but the point is that from a client's perspective they have an
> easy mechanism (e.g. issue a GET to a single URI) that returns all of
> the info needed to determine what happened with one particular operation.
>
> thanks
> -Doug
> ______________________________________________________
> STSM | Standards Architect | IBM Software Group
> (919) 254-6905 | IBM 444-6905 | dug [at] us <mailto:dug [at] us>
> The more I'm around some people, the more I like my dog.
>
> *Eoghan Glynn <eglynn [at] redhat <mailto:eglynn [at] redhat>>*
>
> 06/29/2012 06:00 AM
>
>
>
> To
>
>
>
> Doug Davis/Raleigh/IBM [at] IBMU
>
> cc
>
>
>
> openstack [at] lists <mailto:openstack [at] lists>,
> Jay Pipes <jaypipes [at] gmail <mailto:jaypipes [at] gmail>>
>
> Subject
>
>
>
> Re: [Openstack] Nova and asynchronous instance launching
>
>
>
>
>
>
>
>
> > Note that I do distinguish between a 'real' async op (where you
> > really return little more than a 202) and one that returns a
> > skeleton of the resource being created - like instance.create() does
> > now.
>
> So the latter approach at least provides a way to poll on the resource
> status, so as to figure out if and when it becomes usable.
>
> In the happy-path, eventually the instance status transitions to
> ACTIVE and away we go.
>
> However, considering the unhappy-path for a second, is there a place
> for surfacing some more context as to why the new instance unexpectedly
> went into the ERROR state?
>
> For example even just an indication that failure occurred in the scheduler
> (e.g. resource starvation) or on the target compute node. Is the thought
> that such information may be operationally sensitive, or just TMI for a
> typical cloud user?
>
> Cheers,
> Eoghan
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp


dug at us

Jun 29, 2012, 2:45 PM

Post #13 of 22 (240 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

You don't really expect a client (think ec2-like-user) to analyze debug
info do you?

I really think we need a nice consistent way for people to see what's
going on with long-running operations. Debug info isn't that to me.

thanks
-Doug
______________________________________________________
STSM | Standards Architect | IBM Software Group
(919) 254-6905 | IBM 444-6905 | dug [at] us
The more I'm around some people, the more I like my dog.



Jay Pipes <jaypipes [at] gmail>
Sent by: openstack-bounces+dug=us.ibm.com [at] lists
06/29/2012 01:46 PM

To
Huang Zhiteng <winston.d [at] gmail>
cc
openstack [at] lists
Subject
Re: [Openstack] Nova and asynchronous instance launching






On 06/29/2012 04:25 AM, Huang Zhiteng wrote:
> Sound like a performance issue. I think this symptom can be much
> eased if we spend sometime fixing whatever bottleneck causing this
> (slow AMQP, scheduler, or network)? Now that Nova API has got
> multprocess enabled, we'd move to next bottleneck in long path of
> 'launching instance'.
> Devin, is it possible that you provide more details about this issue
> so that someone else can reproduce it?

Actually, Vish, David Kranz and I had a discussion about similar stuff
on IRC yesterday. I think that an easy win for this would be to add much
more fine-grained DEBUG logging statements in the various nova service
pieces -- nova-compute, nova-network, etc. Right now, there are areas
that seem to look like performance or locking culprits (iptables
save/restore for example), but because there isn't very fine-grained
logging statements, it's tough to say whether:

a) A process (or greenthread) has simply yielded to another while it
waits for something

b) A process is doing something that is blocking

or

c) A process is doing some other work but no log statements are being
logged about that work, which makes it seem like some other work is
taking much longer than it really is

This would be a really easy win for a beginner developer or someone
looking for something to assist with -- simply add informative
LOG.debug() statements at various points in the API call pipelines

Best,
-jay

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


jaypipes at gmail

Jun 29, 2012, 3:03 PM

Post #14 of 22 (250 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

I'm not expecting a client to do anything, and I'm not sure where you
got that from my response below... I'm talking about adding debug
statements into the nova-compute/nova-network logs that an *operator* or
*core developer* would use to determine which parts of the code are
taking that most amount of time.

-jay

On 06/29/2012 05:45 PM, Doug Davis wrote:
>
> You don't really expect a client (think ec2-like-user) to analyze debug
> info do you?
>
> I really think we need a nice consistent way for people to see what's
> going on with long-running operations. Debug info isn't that to me.
>
> thanks
> -Doug
> ______________________________________________________
> STSM | Standards Architect | IBM Software Group
> (919) 254-6905 | IBM 444-6905 | dug [at] us
> The more I'm around some people, the more I like my dog.
>
>
> *Jay Pipes <jaypipes [at] gmail>*
> Sent by: openstack-bounces+dug=us.ibm.com [at] lists
>
> 06/29/2012 01:46 PM
>
>
> To
> Huang Zhiteng <winston.d [at] gmail>
> cc
> openstack [at] lists
> Subject
> Re: [Openstack] Nova and asynchronous instance launching
>
>
>
>
>
>
>
>
> On 06/29/2012 04:25 AM, Huang Zhiteng wrote:
> > Sound like a performance issue. I think this symptom can be much
> > eased if we spend sometime fixing whatever bottleneck causing this
> > (slow AMQP, scheduler, or network)? Now that Nova API has got
> > multprocess enabled, we'd move to next bottleneck in long path of
> > 'launching instance'.
> > Devin, is it possible that you provide more details about this issue
> > so that someone else can reproduce it?
>
> Actually, Vish, David Kranz and I had a discussion about similar stuff
> on IRC yesterday. I think that an easy win for this would be to add much
> more fine-grained DEBUG logging statements in the various nova service
> pieces -- nova-compute, nova-network, etc. Right now, there are areas
> that seem to look like performance or locking culprits (iptables
> save/restore for example), but because there isn't very fine-grained
> logging statements, it's tough to say whether:
>
> a) A process (or greenthread) has simply yielded to another while it
> waits for something
>
> b) A process is doing something that is blocking
>
> or
>
> c) A process is doing some other work but no log statements are being
> logged about that work, which makes it seem like some other work is
> taking much longer than it really is
>
> This would be a really easy win for a beginner developer or someone
> looking for something to assist with -- simply add informative
> LOG.debug() statements at various points in the API call pipelines
>
> Best,
> -jay
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
>
>


_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


jaypipes at gmail

Jun 29, 2012, 3:05 PM

Post #15 of 22 (246 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

On 06/29/2012 05:45 PM, Doug Davis wrote:
>
> You don't really expect a client (think ec2-like-user) to analyze debug
> info do you?
>
> I really think we need a nice consistent way for people to see what's
> going on with long-running operations. Debug info isn't that to me.
>
> thanks
> -Doug

Also, see:

http://wiki.openstack.org/MailingListEtiquette

particularly the first point, re: HTML email.

Cheers,
-jay

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


dug at us

Jun 29, 2012, 3:09 PM

Post #16 of 22 (244 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

True that's all useful info but I thought the original problem being
addressed was how the end-user could know what's going on for long-running
ops.

thanks
-Doug
______________________________________________________
STSM | Standards Architect | IBM Software Group
(919) 254-6905 | IBM 444-6905 | dug [at] us
The more I'm around some people, the more I like my dog.



Jay Pipes <jaypipes [at] gmail>
06/29/2012 06:03 PM

To
Doug Davis/Raleigh/IBM [at] IBMU
cc
openstack [at] lists, Huang Zhiteng <winston.d [at] gmail>
Subject
Re: [Openstack] Nova and asynchronous instance launching






I'm not expecting a client to do anything, and I'm not sure where you
got that from my response below... I'm talking about adding debug
statements into the nova-compute/nova-network logs that an *operator* or
*core developer* would use to determine which parts of the code are
taking that most amount of time.

-jay

On 06/29/2012 05:45 PM, Doug Davis wrote:
>
> You don't really expect a client (think ec2-like-user) to analyze debug
> info do you?
>
> I really think we need a nice consistent way for people to see what's
> going on with long-running operations. Debug info isn't that to me.
>
> thanks
> -Doug
> ______________________________________________________
> STSM | Standards Architect | IBM Software Group
> (919) 254-6905 | IBM 444-6905 | dug [at] us
> The more I'm around some people, the more I like my dog.
>
>
> *Jay Pipes <jaypipes [at] gmail>*
> Sent by: openstack-bounces+dug=us.ibm.com [at] lists
>
> 06/29/2012 01:46 PM
>
>
> To
> Huang Zhiteng <winston.d [at] gmail>
> cc
> openstack [at] lists
> Subject
> Re: [Openstack] Nova and asynchronous instance launching
>
>
>
>
>
>
>
>
> On 06/29/2012 04:25 AM, Huang Zhiteng wrote:
> > Sound like a performance issue. I think this symptom can be much
> > eased if we spend sometime fixing whatever bottleneck causing this
> > (slow AMQP, scheduler, or network)? Now that Nova API has got
> > multprocess enabled, we'd move to next bottleneck in long path of
> > 'launching instance'.
> > Devin, is it possible that you provide more details about this issue
> > so that someone else can reproduce it?
>
> Actually, Vish, David Kranz and I had a discussion about similar stuff
> on IRC yesterday. I think that an easy win for this would be to add much
> more fine-grained DEBUG logging statements in the various nova service
> pieces -- nova-compute, nova-network, etc. Right now, there are areas
> that seem to look like performance or locking culprits (iptables
> save/restore for example), but because there isn't very fine-grained
> logging statements, it's tough to say whether:
>
> a) A process (or greenthread) has simply yielded to another while it
> waits for something
>
> b) A process is doing something that is blocking
>
> or
>
> c) A process is doing some other work but no log statements are being
> logged about that work, which makes it seem like some other work is
> taking much longer than it really is
>
> This would be a really easy win for a beginner developer or someone
> looking for something to assist with -- simply add informative
> LOG.debug() statements at various points in the API call pipelines
>
> Best,
> -jay
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
>
>


cbehrens at codestud

Jun 29, 2012, 4:32 PM

Post #17 of 22 (251 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

There's only 1 rpc call unless you're running cactus or something. All schedulers have a loop...not API.

min-count is unfortunately special cased right now to be a single call vs cast, though. I was going to fix that real soon. Problem is scheduler creating the DB records vs API in this case. I can expand on this when I'm not replying from a phone. :)

There's some other things that would be nice to do here with the API but the call can change to a cast with no API behavior change (except for speeding up the response :)

- Chris

On Jun 27, 2012, at 12:53 PM, Devin Carlen <devin [at] openstack> wrote:

> We filed a blueprint for this yesterday:
>
> https://blueprints.launchpad.net/nova/+spec/launch-instances-async
>
> "Currently if a user attempts to create a lot of instances with a single API call (using min_count) the request will hang for a long time while all RPC calls are completed. For a large number of instances this can take a very long time. The API should return immediately and asynchronously make RPC calls."
>
> We are looking for creative ways to work around this problem, but in the meantime I'd like to hear from folks on what they think the preferred solution would be.
>
>
> Devin
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp


jaypipes at gmail

Jul 1, 2012, 11:06 AM

Post #18 of 22 (224 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

On 06/29/2012 01:50 PM, David Kranz wrote:
> An assumption is being made here that the "user" and "cloud provider"
> are unrelated. But I think there are many projects under development
> where a cloud-based service is being provided on top of an OpenStack
> infrastructure. In that use case, the direct user of OpenStack APIs and
> the "cloud provider" may be the same entity. It would be really nice if
> when an application fires up an instance that enters the error state,
> there was an api that could get the reason why it failed with as much
> information as the OpenStack code that set the instance state to ERROR had.
>
> If we are concerned that such information is sensitive and a public
> provider might not want to give it all to users, this could be an
> admin-only API. There are many
> variations of how the information is controlled.

Yeah, I think this is an excellent suggestion. To be clear, I responded
earlier about adding more debug log statements to nova-network and
nova-compute -- but I wasn't suggesting that as a user-facing solution
to incident tracking :) I was only suggesting that more granular debug
messages in logs can assist the operator.

Best,
-jay

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


philip.day at hp

Jul 1, 2012, 3:04 PM

Post #19 of 22 (217 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

Rather than adding debug statements could we please add additional notification events (for example a notification event whenever task_state changes)

Anyone that want's log file entries could then use the log_notifier, but those that want to get information like this back into a central system can then use rabbit_notifier.

Maybe we need some way of configuring filters on the notifier stream for those that want to decide which events should be logged, sent to MQ, or ignored altogether.

Phil

-----Original Message-----
From: openstack-bounces+philip.day=hp.com [at] lists [mailto:openstack-bounces+philip.day=hp.com [at] lists] On Behalf Of Jay Pipes
Sent: 29 June 2012 18:47
To: Huang Zhiteng
Cc: openstack [at] lists
Subject: Re: [Openstack] Nova and asynchronous instance launching

On 06/29/2012 04:25 AM, Huang Zhiteng wrote:
> Sound like a performance issue. I think this symptom can be much
> eased if we spend sometime fixing whatever bottleneck causing this
> (slow AMQP, scheduler, or network)? Now that Nova API has got
> multprocess enabled, we'd move to next bottleneck in long path of
> 'launching instance'.
> Devin, is it possible that you provide more details about this issue
> so that someone else can reproduce it?

Actually, Vish, David Kranz and I had a discussion about similar stuff on IRC yesterday. I think that an easy win for this would be to add much more fine-grained DEBUG logging statements in the various nova service pieces -- nova-compute, nova-network, etc. Right now, there are areas that seem to look like performance or locking culprits (iptables save/restore for example), but because there isn't very fine-grained logging statements, it's tough to say whether:

a) A process (or greenthread) has simply yielded to another while it waits for something

b) A process is doing something that is blocking

or

c) A process is doing some other work but no log statements are being logged about that work, which makes it seem like some other work is taking much longer than it really is

This would be a really easy win for a beginner developer or someone looking for something to assist with -- simply add informative
LOG.debug() statements at various points in the API call pipelines

Best,
-jay

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


cbehrens at codestud

Jul 1, 2012, 4:13 PM

Post #20 of 22 (216 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

On Jul 1, 2012, at 3:04 PM, "Day, Phil" <philip.day [at] hp> wrote:

> Rather than adding debug statements could we please add additional notification events (for example a notification event whenever task_state changes)
>

This has been in trunk for a month or maybe a little longer.

FYI

- Chris
_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


philip.day at hp

Jul 2, 2012, 4:38 AM

Post #21 of 22 (218 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

Hi Chris,

Thanks for the pointer on the new notification on state change stuff, I'd missed that change.

Is there a blueprint or some such which describes the change ?

In particular I'm trying to understand how the bandwidth_usage values fit in here. It seems that during a VM creation there would normally be a number of fairly rapid state changes, so re-calculating the bandwidth_usage figures might be quiet expensive jut to log a change in task_state from say "Networking" to "Block Device Mapping". I was kind of expecting that to be more part of the "compute.exists" messages than the update.

Do we have something that catalogues the various notification messages and their payloads ?

Thanks,
Phil



-----Original Message-----
From: Chris Behrens [mailto:cbehrens [at] codestud]
Sent: 02 July 2012 00:14
To: Day, Phil
Cc: Jay Pipes; Huang Zhiteng; openstack [at] lists
Subject: Re: [Openstack] Nova and asynchronous instance launching



On Jul 1, 2012, at 3:04 PM, "Day, Phil" <philip.day [at] hp> wrote:

> Rather than adding debug statements could we please add additional notification events (for example a notification event whenever task_state changes)
>

This has been in trunk for a month or maybe a little longer.

FYI

- Chris

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


cbehrens at codestud

Jul 3, 2012, 1:21 PM

Post #22 of 22 (207 views)
Permalink
Re: Nova and asynchronous instance launching [In reply to]

There wasn't a blueprint, but you can see the change here:

https://review.openstack.org/#/c/7542/

Bandwidth is updated in a DB table outside of notifications. Notifications just pulls the last data received and sends it. With rapid state changes, I would expect that bandidth_usage would mostly not be different in the messages… unless a bandwidth update in the background happens to sneak in during the middle of the events.

In any case… these state change events are noted by 'compute.instance.update'. For actions like 'rebuild', you'll get an 'exists' message when the action starts… but then you'll also see some instance.update events as the states switch.

At least this is how I understand it. Besides the code, your best resource for information about notification payloads, etc is this:

http://wiki.openstack.org/SystemUsageData

- Chris


On Jul 2, 2012, at 4:38 AM, Day, Phil wrote:

> Hi Chris,
>
> Thanks for the pointer on the new notification on state change stuff, I'd missed that change.
>
> Is there a blueprint or some such which describes the change ?
>
> In particular I'm trying to understand how the bandwidth_usage values fit in here. It seems that during a VM creation there would normally be a number of fairly rapid state changes, so re-calculating the bandwidth_usage figures might be quiet expensive jut to log a change in task_state from say "Networking" to "Block Device Mapping". I was kind of expecting that to be more part of the "compute.exists" messages than the update.
>
> Do we have something that catalogues the various notification messages and their payloads ?
>
> Thanks,
> Phil
>
>
>
> -----Original Message-----
> From: Chris Behrens [mailto:cbehrens [at] codestud]
> Sent: 02 July 2012 00:14
> To: Day, Phil
> Cc: Jay Pipes; Huang Zhiteng; openstack [at] lists
> Subject: Re: [Openstack] Nova and asynchronous instance launching
>
>
>
> On Jul 1, 2012, at 3:04 PM, "Day, Phil" <philip.day [at] hp> wrote:
>
>> Rather than adding debug statements could we please add additional notification events (for example a notification event whenever task_state changes)
>>
>
> This has been in trunk for a month or maybe a little longer.
>
> FYI
>
> - Chris

OpenStack dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.