Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: OpenStack: Dev

qpid_heartbeat...doesn't?

 

 

OpenStack dev RSS feed   Index | Next | Previous | View Threaded


lars at seas

Jul 28, 2012, 7:24 PM

Post #1 of 9 (232 views)
Permalink
qpid_heartbeat...doesn't?

Our environment has connection-tracking firewalls that drop idle
connections after an hour. There is a connection between nova-compute
and our qpidd server that appears to be idle for long periods of time.

When the firewall drops this connection, the participating hosts are
unaware of that fact and ultimately stop communicating with each other
until we restart nova-compute.

I was hoping that the qpid_heartbeat parameter would avoid this
problem by keeping the connection active, but despite having
qpid_heartbeat set explicitly in our configuration...

# This is supposed to be the default
qpid_heartbeat = 5

...there is no traffic across this connection

I can deal with this problem by forcing (via libkeepalive,
http://libkeepalive.sourceforge.net) SO_KEEPALIVE on the AMQ sockets
(and tuning the net.ipv4.tcp_keepalive_time sysctl to be < the
firewall connection timeout), but that seems a bit of a hack. It's
also possible to work around this by disabling idle connection
timeouts on the firewall, so we're not completely stymied...

...but I would like to understand why setting qpid_heartbeat does not,
in fact, result in the regular transmission of heartbeat packets
across the connection.

We're running openstack-nova-2012.1.1-0.20120615.13614 from EPEL (and
qpid 0.14).

Thanks,

--
Lars Kellogg-Stedman <lars [at] seas> |
Senior Technologist | http://ac.seas.harvard.edu/
Academic Computing | http://code.seas.harvard.edu/
Harvard School of Engineering and Applied Sciences |

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


P at draigBrady

Jul 28, 2012, 8:11 PM

Post #2 of 9 (218 views)
Permalink
Re: qpid_heartbeat...doesn't? [In reply to]

On 07/29/2012 03:24 AM, Lars Kellogg-Stedman wrote:
> Our environment has connection-tracking firewalls that drop idle
> connections after an hour. There is a connection between nova-compute
> and our qpidd server that appears to be idle for long periods of time.
>
> When the firewall drops this connection, the participating hosts are
> unaware of that fact and ultimately stop communicating with each other
> until we restart nova-compute.
>
> I was hoping that the qpid_heartbeat parameter would avoid this
> problem by keeping the connection active, but despite having
> qpid_heartbeat set explicitly in our configuration...
>
> # This is supposed to be the default
> qpid_heartbeat = 5
>
> ...there is no traffic across this connection
>
> I can deal with this problem by forcing (via libkeepalive,
> http://libkeepalive.sourceforge.net) SO_KEEPALIVE on the AMQ sockets
> (and tuning the net.ipv4.tcp_keepalive_time sysctl to be < the
> firewall connection timeout), but that seems a bit of a hack. It's
> also possible to work around this by disabling idle connection
> timeouts on the firewall, so we're not completely stymied...
>
> ...but I would like to understand why setting qpid_heartbeat does not,
> in fact, result in the regular transmission of heartbeat packets
> across the connection.
>
> We're running openstack-nova-2012.1.1-0.20120615.13614 from EPEL (and
> qpid 0.14).

Looks like a typo.
Could you try this.

cheers,
Pádraig.

diff --git a/nova/rpc/impl_qpid.py b/nova/rpc/impl_qpid.py
index 289f21b..e19079e 100644
--- a/nova/rpc/impl_qpid.py
+++ b/nova/rpc/impl_qpid.py
@@ -317,7 +317,7 @@ class Connection(object):
FLAGS.qpid_reconnect_interval_min)
if FLAGS.qpid_reconnect_interval:
self.connection.reconnect_interval = FLAGS.qpid_reconnect_interval
- self.connection.hearbeat = FLAGS.qpid_heartbeat
+ self.connection.heartbeat = FLAGS.qpid_heartbeat
self.connection.protocol = FLAGS.qpid_protocol
self.connection.tcp_nodelay = FLAGS.qpid_tcp_nodelay

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


lars at seas

Jul 29, 2012, 11:14 AM

Post #3 of 9 (216 views)
Permalink
Re: qpid_heartbeat...doesn't? [In reply to]

> Looks like a typo.
> Could you try this.

That seems better...although while the documentation says that
qpid_heartbeat is "Seconds between heartbeat messages" [1], observed
behavior suggests that it is actually *minutes* between messages.

[1]: http://docs.openstack.org/essex/openstack-compute/admin/content/configuration-qpid.html

--
Lars Kellogg-Stedman <lars [at] seas> |
Senior Technologist | http://ac.seas.harvard.edu/
Academic Computing | http://code.seas.harvard.edu/
Harvard School of Engineering and Applied Sciences |

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


P at draigBrady

Jul 29, 2012, 5:41 PM

Post #4 of 9 (212 views)
Permalink
Re: qpid_heartbeat...doesn't? [In reply to]

On 07/29/2012 07:14 PM, Lars Kellogg-Stedman wrote:
>> Looks like a typo.
>> Could you try this.
>
> That seems better...although while the documentation says that
> qpid_heartbeat is "Seconds between heartbeat messages" [1], observed
> behavior suggests that it is actually *minutes* between messages.
>
> [1]: http://docs.openstack.org/essex/openstack-compute/admin/content/configuration-qpid.html

That's surprising as the qpid code does:

if self.connection.heartbeat:
times.append(time.time() + self.connection.heartbeat)

Notice how time.time() is seconds, and so
heartbeat must be given in seconds.
Perhaps there is another issue with the scheduling of this?
How are you monitoring the connection?

cheers,
Pádraig.

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


lars at seas

Jul 29, 2012, 6:49 PM

Post #5 of 9 (210 views)
Permalink
Re: qpid_heartbeat...doesn't? [In reply to]

On Mon, Jul 30, 2012 at 01:41:20AM +0100, Pádraig Brady wrote:
> Perhaps there is another issue with the scheduling of this?

That's likely. While I verified that the patch successfully fixed our
connection timeout issue, I didn't look closely to see exactly where
the behavior changed...and the connection that is standing out now
belongs to nova-volume, whereas the timouts were happening with
nova-compute.

> How are you monitoring the connection?

Our firewall is a Cisco ASDM (6.1). I'm monitoring the connection by
running:

show conn lport 5672

Which gets me:

TCP compute-hosts:630 10.243.16.151:39756 controllers:621 openstack-controller:5672 idle 0:00:00 Bytes 6410148 FLAGS - UBOI
TCP compute-hosts:630 10.243.16.151:39881 controllers:621 openstack-controller:5672 idle 0:00:04 Bytes 10470 FLAGS - UBOI
TCP compute-hosts:630 10.243.16.151:39755 controllers:621 openstack-controller:5672 idle 0:00:02 Bytes 9717108 FLAGS - UBOI
TCP compute-hosts:630 10.243.16.151:39736 controllers:621 openstack-controller:5672 idle 0:03:59 Bytes 36206 FLAGS - UBOI
TCP compute-hosts:630 10.243.16.151:39752 controllers:621 openstack-controller:5672 idle 0:00:03 Bytes 4313246 FLAGS - UBOI

Where the fields are:

<protocol> <source interface> <source ip/port> <dest. interface> <dest ip/port> idle <idle time> ...

The connection from port 39736 on the compute host (which is the
nova-volume process) regularly cycles up to 5 minutes of idle time
before resetting to 0 (and the firewall sets the idle time to zero
whenever any traffic passes across the connection).

And indeed, if I run a packet trace on this connection, I can verify
that packets are only showing up at five-minute intervals.

--
Lars Kellogg-Stedman <lars [at] seas> |
Senior Technologist | http://ac.seas.harvard.edu/
Academic Computing | http://code.seas.harvard.edu/
Harvard School of Engineering and Applied Sciences |


_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


lars at seas

Jul 29, 2012, 6:58 PM

Post #6 of 9 (210 views)
Permalink
Re: qpid_heartbeat...doesn't? [In reply to]

On Sun, Jul 29, 2012 at 09:49:25PM -0400, Lars Kellogg-Stedman wrote:
> And indeed, if I run a packet trace on this connection, I can verify
> that packets are only showing up at five-minute intervals.

Horrors! It may be that nova-volume didn't get restarted when I
restarted everything else. After explicitly restarting nova-volume it
now seems to be emitting heartbeat traffic that correspond with what
the other processes are doing.

--
Lars Kellogg-Stedman <lars [at] seas> |
Senior Technologist | http://ac.seas.harvard.edu/
Academic Computing | http://code.seas.harvard.edu/
Harvard School of Engineering and Applied Sciences |

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


reagul.2007 at gmail

Jul 30, 2012, 4:17 AM

Post #7 of 9 (206 views)
Permalink
Re: qpid_heartbeat...doesn't? [In reply to]

Registering service heartbeat into the subsystem might alleviate the need
to restart certain items . This subsystem is the one that monitors what
services are up and running .

Ravi.

On Sun, Jul 29, 2012 at 9:58 PM, Lars Kellogg-Stedman <lars [at] seas
> wrote:

> On Sun, Jul 29, 2012 at 09:49:25PM -0400, Lars Kellogg-Stedman wrote:
> > And indeed, if I run a packet trace on this connection, I can verify
> > that packets are only showing up at five-minute intervals.
>
> Horrors! It may be that nova-volume didn't get restarted when I
> restarted everything else. After explicitly restarting nova-volume it
> now seems to be emitting heartbeat traffic that correspond with what
> the other processes are doing.
>
> --
> Lars Kellogg-Stedman <lars [at] seas> |
> Senior Technologist |
> http://ac.seas.harvard.edu/
> Academic Computing |
> http://code.seas.harvard.edu/
> Harvard School of Engineering and Applied Sciences |
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
>


lars at seas

Aug 2, 2012, 9:35 AM

Post #8 of 9 (190 views)
Permalink
Re: qpid_heartbeat...doesn't? [In reply to]

On Thu, Aug 02, 2012 at 12:33:13PM -0400, Lars Kellogg-Stedman wrote:
> > Looks like a typo.
> > Could you try this.
>
> FYI: The same typo appears to exist in notify_qpid.py.

Err, that is, glance/notifier/notify_qpid.py, in case it wasn't
obvious...

--
Lars Kellogg-Stedman <lars [at] seas> |
Senior Technologist | http://ac.seas.harvard.edu/
Academic Computing | http://code.seas.harvard.edu/
Harvard School of Engineering and Applied Sciences |

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


P at draigBrady

Aug 2, 2012, 11:01 AM

Post #9 of 9 (190 views)
Permalink
Re: qpid_heartbeat...doesn't? [In reply to]

On 08/02/2012 05:35 PM, Lars Kellogg-Stedman wrote:
> On Thu, Aug 02, 2012 at 12:33:13PM -0400, Lars Kellogg-Stedman wrote:
>>> Looks like a typo.
>>> Could you try this.
>>
>> FYI: The same typo appears to exist in notify_qpid.py.
>
> Err, that is, glance/notifier/notify_qpid.py, in case it wasn't
> obvious...

Well spotted.
I've submitted a patch for:
https://bugs.launchpad.net/glance/+bug/1032314

cheers,
Pádraig.

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp

OpenStack dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.