Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: OpenStack: Dev

instances loosing IP address while running, due to No DHCPOFFER

 

 

OpenStack dev RSS feed   Index | Next | Previous | View Threaded


trapni at gmail

Jun 14, 2012, 1:41 PM

Post #1 of 10 (1435 views)
Permalink
instances loosing IP address while running, due to No DHCPOFFER

Hey all,

I feel really sad with saying this, now, that we have quite a few instances
in producgtion
since about 5 days at least, I now have encountered the second instance
loosing its
IP address due to "No DHCPOFFER" (as of syslog in the instance).

I checked the logs in the central nova-network and gateway node and found
dnsmasq still to reply on requests from all the other instances and it even
got the request from the instance in question and even sent an OFFER, as of
what
I can tell by now (i'm investigating / posting logs asap), but while it
seemed
that the dnsmasq sends an offer, the instances says it didn't receive one -
wtf?

Please tell me what I can do to actually *fix* this issue, since this is by
far very fatal.

One chance I'd see (as a workaround) is, to let created instanced retrieve
its IP via dhcp, but then reconfigure /etc/network/instances to continue
with
static networking setup. However, I'd just like the dhcp thingy to get
fixed.

I'm very open to any kind of helping comments, :)

So long,
Christian.


nathanael.i.burton at gmail

Jun 14, 2012, 1:55 PM

Post #2 of 10 (1422 views)
Permalink
Re: instances loosing IP address while running, due to No DHCPOFFER [In reply to]

Has nova-network been restarted? There was an issue where nova-network was
signalling dnsmasq which would cause dnsmasq to stop responding to requests
yet appear to be running fine.

You can see if killing dnsmasq, restarting nova-network, and rebooting an
instance allows it to get a dhcp address again ...

Nate
On Jun 14, 2012 4:46 PM, "Christian Parpart" <trapni [at] gmail> wrote:

> Hey all,
>
> I feel really sad with saying this, now, that we have quite a few
> instances in producgtion
> since about 5 days at least, I now have encountered the second instance
> loosing its
> IP address due to "No DHCPOFFER" (as of syslog in the instance).
>
> I checked the logs in the central nova-network and gateway node and found
> dnsmasq still to reply on requests from all the other instances and it even
> got the request from the instance in question and even sent an OFFER, as
> of what
> I can tell by now (i'm investigating / posting logs asap), but while it
> seemed
> that the dnsmasq sends an offer, the instances says it didn't receive one
> - wtf?
>
> Please tell me what I can do to actually *fix* this issue, since this is
> by far very fatal.
>
> One chance I'd see (as a workaround) is, to let created instanced retrieve
> its IP via dhcp, but then reconfigure /etc/network/instances to continue
> with
> static networking setup. However, I'd just like the dhcp thingy to get
> fixed.
>
> I'm very open to any kind of helping comments, :)
>
> So long,
> Christian.
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
>
>


trapni at gmail

Jun 14, 2012, 4:02 PM

Post #3 of 10 (1421 views)
Permalink
Re: instances loosing IP address while running, due to No DHCPOFFER [In reply to]

Hey,

thanks for your reply. Unfortunately there was no process restart in
nova-network nor in dnsmasq,
both processes seem to have been up for about 2 and 3 days.

However, why is the default dhcp_lease_time value equal 120s? Not having
this one overridden
causes the clients to actually re-acquire a new DHCP lease every 42 seconds
(at least on my nodes),
which is completely ridiculous.
OTOH, I took a look at the sources (linux_net.py) and found out, why the
max_lease_time is
set to 2048, because that is the size of my network.
So why is the max lease time the size of my network?
I've written a tiny patch to allow overriding this value in nova.conf, and
will submit it to launchpad
soon - and hope it'll be accepted and then also applied to essex, since
this is a very straight forward
few-liner helpful thing.

Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)
instances getting
no DHCP replies/offers after some hours/days anymore.

The one host that caused issues today (a few hours ago), I fixed it by hard
rebooting the instance,
however, just about 40 minutes later, it again forgot its IP, so one might
say, that it
maybe did not get any reply from the dhcp server (dnsmasq) almost right
after it got
a lease on instance boot.

So long,
Christian.

On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton <
nathanael.i.burton [at] gmail> wrote:

> Has nova-network been restarted? There was an issue where nova-network was
> signalling dnsmasq which would cause dnsmasq to stop responding to requests
> yet appear to be running fine.
>
> You can see if killing dnsmasq, restarting nova-network, and rebooting an
> instance allows it to get a dhcp address again ...
>
> Nate
> On Jun 14, 2012 4:46 PM, "Christian Parpart" <trapni [at] gmail> wrote:
>
>> Hey all,
>>
>> I feel really sad with saying this, now, that we have quite a few
>> instances in producgtion
>> since about 5 days at least, I now have encountered the second instance
>> loosing its
>> IP address due to "No DHCPOFFER" (as of syslog in the instance).
>>
>> I checked the logs in the central nova-network and gateway node and found
>> dnsmasq still to reply on requests from all the other instances and it
>> even
>> got the request from the instance in question and even sent an OFFER, as
>> of what
>> I can tell by now (i'm investigating / posting logs asap), but while it
>> seemed
>> that the dnsmasq sends an offer, the instances says it didn't receive one
>> - wtf?
>>
>> Please tell me what I can do to actually *fix* this issue, since this is
>> by far very fatal.
>>
>> One chance I'd see (as a workaround) is, to let created instanced retrieve
>> its IP via dhcp, but then reconfigure /etc/network/instances to continue
>> with
>> static networking setup. However, I'd just like the dhcp thingy to get
>> fixed.
>>
>> I'm very open to any kind of helping comments, :)
>>
>> So long,
>> Christian.
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack [at] lists
>> Unsubscribe : https://launchpad.net/~openstack
>> More help : https://help.launchpad.net/ListHelp
>>
>>


vishvananda at gmail

Jun 14, 2012, 4:04 PM

Post #4 of 10 (1424 views)
Permalink
Re: instances loosing IP address while running, due to No DHCPOFFER [In reply to]

Are you running in VLAN mode? If so, you probably need to update to a new version of dnsmasq. See this message for reference:

http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html

Vish

On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote:

> Hey all,
>
> I feel really sad with saying this, now, that we have quite a few instances in producgtion
> since about 5 days at least, I now have encountered the second instance loosing its
> IP address due to "No DHCPOFFER" (as of syslog in the instance).
>
> I checked the logs in the central nova-network and gateway node and found
> dnsmasq still to reply on requests from all the other instances and it even
> got the request from the instance in question and even sent an OFFER, as of what
> I can tell by now (i'm investigating / posting logs asap), but while it seemed
> that the dnsmasq sends an offer, the instances says it didn't receive one - wtf?
>
> Please tell me what I can do to actually *fix* this issue, since this is by far very fatal.
>
> One chance I'd see (as a workaround) is, to let created instanced retrieve
> its IP via dhcp, but then reconfigure /etc/network/instances to continue with
> static networking setup. However, I'd just like the dhcp thingy to get fixed.
>
> I'm very open to any kind of helping comments, :)
>
> So long,
> Christian.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp


nathanael.i.burton at gmail

Jun 14, 2012, 4:14 PM

Post #5 of 10 (1423 views)
Permalink
Re: instances loosing IP address while running, due to No DHCPOFFER [In reply to]

There's a flag 'dhcp_lease_time' (in secs) that can be set in nova.conf.
DHCP clients typically re-up every (dhcp_lease_time/2) seconds, but this
varies based on client. Additionally some dhcp clients are not persistent,
meaning if there's ever a network hiccup and they don't get a dhcp ACK they
will give up and stop checking in, thus losing their lease and fall off the
network.

On RHEL/CentOS/Fedora this is fixed by setting PERSISTENT_DHCLIENT=1 in
your ifcfg-eth0 file. Not sure about Ubuntu.

Nate
On Jun 14, 2012 7:02 PM, "Christian Parpart" <trapni [at] gmail> wrote:

> Hey,
>
> thanks for your reply. Unfortunately there was no process restart in
> nova-network nor in dnsmasq,
> both processes seem to have been up for about 2 and 3 days.
>
> However, why is the default dhcp_lease_time value equal 120s? Not having
> this one overridden
> causes the clients to actually re-acquire a new DHCP lease every 42
> seconds (at least on my nodes),
> which is completely ridiculous.
> OTOH, I took a look at the sources (linux_net.py) and found out, why the
> max_lease_time is
> set to 2048, because that is the size of my network.
> So why is the max lease time the size of my network?
> I've written a tiny patch to allow overriding this value in nova.conf, and
> will submit it to launchpad
> soon - and hope it'll be accepted and then also applied to essex, since
> this is a very straight forward
> few-liner helpful thing.
>
> Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)
> instances getting
> no DHCP replies/offers after some hours/days anymore.
>
> The one host that caused issues today (a few hours ago), I fixed it by
> hard rebooting the instance,
> however, just about 40 minutes later, it again forgot its IP, so one might
> say, that it
> maybe did not get any reply from the dhcp server (dnsmasq) almost right
> after it got
> a lease on instance boot.
>
> So long,
> Christian.
>
> On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton <
> nathanael.i.burton [at] gmail> wrote:
>
>> Has nova-network been restarted? There was an issue where nova-network
>> was signalling dnsmasq which would cause dnsmasq to stop responding to
>> requests yet appear to be running fine.
>>
>> You can see if killing dnsmasq, restarting nova-network, and rebooting an
>> instance allows it to get a dhcp address again ...
>>
>> Nate
>> On Jun 14, 2012 4:46 PM, "Christian Parpart" <trapni [at] gmail> wrote:
>>
>>> Hey all,
>>>
>>> I feel really sad with saying this, now, that we have quite a few
>>> instances in producgtion
>>> since about 5 days at least, I now have encountered the second instance
>>> loosing its
>>> IP address due to "No DHCPOFFER" (as of syslog in the instance).
>>>
>>> I checked the logs in the central nova-network and gateway node and found
>>> dnsmasq still to reply on requests from all the other instances and it
>>> even
>>> got the request from the instance in question and even sent an OFFER, as
>>> of what
>>> I can tell by now (i'm investigating / posting logs asap), but while it
>>> seemed
>>> that the dnsmasq sends an offer, the instances says it didn't receive
>>> one - wtf?
>>>
>>> Please tell me what I can do to actually *fix* this issue, since this is
>>> by far very fatal.
>>>
>>> One chance I'd see (as a workaround) is, to let created instanced
>>> retrieve
>>> its IP via dhcp, but then reconfigure /etc/network/instances to continue
>>> with
>>> static networking setup. However, I'd just like the dhcp thingy to get
>>> fixed.
>>>
>>> I'm very open to any kind of helping comments, :)
>>>
>>> So long,
>>> Christian.
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~openstack
>>> Post to : openstack [at] lists
>>> Unsubscribe : https://launchpad.net/~openstack
>>> More help : https://help.launchpad.net/ListHelp
>>>
>>>
>


narayan.desai at gmail

Jun 14, 2012, 4:16 PM

Post #6 of 10 (1424 views)
Permalink
Re: instances loosing IP address while running, due to No DHCPOFFER [In reply to]

I vaguely recall Vish mentioning a bug in dnsmasq that had a somewhat
similar problem. (it had to do with lease renewal problems on ip
aliases or something like that).

This issue was particularly pronounced with windows VMs, apparently.
-nld

On Thu, Jun 14, 2012 at 6:02 PM, Christian Parpart <trapni [at] gmail> wrote:
> Hey,
>
> thanks for your reply. Unfortunately there was no process restart in
> nova-network nor in dnsmasq,
> both processes seem to have been up for about 2 and 3 days.
>
> However, why is the default dhcp_lease_time value equal 120s? Not having
> this one overridden
> causes the clients to actually re-acquire a new DHCP lease every 42 seconds
> (at least on my nodes),
> which is completely ridiculous.
> OTOH, I took a look at the sources (linux_net.py) and found out, why the
> max_lease_time is
> set to 2048, because that is the size of my network.
> So why is the max lease time the size of my network?
> I've written a tiny patch to allow overriding this value in nova.conf, and
> will submit it to launchpad
> soon - and hope it'll be accepted and then also applied to essex, since this
> is a very straight forward
> few-liner helpful thing.
>
> Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)
> instances getting
> no DHCP replies/offers after some hours/days anymore.
>
> The one host that caused issues today (a few hours ago), I fixed it by hard
> rebooting the instance,
> however, just about 40 minutes later, it again forgot its IP, so one might
> say, that it
> maybe did not get any reply from the dhcp server (dnsmasq) almost right
> after it got
> a lease on instance boot.
>
> So long,
> Christian.
>
> On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton
> <nathanael.i.burton [at] gmail> wrote:
>>
>> Has nova-network been restarted? There was an issue where nova-network was
>> signalling dnsmasq which would cause dnsmasq to stop responding to requests
>> yet appear to be running fine.
>>
>> You can see if killing dnsmasq, restarting nova-network, and rebooting an
>> instance allows it to get a dhcp address again ...
>>
>> Nate
>>
>> On Jun 14, 2012 4:46 PM, "Christian Parpart" <trapni [at] gmail> wrote:
>>>
>>> Hey all,
>>>
>>> I feel really sad with saying this, now, that we have quite a few
>>> instances in producgtion
>>> since about 5 days at least, I now have encountered the second instance
>>> loosing its
>>> IP address due to "No DHCPOFFER" (as of syslog in the instance).
>>>
>>> I checked the logs in the central nova-network and gateway node and found
>>> dnsmasq still to reply on requests from all the other instances and it
>>> even
>>> got the request from the instance in question and even sent an OFFER, as
>>> of what
>>> I can tell by now (i'm investigating / posting logs asap), but while it
>>> seemed
>>> that the dnsmasq sends an offer, the instances says it didn't receive one
>>> - wtf?
>>>
>>> Please tell me what I can do to actually *fix* this issue, since this is
>>> by far very fatal.
>>>
>>> One chance I'd see (as a workaround) is, to let created instanced
>>> retrieve
>>> its IP via dhcp, but then reconfigure /etc/network/instances to continue
>>> with
>>> static networking setup. However, I'd just like the dhcp thingy to get
>>> fixed.
>>>
>>> I'm very open to any kind of helping comments, :)
>>>
>>> So long,
>>> Christian.
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~openstack
>>> Post to     : openstack [at] lists
>>> Unsubscribe : https://launchpad.net/~openstack
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp


trapni at gmail

Jun 14, 2012, 5:09 PM

Post #7 of 10 (1422 views)
Permalink
Re: instances loosing IP address while running, due to No DHCPOFFER [In reply to]

Hey all,

many many thanks for all your replies, and while already having raised the
dhcp timeouts
just by now, I'll have now enough time to sleep to actually apply the
dnsmasq fix
tomorrow then.

Yes, I am running in VLAN-mode, since this is also the propagated way.

Maybe OpenStack (nova-network) should check the version number of dnsmasq
and
if running in vlan mode, it really should issue a (critical) warning into
the logs,
especially where this kind of error can lead to disasters in datacenters. :)

I also hope that Ubuntu 12.04 will pick up this patch soon enough, so the
"us" won't
end up in a patch-dominated distribution :-)

Good night all,
Christian.

On Fri, Jun 15, 2012 at 1:16 AM, Narayan Desai <narayan.desai [at] gmail>wrote:

> I vaguely recall Vish mentioning a bug in dnsmasq that had a somewhat
> similar problem. (it had to do with lease renewal problems on ip
> aliases or something like that).
>
> This issue was particularly pronounced with windows VMs, apparently.
> -nld
>
> On Thu, Jun 14, 2012 at 6:02 PM, Christian Parpart <trapni [at] gmail>
> wrote:
> > Hey,
> >
> > thanks for your reply. Unfortunately there was no process restart in
> > nova-network nor in dnsmasq,
> > both processes seem to have been up for about 2 and 3 days.
> >
> > However, why is the default dhcp_lease_time value equal 120s? Not having
> > this one overridden
> > causes the clients to actually re-acquire a new DHCP lease every 42
> seconds
> > (at least on my nodes),
> > which is completely ridiculous.
> > OTOH, I took a look at the sources (linux_net.py) and found out, why the
> > max_lease_time is
> > set to 2048, because that is the size of my network.
> > So why is the max lease time the size of my network?
> > I've written a tiny patch to allow overriding this value in nova.conf,
> and
> > will submit it to launchpad
> > soon - and hope it'll be accepted and then also applied to essex, since
> this
> > is a very straight forward
> > few-liner helpful thing.
> >
> > Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)
> > instances getting
> > no DHCP replies/offers after some hours/days anymore.
> >
> > The one host that caused issues today (a few hours ago), I fixed it by
> hard
> > rebooting the instance,
> > however, just about 40 minutes later, it again forgot its IP, so one
> might
> > say, that it
> > maybe did not get any reply from the dhcp server (dnsmasq) almost right
> > after it got
> > a lease on instance boot.
> >
> > So long,
> > Christian.
> >
> > On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton
> > <nathanael.i.burton [at] gmail> wrote:
> >>
> >> Has nova-network been restarted? There was an issue where nova-network
> was
> >> signalling dnsmasq which would cause dnsmasq to stop responding to
> requests
> >> yet appear to be running fine.
> >>
> >> You can see if killing dnsmasq, restarting nova-network, and rebooting
> an
> >> instance allows it to get a dhcp address again ...
> >>
> >> Nate
> >>
> >> On Jun 14, 2012 4:46 PM, "Christian Parpart" <trapni [at] gmail> wrote:
> >>>
> >>> Hey all,
> >>>
> >>> I feel really sad with saying this, now, that we have quite a few
> >>> instances in producgtion
> >>> since about 5 days at least, I now have encountered the second instance
> >>> loosing its
> >>> IP address due to "No DHCPOFFER" (as of syslog in the instance).
> >>>
> >>> I checked the logs in the central nova-network and gateway node and
> found
> >>> dnsmasq still to reply on requests from all the other instances and it
> >>> even
> >>> got the request from the instance in question and even sent an OFFER,
> as
> >>> of what
> >>> I can tell by now (i'm investigating / posting logs asap), but while it
> >>> seemed
> >>> that the dnsmasq sends an offer, the instances says it didn't receive
> one
> >>> - wtf?
> >>>
> >>> Please tell me what I can do to actually *fix* this issue, since this
> is
> >>> by far very fatal.
> >>>
> >>> One chance I'd see (as a workaround) is, to let created instanced
> >>> retrieve
> >>> its IP via dhcp, but then reconfigure /etc/network/instances to
> continue
> >>> with
> >>> static networking setup. However, I'd just like the dhcp thingy to get
> >>> fixed.
> >>>
> >>> I'm very open to any kind of helping comments, :)
> >>>
> >>> So long,
> >>> Christian.
> >>>
> >>>
> >>> _______________________________________________
> >>> Mailing list: https://launchpad.net/~openstack
> >>> Post to : openstack [at] lists
> >>> Unsubscribe : https://launchpad.net/~openstack
> >>> More help : https://help.launchpad.net/ListHelp
> >>>
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~openstack
> > Post to : openstack [at] lists
> > Unsubscribe : https://launchpad.net/~openstack
> > More help : https://help.launchpad.net/ListHelp
> >
>


nathanael.i.burton at gmail

Jun 14, 2012, 5:50 PM

Post #8 of 10 (1422 views)
Permalink
Re: instances loosing IP address while running, due to No DHCPOFFER [In reply to]

FWIW I haven't run across the dnsmasq bug in our environment using EPEL
packages.

Nate
On Jun 14, 2012 7:20 PM, "Vishvananda Ishaya" <vishvananda [at] gmail> wrote:

> Are you running in VLAN mode? If so, you probably need to update to a new
> version of dnsmasq. See this message for reference:
>
> http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html
>
> Vish
>
> On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote:
>
> Hey all,
>
> I feel really sad with saying this, now, that we have quite a few
> instances in producgtion
> since about 5 days at least, I now have encountered the second instance
> loosing its
> IP address due to "No DHCPOFFER" (as of syslog in the instance).
>
> I checked the logs in the central nova-network and gateway node and found
> dnsmasq still to reply on requests from all the other instances and it even
> got the request from the instance in question and even sent an OFFER, as
> of what
> I can tell by now (i'm investigating / posting logs asap), but while it
> seemed
> that the dnsmasq sends an offer, the instances says it didn't receive one
> - wtf?
>
> Please tell me what I can do to actually *fix* this issue, since this is
> by far very fatal.
>
> One chance I'd see (as a workaround) is, to let created instanced retrieve
> its IP via dhcp, but then reconfigure /etc/network/instances to continue
> with
> static networking setup. However, I'd just like the dhcp thingy to get
> fixed.
>
> I'm very open to any kind of helping comments, :)
>
> So long,
> Christian.
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
>
>


trapni at gmail

Jun 15, 2012, 4:19 PM

Post #9 of 10 (1413 views)
Permalink
Re: instances loosing IP address while running, due to No DHCPOFFER [In reply to]

Hey all,

it now just happened twice again, both just today. and the last at 22:00
UTC, with
the following in the nova-network's syslog:

root [at] gw:/var/log# grep 'dnsmasq.*10889' daemon.log
Jun 15 17:39:32 cesar1 dnsmasq[10889]: started, version v2.62-7-g4ce4f37
cachesize 150
Jun 15 17:39:32 cesar1 dnsmasq[10889]: compile time options: IPv6
GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack
Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: DHCP, static leases only on
10.10.40.3, lease time 3d
Jun 15 17:39:32 cesar1 dnsmasq[10889]: reading /etc/resolv.conf
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 4.2.2.1#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 178.63.26.173#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.122#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.121#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: read /etc/hosts - 519 addresses
Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: read
/var/lib/nova/networks/nova-br100.conf
Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPREQUEST(br100) 10.10.40.16
fa:16:3e:3d:ff:f3
Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPACK(br100) 10.10.40.16
fa:16:3e:3d:ff:f3 redis-appdata1

it seemed that this once VM was the only one who sent a dhcp request over
the past 5 hours,
and that first wone got replied with dhcp ack, and that is it.
That's been the time the host behind that IP (redis-appdata1) stopped
functioning.

However, I now actually did update dnsmasq on our gateway note, to latest
trunk
of dnsmasq git repository, killed dnsmasq, restarted nova-network (which
auto-starts dnsmasq per
device).

Now, I really hoped that this one particular bug fix was the cause of the
downtime,
but appearently, thet MIGHT be another factor.

There is unfortunately nothing to read in the VM's syslog.
What else could cause the VM to forget its IP?
Can this also be caused by send_arp_for_ha=True?

Regards,
Christian.

Christian.
On Fri, Jun 15, 2012 at 2:50 AM, Nathanael Burton <
nathanael.i.burton [at] gmail> wrote:

> FWIW I haven't run across the dnsmasq bug in our environment using EPEL
> packages.
>
> Nate
> On Jun 14, 2012 7:20 PM, "Vishvananda Ishaya" <vishvananda [at] gmail>
> wrote:
>
>> Are you running in VLAN mode? If so, you probably need to update to a new
>> version of dnsmasq. See this message for reference:
>>
>> http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html
>>
>> Vish
>>
>> On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote:
>>
>> Hey all,
>>
>> I feel really sad with saying this, now, that we have quite a few
>> instances in producgtion
>> since about 5 days at least, I now have encountered the second instance
>> loosing its
>> IP address due to "No DHCPOFFER" (as of syslog in the instance).
>>
>> I checked the logs in the central nova-network and gateway node and found
>> dnsmasq still to reply on requests from all the other instances and it
>> even
>> got the request from the instance in question and even sent an OFFER, as
>> of what
>> I can tell by now (i'm investigating / posting logs asap), but while it
>> seemed
>> that the dnsmasq sends an offer, the instances says it didn't receive one
>> - wtf?
>>
>> Please tell me what I can do to actually *fix* this issue, since this is
>> by far very fatal.
>>
>> One chance I'd see (as a workaround) is, to let created instanced retrieve
>> its IP via dhcp, but then reconfigure /etc/network/instances to continue
>> with
>> static networking setup. However, I'd just like the dhcp thingy to get
>> fixed.
>>
>> I'm very open to any kind of helping comments, :)
>>
>> So long,
>> Christian.
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack [at] lists
>> Unsubscribe : https://launchpad.net/~openstack
>> More help : https://help.launchpad.net/ListHelp
>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to : openstack [at] lists
>> Unsubscribe : https://launchpad.net/~openstack
>> More help : https://help.launchpad.net/ListHelp
>>
>>


tom.sante at gmail

Jun 27, 2012, 3:59 AM

Post #10 of 10 (1385 views)
Permalink
Re: instances loosing IP address while running, due to No DHCPOFFER [In reply to]

Hey,

I seem to have the same issue with our VMs, I commented (comment #7) on a bug report that seems to correspond with our DHCP issues:
https://bugs.launchpad.net/nova/+bug/887162

Please report if you are still affected by this issue on the bug page so the developers can look into a fix.

Regards,


Op zaterdag 16 juni 2012, om 01:19 heeft Christian Parpart het volgende geschreven:

> Hey all,
>
> it now just happened twice again, both just today. and the last at 22:00 UTC, with
> the following in the nova-network's syslog:
>
> root [at] gw:/var/log# grep 'dnsmasq.*10889' daemon.log
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: started, version v2.62-7-g4ce4f37 cachesize 150
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack
> Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: DHCP, static leases only on 10.10.40.3, lease time 3d
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: reading /etc/resolv.conf
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 4.2.2.1#53
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 178.63.26.173#53
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.122#53
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.121#53
> Jun 15 17:39:32 cesar1 dnsmasq[10889]: read /etc/hosts - 519 addresses
> Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: read /var/lib/nova/networks/nova-br100.conf
> Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPREQUEST(br100) 10.10.40.16 fa:16:3e:3d:ff:f3
> Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPACK(br100) 10.10.40.16 fa:16:3e:3d:ff:f3 redis-appdata1
>
> it seemed that this once VM was the only one who sent a dhcp request over the past 5 hours,
> and that first wone got replied with dhcp ack, and that is it.
> That's been the time the host behind that IP (redis-appdata1) stopped functioning.
>
> However, I now actually did update dnsmasq on our gateway note, to latest trunk
> of dnsmasq git repository, killed dnsmasq, restarted nova-network (which auto-starts dnsmasq per
> device).
>
> Now, I really hoped that this one particular bug fix was the cause of the downtime,
> but appearently, thet MIGHT be another factor.
>
> There is unfortunately nothing to read in the VM's syslog.
> What else could cause the VM to forget its IP?
> Can this also be caused by send_arp_for_ha=True?
>
> Regards,
> Christian.
>
> Christian.
> On Fri, Jun 15, 2012 at 2:50 AM, Nathanael Burton <nathanael.i.burton [at] gmail (mailto:nathanael.i.burton [at] gmail)> wrote:
> > FWIW I haven't run across the dnsmasq bug in our environment using EPEL packages.
> > Nate
> > On Jun 14, 2012 7:20 PM, "Vishvananda Ishaya" <vishvananda [at] gmail (mailto:vishvananda [at] gmail)> wrote:
> > > Are you running in VLAN mode? If so, you probably need to update to a new version of dnsmasq. See this message for reference:
> > >
> > > http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html
> > >
> > > Vish
> > >
> > > On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote:
> > > > Hey all,
> > > >
> > > > I feel really sad with saying this, now, that we have quite a few instances in producgtion
> > > > since about 5 days at least, I now have encountered the second instance loosing its
> > > > IP address due to "No DHCPOFFER" (as of syslog in the instance).
> > > >
> > > > I checked the logs in the central nova-network and gateway node and found
> > > > dnsmasq still to reply on requests from all the other instances and it even
> > > > got the request from the instance in question and even sent an OFFER, as of what
> > > > I can tell by now (i'm investigating / posting logs asap), but while it seemed
> > > > that the dnsmasq sends an offer, the instances says it didn't receive one - wtf?
> > > >
> > > > Please tell me what I can do to actually *fix* this issue, since this is by far very fatal.
> > > >
> > > > One chance I'd see (as a workaround) is, to let created instanced retrieve
> > > > its IP via dhcp, but then reconfigure /etc/network/instances to continue with
> > > > static networking setup. However, I'd just like the dhcp thingy to get fixed.
> > > >
> > > > I'm very open to any kind of helping comments, :)
> > > >
> > > > So long,
> > > > Christian.
> > > >
> > > > _______________________________________________
> > > > Mailing list: https://launchpad.net/~openstack
> > > > Post to : openstack [at] lists (mailto:openstack [at] lists)
> > > > Unsubscribe : https://launchpad.net/~openstack
> > > > More help : https://help.launchpad.net/ListHelp
> > >
> > >
> > >
> > > _______________________________________________
> > > Mailing list: https://launchpad.net/~openstack
> > > Post to : openstack [at] lists (mailto:openstack [at] lists)
> > > Unsubscribe : https://launchpad.net/~openstack
> > > More help : https://help.launchpad.net/ListHelp
> >
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack [at] lists (mailto:openstack [at] lists)
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp




_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to : openstack [at] lists
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp

OpenStack dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.