Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: OpenStack: Operators

Problem with Heavy Network IO and Dnsmasq

 

 

OpenStack operators RSS feed   Index | Next | Previous | View Threaded


vachon at sessionm

Aug 15, 2012, 4:19 AM

Post #1 of 24 (3043 views)
Permalink
Problem with Heavy Network IO and Dnsmasq

I reported this as a bug here: https://bugs.launchpad.net/nova/+bug/1037065

However, I was looking to see if anyone else has seen this.

<snip>

I was running a load test against a 4 node Cassandra cluster in
Openstack. I have separate tenancy for each node to ensure there was
no funny contention. Running the test 3 times produced the same
results each time.

About 1/3 of the way through the test, the dnsmasq process crashes
(with no warning or error in any log). The instance will continue
"working" but only inside of the VNC console as all outside
connectivity is now unroutable.

Here is a log from the dnsmasq process. The first two rows show that
dnsmasq was working, then it just fails to route correctly back to the
instance.

</snip>


--
Thomas Vachon
Principal Operations Architect
session M

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


narayan.desai at gmail

Aug 15, 2012, 5:00 AM

Post #2 of 24 (2957 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 6:19 AM, Thomas Vachon <vachon [at] sessionm> wrote:
> I reported this as a bug here: https://bugs.launchpad.net/nova/+bug/1037065
>
> However, I was looking to see if anyone else has seen this.
>
> <snip>
>
> I was running a load test against a 4 node Cassandra cluster in
> Openstack. I have separate tenancy for each node to ensure there was
> no funny contention. Running the test 3 times produced the same
> results each time.
>
> About 1/3 of the way through the test, the dnsmasq process crashes
> (with no warning or error in any log). The instance will continue
> "working" but only inside of the VNC console as all outside
> connectivity is now unroutable.
>
> Here is a log from the dnsmasq process. The first two rows show that
> dnsmasq was working, then it just fails to route correctly back to the
> instance.

I ran into some sort of virtio-net bug that manifested itself in a
similar fashion recently. (Ubuntu Precise, fwiw). Basically, when
moving large quantities of network traffic into VMs on some node types
(but not others, oddly enough). In my case, it looked like dnsmasq was
failing, but the process was still running; it had just stopped
getting requests from the clients. Rebooting instances would bring
them back into service.

Can you bring the network back up via VNC? If so, this isn't the same
as the issue I saw. If you can't then something is stuck in virtio. I
was able to work around the problem by enabling the vhost_net module
in the hypervisor.
-nld

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


vachon at sessionm

Aug 15, 2012, 5:24 AM

Post #3 of 24 (2962 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 8:00 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
> On Wed, Aug 15, 2012 at 6:19 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>> I reported this as a bug here: https://bugs.launchpad.net/nova/+bug/1037065
>>
>> However, I was looking to see if anyone else has seen this.
>>
>> <snip>
>>
>> I was running a load test against a 4 node Cassandra cluster in
>> Openstack. I have separate tenancy for each node to ensure there was
>> no funny contention. Running the test 3 times produced the same
>> results each time.
>>
>> About 1/3 of the way through the test, the dnsmasq process crashes
>> (with no warning or error in any log). The instance will continue
>> "working" but only inside of the VNC console as all outside
>> connectivity is now unroutable.
>>
>> Here is a log from the dnsmasq process. The first two rows show that
>> dnsmasq was working, then it just fails to route correctly back to the
>> instance.
>
> I ran into some sort of virtio-net bug that manifested itself in a
> similar fashion recently. (Ubuntu Precise, fwiw). Basically, when
> moving large quantities of network traffic into VMs on some node types
> (but not others, oddly enough). In my case, it looked like dnsmasq was
> failing, but the process was still running; it had just stopped
> getting requests from the clients. Rebooting instances would bring
> them back into service.
>
> Can you bring the network back up via VNC? If so, this isn't the same
> as the issue I saw. If you can't then something is stuck in virtio. I
> was able to work around the problem by enabling the vhost_net module
> in the hypervisor.
> -nld

I am also on Precise. Downing/Up'ing the interface via VNC did work.
How exactly did you setup vhost_net, can you provide your libvirt?

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


narayan.desai at gmail

Aug 15, 2012, 5:53 AM

Post #4 of 24 (2953 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon <vachon [at] sessionm> wrote:

> I am also on Precise. Downing/Up'ing the interface via VNC did work.
> How exactly did you setup vhost_net, can you provide your libvirt?

Hm, that might mean that isn't the problem, but this is an easy enough
thing to check.

Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
all you need to do is modprobe it on the hypervisor.
-nld

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


vachon at sessionm

Aug 15, 2012, 5:54 AM

Post #5 of 24 (2953 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
> On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>
>> I am also on Precise. Downing/Up'ing the interface via VNC did work.
>> How exactly did you setup vhost_net, can you provide your libvirt?
>
> Hm, that might mean that isn't the problem, but this is an easy enough
> thing to check.
>
> Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
> all you need to do is modprobe it on the hypervisor.
> -nld

Found the docs saying that. It might help, I will report back on this
thread when I run the tests in a bit

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


vachon at sessionm

Aug 15, 2012, 6:58 AM

Post #6 of 24 (2955 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>
> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon <vachon [at] sessionm> wrote:
> >
> >> I am also on Precise. Downing/Up'ing the interface via VNC did work.
> >> How exactly did you setup vhost_net, can you provide your libvirt?
> >
> > Hm, that might mean that isn't the problem, but this is an easy enough
> > thing to check.
> >
> > Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
> > all you need to do is modprobe it on the hypervisor.
> > -nld
>
> Found the docs saying that. It might help, I will report back on this
> thread when I run the tests in a bit

OK, I lost all connectivity between nova-network and my vm's. DNSMasq
is running and I added the vhost_net module (and confirmed libvirt
sees it). Do I need to do something to the guests now too?

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


vachon at sessionm

Aug 15, 2012, 7:16 AM

Post #7 of 24 (2952 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 9:58 AM, Thomas Vachon <vachon [at] sessionm> wrote:
> On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>
>> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>> >
>> >> I am also on Precise. Downing/Up'ing the interface via VNC did work.
>> >> How exactly did you setup vhost_net, can you provide your libvirt?
>> >
>> > Hm, that might mean that isn't the problem, but this is an easy enough
>> > thing to check.
>> >
>> > Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
>> > all you need to do is modprobe it on the hypervisor.
>> > -nld
>>
>> Found the docs saying that. It might help, I will report back on this
>> thread when I run the tests in a bit
>
> OK, I lost all connectivity between nova-network and my vm's. DNSMasq
> is running and I added the vhost_net module (and confirmed libvirt
> sees it). Do I need to do something to the guests now too?

I think I found the root bug:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


narayan.desai at gmail

Aug 15, 2012, 7:28 AM

Post #8 of 24 (2949 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 9:16 AM, Thomas Vachon <vachon [at] sessionm> wrote:
> On Wed, Aug 15, 2012 at 9:58 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>> On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>
>>> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>>> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>> >
>>> >> I am also on Precise. Downing/Up'ing the interface via VNC did work.
>>> >> How exactly did you setup vhost_net, can you provide your libvirt?
>>> >
>>> > Hm, that might mean that isn't the problem, but this is an easy enough
>>> > thing to check.
>>> >
>>> > Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
>>> > all you need to do is modprobe it on the hypervisor.
>>> > -nld
>>>
>>> Found the docs saying that. It might help, I will report back on this
>>> thread when I run the tests in a bit
>>
>> OK, I lost all connectivity between nova-network and my vm's. DNSMasq
>> is running and I added the vhost_net module (and confirmed libvirt
>> sees it). Do I need to do something to the guests now too?
>
> I think I found the root bug:
> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978

Yeah, this is where i found the vhost_net workaround. (it seems to
work for some people and not for others, so I suspect the problem is
more complicated somehow)
-nld

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


vachon at sessionm

Aug 15, 2012, 7:31 AM

Post #9 of 24 (2945 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 10:28 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
> On Wed, Aug 15, 2012 at 9:16 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>> On Wed, Aug 15, 2012 at 9:58 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>> On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>
>>>> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>>>> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>> >
>>>> >> I am also on Precise. Downing/Up'ing the interface via VNC did work.
>>>> >> How exactly did you setup vhost_net, can you provide your libvirt?
>>>> >
>>>> > Hm, that might mean that isn't the problem, but this is an easy enough
>>>> > thing to check.
>>>> >
>>>> > Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
>>>> > all you need to do is modprobe it on the hypervisor.
>>>> > -nld
>>>>
>>>> Found the docs saying that. It might help, I will report back on this
>>>> thread when I run the tests in a bit
>>>
>>> OK, I lost all connectivity between nova-network and my vm's. DNSMasq
>>> is running and I added the vhost_net module (and confirmed libvirt
>>> sees it). Do I need to do something to the guests now too?
>>
>> I think I found the root bug:
>> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978
>
> Yeah, this is where i found the vhost_net workaround. (it seems to
> work for some people and not for others, so I suspect the problem is
> more complicated somehow)
> -nld

The workaround of adding it? When I added it, I lose all connectivity
to the instances.

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


narayan.desai at gmail

Aug 15, 2012, 7:32 AM

Post #10 of 24 (2959 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 9:31 AM, Thomas Vachon <vachon [at] sessionm> wrote:
> On Wed, Aug 15, 2012 at 10:28 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>> On Wed, Aug 15, 2012 at 9:16 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>> On Wed, Aug 15, 2012 at 9:58 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>> On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>>
>>>>> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>>>>> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>> >
>>>>> >> I am also on Precise. Downing/Up'ing the interface via VNC did work.
>>>>> >> How exactly did you setup vhost_net, can you provide your libvirt?
>>>>> >
>>>>> > Hm, that might mean that isn't the problem, but this is an easy enough
>>>>> > thing to check.
>>>>> >
>>>>> > Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
>>>>> > all you need to do is modprobe it on the hypervisor.
>>>>> > -nld
>>>>>
>>>>> Found the docs saying that. It might help, I will report back on this
>>>>> thread when I run the tests in a bit
>>>>
>>>> OK, I lost all connectivity between nova-network and my vm's. DNSMasq
>>>> is running and I added the vhost_net module (and confirmed libvirt
>>>> sees it). Do I need to do something to the guests now too?
>>>
>>> I think I found the root bug:
>>> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978
>>
>> Yeah, this is where i found the vhost_net workaround. (it seems to
>> work for some people and not for others, so I suspect the problem is
>> more complicated somehow)
>> -nld
>
> The workaround of adding it? When I added it, I lose all connectivity
> to the instances.

Yeah, I think that all instances will need to be restarted. Or freshly
spawned even.
-nld

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


vachon at sessionm

Aug 15, 2012, 7:35 AM

Post #11 of 24 (2955 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 10:32 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
> On Wed, Aug 15, 2012 at 9:31 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>> On Wed, Aug 15, 2012 at 10:28 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>>> On Wed, Aug 15, 2012 at 9:16 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>> On Wed, Aug 15, 2012 at 9:58 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>> On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>>>
>>>>>> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>>>>>> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>>> >
>>>>>> >> I am also on Precise. Downing/Up'ing the interface via VNC did work.
>>>>>> >> How exactly did you setup vhost_net, can you provide your libvirt?
>>>>>> >
>>>>>> > Hm, that might mean that isn't the problem, but this is an easy enough
>>>>>> > thing to check.
>>>>>> >
>>>>>> > Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
>>>>>> > all you need to do is modprobe it on the hypervisor.
>>>>>> > -nld
>>>>>>
>>>>>> Found the docs saying that. It might help, I will report back on this
>>>>>> thread when I run the tests in a bit
>>>>>
>>>>> OK, I lost all connectivity between nova-network and my vm's. DNSMasq
>>>>> is running and I added the vhost_net module (and confirmed libvirt
>>>>> sees it). Do I need to do something to the guests now too?
>>>>
>>>> I think I found the root bug:
>>>> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978
>>>
>>> Yeah, this is where i found the vhost_net workaround. (it seems to
>>> work for some people and not for others, so I suspect the problem is
>>> more complicated somehow)
>>> -nld
>>
>> The workaround of adding it? When I added it, I lose all connectivity
>> to the instances.
>
> Yeah, I think that all instances will need to be restarted. Or freshly
> spawned even.
> -nld

I rebooted the whole thing (physical and all). I also launched a
Cirros image and it had no connectivity. I removed it, rebooted the
physical and I can connect now at least.

It seems to be highly related to bonding multiple nic's together on
the physical host for FT and HA.

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


narayan.desai at gmail

Aug 15, 2012, 7:40 AM

Post #12 of 24 (2950 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 9:35 AM, Thomas Vachon <vachon [at] sessionm> wrote:
> On Wed, Aug 15, 2012 at 10:32 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>> On Wed, Aug 15, 2012 at 9:31 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>> On Wed, Aug 15, 2012 at 10:28 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>>>> On Wed, Aug 15, 2012 at 9:16 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>> On Wed, Aug 15, 2012 at 9:58 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>>> On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>>>>
>>>>>>> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>>>>>>> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>>>> >
>>>>>>> >> I am also on Precise. Downing/Up'ing the interface via VNC did work.
>>>>>>> >> How exactly did you setup vhost_net, can you provide your libvirt?
>>>>>>> >
>>>>>>> > Hm, that might mean that isn't the problem, but this is an easy enough
>>>>>>> > thing to check.
>>>>>>> >
>>>>>>> > Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
>>>>>>> > all you need to do is modprobe it on the hypervisor.
>>>>>>> > -nld
>>>>>>>
>>>>>>> Found the docs saying that. It might help, I will report back on this
>>>>>>> thread when I run the tests in a bit
>>>>>>
>>>>>> OK, I lost all connectivity between nova-network and my vm's. DNSMasq
>>>>>> is running and I added the vhost_net module (and confirmed libvirt
>>>>>> sees it). Do I need to do something to the guests now too?
>>>>>
>>>>> I think I found the root bug:
>>>>> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978
>>>>
>>>> Yeah, this is where i found the vhost_net workaround. (it seems to
>>>> work for some people and not for others, so I suspect the problem is
>>>> more complicated somehow)
>>>> -nld
>>>
>>> The workaround of adding it? When I added it, I lose all connectivity
>>> to the instances.
>>
>> Yeah, I think that all instances will need to be restarted. Or freshly
>> spawned even.
>> -nld
>
> I rebooted the whole thing (physical and all). I also launched a
> Cirros image and it had no connectivity. I removed it, rebooted the
> physical and I can connect now at least.
>
> It seems to be highly related to bonding multiple nic's together on
> the physical host for FT and HA.

Hm. That could be. I wasn't doing that. Oddly enough, there was only
one type of system (HP DL580G7) that had this issue; all of my other
nodes didn't suffer from the same issue.
-nld

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


vachon at sessionm

Aug 15, 2012, 7:44 AM

Post #13 of 24 (2954 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 10:40 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
> On Wed, Aug 15, 2012 at 9:35 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>> On Wed, Aug 15, 2012 at 10:32 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>>> On Wed, Aug 15, 2012 at 9:31 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>> On Wed, Aug 15, 2012 at 10:28 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>>>>> On Wed, Aug 15, 2012 at 9:16 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>>> On Wed, Aug 15, 2012 at 9:58 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>>>> On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>>>>>
>>>>>>>> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai <narayan.desai [at] gmail> wrote:
>>>>>>>> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>>>>>>> >
>>>>>>>> >> I am also on Precise. Downing/Up'ing the interface via VNC did work.
>>>>>>>> >> How exactly did you setup vhost_net, can you provide your libvirt?
>>>>>>>> >
>>>>>>>> > Hm, that might mean that isn't the problem, but this is an easy enough
>>>>>>>> > thing to check.
>>>>>>>> >
>>>>>>>> > Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
>>>>>>>> > all you need to do is modprobe it on the hypervisor.
>>>>>>>> > -nld
>>>>>>>>
>>>>>>>> Found the docs saying that. It might help, I will report back on this
>>>>>>>> thread when I run the tests in a bit
>>>>>>>
>>>>>>> OK, I lost all connectivity between nova-network and my vm's. DNSMasq
>>>>>>> is running and I added the vhost_net module (and confirmed libvirt
>>>>>>> sees it). Do I need to do something to the guests now too?
>>>>>>
>>>>>> I think I found the root bug:
>>>>>> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978
>>>>>
>>>>> Yeah, this is where i found the vhost_net workaround. (it seems to
>>>>> work for some people and not for others, so I suspect the problem is
>>>>> more complicated somehow)
>>>>> -nld
>>>>
>>>> The workaround of adding it? When I added it, I lose all connectivity
>>>> to the instances.
>>>
>>> Yeah, I think that all instances will need to be restarted. Or freshly
>>> spawned even.
>>> -nld
>>
>> I rebooted the whole thing (physical and all). I also launched a
>> Cirros image and it had no connectivity. I removed it, rebooted the
>> physical and I can connect now at least.
>>
>> It seems to be highly related to bonding multiple nic's together on
>> the physical host for FT and HA.
>
> Hm. That could be. I wasn't doing that. Oddly enough, there was only
> one type of system (HP DL580G7) that had this issue; all of my other
> nodes didn't suffer from the same issue.
> -nld

I'm using Dell R620's and seeing this. They use Broadcom NetExtreme II's

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


narayan.desai at gmail

Aug 15, 2012, 7:53 AM

Post #14 of 24 (2960 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 9:44 AM, Thomas Vachon <vachon [at] sessionm> wrote:

> I'm using Dell R620's and seeing this. They use Broadcom NetExtreme II's

That is the same chipset that we have on our HPs. Maybe this is a bug
in the driver? Are you using the bnx2 firmware, or running without?
-nld

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


enelen at helioscloud

Aug 15, 2012, 7:56 AM

Post #15 of 24 (2955 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 5:31 PM, Thomas Vachon <vachon [at] sessionm> wrote:

> On Wed, Aug 15, 2012 at 10:28 AM, Narayan Desai <narayan.desai [at] gmail>
> wrote:
> > On Wed, Aug 15, 2012 at 9:16 AM, Thomas Vachon <vachon [at] sessionm>
> wrote:
> >> On Wed, Aug 15, 2012 at 9:58 AM, Thomas Vachon <vachon [at] sessionm>
> wrote:
> >>> On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon <vachon [at] sessionm>
> wrote:
> >>>>
> >>>> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai <
> narayan.desai [at] gmail> wrote:
> >>>> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon <vachon [at] sessionm>
> wrote:
> >>>> >
> >>>> >> I am also on Precise. Downing/Up'ing the interface via VNC did
> work.
> >>>> >> How exactly did you setup vhost_net, can you provide your libvirt?
> >>>> >
> >>>> > Hm, that might mean that isn't the problem, but this is an easy
> enough
> >>>> > thing to check.
> >>>> >
> >>>> > Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
> >>>> > all you need to do is modprobe it on the hypervisor.
> >>>> > -nld
> >>>>
> >>>> Found the docs saying that. It might help, I will report back on this
> >>>> thread when I run the tests in a bit
> >>>
> >>> OK, I lost all connectivity between nova-network and my vm's. DNSMasq
> >>> is running and I added the vhost_net module (and confirmed libvirt
> >>> sees it). Do I need to do something to the guests now too?
> >>
> >> I think I found the root bug:
> >> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978
> >
> > Yeah, this is where i found the vhost_net workaround. (it seems to
> > work for some people and not for others, so I suspect the problem is
> > more complicated somehow)
> > -nld
>
> The workaround of adding it? When I added it, I lose all connectivity
> to the instances.
>

I can confirm it.
Instance can not get ip address from dnsmasq when vhost_net module is
loaded.
http://www.linux-kvm.org/page/VhostNet see Caveats link.

When I use virtio for instances and vhost_net module is not loaded
my instances lose network connectivity.


>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators [at] lists
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>


vachon at sessionm

Aug 15, 2012, 8:01 AM

Post #16 of 24 (2977 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 10:56 AM, Eugene Nelen <enelen [at] helioscloud> wrote:
>
>
> On Wed, Aug 15, 2012 at 5:31 PM, Thomas Vachon <vachon [at] sessionm> wrote:
>>
>> On Wed, Aug 15, 2012 at 10:28 AM, Narayan Desai <narayan.desai [at] gmail>
>> wrote:
>> > On Wed, Aug 15, 2012 at 9:16 AM, Thomas Vachon <vachon [at] sessionm>
>> > wrote:
>> >> On Wed, Aug 15, 2012 at 9:58 AM, Thomas Vachon <vachon [at] sessionm>
>> >> wrote:
>> >>> On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon <vachon [at] sessionm>
>> >>> wrote:
>> >>>>
>> >>>> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai
>> >>>> <narayan.desai [at] gmail> wrote:
>> >>>> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon
>> >>>> > <vachon [at] sessionm> wrote:
>> >>>> >
>> >>>> >> I am also on Precise. Downing/Up'ing the interface via VNC did
>> >>>> >> work.
>> >>>> >> How exactly did you setup vhost_net, can you provide your libvirt?
>> >>>> >
>> >>>> > Hm, that might mean that isn't the problem, but this is an easy
>> >>>> > enough
>> >>>> > thing to check.
>> >>>> >
>> >>>> > Setting up vhost_net is pretty easy; it is auto-detected by kvm, so
>> >>>> > all you need to do is modprobe it on the hypervisor.
>> >>>> > -nld
>> >>>>
>> >>>> Found the docs saying that. It might help, I will report back on
>> >>>> this
>> >>>> thread when I run the tests in a bit
>> >>>
>> >>> OK, I lost all connectivity between nova-network and my vm's. DNSMasq
>> >>> is running and I added the vhost_net module (and confirmed libvirt
>> >>> sees it). Do I need to do something to the guests now too?
>> >>
>> >> I think I found the root bug:
>> >> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978
>> >
>> > Yeah, this is where i found the vhost_net workaround. (it seems to
>> > work for some people and not for others, so I suspect the problem is
>> > more complicated somehow)
>> > -nld
>>
>> The workaround of adding it? When I added it, I lose all connectivity
>> to the instances.
>
>
> I can confirm it.
> Instance can not get ip address from dnsmasq when vhost_net module is
> loaded.
> http://www.linux-kvm.org/page/VhostNet see Caveats link.
>
> When I use virtio for instances and vhost_net module is not loaded
> my instances lose network connectivity.
>
>>

I actually have tg3 loaded, not bnx2. I just realized they are
NetXtreme not II's

01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
Gigabit Ethernet PCIe
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
Gigabit Ethernet PCIe


Eugene, looking at that caveat, I couldn't find a "solution", did you see one?

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


enelen at helioscloud

Aug 15, 2012, 8:19 AM

Post #17 of 24 (2977 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 6:01 PM, Thomas Vachon <vachon [at] sessionm> wrote:

> On Wed, Aug 15, 2012 at 10:56 AM, Eugene Nelen <enelen [at] helioscloud>
> wrote:
> >
> >
> > On Wed, Aug 15, 2012 at 5:31 PM, Thomas Vachon <vachon [at] sessionm>
> wrote:
> >>
> >> On Wed, Aug 15, 2012 at 10:28 AM, Narayan Desai <
> narayan.desai [at] gmail>
> >> wrote:
> >> > On Wed, Aug 15, 2012 at 9:16 AM, Thomas Vachon <vachon [at] sessionm>
> >> > wrote:
> >> >> On Wed, Aug 15, 2012 at 9:58 AM, Thomas Vachon <vachon [at] sessionm>
> >> >> wrote:
> >> >>> On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon <vachon [at] sessionm
> >
> >> >>> wrote:
> >> >>>>
> >> >>>> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai
> >> >>>> <narayan.desai [at] gmail> wrote:
> >> >>>> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon
> >> >>>> > <vachon [at] sessionm> wrote:
> >> >>>> >
> >> >>>> >> I am also on Precise. Downing/Up'ing the interface via VNC did
> >> >>>> >> work.
> >> >>>> >> How exactly did you setup vhost_net, can you provide your
> libvirt?
> >> >>>> >
> >> >>>> > Hm, that might mean that isn't the problem, but this is an easy
> >> >>>> > enough
> >> >>>> > thing to check.
> >> >>>> >
> >> >>>> > Setting up vhost_net is pretty easy; it is auto-detected by kvm,
> so
> >> >>>> > all you need to do is modprobe it on the hypervisor.
> >> >>>> > -nld
> >> >>>>
> >> >>>> Found the docs saying that. It might help, I will report back on
> >> >>>> this
> >> >>>> thread when I run the tests in a bit
> >> >>>
> >> >>> OK, I lost all connectivity between nova-network and my vm's.
> DNSMasq
> >> >>> is running and I added the vhost_net module (and confirmed libvirt
> >> >>> sees it). Do I need to do something to the guests now too?
> >> >>
> >> >> I think I found the root bug:
> >> >> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978
> >> >
> >> > Yeah, this is where i found the vhost_net workaround. (it seems to
> >> > work for some people and not for others, so I suspect the problem is
> >> > more complicated somehow)
> >> > -nld
> >>
> >> The workaround of adding it? When I added it, I lose all connectivity
> >> to the instances.
> >
> >
> > I can confirm it.
> > Instance can not get ip address from dnsmasq when vhost_net module is
> > loaded.
> > http://www.linux-kvm.org/page/VhostNet see Caveats link.
> >
> > When I use virtio for instances and vhost_net module is not loaded
> > my instances lose network connectivity.
> >
> >>
>
> I actually have tg3 loaded, not bnx2. I just realized they are
> NetXtreme not II's
>
> 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
> Gigabit Ethernet PCIe
> 01:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
> Gigabit Ethernet PCIe
> 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
> Gigabit Ethernet PCIe
> 02:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
> Gigabit Ethernet PCIe
>
>
> Eugene, looking at that caveat, I couldn't find a "solution", did you see
> one?
>
No. For now I am not using vhost_net module/virtio for production VMs. I am
using realtek driver.
But performance is really bad.


vachon at sessionm

Aug 20, 2012, 6:44 AM

Post #18 of 24 (2934 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Wed, Aug 15, 2012 at 11:19 AM, Eugene Nelen <enelen [at] helioscloud>
wrote:
>
>
>
> On Wed, Aug 15, 2012 at 6:01 PM, Thomas Vachon <vachon [at] sessionm>
> wrote:
>>
>> On Wed, Aug 15, 2012 at 10:56 AM, Eugene Nelen <enelen [at] helioscloud>
>> wrote:
>> >
>> >
>> > On Wed, Aug 15, 2012 at 5:31 PM, Thomas Vachon <vachon [at] sessionm>
>> > wrote:
>> >>
>> >> On Wed, Aug 15, 2012 at 10:28 AM, Narayan Desai
>> >> <narayan.desai [at] gmail>
>> >> wrote:
>> >> > On Wed, Aug 15, 2012 at 9:16 AM, Thomas Vachon <vachon [at] sessionm>
>> >> > wrote:
>> >> >> On Wed, Aug 15, 2012 at 9:58 AM, Thomas Vachon
>> >> >> <vachon [at] sessionm>
>> >> >> wrote:
>> >> >>> On Wed, Aug 15, 2012 at 8:54 AM, Thomas Vachon
>> >> >>> <vachon [at] sessionm>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> On Wed, Aug 15, 2012 at 8:53 AM, Narayan Desai
>> >> >>>> <narayan.desai [at] gmail> wrote:
>> >> >>>> > On Wed, Aug 15, 2012 at 7:24 AM, Thomas Vachon
>> >> >>>> > <vachon [at] sessionm> wrote:
>> >> >>>> >
>> >> >>>> >> I am also on Precise. Downing/Up'ing the interface via VNC
>> >> >>>> >> did
>> >> >>>> >> work.
>> >> >>>> >> How exactly did you setup vhost_net, can you provide your
>> >> >>>> >> libvirt?
>> >> >>>> >
>> >> >>>> > Hm, that might mean that isn't the problem, but this is an easy
>> >> >>>> > enough
>> >> >>>> > thing to check.
>> >> >>>> >
>> >> >>>> > Setting up vhost_net is pretty easy; it is auto-detected by
>> >> >>>> > kvm, so
>> >> >>>> > all you need to do is modprobe it on the hypervisor.
>> >> >>>> > -nld
>> >> >>>>
>> >> >>>> Found the docs saying that. It might help, I will report back on
>> >> >>>> this
>> >> >>>> thread when I run the tests in a bit
>> >> >>>
>> >> >>> OK, I lost all connectivity between nova-network and my vm's.
>> >> >>> DNSMasq
>> >> >>> is running and I added the vhost_net module (and confirmed libvirt
>> >> >>> sees it). Do I need to do something to the guests now too?
>> >> >>
>> >> >> I think I found the root bug:
>> >> >> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978
>> >> >
>> >> > Yeah, this is where i found the vhost_net workaround. (it seems to
>> >> > work for some people and not for others, so I suspect the problem is
>> >> > more complicated somehow)
>> >> > -nld
>> >>
>> >> The workaround of adding it? When I added it, I lose all connectivity
>> >> to the instances.
>> >
>> >
>> > I can confirm it.
>> > Instance can not get ip address from dnsmasq when vhost_net module is
>> > loaded.
>> > http://www.linux-kvm.org/page/VhostNet see Caveats link.
>> >
>> > When I use virtio for instances and vhost_net module is not loaded
>> > my instances lose network connectivity.
>> >
>> >>
>>
>> I actually have tg3 loaded, not bnx2. I just realized they are
>> NetXtreme not II's
>>
>> 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
>> Gigabit Ethernet PCIe
>> 01:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
>> Gigabit Ethernet PCIe
>> 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
>> Gigabit Ethernet PCIe
>> 02:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5720
>> Gigabit Ethernet PCIe
>>
>>
>> Eugene, looking at that caveat, I couldn't find a "solution", did you see
>> one?
>
> No. For now I am not using vhost_net module/virtio for production VMs. I
> am using realtek driver.
> But performance is really bad.
>
>

OK, I have a feeling I know at least how to "fix" the issue. After
reading the entire thread, it seems to be 100% related to the
checksumming of dhcp in QEMU (which is broken). Would a simple way to
fix this be to have puppet take the given DHCP IP, and then put it
into the interfaces file as a static IP? From my understanding, this
is safe as the host server owns that IP until the instance dies. As
long as it is not baked into the base AMI/QCOW2, this seems like the
only way to continue using KVM and not switch to Xen or such.

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


lorin at nimbisservices

Aug 20, 2012, 6:57 AM

Post #19 of 24 (2925 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Aug 20, 2012, at 9:44 AM, Thomas Vachon <vachon [at] sessionm> wrote:

> OK, I have a feeling I know at least how to "fix" the issue. After
> reading the entire thread, it seems to be 100% related to the
> checksumming of dhcp in QEMU (which is broken).

Thomas:

Do you have a link to the upstream bug that describes the DHCP checksum problem in QEMU? I'd like to add it to the docs.


Take care,

Lorin
--
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.com


vachon at sessionm

Aug 20, 2012, 6:59 AM

Post #20 of 24 (2919 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Mon, Aug 20, 2012 at 9:57 AM, Lorin Hochstein
<lorin [at] nimbisservices> wrote:
> On Aug 20, 2012, at 9:44 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>
> OK, I have a feeling I know at least how to "fix" the issue. After
> reading the entire thread, it seems to be 100% related to the
> checksumming of dhcp in QEMU (which is broken).
>
>
> Thomas:
>
> Do you have a link to the upstream bug that describes the DHCP checksum
> problem in QEMU? I'd like to add it to the docs.
>
>
> Take care,
>
> Lorin
> --
> Lorin Hochstein
> Lead Architect - Cloud Services
> Nimbis Services, Inc.
> www.nimbisservices.com
>

This is the best one I can find:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


vachon at sessionm

Aug 20, 2012, 7:47 AM

Post #21 of 24 (2922 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Mon, Aug 20, 2012 at 9:59 AM, Thomas Vachon <vachon [at] sessionm> wrote:
> On Mon, Aug 20, 2012 at 9:57 AM, Lorin Hochstein
> <lorin [at] nimbisservices> wrote:
>> On Aug 20, 2012, at 9:44 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>
>> OK, I have a feeling I know at least how to "fix" the issue. After
>> reading the entire thread, it seems to be 100% related to the
>> checksumming of dhcp in QEMU (which is broken).
>>
>>
>> Thomas:
>>
>> Do you have a link to the upstream bug that describes the DHCP checksum
>> problem in QEMU? I'd like to add it to the docs.
>>
>>
>> Take care,
>>
>> Lorin
>> --
>> Lorin Hochstein
>> Lead Architect - Cloud Services
>> Nimbis Services, Inc.
>> www.nimbisservices.com
>>
>
> This is the best one I can find:
> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978

I can confirm this is still happening in the latest QEMU code released
a few days ago (1.0+noroms-0ubuntu14.1).

According to the thread I referenced above, dropping to 10.04 as your
guest might fix it. I am going to try that now.

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


lorin at nimbisservices

Aug 20, 2012, 9:45 AM

Post #22 of 24 (2921 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Aug 20, 2012, at 9:59 AM, Thomas Vachon <vachon [at] sessionm> wrote:

> On Mon, Aug 20, 2012 at 9:57 AM, Lorin Hochstein
> <lorin [at] nimbisservices> wrote:
>> On Aug 20, 2012, at 9:44 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>
>> OK, I have a feeling I know at least how to "fix" the issue. After
>> reading the entire thread, it seems to be 100% related to the
>> checksumming of dhcp in QEMU (which is broken).
>>
>>
>> Thomas:
>>
>> Do you have a link to the upstream bug that describes the DHCP checksum
>> problem in QEMU? I'd like to add it to the docs.
>>
>>
>> Take care,
>>
>> Lorin
>> --
>> Lorin Hochstein
>> Lead Architect - Cloud Services
>> Nimbis Services, Inc.
>> www.nimbisservices.com
>>
>
> This is the best one I can find:
> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978

Ah, that one. I tried to document that issue in the Compute Admin guide under "KVM: network connectivity works initially, then fails": <http://docs.openstack.org/essex/openstack-compute/admin/content/network-troubleshooting.html#d6e6602>.


Take care,

Lorin
--
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.com


vachon at sessionm

Aug 20, 2012, 9:54 AM

Post #23 of 24 (2932 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Mon, Aug 20, 2012 at 12:45 PM, Lorin Hochstein
<lorin [at] nimbisservices> wrote:
>
> On Aug 20, 2012, at 9:59 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>
> On Mon, Aug 20, 2012 at 9:57 AM, Lorin Hochstein
> <lorin [at] nimbisservices> wrote:
>
> On Aug 20, 2012, at 9:44 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>
> OK, I have a feeling I know at least how to "fix" the issue. After
> reading the entire thread, it seems to be 100% related to the
> checksumming of dhcp in QEMU (which is broken).
>
>
> Thomas:
>
> Do you have a link to the upstream bug that describes the DHCP checksum
> problem in QEMU? I'd like to add it to the docs.
>
>
> Take care,
>
> Lorin
> --
> Lorin Hochstein
> Lead Architect - Cloud Services
> Nimbis Services, Inc.
> www.nimbisservices.com
>
>
> This is the best one I can find:
> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978
>
>
> Ah, that one. I tried to document that issue in the Compute Admin guide
> under "KVM: network connectivity works initially, then fails":
> <http://docs.openstack.org/essex/openstack-compute/admin/content/network-troubleshooting.html#d6e6602>.
>
>
> Take care,
>
> Lorin
> --
> Lorin Hochstein
> Lead Architect - Cloud Services
> Nimbis Services, Inc.
> www.nimbisservices.com
>
>
>
>

I have just completed a test with Cassandra on 10.04 guests where I
inserted 4,000,000 each on average about 512 Mb. I never saw it dump
the network. On average when running a 12.04 guest, I would see it
dump out after about 4GB of writes across the cluster. This problem
seems to be isolated to 12.04 guest QEMU code. I still can run 12.04
as the host, just not the guest.

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators [at] lists
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


lorin at nimbisservices

Aug 20, 2012, 10:12 AM

Post #24 of 24 (2919 views)
Permalink
Re: Problem with Heavy Network IO and Dnsmasq [In reply to]

On Aug 20, 2012, at 12:54 PM, Thomas Vachon <vachon [at] sessionm> wrote:

> On Mon, Aug 20, 2012 at 12:45 PM, Lorin Hochstein
> <lorin [at] nimbisservices> wrote:
>>
>> On Aug 20, 2012, at 9:59 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>
>> On Mon, Aug 20, 2012 at 9:57 AM, Lorin Hochstein
>> <lorin [at] nimbisservices> wrote:
>>
>> On Aug 20, 2012, at 9:44 AM, Thomas Vachon <vachon [at] sessionm> wrote:
>>
>> OK, I have a feeling I know at least how to "fix" the issue. After
>> reading the entire thread, it seems to be 100% related to the
>> checksumming of dhcp in QEMU (which is broken).
>>
>>
>> Thomas:
>>
>> Do you have a link to the upstream bug that describes the DHCP checksum
>> problem in QEMU? I'd like to add it to the docs.
>>
>>
>>
>> This is the best one I can find:
>> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/997978
>>
>>
>> Ah, that one. I tried to document that issue in the Compute Admin guide
>> under "KVM: network connectivity works initially, then fails":
>> <http://docs.openstack.org/essex/openstack-compute/admin/content/network-troubleshooting.html#d6e6602>.
>>
>
> I have just completed a test with Cassandra on 10.04 guests where I
> inserted 4,000,000 each on average about 512 Mb. I never saw it dump
> the network. On average when running a 12.04 guest, I would see it
> dump out after about 4GB of writes across the cluster. This problem
> seems to be isolated to 12.04 guest QEMU code. I still can run 12.04
> as the host, just not the guest.

I'll add to the docs that this bug appears to be specific to Precise guests.


Take care,

Lorin
--
Lorin Hochstein
Lead Architect - Cloud Services
Nimbis Services, Inc.
www.nimbisservices.com

OpenStack operators RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.