
rick.jones2 at hp
Jul 13, 2012, 9:58 AM
Post #1 of 1
(127 views)
Permalink
|
|
Re: [Openstack] Bad performance on physical hosts kvm + bonding + bridging
|
|
On 07/13/2012 06:55 AM, Leandro Reox wrote: > Ok, here is the story, we deployed some inhouse APIs in our Openstack > privade cloud, and we were stressing them up, we realize that some > packages were taking so long, to discard the behavior of the api, we > installed apache, lighttpd and event tried with netcat, of course on > the guest systems running ubuntu 10.10 w/virtio, after getting nuts > modifing sysctl parameters to change the guest behavior, we realized > that if we installed apache, or lighttpd on the PHYSICAL host the > behavior was the same ...., that surprised us, when we try the same > benchmark on a node without bonding, bridging and without any KVM > packages or nova installed, with the same HW specs, the benchmark > passes OK, but if we run the same tests on a spare nova node with > everything installed + bonding + bridging that never run a virtual > guest machine, the test fails too, so, so far: > > Tested on hosts with Ubuntu 10.10, 11.10 and 12.04 > > - Clean node without bonding + briding or KVM - just the eth0 > configured - PASS > - Spare node with bridging - PASS > - Spare node with just bonding (dynamic link aggr mode4) - PASS > - Spare node with nova + kvm + bonding + bridging - FAILS > - Spare node with nova + kvm - PASS > > Is there a chance that working with bridging + bonding + nova some > module get screwed, ill attach the tests , you can see that a small > amount of packages takes TOO LONG, like 3secs, and the overhead time > is on the "CONNECT" phase If I recall correctly, 3 seconds is the default, initial TCP retransmission timeout (at least in older kernels - what is your load generator running?). Between that, and your mentioning connect phase, my first guess (it is only a guess) would be that something is causing TCP SYNchronize segments to be dropped. If that is the case, it should show-up in netstat -s statistics. Snap them on both client and server before the test is started, and after the test is completed, and then run them through something like beforeafter ( ftp://ftp.cup.hp.com/dist/networking/tools ) netstat -s > before.server # run benchmark netstat -s > after.server beforeafter before.server after.server > delta.server less delta.server (As a sanity check, make certain that before.server and after.server have the same number of lines. The habit of Linux's netstat to avoid printing a statistic with a value of zero can, sometimes, confuse beforeafter if a stat appears in after that was not present in before.) It might not be a bad idea to include ethtool -S statistics from each of the interfaces in that procedure as well. rick jones probably a good idea to mention the bonding mode you are using > This is ApacheBench, Version 2.3 <$Revision: 655654 $> > Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ > Licensed to The Apache Software Foundation, http://www.apache.org/ > > Benchmarking 172.16.161.25 (be patient) > Completed 2500 requests > Completed 5000 requests > Completed 7500 requests > Completed 10000 requests > Completed 12500 requests > Completed 15000 requests > Completed 17500 requests > Completed 20000 requests > Completed 22500 requests > Completed 25000 requests > Finished 25000 requests > > > Server Software: Apache/2.2.16 > Server Hostname: 172.16.161.25 > Server Port: 80 > > Document Path: / > Document Length: 177 bytes > > Concurrency Level: 5 > Time taken for tests: 7.493 seconds > Complete requests: 25000 > Failed requests: 0 > Write errors: 0 > Total transferred: 11350000 bytes > HTML transferred: 4425000 bytes > Requests per second: 3336.53 [#/sec] (mean) > Time per request: 1.499 [ms] (mean) > Time per request: 0.300 [ms] (mean, across all concurrent requests) > Transfer rate: 1479.28 [Kbytes/sec] received > > Connection Times (ms) > min mean[+/-sd] median max > Connect: 0 1 46.6 0 3009 > Processing: 0 1 5.7 0 277 > Waiting: 0 0 4.6 0 277 > Total: 0 1 46.9 1 3010 > > Percentage of the requests served within a certain time (ms) > 50% 1 > 66% 1 > 75% 1 > 80% 1 > 90% 1 > 95% 1 > 98% 1 > 99% 1 > 100% 3010 (longest request) > > Regards! > > >
|