
jervin at email
Aug 15, 2002, 8:53 AM
Post #1 of 10
(1484 views)
Permalink
|
|
arriba calculation? (slightly long)
|
|
We have an odd problem; we recently migrated our application to 4 new servers, identically configured with identical hardware. mod_backhand 1.2.1 is installed on all servers. Upon startup, the Arriba value calculated for each of these servers varies wildly: server 1: 625364 server 2: 723409 server 3: 643787 server 4: 562925 These results persist after a restart of the Apache server and removal of the old Arriba files. Our backhand configuration is as follows: BackhandSelfRedirect On Backhand byAge Backhand byRandom Backhand byLogWindow Backhand addSelf Backhand byBusyChildren 1 The weight value on the byBusyChildren directive is intended to prevent backhanding unless load goes over 1. We were testing this value to see how it worked. These servers are 2-processor Sun 280Rs, and have no problem serving requests at low load, so we wanted to let users who landed on the server stay on the server unless load went above a certain threshold (which would probably indicate a problem on the server, or an especially long-running CGI). We intended to increase the threshold somewhat if initial results were satisfactory. This could have been more thoroughly tested, except that the application in question is an web-based e-mail client, and it's difficult to emulate real-world conditions for that, for economic and other reasons. When in production, we're observing excessive usage of server 4; to the point where it's doing up to 50% of the work, according to our statistics. I can pinpoint three possible reasons for this: 1) Backhand is working properly (i.e., letting requests stay on the recipient server at low load), but more requests are coming in to server 4. This may be the result of misconfigured DNS servers that aren't caching our round-robin DNS entry correctly. I know that all Windows 2000 and higher boxes now have a built-in caching DNS server, so this is not out of the realm of possibility. 2) The Arriba "miscalculation" is causing more requests to go to server 4. (By implication, wouldn't this mean fewer requests would end up on server 2? However, we don't observe this; load seems to be evenly spread over the three remaining boxes; if anything, server 2 is doing slightly more work than servers 1 and 3.) 3) Server 4 is actually faster, somehow, despite being identical in every way. I tend to discount option 3, and option 1 isn't really a topic for this list. However, I can't seem to find any solid information on how the Arriba value is calculated. Does anyone know of any? I looked at arriba.c slightly, but I'm not very adept at C--it appears to create 12 threads, measure the time it took, and calculate Arriba based on that--I was hoping to confirm that suspicion. Also, with this backhand configuration, does the Arriba value have any import anyway? I thought putting the "byBusyChildren" directive last in the list would cause the redirection to happen solely on the basis of the number of busy Apache servers, which is essentially an estimate of the length of the run queue, and ignore the Arriba or other resource estimates on the server. We've avoided using the byLoad directive because we found that in a cluster of servers with different hardware configurations, byLoad tended to place too much emphasis on servers with extra memory. Lastly, if option 1 is actually what's happening, shouldn't removing the weight value on byBusyChildren cause server 4 to begin redirecting more requests to the other servers? That would be acceptable. I apologize for not being able to provide more information about what's actually happening; the application is in production and for performance reasons, we're chose not to enable logging of backhand information; unfortunately, the application is dynamic and session-based, so now it's politically expedient to wait for an appropriate time of low usage to restart one or more of the servers to put the logging directives in. Thanks for any suggestions- James Ervin UNC-Chapel Hill
|