Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Wikipedia: Wikitech

Downtime this morning

 

 

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded


agarrett at wikimedia

Nov 16, 2009, 6:54 AM

Post #1 of 1 (449 views)
Permalink
Downtime this morning

Hi all,

There has been some downtime this morning (about 15 minutes) due to a
software update.

I pushed a software update, and immediately servers started crashing
according to nagios. Looking at ganglia, it looks like the issue was
the familiar issue where scap pushes a few 4-CPU apaches into swap,
which then crash and come back a few minutes later. This time,
however, obviously a key memcached node fell over, causing a database
overload, resulting in the site being mostly inaccessible for about
ten minutes.

I prepared to revert the software update, but determined that the
problem was not the software update, and a scap would exacerbate the
issue. The problem resolved itself spontaneously.

We need to fix things up so the scap script is less liable to push
machines into swap :)

--
Andrew Garrett
agarrett [at] wikimedia
http://werdn.us/


_______________________________________________
Wikitech-l mailing list
Wikitech-l [at] lists
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikipedia wikitech RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.