6 back-to-back power outages hit the SOMA neighborhood of San Francisco Tuesday afternoon causing major havoc with popular web services. 365 Main is down, along with craigslist, Technorati, Yelp, AdBrite and SixApart (including TypePad, LiveJournal and Vox). It caused some problems with servers used by Current TV, RedEnvelope and Second Life.
Pacific Gas and Electric (PG&E) is currently working on the issue. It is now estimated that over PG&E 30,000 customers are with out power.
So the big question, where is the backup power at the data centers used by these services? UPS and diesel generators should normally help situation like this.
More coverage:
- OpenDNS
- GigaOm
UPDATE 1: Digg is still up
UPDATE 2: Some services, like Technorati are starting to come back online.
UPDATE 3: SixApart has been providing updates via Twitter. They are slowly bringing their services back up now. See the SixApart status page for more info.
UPDATE 4: We’re seeing several reports of a chaotic scene at 365 Main, with a line of sys admins forming outside the door waiting to get in to work on their servers.
UPDATE 5: According to PG&E, the outage was caused by an underground explosion in a transformer vault under a manhole at 560 Mission Street.
UPDATE 6: The funny thing is that The Onion predicted all of this a couple of weeks ago.
UPDATE 7: How ironic, 1UP, is also down.
UPDATE 8: The Associated Press is reporting that the Netflix downtime is not related to the power outage in San Francisco. I’ve updated the post accordingly.
UPDATE 9: USA Today runs their blogs on TypdPad, so they were down as well because of the outage.
UPDATE 10: Automattic sys admin (and former Laughing Squid sys admin) Barry Abrahamson has a great write-up on why data centers should have better power redundancy and what they have done with WordPress.com to help it survive possible outages like this.
UPDATE 11: What’s amazing about this is that there has still been no public response from 365 Main on their website. It seems like they should at least acknowledge the issue.
UPDATE 12: Lori just informed us that 365 Main has provided a summary of what happened, including a FAQ, where they address questions about the backup generators and how many customers were affected.
* At 1:49 p.m. on Tuesday, July 24, 365 Main’s San Francisco data center was affected by a power surge caused when a PG&E transformer failed in a manhole under 560 Mission St. While back-up electrical infrastructure is installed in the facility to defend against power surges, an initial investigation has revealed that certain 365 Main back-up generators did not start when the initial power surge hit the building. On-site facility engineers responded and manually started affected generators allowing stable power to be restored at approximately 2:34 pm across the entire facility.
* As a result of the incident, continuous power was interrupted for up to 45 minutes for certain customers. We’re certain 3 of the 8 colocation rooms were directly affected, and impact on other colocation rooms is still being investigated. Due to the complexity and specialization of data center electrical systems, we are currently working with Hitec, Valley Power Systems, Cupertino Electric and PG&E to further investigate the incident and determine the root cause of why certain generators did not start. All generators will continue to operate on diesel fuel until the root cause of the event has been identified and corrected. Generators are currently filled with over 4 days of fuel and additional fuel has already been ordered.
* We will apply knowledge gained in this investigation to all 365 Main facilities to help prevent this type of incident from happening again.
UPDATE 13: Rusty Hodge of SomaFM has a write-up on what he thinks happened at 365 Main. Oh and as you might of heard incorrectly, it wasn’t some drunk guy going crazy in the data center.
Here Are A Few Related Posts You Might Enjoy:
- Power Outage at Rackspace Brings Down Laughing Squid Servers
- San Francisco Requiring a Permit For Bring Your Own Big Wheel
- The Power Tool Drag Races 2004
- The Return of The Power Tool Drag Races
- Best Game Ever, Little League Turns Into Major League


























{ 42 trackbacks }
{ 47 comments… read them below or add one }
365 Main is refusing entry however many (unknown if all) cages have not experienced a power event and are still up and running.
Oh.. the irony..
Livejournal too. There has been a serious disturbance in the force. It’s as if thousands of emo souls cried out in pain simultaneously.
Wow! that’s huge. Netflix doesn’t need anything else to affect its negative stock. This is exactly what they don’t need.
And craigslist. wow!
You’d think in an earthquake zone, these companies and their webhosts would have generators, backup centers in other places, etc.
Not to mention PG&E has had lots of major outages over the past few years.
LiveJournal seems to be down too. I wonder if that’s connected.
Even craigslist went down!
This is crazy. Why is there no generator backup for all of these huge companies? Here in Florida we prepare for disasters and every tech company has a backup for their backup when a Hurricane strikes, but you’re telling me these tech-central disaster-prone California businesses don’t have a power backup? Get real.
Looks like Sun Microsystems site (sun.com) is down as well? I was just on their site an hour ago, and now it’s completely down. Just curious is this is related. Thanks!
backup generators only last for so long (what 1 to 5 hours) for battery backups and well for gas/diesl generators depending on how much fuel you want to spend.
Secondlife.com is down as well.
Joshua, LiveJournal is owned by SixApart.
lol, fire sale
@Steve, Bridget:
None of the sites in question run without provisions for emergency power. Take a look at the specs on 356 Main (http://365main.com), one of the colos in question. They’ve spent a fortune on power systems – generators being the least of what they provide – to keep the facility running. But even redundant power systems fail in extraordinary circumstances, I’m sure we’ll get a detailed explanation why in the next few hours.
SecondLife has also pulled the grid offline.
http://www.geeked.info/power-outage-takes-out-major-sites/
This is where the term “IT Professional” and “IT Manager” comes in to play…. and if the terms sound redundant… it’s because their supposed to be….
I wonder how all the various “professionals” are going to justify what’s happend? I suspect there wil be some intense “reviewing of contracts” in the next few days…
P.
SF’s favorite Armory owning company is back online, as well as Craigslist
A press release from March describes the backup power at their 5 data centers, so I’m not sure what went wrong.
http://365main.com/press_releases/pr_3_14_07_pge.html
It is ironically headlined:
Of note: my bandwidth cooperative’s colo facility, Coloserve, is on the other side of the same building as 365 Main and it stayed up just fine. It may not be a question of how much they claim to have invested in infrastructure, it’s how it’s installed. I’m glad we moved from 365 Main to Coloserve a couple of months ago (though the move was not painless, so I can’t unequivocally recommend Coloserve).
WOW. CREEPY.
I was at the mariott on 2nd street from thursday through till lastnight for the WordCamp conference.
The girlfriend and I were walking around on 2nd and mission most of the day monday taking pictures. We were one block away from a shooting over a chess game on market right in front of that giant Nikon store.
The next day the place is exploding? wow! Its creepy.
We were also given a tour of the CNET building by a guy we met at WordCamp, which is on that same block
For cryin’ out freggin loud. UPS’s are NOT BACKUP POWER SOURCES! They are just as the name implies, uninteruptablee power supplies. The primary purpose is to supply an uninterupted source of power when the primary power goes out, giving enough time for systems to be shutdown gracefully or enough time for the circuit to be switched over to generator power.
So for those who are crowing about UPS’s, you apparently have never supported a data center before. So just keep your mouths shut about it if you don’t know what the hell you are talking about.
Screenshot: Technorati error page.
USA Today blog site says “system maintenance” but I doubt it:
http://www.usatoday.com/blogs/serviceOutage.htm
Pretty lame. I used to live in S.F. I am now located in North Carolina. Our data center specifically has backup generators that can power the entire site for several days in the event of a power outage.
All these companies that went down today because their data center was poorly designed and managed need to get out of there.
Valley Wag posits that a drunken disgruntled employee at 365 Main is to blame:
bak UP!!!
1up.com seems to be down as well.
Was dead here off and on at 2nd and Brannon where your beloved Mule Design resides.
AP is reporting that Netflix and the power outage do not appear to be related.
http://www.msnbc.msn.com/id/19932882/
“The Wayback Machien” – http://www.webarchives.org also appears to be affected:
The Wayback Machine service is experiencing technical difficulties.
Thank you for your patience while we restore this service.
Seems like the Craigslist front page is up and allows you to choose a city/state, but trying to enter anything in the search field comes up with an error.
365Main is fully to blame for the outage in there colo. I was in colo 1 in 365Main when the power went out. It went up and down twice before going off for about 20 minutes before they got the generators going. This is not what we and every other customer in a datacenter pays for. They are supposed to be able to handle the outage without even a blip on the customers side. Needless to say they failed today. What bothers me is that there pushing blame to PG&E and not themselves. Granted PG&E did their share, but it should of not caused an issue with customers in there colo.
365 Main is quickly catching up with HE.net in terms of service quality.
Hosting.com stayed up, craigslist use to be located in this datacenter when Verio owned it but moved to 365main. Hosting.com has invested TONS of money at this location and has tested everything during off hours. I’m happy with my colo service….
Why data centers should have redundant power is one subject. The city’s power infrastructure is another.
The 365 Main people bought a really impressive infrastructure when they took over the building, and they got it for pennies on the dollar. The tour is really impressive: rooms of pipes that look like Super Mario levels, and fast-spinning 10-ton cylinders that provide instant interim power until the generators kick in. Seems bulletproof. This isn’t their first outage, though.
Their front-line techs were good guys, but I never had a talk with a higher-up there that didn’t result in at least one quote so ridiculous that it made all my sysadmin pals laugh. Whether it turns out to be garden-variety incompetence or a disgruntled employee, neither one would shock me a lot.
Three Embarcadero flickered in and out with power. Given that our consulting group runs our own servers, I was running programs up the wazoo running during that time, and we were fine. Given that our group is not that technical and IT oriented, I am really surprised that all these web pages went down. Really a shocker.
This is the first plausible and informed explanation I’ve seen on what happened at 365 Main:
http://scobleizer.com/2007/07/24/big-data-center-down-now-too/#comment-817817
I don’t know if this is exactly right or not, but it at least points out the substantial infrastructure they have in place to guard against outages. Too bad, it now looks like the PR crisis will be a lot worse than the fix to their power control systems.
So does anyone know, how the affected Sites were able to put up the above pictured maintenance pages so fast? I doubt, they switched DNS to a temporary server or such. Anyonw know how they did it?
The data center my racks are in runs a power off test at least once a month just to see if the generators kick in correctly. Maybe Main365 should consider doing something like that. I don’t see any mention on their website that they actually test their backup procedures. (But, to be fair, neither does my datacenter.)
I work at a small company and we had no backups or disaster recovery plans for that matter. Found this article while surfing. http://www.smartbrief.com/news/aaaa/industryBW-detail.jsp?id=B3A11DDD-AD9B-4399-9682-6E54C82E6757
I had no idea data recovery was even an option. Anyways we’re backing up regularly now. More companies and people should backup because you never know. Hopefully it won’t happen again, but if it does I’m taking my server to these guys. http://www.cbltech.com
There are no surprises here. San Francisco’s Mission St. Substation feeds half a dozen significant datacenters (365main, Level3, Coloserve, 400 Mission, and 650 Townend) and has suffered 3 serious outages in the past 7 years. California itself had 2 straight summers of rolling blackouts, which only subsided thanks to the dot-com crash. California is running out of duct-tape.
365main, usually runs a good operation, and is one of the best datacenters in California.. However, it’s also the most expensive datacenter in California, and should have a better track record than it’s lower-cost competitors like 200 Paul and Coloserv.
In May, 2007 we moved our infrastructure out of 365, off of California’s cancerous power grid, and into a more reliable, greener, and cheaper grid.. Yeah, we moved to Seattle. This was the best decision we ever made.
Most of our experience with 365 was extremely positive, however pricing, and power density problems forced us to move. I can’t list all of the good things 365main did, but here’s a list of 365’s power problems as we experienced them:
In April, 2005 365main had an outage that affected all customers for 50 minutes due to a failed EPO valve. 365 handled that outage spectacularly, claling all of their customers within 15 minutes of the outage.
In February, 2006 365main experienced a partial outage for 3 seconds that only affected some customers, but caused problems in their Telco spine, affecting connectivity.
In October, 2006 365main had a backup generator fail, but supposedly no customers were directly affected, but customers were not allowed to enter the building between 3:29 PM and 4:40 PM.
according to their Press Release ,
they never initialized the software
nor did they test the hardware
which would have activated
in the event of a power outage.
http://www.365main.com/press_releases/pr_8_1_07_365_main_report.html
Speaking from a IT Officer perspective this is why we put in place monitoring checks and have failover built in-You never know when things like this will occur, but an automatic failover and monitoring solution pays off…Glad some of us invested-
Kind of scary to think that we’re all so vulnerable.
I really think something needs to be done about all these power outages. I mean you would think that the public would learn the havoc that one single power station can wreak if it should happen to go out. Don’t even get me started on the fact that these things are usually rigged up so that if one were to go out, half of the state is out of power. Someone needs to wake up the person in charge of the maintenance for these things. It’s one thing for the power to go out in a single area, but quite another when it happens to affect web sites all over the place.
The Questions on power station it is necessary to solve on state level.
Someone needs to wake up the person in charge of the maintenance for these things