How do you handle Amazon EC2's failures

By Clinton Gormley on February 8, 2010 7:13 PM

We've just moved our website to Amazon EC2, and within about 2 hours of going live, or proxy server went down. Just disappeared. We couldn't even terminate the instance.

OK, temporary glitch. It happens.

2 weeks go by, then last night, our alarms go crazy. All 3 database servers have gone down. They're there, just not responding to ssh or even ping.

We try to reboot the instances. From the console log, I can see that they reboot. Still not accessible. I launch another instance of our DB AMI. It boots, but is also unresponsive.

Eventually we boot a vanilla AMI, reinstall the DB and attach the EBS volumes.

Next surprise - 2 of our EBS volumes have disappeared - or at least the data on them has. Fortunately, they were redundant copies. But what happens if next time they're not?

All in all we had 2 hours of downtime. More than the previous 3 years with dedicated servers put together.

How on earth do other companies maintain their uptime (and data!) on a service that seems to fail way too frequently?

3 comments

Tagged as:

ec2 amazon

3 Comments

Ranguard | February 9, 2010 9:45 AM | Reply

I'm afraid this doesn't help as such.. but...

I've had an instance running for over a year without issue (it's in the US zone, don't know if that makes any difference).

EBS I take hourly/daily/weekly/monthly snap shots (rotating all except the monthly) to S3 Net::Amazon::EC2 lets you use the API for this.

I also run a spare machine on another cloud for major disaster recovery scenarios, although I've not had to use it yet *cross fingers*

Clinton Gormley replied to comment from Ranguard | February 9, 2010 4:05 PM | Reply

You've had one instance running for a year? What about the others? How often do you see instances falling over?

We need to make a decision this week if we're going to continue with Amazon or go back to hosted.

The savings aren't incredible (even with reserved instances), especially when you factor in the extra support you get with hosted, and the need to run extra instances elsewhere, just in case...

htbaa.myopenid.com | February 10, 2010 10:46 AM | Reply

You could always give Rackspace a try with their Cloud Servers. When those go down your data doesn't get wiped away. Their support also seems to be very good, although I haven't really experienced that (no need yet...) as I currently only use Cloud Files.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Clinton Gormley

The doctor will see you now...

More info »

Clinton Gormley