Amazon Says Human Error Caused Massive Outage

Amazon Says Human Error Was the Cause of the Amazon S3 Outage

The distress thousands of small businesses felt when Amazon S3 service was crippled for almost four hours on the morning of Feb. 28 was caused by, drum roll … a single command that was entered incorrectly. In other words, it was human error. A typo.

Behind the Cause of the Amazon S3 Outage

Just so you get the explanation verbatim from Amazon (NASDAQ:AMZN), here is what the company said about the cause of the Amazon S3 outage:

“At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems.  One of these subsystems, the index subsystem, manages the metadata and location information of all S3 objects in the region.”

It goes into more detail, but this is what instigated the whole thing. Amazon was very transparent and effective in keeping everyone up to date regarding the event with a thorough timeline as events took place.

It is also important to note, companies did not lose data, and the effect was not being able to access resources in a timely manner, which for some was the entirety of the event. But for companies that rely on those resources, it was a bad day.

This incident and the one in 2015, and those that will take place in the future should be valuable lessons for anyone with a digital presence. If your website is a critical part of your business, have a version of the same website hosted at a different location. Talk to different host companies and find the best one that is able to address all of your needs.

There are also disaster recovery and business continuity (DR/BC) solutions that will guide you to set up the right system for your company. The sooner you implement one, the sooner you can relax. You have to be proactive in protecting your digital assets, just like you would your brick and mortar store.

Amazon S3 Photo via Shutterstock

This article, “Amazon Says Human Error Caused Massive Outage” was first published on Small Business Trends