We here at PlayFab are very proud of our uptime record, which had been 99.995% for more than 2 years… until a power outage at an Amazon Web Services (AWS) data center brought us down for two hours on Memorial Day (May 26), 2014.
While we couldn’t have anticipated this outage, we could have done more to prevent such an outage from taking down our whole service. A calculated risk that was acceptable back when we were the internal service for Uber Entertainment is no longer acceptable in our new incarnation as PlayFab. Now that reliability is a big part of what we’re selling, we need to reduce our exposure to downtime risk.
This blog post describes what we’ve done to increase redundancy and reduce the chances that a repeat AWS outage would take us down again.