Alex-skill-outage-June-17-2

AWS Lambda and Alexa Skill Outage, A Lesson for Developers

On Thursday evening, the AWS Lambda in US-EAST-1 that runs many US-based third-party Alexa skills was down. Alexa developer James Guinn who goes by guinnjl in a developer Slack group was one of the first to notice the outage.

Alexa Skills Not Working

Mark Tucker, another Alexa developer, checked and confirmed that there was not a reported AWS outage at that time. Alexa developer and Smart Home consultant Andrea Bianco was able to confirm an AWS CloudWatch notification that shows a Lambda operational issue in US-EAST-1 starting at 9:07 PM EST and still active 20 minutes later.

Guinn, Tucker and Bianco weren’t the only people to notice. A short time later, Bart Dorsey took to Twitter to mention that Alexa had an issue, but indicated his Apple Homekit apps for the same home automation services were working. Derek Daczewitz made the prescient comment about applications that rely on Lambdas a few minutes afterward. And, several hours later, the Twitter account Robot Taylor mentioned an IoT-focused Facebook group was “freaking out.”

Alexa Developer Nick Schwab is known for some of the most popular Alexa skills such as Ocean Sounds and Thunderstorm Sounds. He noted that after he submitted a support ticket, Amazon did put up a notification for users in the skill store.

Mr. Tucker later confirmed in a LinkedIn post the next day that an AWS notification suggested the Lambdas used for many third-party Alexa skills were down for nine hours.

Is Your Alexa Skill Solely Dependent On AWS US-EAST-1?

Native Alexa skills such as setting timers and alarms appeared to be unaffected. Many third-party skills were. It could be that AWS Lambdas for native Alexa skills are hosted AWS US-WEST-2 or have automatic disaster recovery (DR) failover configured whereas many third party skills do not have DR and are hosted in AWS US-EAST-1 only. US-EAST-1 was also the epicenter of a significant AWS S3 storage outage in February that lasted about 4.5 hours.

What is the Lesson for Alexa Devs?

But what is the larger lesson here? John Kelvie, CEO of voice application software tools provider Bespoken, said that few people are thinking about load balancing or DR for their Alexa skills.

The outage this week with Lambdas in the US-East region affected many skill providers. Lambdas are the most popular way to deploy Alexa Skills, and are generally reliable. But no system is fool-proof. High Availability is not yet a top priority for most skill devs, and understandably so. But as more skills transition from experiments to mission-critical, all the typical monitoring, scalability and availability concerns will come into play.

This incident could start changing developer priorities. The AWS S3 outage certainly awakened several companies and consumers to the issue of relying on a single hosting geography. The Seattle Time reported on that outage with a quote from Forrester analyst Dave Bartoletti, “[Companies] started treating this like dial tone, and then it goes away.” The same disposition is likely the norm for the Lambdas used to host Alexa skills.

We are moving past a time when most skills are games or hobbies for developers. Some home automation users realized that they didn’t have an easy fall-back solution to turn on their lights if their Alexa skills failed. Media companies that miss out on user sessions or companies facing downtime for mission critical business applications should start looking at this market the way they do for other application categories. Bespoken’s Kelvie says that beyond high availability architectures, Alexa skill developers also need production quality monitoring and logging tools.

At Bespoken, our logging tools help to identify these issues. Our new monitoring tools, officially available this week, will pro-actively alert developers when there is a problem. We saw this outage in real-time in our internally deployed beta dashboard.

Despite the Long Outage, Few Social Media Complaints

The outage was not a big surprise. Systems go down. Developers need to account for that in their cloud hosting strategies. What was surprising is how few complaints showed up on social media. There appeared to be no complaints on Reddit and only a few Tweets mentioning the issues. Is this good or bad? There has been no significant negative PR fallout for Amazon or Alexa skill developers. That would seem to be good news.

However, a recent Edison Research survey suggested that more than 42% of Amazon Echo owners consider the devices, “essential to their everyday lives.” If they were “essential,” wouldn’t you expect more outrage over the outage? It could be that the devices are used more frequently early in the day and that an evening outage that impacts only a subset of skills and is addressed overnight doesn’t severely inconvenience users. That will likely change as more Alexa skills that are truly mission critical get deployed in the coming months. High volume social media outrage during future outage will tell us that Alexa is truly essential. Expect that to come relatively soon.