Speech Markdown is the Simpler Way to Format Text-to-Speech Content Over SSML
The World of SSML
Since 2015, developers have coded text-to-speech responses for modern voice assistant platforms (Alexa, Google Assistant, Cortana, etc.). These responses were typically written into the project code or stored in a database. Over time, designers have become more involved, but the only available way to format responses was Speech Synthesis Markup Language (SSML). As the platforms and voice technologies continue to advance, content will be moved into Content Management Systems (CMS). The vast majority of voice-formatted text will be written by content authors. Designers and developers will continue to create content as well.
Some may not know, SSML is 15 years old. Version 1.0 became a W3C Recommendation in 2004 and version 1.1 came out in 2010. Many know the SSML syntax only through documentation from either the Amazon Alexa or Google Assistant documentation. It might surprise you to know that neither of these platforms fully supports the SSML standard. For the markup that they share, at times the implementation is different. There are even platform-specific additions not mentioned in the specification. In other words, these proprietary voice platforms implement standardized concepts differently.
Why is Speech Markdown Needed?
Websites rely heavily on Hyper Text Markup Language (HTML), but content posted on blogs, entered in forums, added to GitHub, or created across the web is not formatted directly with HTML. As an author writing content for publication on WordPress, you wouldn’t write HTML. Most of the time you would write plain text. But when you needed to format that text, you would use… markdown. Content authors use markdown.
What content authors creating for this new voice age need is a markdown of their own: Speech Markdown.
What are the Benefits of Speech Markdown?
Three of the benefits of Speech Markdown are: simplicity, incremental adoption, and cross platform.
Simpler than SSML
The first benefit of Speech Markdown is simplicity. It has an easier syntax to remember and less characters to type. The content doesn’t get lost in the end tags of the markup.
Here’s a quick comparison of the two formats:
For this single sentence, the number of characters for the SSML is more than 2 times that of the Speech Markdown.
Incremental Adoption
The reality of the situation is that SSML is what platforms are using today. The smaller content size of Speech Markdown is a compelling reason for these platforms to consider also supporting Speech Markdown. Just because today’s voice platforms use SSML, doesn’t mean that content authors must. Write content is Speech Markdown and let the tools and frameworks convert it to SSML or plain text.
The solution is to parse Speech Markdown into an intermediate format (an Abstract Syntax Tree) and then use formatters to convert it into a platform-specific flavor of SSML:
Cross Platform Support
As mentioned earlier, the implementation of SSML by the various voice platforms is inconsistent. Speech Markdown allows you to format your text and not have to worry about which markup is supported by which platform. This will become increasingly more important as Cortana, Mycroft, and other platforms will only add to the inconsistency and confusion.
We need new tools that will enable consistency of content across assistant platforms. A coding framework such as Jovo or a “no-code” visual designer such as Voiceflow promises the ability to build once and deploy to both Amazon Alexa and Google Assistant. This is talking primarily about code and not content. You will have consistency in as much as the text formatting complies with every platform, but that puts the burden on the content author to know this. Speech Markdown would give these frameworks and tools increased content consistency by using formatters.
Speech Markdown is Open Source
Speech Markdown is open source and belongs to the voice-first community. It is available to implement and improve. Visit www.speechmarkdown.org to learn more and follow @speechmarkdown on Twitter.
There are currently four projects for community members to join. All documentation for these projects is made available under the Creative Commons Attribution-ShareAlike 4.0 International License and all code samples and libraries are made available under a modified MIT license. Please read the Contributor Covenant Code of Conduct and Contribution Guidelines.
All projects can be found in the speechmarkdown GitHub organization:
- speechmarkdown-js – Speech Markdown grammar, parser, and formatters for use with JavaScript.
- docs-speechmarkdown-reference – The open source version of the Speech Markdown Reference docs.
- speechmarkdown-test-files – Test data files to assist those writing parsers, formatters, libraries, and other software.
- awesome-speechmarkdown – A curated list of awesome Speech Markdown projects, tools, frameworks, software, and resources.
I would love to see grammars, parsers and formatters coded in multiple programming languages. For all voice platforms to create formatters to convert markdown to SSML on their way to natively support Speech Markdown. For frameworks, tools, platforms, and vendors to embrace Speech Markdown for the benefit of their users. We are in exciting times and Speech Markdown holds great promise. This is an opportunity for you to make a significant contribution, so come help ensure the success of Speech Markdown.
About the Author: Mark Tucker is a Senior Architect, Voice Technology at Soar.com and an Alexa Champion with over 20 years of software design and development experience. He is the organizer of the Phoenix Alexa Meetup and the Phoenix Chapter of the Ubiquitous Voice Society. You can find him on Twitter @marktucker or LinkedIn.