Bixby Learnings from an Alexa Developer
As a voice-first developer with years of experience building many custom Amazon Alexa skills, I only started creating Samsung Bixby capsules over the last two months. With one capsule published and another two in progress, there is still much to learn about this new platform. As both an Alexa Champion and Bixby Premier Developer, I see benefits in both platforms and a future where both will co-exist. Let’s compare development for both platforms with as much objectivity as possible.
Three Main Parts
Developing for Alexa using the Alexa Skills Kit (ASK) Command-Line Interface (CLI), you can divide skill creation into three main parts: language model, code, and (skill) info.
The language model maps the utterances that a user says to intents and slots and ultimately to handlers in code. These handlers process a request by a user and returns a response from a custom Alexa skill. The skill information includes the values needed for the entry in the Alexa Skill store (or marketplace), which permissions the skill should request of the user, and other distribution and configuration settings.
The following table shows the three parts with the information area expanded and shows where these values can be found in both Alexa and Bixby projects:
|Language Model||en-US.json (model)||/resources/en/training
|Invocation/Dispatch Name||en-US.json (model)||capsule-info.bxb|
|Marketplace Entry – icon, description, privacy, terms, keywords||skill.json||capsule.bxb|
Alexa developers have multiple frameworks to choose from and the ASK SDK for Node.js published by Amazon has already gone through a rewrite to version 2. There are over 63,000 skills available in the U.S. Alexa Skill store and over 100,000 skills published worldwide. The community of Alexa developers have grown over the last 4 years and these developers can participate in events, hackathons, and competitions. Over 325,000 developers have used the Alexa Skills Kit.
The Bixby developer community has just started and many of the tools and features are in version 1 status. Development is focused primarily on the U.S. and Korea and there are just over 100 capsules in the U.S. Bixby Marketplace. There are times when the Bixby Studio crashes or can’t sync my capsules or when the documentation is out of sync. That hasn’t bothered me too much because I see the fast-paced progress that the Bixby team is making. Samsung is promoting Bixby at more events and there was a big showing and hackathon at the recent Voice Summit 2019.
The first difference that I noticed between the two platforms was that I can use Visual Studio Code (VSCode), or a multiplicity of other editors, to code for Alexa while Bixby requires its own Developer Studio. You can also have VSCode open on the same files in your Bixby project, but the Developer Studio constantly compiles and syncs your changes and is an integral part of development.
Mysterious .BXB Files
In a Bixby project, you can find .bxb files everywhere and they are used for configuration settings and definition of things such as concepts, actions, views, layouts, dialogs, and endpoints. These files are like JSON files in some ways, but they have defined schemas. They are also like YAML files in that strings do not need to be enclosed in quotes. Besides the language model, these configuration files are what defines a capsule. There is significantly more technical configuration in a Bixby capsule than an Alexa skill.
Invocation and Dispatch
To explicitly start an Alexa skill, you use the invocation name. Invocation names do not need to be unique. You can either start the skill in conversation mode by saying something like “open <invocation name>” or with a one-shot phrase “ask <invocation name> to…”.
With Bixby, dispatch names are unique, and you can have additional dispatch aliases. For example, if my capsule has a dispatch name of “SquareAbouts” other similar dispatch aliases could be: “Square Abouts” and “Square About”. It appears that capsules can only be invoked with a dispatch phrase similar to Alexa’s one-shot phrase, but following the pattern: “with <dispatch name/alias>…”.
Language Model – Training, Intents, and Actions
Both platforms require you to create a language model of what the user can say to interact with your voice app. For Alexa, you include custom or built-in intents as well as custom or built-in slot types. When a user interacts with a skill, the corresponding intent handler is called in code, and any values specified in slots are passed to that code. You can define the language model in either the Alexa Developer Console GUI or in a JSON file that can be updated via API or CLI.
In the beginning of Alexa, there was the Echo smart speaker and the Alexa App. The only multimodal option available at that time were some limited cards that could be sent to the Alexa App. Now Alexa can be found in smart displays and on TVs and tablets. Development is meant to be a voice-first-plus-screen experience. However, you can still deploy voice-only skills and serve most Alexa users which predominantly use only smart speakers without displays.
There are already 100s of millions of Samsung phones that support the newer Bixby based on Viv technology. So, from the beginning, capsules are encouraged to be multimodal and include results and detail views. Samsung plans to ship a smart speaker in 2019 which will only offer single mode, voice-only interaction, but the vast majority of users and sessions will be multimodal by default.
The code for an Alexa skill is often hosted on AWS Lambda, but can be hosted on any cloud provider or server connected to the internet. Instead of provisioning their own endpoints and backend resources, Alexa developers also have the option to have the skill code hosted on the Alexa platform for free (within AWS Free Tier limits). This allows Alexa developers to not have to think about deployment at all.
Bixby capsule development has many similar concepts as development of Alexa skills, but the approach is quite different. With Alexa, the development is primarily focused on language model and code with a little configuration. With Bixby, the language model is still a key focus, but there is much more configuration and much less code. The configuration is still technical in nature so will likely be done by a developer.
With any new language, framework, or tool there is a learning curve and when I encounter a problem with Bixby it almost always has to do with configuration. That is a strange feeling as a developer who is used to spending hours in code each day. The help I received from the various Bixby developer evangelists and partner engineers on my first capsule was very much appreciated.
By learning both platforms, you can be a better developer and have an increased perspective of voice technology today and where it is going soon. So, go out there and make some great voice apps. I would enjoy hearing about the voice experiences that you create on either platform.
About the Author
Mark Tucker is an Alexa Champion, Bixby Premier Developer, and software architect with over 20 years of design and development experience. Mark is the organizer of the Phoenix Alexa Meetup and the Phoenix Chapter of the Ubiquitous Voice Society. He is interested in ways that voice technology can help non profits and his passion project is Speech Markdown. You can find him on Twitter @marktucker or LinkedIn.