BIXBY-VOICE-2

Bixby Learnings from an Alexa Developer

As a voice-first developer with years of experience building many custom Amazon Alexa skills, I only started creating Samsung Bixby capsules over the last two months. With one capsule published and another two in progress, there is still much to learn about this new platform. As both an Alexa Champion and Bixby Premier Developer, I see benefits in both platforms and a future where both will co-exist. Let’s compare development for both platforms with as much objectivity as possible.

Three Main Parts

Developing for Alexa using the Alexa Skills Kit (ASK) Command-Line Interface (CLI), you can divide skill creation into three main parts: language model, code, and (skill) info.

The language model maps the utterances that a user says to intents and slots and ultimately to handlers in code. These handlers process a request by a user and returns a response from a custom Alexa skill. The skill information includes the values needed for the entry in the Alexa Skill store (or marketplace), which permissions the skill should request of the user, and other distribution and configuration settings.

The following table shows the three parts with the information area expanded and shows where these values can be found in both Alexa and Bixby projects:

Expanded Parts Alexa Bixby
Language Model en-US.json (model) /resources/en/training

/models/actions

/models/concepts

Code index.js

/lambda/custom

/code
Invocation/Dispatch Name en-US.json (model) capsule-info.bxb
Marketplace Entry – icon, description, privacy, terms, keywords skill.json capsule.bxb
Permissions skill.json capsule.bxb
Settings skill.json capsule.bxb
Hints skill.json hints.bxb

Platform Maturity

Alexa developers have multiple frameworks to choose from and the ASK SDK for Node.js published by Amazon has already gone through a rewrite to version 2. There are over 63,000 skills available in the U.S. Alexa Skill store and over 100,000 skills published worldwide. The community of Alexa developers have grown over the last 4 years and these developers can participate in events, hackathons, and competitions. Over 325,000 developers have used the Alexa Skills Kit.

The Bixby developer community has just started and many of the tools and features are in version 1 status. Development is focused primarily on the U.S. and Korea and there are just over 100 capsules in the U.S. Bixby Marketplace. There are times when the Bixby Studio crashes or can’t sync my capsules or when the documentation is out of sync. That hasn’t bothered me too much because I see the fast-paced progress that the Bixby team is making. Samsung is promoting Bixby at more events and there was a big showing and hackathon at the recent Voice Summit 2019.

Code Editors

The first difference that I noticed between the two platforms was that I can use Visual Studio Code (VSCode), or a multiplicity of other editors, to code for Alexa while Bixby requires its own Developer Studio. You can also have VSCode open on the same files in your Bixby project, but the Developer Studio constantly compiles and syncs your changes and is an integral part of development.

Programming Languages

There are various options for programming languages for Alexa including JavaScript (Node.js), Python, Java, and .NET. I like developing in Node.js with its many available npm packages. Currently, Bixby supports JavaScript with full support for ES5 and some support for ES6. The JavaScript engine used by Bixby is based on Mozilla’s open-source Rhino project. You can import functions from 14 Bixby library capsules including audio player, geo, and contacts and 12 function modules such as http, console, and dates.

Mysterious .BXB Files

In a Bixby project, you can find .bxb files everywhere and they are used for configuration settings and definition of things such as concepts, actions, views, layouts, dialogs, and endpoints. These files are like JSON files in some ways, but they have defined schemas. They are also like YAML files in that strings do not need to be enclosed in quotes. Besides the language model, these configuration files are what defines a capsule. There is significantly more technical configuration in a Bixby capsule than an Alexa skill.

Invocation and Dispatch

To explicitly start an Alexa skill, you use the invocation name. Invocation names do not need to be unique. You can either start the skill in conversation mode by saying something like “open <invocation name>” or with a one-shot phrase “ask <invocation name> to…”.

With Bixby, dispatch names are unique, and you can have additional dispatch aliases. For example, if my capsule has a dispatch name of “SquareAbouts” other similar dispatch aliases could be: “Square Abouts” and “Square About”. It appears that capsules can only be invoked with a dispatch phrase similar to Alexa’s one-shot phrase, but following the pattern: “with <dispatch name/alias>…”.

Language Model – Training, Intents, and Actions

Both platforms require you to create a language model of what the user can say to interact with your voice app. For Alexa, you include custom or built-in intents as well as custom or built-in slot types. When a user interacts with a skill, the corresponding intent handler is called in code, and any values specified in slots are passed to that code. You can define the language model in either the Alexa Developer Console GUI or in a JSON file that can be updated via API or CLI.

With Bixby, you train a capsule with utterances typically mapped to either actions or concepts. An action is an operation that can be performed with inputs and outputs. The action is often implemented as a JavaScript function that can also make API calls. A concept is a generic term for an object or thing in the system. A concept can be of a primitive type: boolean, decimal, enum, integer, name, text, etc. Multiple primitive concepts can be put together to form a structure concept. A structure can include other structures as well as cardinality: optional, required, one, and many. Creators of Bixby capsules have greater control over configuration of the language model than do Alexa developers. In Bixby, there are just more ways to mark up an utterance and determine how it will affect values or actions.

Multimodal

In the beginning of Alexa, there was the Echo smart speaker and the Alexa App. The only multimodal option available at that time were some limited cards that could be sent to the Alexa App. Now Alexa can be found in smart displays and on TVs and tablets. Development is meant to be a voice-first-plus-screen experience. However, you can still deploy voice-only skills and serve most Alexa users which predominantly use only smart speakers without displays.

There are already 100s of millions of Samsung phones that support the newer Bixby based on Viv technology. So, from the beginning, capsules are encouraged to be multimodal and include results and detail views. Samsung plans to ship a smart speaker in 2019 which will only offer single mode, voice-only interaction, but the vast majority of users and sessions will be multimodal by default.

Code Deployment

The code for an Alexa skill is often hosted on AWS Lambda, but can be hosted on any cloud provider or server connected to the internet. Instead of provisioning their own endpoints and backend resources, Alexa developers also have the option to have the skill code hosted on the Alexa platform for free (within AWS Free Tier limits). This allows Alexa developers to not have to think about deployment at all.

Bixby also takes the approach that capsules are hosted by the platform at no cost. Any JavaScript functions in a Bixby project are limited in functionality to what is available in the core modules and library capsules. There will be times when you want more functionality and so you will create an external API that is then called from the capsule. Developers can host those external APIs on any cloud provider they choose but are responsible for any costs.

Conclusion

Bixby capsule development has many similar concepts as development of Alexa skills, but the approach is quite different. With Alexa, the development is primarily focused on language model and code with a little configuration. With Bixby, the language model is still a key focus, but there is much more configuration and much less code. The configuration is still technical in nature so will likely be done by a developer.

With any new language, framework, or tool there is a learning curve and when I encounter a problem with Bixby it almost always has to do with configuration. That is a strange feeling as a developer who is used to spending hours in code each day. The help I received from the various Bixby developer evangelists and partner engineers on my first capsule was very much appreciated.

By learning both platforms, you can be a better developer and have an increased perspective of voice technology today and where it is going soon. So, go out there and make some great voice apps. I would enjoy hearing about the voice experiences that you create on either platform.

About the Author

Mark Tucker is an Alexa Champion, Bixby Premier Developer, and software architect with over 20 years of design and development experience. Mark is the organizer of the Phoenix Alexa Meetup and the Phoenix Chapter of the Ubiquitous Voice Society. He is interested in ways that voice technology can help non profits and his passion project is Speech Markdown. You can find him on Twitter @marktucker or LinkedIn.