By using one of the Amazon Echo products, the Alexa App installed on your phones or any third party device with Amazon Alexa Voice System on it, you can access to a variety of applications to use either at work or at home.
As a quick review, Alexa is a cloud-based voice service with speech-driven services installed behind various devices where you can build voice experiences to make your daily activities faster and appealing to customers. It involves a lot of events, from playing music, controlling the temperature in the thermostats, to providing weather and traffic information, making to-do lists, and setting alarms.
Building an Alexa Skill for a Display Interface
Developing an application (a skill in Alexa’s terms) which supports a screen interface (usually for Amazon's Echo Spot and Echo Show) is not very different from programming a regular one, as happens for Amazon's Echo Dot. The most important thing is to think carefully how the interaction with the user will be handled and what information will be displayed on the screen by implementing the Display Interface.
If you already had develop and published a skill, it would be easier to make it available to the Echo Show and Spot by adding support for the Display Interface. Also, if your existing skill has already configured any card, it will be displayed automatically after activating the Display Interface.
The type of screens you can design are: a list with content which includes text and images, a single image, images with text, and build an audio or a video player. It might be necessary to provide support for touch selection events, especially in the case of listing items.
Enabling the Display Render Templates is as easy as activating the “Display Interface” on the left menu “Interfaces” of your skill in the Alexa Developer Portal. To build a Video skill, you have to enable the “Video App” Interface instead.
A set of mandatory built-in intents will be added to your skill, which will help you provide the necessary interaction with the screen. Make sure to put at least one utterance for each intent before building your model again.
The next thing to do is to supply the code for supporting visual interfaces. This behaviour is implemented at the endpoint of the skill, which might be a Lambda function or a custom HTTPS endpoint.
The endpoint must include the Display Render Template directive in the skill’s response. Also, it is recommended to check in the code whether the request comes from a device which supports a screen or not.
There are two types of display templates: the body template and the list template. The first one displays text and images while the second is a scrollable and selectable list of items (which can be made of both text and images).
Items supported in the templates
- Type: a body or list template from the available list.
- Token: id to track selectable items on the screen
- Back button: to show or hide the back button
- Background Image
- Text content: the main content to be shown. There are three levels of text: primary, secondary and tertiary.
- Image: used to describe the image properties, such as the size, the url or the pixels.
- List Items: the list of items to be shown on a list template directive.
Related to the size and pixel density of the images to be supplied, Amazon provides a guideline. The Echo Show screen is 1024x600px and for the Echo Spot, images will be scaled down automatically.
Also, do not forget to check Amazon Alexa Developer Portal for fully-featured examples and specifications.
We invite you to learn more about this technology by reading this post: Goodbye mobile apps, hello voice assistants?