What tool to choose for voice user interface design?

Hi, my name is Pavel. I work at mobile app development company in Russia and more than a year ago I actively immersed into the designing process of voice apps.

I took part in several real projects, was the expert of different conferences, met developers of Google Assistant, shared my experience with other VUI designers and after that took a deep thought of how to optimise the designing and testing processes of voice apps.

This thought gave me a motivational kick to a long journey through existing tools and to an analysis of their limitation, and, well, led me to the expected conclusion — I leave it until the end of the article. Firstly, about present.

Theoretical part

Let me tell some words about designing voice apps process for those who have never used such interfaces at all.

Good voice app is different from chatbots that there is no compulsion to use specific commands. The user has an opportunity to build light talk with the service similar with the real life conversation. The main tools are voice and text, but if the device has a screen, the app can add visual accompaniment in the form of cards, carousels, lists with the purpose of the best information delivery.

For example, let’s have a look at “Pizza order”. Just pretend how many variants of phrases you have to tell the application you want pizza. The user can order some certain pizza, but also he can ask for something with mushrooms and ham. User can ask the application to list all available products and choose the suitable one. Eventually he can just say he is hungry.

All these arescenarios for the development of the plot and we should provide for each individual step in each possible way in each scenario of the app. What a mess! And we haven’t order a pizza yet!

Design Methodology

Regardless of a platform VUI design run through the standart set of steps. You can find detailed guidelines on developers page of Google Assistant, Amazon Alexa and Microsoft Cortana. Each team builds its processes in a different way. As for me, I highlight the following steps (sometimes, they run not consistently):

Identify people from the audience. Each persona is a collective image of a representative of a group of app audience. Personas help us to understand better user needs and each of them has a certain set of phrases, based on behavioral patterns.
Filter scripts. Now we need to decide on what scripts will be implemented in the application. You can not just take and “convert” the graphics app into a voice one. For myself, I derived a simple rule: if it’s impossible to imagine how I pronounce a concrete script with a real person, then we shouldn’t take it to work, because users will definitely have problems with it.
Create a character. We should strive to make conversation with the app as similar as a regular talk with a living person. For this the user should form the image of the person with whom he communicates. You can do this by inventing a character. We add a name, draw the exterior, skills, brief biography, character and, of course, voice.
Write examples of dialogues. We know who our users are, we know what functionality they want, we know from whose face we speak to users. It’s time to write examples of dialogs between the user and the application for each script.
Building a dialog tree. To take into account all the variants of the course of events, all the steps that will lead the user to a hypothetical “pizza order”, it is worth to visualize all the actions. I avoid confusion by drawing a dialog tree in the form of a flowchart.

6. Work with phrases. We want to make the conversation live, so for each replica on the interface side we should have 3+ options. And we want to help developers in speech recognition, so we write several variants of replicas of users.

7. Testing. Are all the branches of the dialogue taken into account, are there no logical dead ends, chopped phrases? For this we need to check our work. I use WoZ testing.

Houston, we have a problem

The root of all problems of VUI-designers is a huge mass of information. Scenarious, variants of their passage, trees of dialogues, hundreds of steps even in a small application.

All this info has to be stored somewhere, somehow synthesized, checked, tested, passed on to development, shared with the customer, and there are simply no recommendations in the guidelines from the developers of voice assistants of how to choose appropriate tool.

After painful and laborious designing my first apps I brought out a specific set of problems:

Huge map of the dialogue. Detailed and intuitive way from point A to point B, the entire labyrinth of confusing user’s movement toward the goal. For such a task the regular white board is not suitable. Just imagine how to write all words and then drag them to the developers. And we still have to agree with the team what common symbols we use on the map. Gosh!
Routine work. A lot of time has to be spent not only on posting info, but also on synchronizing edits and changes. You can’t place all variants of phrases on the map, so you have to keep them in the Google sheets or whatever. A huge part of the time is spent on manually synchronizing all available information. Since all actions are performed manually and we are not immune from usual human mistakes and typos, we have to recheck ourselves a hundred times.
Ponderous testing. Each time to check the quality of the completed work, you have to collect a transcript of the dialogue manually, constantly switching between a document with the transcript, a dialogue map and a table with phrases. This is a terribly tedious and long process, which completely discourages the desire to control the quality of their work.

The result of this constant struggle with pains is not only the extended development terms, but also the loss of quality due to inattention, tiredless and, of course, loss of motivation.

Thera are some tools which can facilitate this complicated process. Let’s review them.

Evaluation criteria

To be objective in my analysis I took one and the same part of the real application that I worked on and tried to implement it with the help of the proposed tools.

I put all the results into the table and evaluated each set of services according to three basic criteria, estimate them on a 5-point scale:

Visibility of dialogue map;
simplicity and quality of testing;
simplicity of making edits and synchronization.

Before you read

I and my team are makers of one of the tools below. The main goal of this article is not to promote our tool. The main goal is to fairly tell about the pros and cons of every tool so you could make a decision which tool to use by yourself.

In my opinion, every team has their own needs so there are no clear winner or loser. I’ll be happy to discuss in comments if I missed something or made a wrong statement. Happy reading!

Whiteboard (Realtimeboard)

Realtimeboard

Let’s start from the “classic”: building map of dialogues on a white board, more precisely in its digital equivalent — Realtimeboard. Description of a character and examples of dialogues will be stored in Google Docs and phrases in Google Sheets.

I suppose that the mechanism of work is very similar to other similar tools — draw.io, liquidchart and so on.

Dialog Map

Before building a map, you will have to choose your own definitions — it takes a lot of time, but then you get personal framework for your team. When constructing a map, each step is drawn and aligned manually — it is slow, but the map visually becomes clearer.

Testing

Collection materials process takes a lot of time. It looks like this: take a glimpse into the map, then take the phrase from the table and insert it into the document. No flexibility, solid routine and constant switching between tools.

Editing and synchronization

It’s easy to edit the map: you can swap the steps, move entire branches and select individual elements in groups. But you have to synchronize the map with the phrases table manually — again the scraping sense of lost data.

Summary

Realtimeboard gets “four” for the visibility and flexible methodology of designing process. But the testing time is too long and manual synchronization doesn’t make us happy.

Dialog Map — 5/5
Editing and synchronization— 0/5
Testing — 0/5

Link: https://realtimeboard.com

Tortu

Tortu helps you with mapping out your dialogs into diagrams, allows you to store all phrase variants and can build an interactive prototype. For dialogs scripts and persona descriptions we will use Google Docs.

Dialog Map

The map is constructed from two main blocks — the user step and the interface step. Steps are added and linked conveniently, and also automatically aligned. This saves time. The tool is very flexible and allows you to associate any step with any other, the number of links is not limited. The map can be divided into scenarios.

However, auto-alligment periodically plays a cruel joke. With large size of a map auto-alignment does not always work correctly and it is difficult “read” the map.

Editing and synchronizing

Each step on the map, whether it’s a user step or an interface reaction can contain an unlimited number of phrases. This allows us not to store them separately, so there is no problem with manual synchronization. The information is edited conveniently, internal notes can be attached to the steps.

Working with the map and making edits is also convenient, you can change the steps, rename them, change the links, but there is no drag-and-drop, and also the group selection / copying functions.

Testing

One click builds an interactive prototype in which we select the steps of the user and the interface alternately. You can see the history and undo the actions. The text is visible on the screen, which means we can read the phrases for typos and watch the history. You don’t waste time for collecting materials and whole process is rather simplier.

But there is no functionality to test the scripts by voice and even no voice-over for your phrases.

Summary

The map is visible and easy and convinient in creating and working with. All phrases are inside steps and the re is no porblems with editing and synchronizing. Interactive prototype is built by one click and gives a lot of opportunitis for testing.

No drag-and-drop function and the difficulty in reading the map depends on the size of it. No voice-over.

Dialog map — 3/5
Editing and synchronizing — 4/5
Testing — 4/5

Link: https://tortu.io

Sayspring

Maps and phrases are inside Saysprng, info about the character, personas and examples of dialogues are in Google Docs.

Dialog map

The map is formed step by step: there are notations for user and interface, it can be divided into scripts. At the same time, the map is absolutely linear: transitions are not displayed. It can be done quite quickly and conveniently, but it doesn’t look understandable.

Testing

The service allows you to test the scripts by voice, but the text analogue is not available, there is no possibility to return back into a couple of steps (you have to start from the beginning), speech recognition is available only for three languages and works poorly (or it’s my accent’s fault, sorry). As a voice prototype — okay, but for testing this mode is useless, because there is no possibility to look at the history of the dialogue, get into the phrases, go back a couple of steps. Still have to collect dialogs in a text file to test them.

Fortunately, the collection of dialogues here is facilitated. By clicking on the button the tool itself will show you the possible dialogs. There are many problems and inconveniences (for example, you can not collect two scripts in one file, you can not download the file, only view it in the tool), but we seriously save time on collecting materials.

Editing and synchronizing

All replicas are assigned to a specific logical step in the map, which saves us from a table with phrases and the need to switch between tools and synchronize their state. This is a huge plus.

It is not convient to edit the map by dragging elements only inside one script and without function of grouping steps.

Summary

Sayspring eliminates the routine work of collecting materials for testing and synchronizing the phrase table with the map, due to the fact that the replicas are fixed after the steps. In a couple of clicks we get an interactive voice prototype.

The map is unclear, difficult and inconvenient in design process. There is an interactive prototype, but it works only by voice, which makes it useless for testing, as there is no way to get into the replicas, see the history and the unloading of dialogs is limited.

Dialog Map — 0/5
Editing and synchronizing — 3/5
Testing — 3/5

Link: https://sayspring.com

Botsociety

The tool is fundamentally different in approach. We write a dialogue, and the map is built automatically. Phrases and characters will be stored in Google Docs.

Dialog Map

On the map, the forks and the connections between the steps are clearly visible. It is interactive: by clicking on the step, the item is opened.

There is no division into the script, it leads to a large number of repetitions and a huge confusing flowchart.

Testing

Testing is carried out in the form of chat, which allows you to get into the replicas, and see the history.

However, there is no opportunity to control the process, in fact, we just watch the video, not test.

Editing and synchronizing

For a step, you can specify only one variant of a phrase, so phrases and a map are stored separately. The problem with synchronization remains. It is quite convenient to make edits to the map, there is drag-and-drop function, but you can not select several elements and make a general action over them.

Other

There is “build-mode” in the service: you can add variables in to phrases and get access to them through API. Thus, the tool can become a content keeper.

And there is Dialogflow integration which is cool.

Summary

The tool is created rather for rapid prototyping of simple apps and chatbots. For a full-fledged design, in my opinion, it does not fit. Testing does not work, leaving the problem with the collection of materials open. Downloading dialogs is only available in MP4, GIF or AVI format.

Dialog map— 2/5
Editing ans synchronizing — 2/5
Testing — 1/5

Link: https://botsociety.io

Conclusions

The results of the research are quite predictable. Every team has its own processes and definitely there is no the best tool. Every tool has its pros and cons. But I hope that this article will help you to pick the perfect tool for you and your team.

It’s clear, I haven’t researched every VUI design tool, I could miss some of them. I will be glad to discuss at comments both the choice of the certain tool and the design process.

Thank you for attention 😉

PS. I want to thank Maria Kruglova and Nikita Korneev from mobile-development studio KODE for helping to translate the article into English.