Building Conversational Products & Interfaces

Introduction

With conversational products and interfaces, product creators now have to imagine an interaction where the screen plays an ancillary role. As a consequence, the majority of the best practices and principles of a good user experience need to be redefined, if at all used.

Even within conversational interfaces, we don’t speak same way we write. We don’t pronounce all words in the sentence, combine words, don’t say some words at all and huge part of our communication is non-verbal.

Hence the MVP for conversational interfaces needs to be different from that for any other product. It’s needed to set good expectations of what it can do for a user.

Build Out Conversations, Not User Stories

In my experience, building user stories for conversational interfaces turns out very clunky, and a better approach is to build out possible conversations between the end user and the product. Focusing on the bookends of the conversations helps you to understand the trigger and the goal.

Improving the user experience provides customers the solution to their problems with as little friction as possible. Improving confidence levels and increasing the scope of utterances is the key way to reduce friction for conversational UI.

From learning all the different ways that someone may ask for their account balance, to increasing the number of tasks that can be performed autonomously, every challenging question that a customer asks is an opportunity for us to learn and improve.

Here are some of the things to keep in mind when building out these conversations:

High-level flows

Once you have a few sample dialogs, you can abstract the flow and logic of the conversation. This provides the structure of your conversational interface. Good designs balance the need for clearly defined user paths with the users’ desire for shortcuts directly to what they want.

Start with the most ideal or optimal path that you envisage with the users. This is perfect alignment between customer actions and your designed intent.

Then expand the path as you go along.

Adapt to What Users would Naturally Say

The beauty of conversational interfaces is that it comes naturally to users and they don’t have to learn how to use them. The product should to adapt to the user’s word choices, instead of forcing the user to memorize a set of commands. It’s easier, and more natural, for users to respond to a narrow-focus question (e.g., “Does that work for you?”) than to be taught what to say (e.g., “If that works, say ‘yes.”).

Provide Cues

A bot should drive the conversation forward and at times even restrict it. There are some easy ways to help the user stay within guardrails. Visually, you can use buttons or quick replies (FB Messenger and Slack supports this) to nudge them in the right direction.

Consider suggesting things to do; this will help users discover additional functionality.

“Hey bot, book a table.”

“Table reserved. Would you like me to order an Uber?”.

Bot interactions are a bit like the traditional e-commerce flows. We should constantly keep the user updated and help them move forward, while avoiding overwhelming the user with a wall of information.

Interactions Should Be Simple

The number of paths a conversation can take increases the potential for dead ends. It is better to limit the functionality and nudge the user down a particular path. A simple solution is to use structured messages to guide the users. Rather than asking the end user to type “yes” or “no,” show a structured message with two buttons.

Focus on Spoken Conversations

When starting, it is better to focus on just the spoken conversation, without the technical distractions of code notation, complex flow diagrams, screen size etc. Getting the flow right is easier if everything is in one place. As you expand to other devices like mobile phones, pieces will move out of the spoken prompts and into the display prompts, chips, and visuals.

Handling errors

Conversational interfaces can face the following 3 types of errors:

No Input: The product did not record the user’s input.
System error: Error in fulfilment of the user prompt.
No Match: The Action couldn’t interpret the user’s response in context.

‘No Match’ error is a little trickier and needs to be handled well in the conversations. Here are a few possible causes of No Match errors.

Prompt: What time works for you?

User: Sometime late in the evening?

Prompt: Sorry, what time?

(The user says something relevant to the context, but the product doesn’t understand it.)

Prompt: What time works for you?

User: What’s the weather like?

Prompt: Sorry, what time?

(The user wants to switch topics entirely.)

In each context, it is important consider why the user might be having difficulty. Then, in the subsequent prompt, include additional support in the form of options or additional information. E.g.

Prompt: What time works for you?

User: Sometime late in the evening?

Prompt: Sorry, we have two time slots. One in the afternoon between 1 pm and 2 pm and the other in evening between 4 pm and 5 pm. Now, what works for you.

If there is a ‘No Match’ even after the second attempt, end the conversation to avoid further user frustration and offer a substitute or alternative method for follow up. E.g.

Prompt: Sorry, I’m still having trouble, so you may want to visit our website instead.

Good error handling is context-specific. Even though you’re asking for the same information, the conversational context is different on the second or third attempt. In order to play the right error prompt for the context, you’ll need to keep track of how many, and what type of, errors have occurred.

Randomize Prompts When Appropriate

Craft a variety of responses just like a person would. This makes the conversation feel more natural and keeps the experience from getting stale. E.g. randomize your first prompt with Hi, Hello, Hey, Welcome etc.

Right mix of Conversational Components

Conversational components are all the things that make up a prompt, like acknowledgements or questions. They also include chips, which are used to continue or pivot the conversation. Prompts and chips are the core of the conversational interaction and should be designed for every turn in the dialog. Design guidelines with Google lists out the different conversational components that can be used in a prompt.

Be concise and relevant

The conversational interface is also linear, and unlike GUIs, there’s no way to skim over the information. By forcing users to uninformative and irrelevant verbiage, you are unnecessarily wasting their time and testing their patience. People do not appreciate taking extra time or jumping through hoops to find things out or to get things done. Successful conversational interface design therefore should be brief and concise.

Graceful Exits

Make sure that the user can leave the current conversation flow. Whether it’s typing “quit”, timing out, or even erroring out. The bot should always allow you to exit gracefully without making the user feel guilty.

All About Context

Once you have generated potential use cases and filtered them against real business goals, you next need to answer a few questions about your target users, such as the customer’s identity, feedback, the channel used for contact, date, time, location, and other information in real time.

All these questions will influence the implementation of a conversational interface experience. By analyzing all that data using artificial intelligence and machine learning, the bot can anticipate customer needs. In 2D design contexts, you can watch a user interact with your app — scrolling, tapping buttons, typing but in conversational product design, you have to look at the broader user context.

Intent is easier to express through words than a hamburger menu, but natural language is harder to interpret than a click. Over time, it gets easier by understanding context and observing user behavior.

It is one thing to train humans to input commands that a computer understands. It is a completely other process to teach a computer to map natural human language inputs to intent. Mapping natural language to intent is the first layer to conversational UI.

Leverage Platform Capabilities

Bots are completely different from platform to platform as well. All the platforms have different functionality. They all have different feature sets or capabilities.

Alexa is a voice-first experience. Alexa’s versions of apps are called ‘skills’ — the idea being that you can teach your Alexa to do something new by enabling a new ‘skill’, created by a third party.

Google’s Assistant is more of a platform: ‘Actions’ (Google’s version of ‘skills’) can be either voice-activated, tapped on or typed to. A hybrid voice, text and type interface, Assistant lets you use whatever method suits you in your current context and move between methods seamlessly. This also means Assistant is available on a wider range of form factors, from smart speakers to watches to smartphones to TVs.

Facebook Messenger is a chat platform, augmented with a wide selection of customisable widgets allowing users to shortcut conversations and quickly get stuff done. It’s really easy to bring a Messenger chatbot into a human conversation to help you get stuff done.

A conversational product should try to get the most out of the platform it is on.

Not just NLP, but also prediction engine

When one person asks another a question, it takes an average of 200 milliseconds for them to respond. This is so fast that we can’t even hear the pause. In fact, it’s faster than our brains actually work. It takes the brain about half a second to retrieve the words to say something, which means that in conversation, one person is gearing up to speak before the other is even finished. By listening to the tone, grammar, and content of another’s speech, we can predict when they’ll be done

Monitor Health of the Bot

The older analytics tools out there don’t cut it when it comes to bots, so it’s important to do a deep dive and not just plug in something off-the-shelf. There are new players, like Botmetrics and Dashbot, who are building the tools for bots from first principles, and they are improving everyday. But even they may not provide all the information / functionality you need, and you might end up with your data being silo-ed and inaccessible.

Confusion Triggers: The entire funnel is wrought with ways for things to go wrong. Given this range of possible user inputs, bots often misinterpret or simply can’t determine what the user wants. Examine the previous prompts to see where you could make some clarifications. Was the call to action clear?
Conversation Steps: Knowing the average conversation steps enable you to start segmenting good or bad experiences between the user and the bot. A conversation step is a single back-and-forth exchange between user and bot. Your average steps will be different depending on the type of the bot. Conversations that either significantly exceed or fall short of the average conversation step usually indicate a bad user experience. Either a user gave up too quickly or a bot took too long to complete a user’s goal.
Conversation Flow:

Pay attention to the way users naturally ask for things. Does the flow make users feel like they can only provide one piece of information at a time, or does it encourage them to provide multiple details in one sentence?
Look for places where users look confused or are unsure of what to say or do.
Users might say something you didn’t expect. Take note of it and add handling for it in your design.
Signs of frustration or impatience show that the interaction is too long-winded. Review your prompts to see if you can be more concise.
Observe who’s speaking the most and whether users seem to be in control of the conversation

4. Conversations Per User: Associated with the number of average steps is the average number of conversations per user. This indicates engagement with the bot.

5. Active vs. Engaged Users: If your bot supports both push and pull interactions, then you will want to compare the differences in how users interact with your bot. Active users will consume messages sent by the bot, whereas engaged users will respond. This is important because you can learn from users who are engaged. These are your happy users that are leaning into the experience. You will want to accommodate these types of conversations.

6. Response Time: Know how long it takes for your bot to respond. Know the latency period between command and response. This impacts the overall user experience. You’re looking for median time as well as outliers here.

A bot should save time and relieve stress by reducing friction and effort. Otherwise it is not much better than the website or app that came before it. Integrations and contextual relevancy are key elements to the experience.

A Hybrid Experience Is Okay

It’s a mistake to believe a conversational design means every user input and output has to be handled in a purely chat format. Chatbots live within messaging apps and users are already accustomed to communicating with friends using text, but text input is not efficient for all use cases. The GUI replaced the text-based terminal for a reason and some tasks like browsing and selecting is faster with touch or click. Whenever it makes sense, it’s fine to consider a hybrid approach and use other interactive elements within the conversation (e.g. buttons, cards, forms) to help users complete their goals faster and with less friction.

Over-reliance on structured messages, however, will feel artificial as you lose the conversational element.

Conclusion

Designing from first principles is the only way to avoid fundamental usability problems especially because it is a whole new paradigm of user experience. It can be a reasonably safe assumption in web or mobile apps because people have learnt the standard UIs and gestures, but not in the conversational interface world.

Conversational products relate to delivering service over voice and messaging interfaces, but it also implies a different philosophy when it comes to product development, which is much more about understanding the context, constant iteration and constant A/B testing.