Best Practices for User Testing Artificial Intelligence Products

Three rules from a seasoned UX designer, user researcher, and product builder at frog design

If you are building any AI product, you must user test. To get the most quality out of your user testing, follow these three rules:

Whatever you are testing, it must be tested in context. In my work at frog design, our religion is human centered design, so we take this very seriously and it warrants incredible results. Here are some examples of context:

1 — Testing voice UI for cars:

Do this with a ‘Drive along.’ The user is driving; the designer is in the front seat with a script posing as the Ai.

2 — Testing a chatbot (level 1):

The first tier of testing chatbots in context is to simply text a user. The designer is in one room, posing as the Ai. The user and 2nd designer are in another room responding to the Ai. The user is thinking out loud while the designer is taking notes.

3 — Testing a chatbot (level 2):

Once you feel somewhat settled on a general flow, start testing with users in a more natural asynchronous way. Allow the user to go about their daily life and let them interact with your Ai as they please. There is still not one line of NLP or Ai written here. The Ai on the other end is a designer.

4 — Testing a robot for the home:

Do this in a home. Find an off-the-shelf robot that is similar to what you are building and bring it into people’s homes and have them interact with it.Observe passively. Give them tasks. If you have the resources, amend the robot to make it more similar to your design.

In the examples above (voice UI for car, chatbot, and home robot) there are different assumptions and expectations and that we hope for in the user’s interaction. However, some of these engagements must unfold over time. To test assumptions about order of events, relationship building, and happy path to feature discovery, test strategically over time. Some examples of time-dependent interactions:

1– Feature Discovery: You want to start slow and simple, then once a user masters the simple capabilities, teach them new tricks. We often study game design for this. Within most complex video games, you go through an intro level of exploration of how the buttons work, then features are revealed over time. The user must build an understanding of your mental model one piece at a time.

2– Building a relationship: This takes time. In my work, we often codify what questions feel intrusive vs superficial so we are certain to ask them in the correct order, so the Ai shows reverence and respect for social norms. If we are developing a chatbot that is meant to be your friend, for example, there might be certain superficial ‘getting to know you’ questions on day 1 that lay a bedrock for deeper questions in week 1.

3– Circadian rhythm: Human emotions are different in the morning than at night. In the case of the family home robot example, family dynamics change drastically throughout the day. If a robot is truly meant to live in the home, we must gather data throughout the day, week, month, etc.

User testing can lead to a deluge of data. It is up to you how to organize it, but do not expect to easily be able to synthesize it if it’s all on handwritten sticky notes. Some down and dirty tools:

1– Color coded Spreadsheets — pre-write all of your messages; categorize them with a color coded system; then copy paste them into your text thread, but also copy paste them into a local log. The color coded system allows you to see patterns immediately, without having to code anything or use any analytics, but it breaks down fast. This is useful while you are iterating with just a handful of users at a time, but you’ll outgrow it fast.

2 — APIs — recently at work we were testing an app in Tokyo. Everyone uses the messaging app LINE over there. We used LINE’s API to push all messages programmatically to an html table so we could see all the different user’s results to the same questions, all in one place.

3- IFTTT — if this then that app. I’m still testing this myself, but if you’re using an Android phone, you can have the SMS messages post directly to a spreadsheet. I’ll write a separate ‘how to’ article on this topic once I’ve got the steps down!

4–Node based facebook deployment with analytics– There are quite a few free node-based facebook apps. I’m currently researching the spectrum and I’ll certainly have a followup post analyzing some of them. One I’m liking is chatfuel.com (very quick to get up and running — making my own ‘TrashSee’ on it now, but no analytics), and at the other end of the spectrum is http://botanalytics.co/ — supports 11 platforms but you have to code in Node.js which is contrary to what this whole post is about. Also, as expected google has some open source ideas. They’re tool dialogueFlow looks nice, but I don’t have a lot of hours with it yet. Please DM me if you have an opinion on that!

Whatever you do, do not expect to be able to codify data that is not structured. Speak to a developer. Speak to a data scientist. Structure your data before gathering it!

Thanks!Please 👏👏👏 clap if you like this article, so others can find it!DM me if you’d like to hear more — I’m available for speaking engagements upon request.