A Conversational Agent for Data Science

Science fiction has long imagined a future in which humans converse with machines. But what are the limits of conversational interfaces? Agents like Siri or Cortana can help us with simple things such as getting directions or scheduling an appointment, but is it possible to apply these agents to more complex goals? Today, our group at Stanford is excited to announce a few ideas we have been exploring through Iris: a conversational agent for data science tasks.

In comparison to the domains targeted by most conversational agents, data science is unusually complicated, with many steps and dependencies necessary to run any kind of useful analysis. When interacting with data through analyses and visualizations, you can’t simply rely on a set of hardcoded, standalone commands (think of Siri commands such as doing a web-search or placing a call). Instead, commands need to build upon and support one another. For example, when analyzing some econometrics data, you might take the log-transform first, then run a statistical test.

Iris supports interactive command combination through a conversational model inspired by linguistic theory and programming language interpreters. Our approach allows us to leverage a simple language model to enable complex workflows: for example, allowing you to converse with the system to build a classifier based on a bag-of-words embedding model, or compare the inauguration speeches of Obama and Trump through lexical analyses. To make this concrete, consider the following example:

In the rest of this post, we’ll explore the specific interactions that Iris enables, why these interactions matter, and how developers can extend the system through a conversational domain specific language (DSL).

Basic functionality: how users interact with Iris

The Iris interface is a cross between a chat application like Facebook messenger and a development environment like RStudio (Figure 2). In the bottom left, users issue commands to the system, which will then appear in the window above, along with system responses. Data structures created in the course of a conversation appear in the upper right. There is a collapsible pane on the right sidebar for function search and documentation.

Figure 2: An overview of the Iris interface. Conversation and analysis occur in the left side window, while supporting metadata appear on the right.

One thing we learned early in the course of user testing is that people want to see, in advance, what command will be executed under their current input query. Iris displays hints above the text input pane, the leftmost of which will be executed on hitting enter (Figure 3). These hints allow users to reformulate queries when the proposed command does not match their expectations, resulting in fewer system errors. Hints also signal other elements of system state, for example, whether a user input argument parses correctly or matches the required type.

Figure 3: Hints allow for command search and query reformulation

Once a user hits enter to execute a command, Iris will automatically extract whatever arguments it can from the text of a request. Iris will then resolve any remaining arguments through a series of clarification requests with the user. For example, if you ask Iris to “take the mean of the column”, Iris might respond, “What column of data do you want me to use?” Once all arguments have been resolved, Iris will execute the command and display its output in the main window.

Commands as building blocks: sequencing commands

When working with data, it is rarely the case that a single command or function will accomplish everything you are interested in. More commonly, in programming a short script, you will chain together a series of commands. For example, first you might transform some text data into a bag-of-words feature representation, and then you might fit a classification model on these features. In Iris, any command in the system can be combined with any other command (subject to type constraints). This allows data scientists to use Iris to construct more complex workflows.

One intuitive way of combining commands is through sequencing: executing one command, and then using its result as the argument for some other command. This style of combination is similar to writing an imperative program, which Iris supports in three major ways. First, the result of any Iris command can be referenced in a follow-up command through pronouns such as “that”, “it”, and “those”. Second, Iris enables what is known in programming languages as assignment: storing a value in a named symbol. You can ask Iris, for example, to “save that result as my_array,” and a new variable named my_array will appear in the environment, where it can be referenced later in the conversation. Finally, many Iris commands save named variables in the course of their execution, which will similarly persist in the enviornment for future use.

Using these strategies, data scientists can chain together commands in sequence. For example, we might create a logistic regression model, then valuate it under cross-validation (Figure 4):

Figure 4: Here we use Iris to build a classification model to predict flower type, then evaluate it under cross-validation.

Commands as building blocks: composing commands

While sequencing commands is powerful and intuitive, it is not always the best way to stitch commands together. Consider how people speak, and you will often find that they craft meaning through a composition of statements. For example, if someone asks you, “what is the variance of that?”, and you respond, “sorry, what data are you talking about?”, then they might say, “I meant the first column of data’.” In programming terms, this conversation would compose a command to compute variance with another command to select the first column of data:

Iris supports composition through the ability to execute any other command when resolving some top-level command’s arguments. This style of sub-conversation can be nested upon itself to form conversations of arbitrary depth (in practice, exchanges that are more than two levels deep become somewhat confusing). For example, we can ask Iris to run a t-test, and interactively compose that with a log-transform (Figure 5):

Figure 5: Here we compose a command to run a t-test, with a command to log-transform columns of data

Like an interpreter in a dynamically typed programming language, Iris will check whether a composed command returns the type of value that an argument requires (for example, a command that takes the mean of some data might expect a composed command to return an array value). If the type returned by the composed function doesn’t match, Iris will reject the value and ask the user for further clarification.

Extending Python functions with conversational affordances

User contributions are an important part of any programming ecosystem. One of Iris’s goals is to allow expert users to extend the system with missing functionality. For this reason, all of the underlying commands in Iris are defined as Python functions, transformed by a high-level DSL into Iris commands with conversational affordances. For example, here is the implementation of an command to compute a standard deviation:

Figure 6: An Iris command implementation that computes the standard deviation of an array of data. The logic of the command is simply a Python function, annotated with natural language and type annotations.

The core logic of the function is wrapped in a class that defines metadata such as the title of a function (how it will appear in a hint), example user requests that execute the function, argument type information, and descriptive information. The DSL uses these metadata to transform the class into an automata that users can interact with conversationally.

While this DSL is relatively compact, we found in user testing that even expert users preferred a GUI to help with command creation. Iris has a command editor that supports command creation and modification dynamically within the environment, where changes can be saved to persist across sessions. We have also discovered, in practice, that the ability to compile and test commands directly within the enviornment leads to much faster iteration and debugging of new functionality.

Figure 7: Iris includes an editor that allows users to create new commands and edit existing commands. Any modifications to command can then be immediately tested through conversation in the environment.

Beyond text: working with mixed modality interfaces

Conversation may be an efficient way to create and execute programs, but data science tasks often rely on non-textual abstractions. For example, researchers often inspect and organize data through spreadsheets or write code that produces visual output, such as charts or graphs. Iris supports these modalities as a complement to its conversational interface. For example, when selecting a column from a dataframe, Iris will present the dataframe in a spreadsheet-like view, where users can click on the title of the column to populate the conversation request field.

Similarly, visualizing data is an important part of many data science workflows, and Iris supports this through the vega-lite library (a high-level wrapper around d3.js). Whenever an Iris command produces vega-lite schemas as output, for example, scatter plots or bar charts, these will be rendered as visual elements within the conversation.

Figure 8: Iris integrates with vega-lite to produce data visualizations

Launching an open source community

Iris is a research project, but it is also open source code, and as development continues to progress we hope that others will contribute. By building a community around this project, we aim to bootstrap an open dataset of natural language interactions that will allow us to push the boundaries of the kinds of tasks that conversational agents can support. We are planning to launch a desktop client for OSX later this summer, so if you are interested in helping us debug a beta release, check out our arXiv paper, follow the Gihub project or sign up on our mailing list at irisagent.com.

(This work is in collaboration with Binbin Chen, Julia Mendelsohn, Jon Bassen, and Michael Bernstein at Stanford.)