How to Build an Agent With an OpenAI Assistant in Python

This is the second part in a multi-part series on building Agents with OpenAI's Assistant API using the Python SDK.

Recap of Part 1

In Part 1 of this series, we created a simple conversational Agent using the OpenAI Assistant API.

Let's recall our definition of an Agent. An agent is a piece of software leveraging an LLM (Large Language Model) aiming to mimic human behavior. That means it can not only converse and understand language, but it can also perform actions that have an impact on the real world. These actions are typically called tools.

I use the terms tools and functions interchangeably when it comes to functions that the agent is able to call. OpenAI also uses the term action interchangeably with tools to refer to the same concept. It seems the jury is still out on which term will become the standard.

In Part 2 of the series, let's build our first tool!

We will be building on top of the simple conversational Agent we built in Part 1.

We will purposefully call our implementation an Agent and refer to the OpenAI SDK implementation as an Assistant to easily distinguish between the two.

Prerequisites

To follow along with this tutorial, you will need the following:

Have read Part 1 or at least copied the code as your starting point
Python3 installed on your machine
An OpenAI API key
Basic knowledge of Python programming

Dependencies

We have one new dependency docstring_parser, a library used for parsing docstrings in Python code. This is needed for our Agent to dynamically interpret and manage the functions it can call. More on this later.

python3 -m venv venv
source venv/bin/activate
pip install docstring_parser

The goal

In Part 1, we created an Agent that represents a famous hobbit who spends too much time thinking about breakfast 🍳

The goal of this tutorial will be to add two abilities or tools to our Agent. The tools will:

Add the ability for our Agent to eat a second breakfast if he has only had one or else eat lunch.
Add the ability for him to tell us the current date.

NOTE: For those of you who don't know, hobbits eat two breakfasts. Also, some humans, like myself...

It's a ridiculous example, but it will clearly illustrate the data flow you can set up for more complex use cases that interact with real-world data.

Setting up the mock database

Let's do a small amount of prep work before we dive in. We will create a simple mock database represented by a new file db.py containing a variable breakfast_count. In this case, the breakfast_count variable will keep track of how many breakfasts our hobbit has eaten.

breakfast_count = 1

While we are here, we will update the get_breakfast_count_from_db method on our Agent class to get data from our mock db.

import db

class Agent:
    # ... (rest of code)
    def get_breakfast_count_from_db(self):
        return db.breakfast_count

😴... That's out of the way. Let's move on to the fun part.

Our first tool

A tool is quite simply a function that the Agent is aware it needs to call in a specific scenario. Therefore, let's first define a simple function that handles our breakfast logic.

IMPORTANT: I want to draw your attention to the Google Style docstring in the tool functions we are about to define. It's crucial to get the documentation syntax right, as this is what will be used to extract the correct OpenAI function format in JSON. More on this soon.

Create a new tools.py file with this content:

import db


def eat_next_meal(breakfast_count: int):
    """
    Call this tool when user wants you to eat another meal.

    Args:
        breakfast_count (int): Value with same name from metadata.

    Returns:
        str: The meal you should eat next.
    """
    print("== eat_next_meal ==> tool called")
    
    if breakfast_count == 2:
        return "You have already eaten breakfast twice today. You eat lunch now."
    if breakfast_count == 1:
        db.breakfast_count += 1
        return "You have only eaten one breakfast today. You eat second breakfast now."

Pretty simple, right? If breakfast_count is 1, eat second breakfast and update the database to reflect that 2 breakfasts have been eaten. If breakfast_count is 2, eat lunch.

You could then return any string you want from the tool. This return value is then fed right back into the Run in order to inform the Assistant of the outcome of an action taken.

If your agent only cares that the tool was called and completed successfully, you could just return a "success" or "failed" message.

Let's quickly create the second tool for our Agent so that we can illustrate how multiple tools can be used. We will create a tool to allow our Agent to tell us what the current date is.

def tell_the_date():
    """
    Call this tool when the user wants to know the date.

    Returns:
        str: The current date
    """
    print("== tell_the_date ==> tool called")
    current_date = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    return f"The date is {current_date}"

Providing the tools to our Agent

Now, let's provide these new tools to our Agent in main.py. We create a dictionary that maps function names to their corresponding callable objects and pass this as the tools argument to our class instantiation. We could hardcode the dictionary keys as strings, but we all know how easy typos are, so I chose to leverage Python's built-in __name__ property on functions to make this less error-prone:

from tools import eat_next_meal, tell_the_date

agent = Agent(name="Bilbo Baggins",
              personality="You are the accomplished and renown adventurer from The Hobbit. You act like you are a bit of a homebody, but you are always up for an adventure. You worry a bit too much about breakfast.",
              tools={
                  eat_next_meal.__name__: eat_next_meal,
                  tell_the_date.__name__: tell_the_date
              })

Updating our Agent to accept tools

Let's put these new tools into our Agent's tool belt by updating the Agent class constructor to accept a dictionary of tools. You could bake this right into the class using class methods, but since it's common for different Agents to have different functionalities, we will pass tools as a constructor parameter and set them dynamically as a class property tool_belt for later use.

class Agent:
    # ... (rest of code)
    def __init__(self, name: str, personality: str, tools: dict[str, callable]):
        self.name = name
        self.personality = personality
        self.client = openai.OpenAI(api_key="sk-*****")

        self.tool_belt = tools
        self.assistant = self.client.beta.assistants.create(
            name=self.name,
            model="gpt-4-turbo-preview",
        )

Our Agent now has a tool belt it can access, but the OpenAI Assistant is unaware of them. We could provide the tools when creating the Assistant (assistants.create) by passing a tools parameter, but tools can change often in development, and it's clunky to have to update Assistants every time you modify a tool by calling assistants.update.

Therefore, a better approach is to provide the needed tools to every Run so that they are created dynamically each time.

Let's do exactly that by adding the tools parameter to our run creation:

class Agent:
    # ... (rest of code)
    def _create_run(self):
        count = self.get_breakfast_count_from_db()
        return self.client.beta.threads.runs.create(
            thread_id=self.thread.id,
            assistant_id=self.assistant.id,
            tools=self._get_tools_in_open_ai_format(), # add this line
            # ... (rest of code)
        )

Woah!? Wait a minute... ✋🏻 Where did _get_tools_in_open_ai_format come from? Why aren't we just passing tool_belt directly?

Well, let's create that method, and I'll explain right after:

import docstring_parser

class Agent:
    # ... (rest of code)
    def _get_tools_in_open_ai_format(self):
        python_type_to_json_type = {
            "str": "string",
            "int": "number",
            "float": "number",
            "bool": "boolean",
            "list": "array",
            "dict": "object"
        }

        return [
            {
                "type": "function",
                "function": {
                    "name": tool.__name__,
                    "description": docstring_parser.parse(tool.__doc__).short_description,
                    "parameters": {
                        "type": "object",
                        "properties": {
                            p.arg_name: {
                                "type": python_type_to_json_type.get(p.type_name, "string"),
                                "description": p.description
                            }
                            for p in docstring_parser.parse(tool.__doc__).params

                        },
                        "required": [
                            p.arg_name
                            for p in docstring_parser.parse(tool.__doc__).params
                            if not p.is_optional
                        ]
                    }
                }
            }
            for tool in self.tool_belt.values()
        ]

😵... Ok, you really don't need to try to understand what each line is doing here since I took the time to work this all out for you.

All you need to know is that this method takes the tool_belt class property and uses the docstring_parser library we installed at the beginning of this tutorial to parse the docstrings of each function and extract the correct OpenAI JSON format. That is why correctly formatting the docstring is so important.

Otherwise, we would have to define our functions as Python code and define them again manually in the OpenAI JSON format. That's too much room for human error in my book.

Polling for tool calls

As we saw in Part 1, we have a polling mechanism to determine the current status of a Run. One of these statuses is requires_action. This means the assistant has determined that one or several tools need to be called based on the instructions and tool definitions provided. The Run will not continue until all the tools have been called and the results of those calls have been submitted to the Run.

Let's, therefore, update our _poll_run method to respond appropriately to the requires_action status:

class Agent:
    # ... (rest of code)
    def _poll_run(self, run: Run):
        status = run.status
        start_time = time.time()
        while status != "completed":
            if status == 'failed':
                raise Exception(f"Run failed with error: {run.last_error}")
            if status == 'expired':
                raise Exception("Run expired.")
            # add the below code block
            if status == 'requires_action':
                self._call_tools(
                    run.id, run.required_action.submit_tool_outputs.tool_calls)

            # ... (rest of method)

As you might have guessed, _call_tools will contain all the logic to dynamically call the available tools.

Remember, the Assistant only knows what tools need to be called but doesn't actually call them. The Run returns a list of dictionaries, run.required_action.submit_tool_outputs.tool_calls, that contain the names of the functions that need to be called along with the parameters to pass to those function calls. It's up to our custom Agent implementation to obey the Assistant's commands and call the actual function implementations.

Let's implement _call_tools. We'll break this down as code comments so it's easier to understand what is happening:

import json

class Agent:
    # ... (rest of code)
    def _call_tools(self, run_id: str, tool_calls: list[dict]):
        # We create a tool_outputs list to collect the results of function calls.
        tool_outputs = []

        # We iterate over all the tool_calls to deal with them individually
        for tool_call in tool_calls:
            # We get the `function` object from the tool_call
            function = tool_call.function
            # We extract the arguments from the function object.
            # They are in JSON so we need to load them with the json module.
            function_args = json.loads(function.arguments)
            # We map the function name to our callable function in our Agent's tool belt.
            function_to_call = self.tool_belt[function.name]
            # We can now call the function with the provided arguments.
            function_response = function_to_call(**function_args)
            # We append the response to the tool_outputs list
            tool_outputs.append(
                {"tool_call_id": tool_call.id, "output": function_response})

        # Finally, we submit the tool outputs to OpenAI
        self.client.beta.threads.runs.submit_tool_outputs(
            thread_id=self.thread.id,
            run_id=run_id,
            tool_outputs=tool_outputs
        )

Once the OpenAI method submit_tool_outputs has been called with all the tool call outputs, we have completed the requirements for the Run to move past the requires_action status. The status will then switch to in_progress before eventually reaching the completed status and returning a response to the user.

Running it

Let's give it a run:

python3 main.py

User: Good morning

Assistant: Good morning! What an excellent day it seems for an adventure... or perhaps for a hearty breakfast first. How can I assist you on this fine day?

User: Should you eat breakfast?

== eat_next_meal ==> tool called

Assistant: Ah, only one breakfast so far? Well then, it's clear what must be done. It's time for a second breakfast!

User: You should eat another meal.

== eat_next_meal ==> tool called

Assistant: Ah, having already enjoyed both breakfast and second breakfast, it seems it's time to move on to lunch! This is turning into quite the day of culinary adventures.

User: What day is it?

== tell_the_date ==> tool called

Assistant: It's the 19th of February, 2024. Seems like a perfect day to embark on a quest or to delve into the mysteries of Middle-earth. What plans do we have for today?

Voilà! Your hobbit can tell the date and eat the correct meal now. Even though this example is quite silly, my intention is to provide you with a functioning code sample to build your own powerful agents.

In Part 3, we will implement RAG (Retrieval Augmented Generation) using PostgreSQL. Wait, what? OpenAI Assistants have that built-in; why not use their implementation? I'll explain my reasoning in Part 3.

Thank you for your reading. Happy to hear any thoughts and feedback in the comments. Follow me on Linkedin for more content like this.

How to Build an Agent With an OpenAI Assistant in Python - Part 2: Function Calling / Tools