Understanding What Artificial Intelligence Actually Sees

Many call artificial intelligence (AI) a “black box”, and it kinda is. One of the biggest problems of AI is that it’s incredibly difficult to understand how the data is being interpreted.

Before we get our hands dirty and dive deeper, let’s play a little game.

I’m going to show you a series of abstract images that are either in category A or B.

Do you think the following image belongs to category A or B?

Hint: There’s no C.

We’ll get back to this later.

Let’s look at some more examples first.

Now can you tell if it belongs to A or B?

⚠️ Spoiler Alert

The answer is… A!

If you choose B, don’t be embarassed, you’re not alone. When asked to a room full of engineers and developers, the split is always 50/50. So… why is the answer A…?

Because I said so.

The answer is A, there’s no debating it, but if you don’t agree with me, then it was my fault as the trainer.

As the trainer, I know that A is a red circle. So anything with a red circle in it is A. I also know that B is an orange circle. The rest of the image is irrelevant. It’s all about trying to find a pattern between the set of images.

But it’s hard.

In an AI system, I can’t explain with words what makes the image A. All I can do is show you more pictures and hope it starts to click.

And you, the AI, can’t tell me why you think it’s B. It’s up to me to blindly feed you data until you get the answer right.

Here’s the same set of images, but less abstract. If I were to ask you the same question, everyone would know right away that A is an apple and B is an orange. This is almost so easy that many people think it’s a trick question. We all know that the hand and background are all irrelevant information, because we’re humans and grew up learning these things, but for AI it’s not a given. It sees images as more abstract and doesn’t know what you want it to focus on.

A Miscommunication

Let’s take a look at another toy scenario that shows how we might accidentally communicate the wrong signals to the AI system.

We have a few samples of oak trees. (It’s a bit cloudy where I live)

Here are some palm trees. (It was really sunny on the beach)

This next example is a palm tree, but the lighting is much closer to the oak trees. Which pattern should we focus on? The lighting? Or the shape of the tree? It might be difficult for the model to tell.

**Confidence:** - Palm 0.75

Oak 0.60

With this example, it might be pretty obvious that we left behind an unintended pattern for the AI to pick up. However, in reality, it’s normally something much more inconspicuous.

Peeking Under the Curtain

So how can we get more insight into what the AI is focussing on?

What if we passed a rectangle over the image and recorded the changes in confidence? If the confidence drops, then that’s probably an important part of the image.

Which picture makes it easier to tell that this cable is a USB?

The first image completely obscures the connector, making it nearly impossible to guess, so we can denote the region the rectangle covers as important. However, the rectangle in the second image doesn’t hinder our ability to determine the cable type. We can safely mark the location as insignificant.

We can continue to pass the rectangle over image to establish a heat map of importance.

We can see that the model’s focus is on the tip of the connector, which is great. It’s looking where we want it to.

Let’s look at a model that wasn’t trained well.

**Confidence:** - USB 0.76

The model correctly predicted that the cable was a USB with a confidence of 0.76. We might say that’s acceptable, especially since the photo is far away and isn’t great quality.

However, upon closer inspection, the model seems to be focusing on the wrong area, not the ends of the cable like we would expect.

What does this tell us?

The model appears to rely too heavily on the wire and fingers. To improve accuracy and clear up the confusion, we can include more examples of wire and hands in a negative training set.

We don’t need to train on piles and piles of generic data until our model starts performing better. We can tactfully use this information as an aid in retraining the model, saving us time and money.

Using the Tool

Wow! This is great, but I don’t want to put in the effort to actually implement this

Good news! You can find the fully functional iOS app on my GitHub 😘

Final Thoughts

Creating your own model is easy, but that doesn’t mean the work stops there. The hardest part of machine learning is always producing good data.

We can use the basic guidelines of having similar pose, lighting and a consistent mix of stock and natural photos across our training images to gain a foothold in our quest toward a good model. After that, we are left using tools and our intuition to try and gain insight into the thought process of AI.

Thanks for reading! If you have any questions, feel free to reach out at bourdakos1@gmail.com, connect with me on LinkedIn, or follow me on Medium and Twitter.

If you found this article helpful, it would mean a lot if you gave it some applause👏 and shared to help others find it! And feel free to leave a comment below.