Objects Classification Using CNN-based Model

— All the images (plots) are generated and modified by the Author.

Today we have the super-effective technique as Transfer Learning where we can use a pre-trained model by Google AI to classify any image of classified visual objects in the world of computer vision.

Transfer learning is a machine learning method which utilizes a pre-trained neural network. Here, the image recognition model called Inception-v3 consists of two parts:-

Feature extraction part with a convolutional neural network.
Classification part with fully-connected and softmax layers.

Inception-v3 is a pre-trained convolutional neural network model that is 48 layers deep.

It is a version of the network already trained on more than a million images from the ImageNet database. It is the third edition of Inception CNN model by Google, originally instigated during the ImageNet Recognition Challenge.

This pre-trained network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 299-by-299. The model extracts general features from input images in the first part and classifies them based on those features in the second part.

Inception v3 is a widely-used image recognition model that has been shown to attain greater than 78.1% accuracy on the ImageNet dataset and around 93.9% accuracy in top 5 results. The model is the culmination of many ideas introduced by multiple researchers over the past years. It is based on the original paper: “Rethinking the Inception Architecture for Computer Vision” by Szegedy, et. al.

More information about the Inception architecture can be found here.

In Transfer Learning, when you build a new model to classify your original dataset, you reuse the feature extraction part and re-train the classification part with your dataset. Since you don’t have to train the feature extraction part (which is the most complex part of the model), you can train the model with less computational resources and training time.

In this article, we will just use the Inception v3 model to predict some images and fetch the top 5 predicted classes for the same. Let’s begin.

We are using Tensorflow v2.x

Import Data

import os
import numpy as np
from PIL import Image
from imageio import imread
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import tf_slim as slim
from tf_slim.nets import inception
import tf_slim as slim
import cv2
import matplotlib.pyplot as plt

Data Loading

Setup all initial variables with default file locations and respective values.

ckpt_path = "/kaggle/input/inception_v3.ckpt"
images_path = "/kaggle/input/animals/*"
img_width = 299
img_height = 299
batch_size = 16
batch_shape = [batch_size, img_height, img_width, 3]
num_classes = 1001
predict_output = []
class_names_path = "/kaggle/input/imagenet_class_names.txt"
with open(class_names_path) as f:
class_names = f.readlines()

Create Inception v3 model

X = tf.placeholder(tf.float32, shape=batch_shape)

with slim.arg_scope(inception.inception_v3_arg_scope()):
    logits, end_points = inception.inception_v3(
        X, num_classes=num_classes, is_training=False
    )

predictions = end_points["Predictions"]
saver = tf.train.Saver(slim.get_model_variables())

Define a function for loading images and resizing for sending to model for evaluation in RGB mode.

def load_images(input_dir):
    global batch_shape
    images = np.zeros(batch_shape)
    filenames = []
    idx = 0
    batch_size = batch_shape[0]
    files = tf.gfile.Glob(input_dir)[:20]
    files.sort()
    for filepath in files:
        with tf.gfile.Open(filepath, "rb") as f:
            imgRaw = np.array(Image.fromarray(imread(f, as_gray=False, pilmode="RGB")).resize((299, 299))).astype(np.float) / 255.0
        images[idx, :, :, :] = imgRaw * 2.0 - 1.0
        filenames.append(os.path.basename(filepath))
        idx += 1
        if idx == batch_size:
            yield filenames, images
            filenames = []
            images = np.zeros(batch_shape)
            idx = 0
    if idx > 0:
        yield filenames, images

Load Pre-Trained Model

session_creator = tf.train.ChiefSessionCreator(
        scaffold=tf.train.Scaffold(saver=saver),
        checkpoint_filename_with_path=ckpt_path,
        master='')

Classify Images using Model

with tf.train.MonitoredSession(session_creator=session_creator) as sess:
    for filenames, images in load_images(images_path):
        labels = sess.run(predictions, feed_dict={X: images})
        for filename, label, image in zip(filenames, labels, images):
            predict_output.append([filename, label, image])

Predictions

We will use some images from the Animals-10 dataset from Kaggle to declare the model predictions.

for x in predict_output:
    out_list = list(x[1])
    topPredict = sorted(range(len(out_list)), key=lambda i: out_list[i], reverse=True)[:5]
    plt.imshow((((x[2]+1)/2)*255).astype(int))
    plt.show()
    print("Filename:",x[0])
    print("Displaying the top 5 Predictions for above image:")
    for p in topPredict:
        print(class_names[p-1].strip())

At length, all of the classes are classified spot on, and we can also see that the top 5 similar classes as predicted by the model are pretty good and precise.

I hope this post has been useful. I appreciate feedback and constructive criticism. If you want to talk about this article or other related topics, you can drop me a text here or on LinkedIn.

Also published at https://towardsdatascience.com/classify-any-object-using-pre-trained-cnn-model-77437d61e05f