Creating an Image Recognizer on Golang Telegram Bot

In this article, we will go over a project on image recognition using Go. We will also create a Telegram bot, through which we can send images for recognition.

The first thing we need is an already trained model. Yes, in this article we will not train our model. For this exercise, let's take a ready-made module from the docker image of ctava/tfcgo.

To launch our project, we will need 4 terminals at the same time.

In the first case, we will launch an image recognition server. In the second case, we will launch the bot. In the third case, we will launch a public tunnel for sending our bot "out". In the fourth - we will execute the command to register our bot.

To start the recognition server, create a Dockerfile:

FROM ctava/tfcgo

RUN mkdir -p /model && \
  curl -o /model/inception5h.zip -s "http://download.tensorflow.org/models/inception5h.zip" && \
  unzip /model/inception5h.zip -d /model

WORKDIR /go/src/imgrecognize
COPY src/ .
RUN go build
ENTRYPOINT [ "/go/src/imgrecognize/imgrecognize" ]
EXPOSE 8080

This way we will run our server in the image. Inside this image, we will have our server: src/imgrecognize. In addition, we will unpack the model in the directory: /model.

For the server, the first thing we need is to set the value of the constant

os.Setenv("TF_CPP_MIN_LOG_LEVEL", "2")

This is necessary so as not to get an error:

I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
unable to make a tensor from image: Expected image (JPEG, PNG, or GIF), got empty file

Here, we will not optimize our server, but simply run it through "ListenAndServe". On port 8080. Before starting the server, we will load our model (loadModel) and get our graph (modelGraph) and labels (labels). From the graph, which is stored in a file in the protobuf format "/model/tensorflow_inception_graph. pb".

func loadModel() (*tensorflow.Graph, []string, error) {
	// Load inception model
	model, err := ioutil.ReadFile(graphFile)
	if err != nil {
		return nil, nil, err
	}
	graph := tensorflow.NewGraph()
	if err := graph.Import(model, ""); err != nil {
		return nil, nil, err
	}

	// Load labels
	labelsFile, err := os.Open(labelsFile)
	if err != nil {
		return nil, nil, err
	}
	defer labelsFile.Close()
	scanner := bufio.NewScanner(labelsFile)
	var labels []string
	for scanner.Scan() {
		labels = append(labels, scanner.Text())
	}

	return graph, labels, scanner.Err()
}

Actually, in "modelGraph" we keep the "structure" of our model and the key tools for working with it. And "labels" contains a "dictionary" for working with our model.

Inside our HTTP handler, we are required to normalize the resulting image-normalizeImage. In order to pass it on to the recognition input in the future. To normalize, we convert our image from a Go value to a Tensor:

tensor, err := tensorflow.NewTensor(buf.String())

After that, we get three variables

graph, input, output, err := getNormalizedGraph()

"graph" - we need to decode, resize, and normalize an image. The "input", together with the tensor, will be the "input point" for "communication" between our application and tensorflow. The "output" will be used as the output signal.

Through "graph", we will also open a session to start normalization directly.

session, err := tensorflow.NewSession(graph, nil)

Normalization Code:

func normalizeImage(imgBody io.ReadCloser) (*tensorflow.Tensor, error) {
	var buf bytes.Buffer
	_, err := io.Copy(&buf, imgBody)
	if err != nil {
		return nil, err
	}

	tensor, err := tensorflow.NewTensor(buf.String())
	if err != nil {
		return nil, err
	}

	graph, input, output, err := getNormalizedGraph()
	if err != nil {
		return nil, err
	}

	session, err := tensorflow.NewSession(graph, nil)
	if err != nil {
		return nil, err
	}

	normalized, err := session.Run(
		map[tensorflow.Output]*tensorflow.Tensor{
			input: tensor,
		},
		[]tensorflow.Output{
			output,
		},
		nil)
	if err != nil {
		return nil, err
	}

	return normalized[0], nil
}

After normalizing the image, we create a session for inference over modelGraph.

session, err := tensorflow.NewSession(modelGraph, nil)

With the help of this session (session), we will start the recognition itself. The input is our normalized image

modelGraph.Operation("input").Output(0): normalizedImg,

The result of the calculation (recognition) will be saved in the "outputRecognize"variable.

From the received data we get the last 3 results:

res := getTopFiveLabels(labels, outputRecognize[0].Value().([][]float32)[0])
func getTopFiveLabels(labels []string, probabilities []float32) []Label {
	var resultLabels []Label
	for i, p := range probabilities {
		if i >= len(labels) {
			break
		}
		resultLabels = append(resultLabels, Label{Label: labels[i], Probability: p})
	}
	sort.Sort(Labels(resultLabels))

	return resultLabels[:ResultCount]
}

And for the HTTP response, we will give only one most likely result:

msg := fmt.Sprintf("This is: %s (%.2f%%)", res[0].Label, res[0].Probability*100)
_, err = w.Write([]byte(msg))

All the code of our server for recognition:

package main

import (
	"bufio"
	"bytes"
	"fmt"
	"io"
	"io/ioutil"
	"log"
	"net/http"
	"os"
	"sort"

	tensorflow "github.com/tensorflow/tensorflow/tensorflow/go"
	"github.com/tensorflow/tensorflow/tensorflow/go/op"
)

const (
	ResultCount = 3
)

var (
	graphFile  = "/model/tensorflow_inception_graph.pb"
	labelsFile = "/model/imagenet_comp_graph_label_strings.txt"
)

type Label struct {
	Label       string
	Probability float32
}

type Labels []Label

func (l Labels) Len() int {
	return len(l)
}
func (l Labels) Swap(i, j int) {
	l[i], l[j] = l[j], l[i]
}
func (l Labels) Less(i, j int) bool {
	return l[i].Probability > l[j].Probability
}

var (
	modelGraph *tensorflow.Graph
	labels     []string
)

func main() {
	// I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
	// unable to make a tensor from image: Expected image (JPEG, PNG, or GIF), got empty file
	err := os.Setenv("TF_CPP_MIN_LOG_LEVEL", "2")
	if err != nil {
		log.Fatalln(err)
	}

	modelGraph, labels, err = loadModel()
	if err != nil {
		log.Fatalf("unable to load model: %v", err)
	}

	log.Println("Run RECOGNITION server ....")
	http.HandleFunc("/", mainHandler)
	err = http.ListenAndServe(":8080", nil)
	if err != nil {
		log.Fatalln(err)
	}
}

func mainHandler(w http.ResponseWriter, r *http.Request) {
	normalizedImg, err := normalizeImage(r.Body)
	if err != nil {
		log.Fatalf("unable to make a normalizedImg from image: %v", err)
	}

	// Create a session for inference over modelGraph
	session, err := tensorflow.NewSession(modelGraph, nil)
	if err != nil {
		log.Fatalf("could not init session: %v", err)
	}

	outputRecognize, err := session.Run(
		map[tensorflow.Output]*tensorflow.Tensor{
			modelGraph.Operation("input").Output(0): normalizedImg,
		},
		[]tensorflow.Output{
			modelGraph.Operation("output").Output(0),
		},
		nil,
	)
	if err != nil {
		log.Fatalf("could not run inference: %v", err)
	}

	res := getTopFiveLabels(labels, outputRecognize[0].Value().([][]float32)[0])
	log.Println("--- recognition result:")
	for _, l := range res {
		fmt.Printf("label: %s, probability: %.2f%%\n", l.Label, l.Probability*100)
	}
	log.Println("---")

	msg := fmt.Sprintf("This is: %s (%.2f%%)", res[0].Label, res[0].Probability*100)
	_, err = w.Write([]byte(msg))
	if err != nil {
		log.Fatalf("could not write server response: %v", err)
	}
}

func loadModel() (*tensorflow.Graph, []string, error) {
	// Load inception model
	model, err := ioutil.ReadFile(graphFile)
	if err != nil {
		return nil, nil, err
	}
	graph := tensorflow.NewGraph()
	if err := graph.Import(model, ""); err != nil {
		return nil, nil, err
	}

	// Load labels
	labelsFile, err := os.Open(labelsFile)
	if err != nil {
		return nil, nil, err
	}
	defer labelsFile.Close()
	scanner := bufio.NewScanner(labelsFile)
	var labels []string
	for scanner.Scan() {
		labels = append(labels, scanner.Text())
	}

	return graph, labels, scanner.Err()
}

func getTopFiveLabels(labels []string, probabilities []float32) []Label {
	var resultLabels []Label
	for i, p := range probabilities {
		if i >= len(labels) {
			break
		}
		resultLabels = append(resultLabels, Label{Label: labels[i], Probability: p})
	}
	sort.Sort(Labels(resultLabels))

	return resultLabels[:ResultCount]
}

func normalizeImage(imgBody io.ReadCloser) (*tensorflow.Tensor, error) {
	var buf bytes.Buffer
	_, err := io.Copy(&buf, imgBody)
	if err != nil {
		return nil, err
	}

	tensor, err := tensorflow.NewTensor(buf.String())
	if err != nil {
		return nil, err
	}

	graph, input, output, err := getNormalizedGraph()
	if err != nil {
		return nil, err
	}

	session, err := tensorflow.NewSession(graph, nil)
	if err != nil {
		return nil, err
	}

	normalized, err := session.Run(
		map[tensorflow.Output]*tensorflow.Tensor{
			input: tensor,
		},
		[]tensorflow.Output{
			output,
		},
		nil)
	if err != nil {
		return nil, err
	}

	return normalized[0], nil
}

// Creates a graph to decode, rezise and normalize an image
func getNormalizedGraph() (graph *tensorflow.Graph, input, output tensorflow.Output, err error) {
	s := op.NewScope()
	input = op.Placeholder(s, tensorflow.String)
	decode := op.DecodeJpeg(s, input, op.DecodeJpegChannels(3)) // 3 RGB

	output = op.Sub(s,
		op.ResizeBilinear(s,
			op.ExpandDims(s,
				op.Cast(s, decode, tensorflow.Float),
				op.Const(s.SubScope("make_batch"), int32(0))),
			op.Const(s.SubScope("size"), []int32{224, 224})),
		op.Const(s.SubScope("mean"), float32(117)))
	graph, err = s.Finalize()

	return graph, input, output, err
}

Now, we need to build this image (build it). Of course, we can build an image and run it in the console using the appropriate commands. But it is more convenient to build these commands in a Makefile. So, let's create this handy file:

recognition_build:
	docker build -t imgrecognition .

recognition_run:
	docker run -it -p 8080:8080 imgrecognition

After that, open the terminal and run the command:

make recognition_build && make recognition_run

Now, in the first terminal, we have a local HTTP server that can accept images. In response, it sends a text message containing information about what was recognized in the image.

This is so to say the "core" of our project.

Creating a Telegram Bot

Next, we need to create a Telegram bot.

We need to "build" the bot; to do this, we need to write a second HTTP server. The first server recognizes our images and uses port 8080. The second one will be the Bot's server and will use port 3000.

First, we need to create a bot through your account in the app via BotFather. With this registration, you will receive the bot's name and its token. Don't tell anyone about this token.

Let's put this token in the "BotToken" constant. You should get something like this:

const BotToken = "1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5"

Our bot's handler will decode the JSON response body.

json.NewDecoder(r.Body).Decode(webhookBody)

We are interested in the photo in the sent message

webhookBody.Message.Photo.

By the unique image ID-

photoSize.FileID

let's collect a link to the image itself

fmt.Sprintf(GetFileUrl, BotToken, photoSize.FileID)

. And download it

downloadResponse, err = http.Get(downloadFileUrl).

We will send the image bytes to the handler of our first server:

msg := recognitionClient.Recognize(downloadResponse)

In response, we get a certain message - a text string.

After that, we simply send this string to the User, as is, in the Telegram Bot.

The entire bot code:

package main

import (
	"bytes"
	"encoding/json"
	"errors"
	"fmt"
	"io/ioutil"
	"log"
	"net/http"

	"github.com/romanitalian/recognition/src/bot/recognition"
)

// Register Bot: curl -F "url=https://9068b6869da7.ngrok.io "  https://api.telegram.org/bot1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5/setWebhook
const (
	BotToken = "1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5"

	GetFileUrl       = "https://api.telegram.org/bot%s/getFile?file_id=%s"
	DownloadFileUrl  = "https://api.telegram.org/file/bot%s/%s"
	SendMsgToUserUrl = "https://api.telegram.org/bot%s/sendMessage"
)

type webhookReqBody struct {
	Message Msg
}

type Msg struct {
	MessageId int    `json:"message_id"`
	Text      string `json:"text"`
	From      struct {
		ID        int64  `json:"id"`
		FirstName string `json:"first_name"`
		Username  string `json:"username"`
	} `json:"from"`
	Photo *[]PhotoSize `json:"photo"`
	Chat  struct {
		ID        int64  `json:"id"`
		FirstName string `json:"first_name"`
		Username  string `json:"username"`
	} `json:"chat"`
	Date  int `json:"date"`
	Voice struct {
		Duration int64  `json:"duration"`
		MimeType string `json:"mime_type"`
		FileId   string `json:"file_id"`
		FileSize int64  `json:"file_size"`
	} `json:"voice"`
}

type PhotoSize struct {
	FileID   string `json:"file_id"`
	Width    int    `json:"width"`
	Height   int64  `json:"height"`
	FileSize int64  `json:"file_size"`
}
type ImgFileInfo struct {
	Ok     bool `json:"ok"`
	Result struct {
		FileId       string `json:"file_id"`
		FileUniqueId string `json:"file_unique_id"`
		FileSize     int    `json:"file_size"`
		FilePath     string `json:"file_path"`
	} `json:"result"`
}

func main() {
	log.Println("Run BOT server ....")
	err := http.ListenAndServe(":3000", http.HandlerFunc(Handler))
	if err != nil {
		log.Fatalln(err)
	}
}

// This handler is called everytime telegram sends us a webhook event
func Handler(w http.ResponseWriter, r *http.Request) {
	// First, decode the JSON response body
	webhookBody := &webhookReqBody{}
	err := json.NewDecoder(r.Body).Decode(webhookBody)
	if err != nil {
		log.Println("could not decode request body", err)
		return
	}

	// ------------------------- Download last img

	var downloadResponse *http.Response

	if webhookBody.Message.Photo == nil {
		log.Println("no photo in webhook body. webhookBody: ", webhookBody)
		return
	}
	for _, photoSize := range *webhookBody.Message.Photo {
		// GET JSON ABOUT OUR IMG (ORDER TO GET FILE_PATH)
		imgFileInfoUrl := fmt.Sprintf(GetFileUrl, BotToken, photoSize.FileID)
		rr, err := http.Get(imgFileInfoUrl)
		if err != nil {
			log.Println("unable retrieve img by FileID", err)
			return
		}
		defer rr.Body.Close()
		// READ JSON
		fileInfoJson, err := ioutil.ReadAll(rr.Body)
		if err != nil {
			log.Println("unable read img by FileID", err)
			return
		}
		// UNMARSHAL JSON
		imgInfo := &ImgFileInfo{}
		err = json.Unmarshal(fileInfoJson, imgInfo)
		if err != nil {
			log.Println("unable unmarshal file description from api.telegram by url: "+imgFileInfoUrl, err)
		}
		// GET FILE_PATH

		downloadFileUrl := fmt.Sprintf(DownloadFileUrl, BotToken, imgInfo.Result.FilePath)
		downloadResponse, err = http.Get(downloadFileUrl)
		if err != nil {
			log.Println("unable download file by file_path: "+downloadFileUrl, err)
			return
		}
		defer downloadResponse.Body.Close()
	}

	// --------------------------- Send img to server recognition.
	recognitionClient := recognition.New()
	msg := recognitionClient.Recognize(downloadResponse)

	if err := sendResponseToUser(webhookBody.Message.Chat.ID, msg); err != nil {
		log.Println("error in sending reply: ", err)
		return
	}
}

// The below code deals with the process of sending a response message
// to the user

// Create a struct to conform to the JSON body
// of the send message request
// https://core.telegram.org/bots/api#sendmessage
type sendMessageReqBody struct {
	ChatID int64  `json:"chat_id"`
	Text   string `json:"text"`
}

// sendResponseToUser notify user - what found on image.
func sendResponseToUser(chatID int64, msg string) error {
	// Create the request body struct
	msgBody := &sendMessageReqBody{
		ChatID: chatID,
		Text:   msg,
	}

	// Create the JSON body from the struct
	msgBytes, err := json.Marshal(msgBody)
	if err != nil {
		return err
	}

	// Send a post request with your token
	res, err := http.Post(fmt.Sprintf(SendMsgToUserUrl, BotToken), "application/json", bytes.NewBuffer(msgBytes))
	if err != nil {
		return err
	}
	if res.StatusCode != http.StatusOK {
		buf := new(bytes.Buffer)
		_, err := buf.ReadFrom(res.Body)
		if err != nil {
			return err
		}
		return errors.New("unexpected status: " + res.Status)
	}

	return nil
}

The client code that sends the image request from the Bot to the Recognition Server:

package recognition

import (
	"io/ioutil"
	"log"
	"net/http"
)

const imgRecognitionAddress = "http://localhost:8080/"

type Client struct {
	httpClient *http.Client
}

func New() *Client {
	return &Client{
		httpClient: &http.Client{},
	}
}

func (c *Client) Recognize(downloadResponse *http.Response) string {
	var msg string
	method := "POST"

	req, err := http.NewRequest(method, imgRecognitionAddress, downloadResponse.Body)
	if err != nil {
		log.Println("error from server recognition", err)
		return msg
	}
	req.Header.Add("Content-Type", "image/png")

	// do request to server recognition.
	recognitionResponse, err := c.httpClient.Do(req)
	if err != nil {
		log.Println(err)
		return msg
	}
	defer func() {
		er := recognitionResponse.Body.Close()
		if er != nil {
			log.Println(er)
		}
	}()

	recognitionResponseBody, err := ioutil.ReadAll(recognitionResponse.Body)
	if err != nil {
		log.Println("error on read response from server recognition", err)
		return msg
	}
	msg = string(recognitionResponseBody)

	return msg
}

By the way, to make our bot work correctly-register our handler. To do this, run:

ngrok http 3000

Immediately after executing this command, you will see a list of public addresses. The last one will be an address with HTTPS - we need it. For example, it can be:

https://9068b6869da7.ngrok.io.

And directly register our bot-say Telegram where to send webhooks:

curl -F "url=https://9068b6869da7.ngrok.io"  https://api.telegram.org/bot1695571234:AAEbodyrfOjto2xNE5yjpQpW2Gyq0Ob5X24D5/setWebhook

Now you can send a file with a photo to your bot and get information about what is depicted on it.

Thanks for your attention.