How to Detect and Delete Emojis in Golang

Written by vgukasov | Published 2021/09/08
Tech Story Tags: go | golang | tutorial | emoji | emojis | coding | mysql | mysql-web-development

TLDR Some time ago I’ve encountered an issue when 10 million messages with emoji were written in the MySQL table with utf8 encoding. The fast solution was to prevent the insertion of messages in a database. The service back-end is written in Golang. The best way is to have a storage with all emojis and use it when you need to detect an emoji in a text. That’s how the [GoMoji library works. It uses local emoji list as provider. It checks whether given string contains emoji or not.via the TL;DR App

Some time ago, I encountered an issue when 10 million messages with emoji were written in the MySQL table with utf8 encoding.

For those who don’t know: you should use the ut8mb4 encoding in MySQL to support emoji

If it were a small table, I would alter it with the conversion of columns to utf8mb4 encoding. But the table contains hundreds of millions of rows in many shards, so it’s very hard to alter it in the production runtime without the system degradation.

So the fast solution was to prevent the insertion of messages in a database on the back-end level. The service back-end is written in Golang. In this article, I’ll explain a few solutions to work with emojis in Go: the good way and the bad way.

Not So Good Way 🙅‍♂️

Some libraries and forums suggest using regexp to find emojis in a text:

package main

import (
	"fmt"
	"regexp"
)

func main() {
	var emojiRx = regexp.MustCompile(`[\x{1F600}-\x{1F6FF}|[\x{2600}-\x{26FF}]`)

	fmt.Println(emojiRx.MatchString("Message with emoji 😆😛")) // true
}

It looks like it works, but some disadvantages make us don’t use the way:

  • it detects NOT all emojis:
fmt.Println(emojiRx.MatchString("Message with emoji 👏")) // false
  • the regexp isn’t readable [\x{1F600}-\x{1F6FF}|[\x{2600}-\x{26FF}]
  • regular expressions have poor performance, so it’s inappropriate for high-load apps

Good Solution 🔥

The best way is to have a storage with all emojis and use it when you need to detect an emoji in a text. That’s how the GoMoji library works.

Firstly let’s add the package to our project:

go get -u github.com/forPelevin/gomoji

Now it’s pretty simple to check whether a string contains emoji:

package main

import (
	"fmt"

	"github.com/forPelevin/gomoji"
)

func main() {
	fmt.Println(gomoji.ContainsEmoji("Message with emoji 👏")) // true
}

GoMoji Internals 🧐

Let’s deep dive into the library internals and figure out how it works.

Firstly, look into the ContainsEmoji function:

// ContainsEmoji checks whether given string contains emoji or not. It uses local emoji list as provider.
func ContainsEmoji(s string) bool {
	for _, r := range s {
		if _, ok := emojiMap[r]; ok {
			return true
		}
	}

	return false
}

We see the lib iterates through string’s runes and checks whether it’s an emoji or not via emojiMap. So the complexity of the function is O(N), where N is runes count.

But what is emojiMap? It’s a map of Emoji models by their hex code:

// Emoji is an entity that represents comprehensive emoji info.
type Emoji struct {
	Slug        string `json:"slug"`
	Character   string `json:"character"`
	UnicodeName string `json:"unicode_name"`
	CodePoint   string `json:"code_point"`
	Group       string `json:"group"`
	SubGroup    string `json:"sub_group"`
}

// Code generated by generator.go ; DO NOT EDIT.

package gomoji

var (
	emojiMap = map[int32]Emoji{

		42: {
			Slug:        "keycap",
			Character:   "*️⃣",
			UnicodeName: "keycap: *",
			CodePoint:   "002A FE0F 20E3",
			Group:       "symbols",
			SubGroup:    "keycap",
		},
        ...
    }
)

So the pre-generated map gives us some advantages:

  • it contains all existed emojis

  • it returns an emoji by a hex code in 0(1) which is an outstanding performance.

    There are some benchmarks:

BenchmarkContainsEmojiParallel-8   	94079461	        13.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkContainsEmoji-8           	23728635	        49.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkFindAllParallel-8         	10220854	       115 ns/op	     288 B/op	       2 allocs/op
BenchmarkFindAll-8                 	 4023626	       294 ns/op	     288 B/op	       2 allocs/op

The reasonable question is where it takes all emojis data?

An answer to the question is in the generator.go file. It contains the CLI app that fetches all emojis from OpenAPI Emoji and saves them in the data.go file in emojiMap map[int32]Emoji format. It allows the lib to keep emojis up to date.

Conclusion 💡

As software engineers, we encounter problems every day. The best solution isn’t always the first that comes to mind or is founded in stackoverflow.

So if you are stuck with emojis, consider using a simple and useful GoMoji lib. It can help you not only validate texts but make great features in your chat app.


Written by vgukasov | Senior SWE at Akma Trading
Published by HackerNoon on 2021/09/08