How I Made My Own Library of HD GIFs

Till the time I was learning to code on my own, I had the liberty to pick & explore random new stuff and technologies and build small CLIs/chrome extensions for myself.

Every mini project I took had an interesting lesson to teach. But eventually, like everyone else around me, I landed up in a regular company and couldn’t, due to a pretty hectic work schedule, continue with my independent explorations.

But now, fortunately, or unfortunately, due to the covid19 situation, we are all stuck at home, working remotely. This has given me time to go back to the things I loved doing the most and build something exciting and at the same time useful.

Anyway, I love GIFS! I am always using them while chatting with my friends. But, the idea to create an entire GIF hub entered my mind when one of my friends complained about how she couldn’t find great GIFs in HD Quality. She had to visit different websites to come up with one decent GIF!

I was disappointed too! But at this moment the idea to build a resourceful High-Quality GIF hub struck me. I knew it would be another cool project! I dug through all the resources providing GIFs, the GIPHY API, tenor, etc, and I found the same pattern where GIFs had a preview and the main URL available in a variety of sizes. Now, the only problem was how to filter them?

My mentor always says, solve the problem on paper first then go about its practicality. This helped me break down the problem in a below-mentioned way:

Which of these sizes to select if I filter through my own Node Application?
Which JS library to use to process my image?
Where do I store the processed GIFs?
Streamline everything (1, 2, and 3)?
How do I create a collection of HD GIFs?

Picking one problem at a time, I decided to analyze GIPHYs response using GIPHY’s search API for different keywords. I observed the response was…well, refer to the picture and you would know. *Takes a deep breath*

So, I decided to extract one preview and main URL from the response that would give me an array of GIF URLs, and their respective preview URLs in a condensed form.

Woah, What the….? 😆

I made sure that both of them aren’t too small or they could have resulted in deteriorated quality, or too large to process at a time.

My next challenge was how to process these GIFs and set up well-defined criteria for HD quality ones.

I started exploring image processing libraries like Jimp, Sharp, and finally settled for image-size. Since my only aim was to process the size of the image, I chose a lightweight library that would solve my problem.

My next move was to decide the size of the GIFs that would qualify for HD. Again, they needed to be light enough to be processed easily and at the same time should look like an HD image while rendering. After a few iterations, I decided to keep 480 x 360 as the qualifying criterion for my HD GIFs.

Wrapping things up, I was able to fetch images from the API, extract an average-sized URL from the elaborated response, and process it through image-size and filter HD ones. Simple enough?

Now the next biggest challenge was where to store these GIFs. Since the data had a defined schema (the main URL, a preview URL and, a corresponding keyword) I decided to use My SQL Database.

After a few searches, I got a free deal of AWS RDS (Amazon Relational Database Service) providing me 20GB of MySQL storage. This was good for starting my mini-library and with minimal inputs. 🎉

It was my first time with AWS, So I started exploring their other free tier resources. The idea behind the project was to experiment with new technologies with minimal inputs. So, I decided to use Amazon’s Elastic Compute Cloud (EC2) to launch my Application. An EC2 instance was nothing but a virtual machine in the cloud that I could operate from my house.

By this time, The project started to excite me a lot, and I couldn’t stop thinking about how big I can make it. 🎉

But again, I met a set of problems before I could push them to the database. I needed to implement a process where I could move GIF-processing out of the HTTP request and continue in the background.

We used to use RabbitMQ to queue emails at our workplace, where the HTTP response was completed first and emails were sent to the queue which in turn gets delivered later.

That’s when I decided to put RabbitMQ to use to queue my GIF-processing.

RabbitMQ acts as a postman for your services. I will explain it using the same analogy. Imagine, you had a postbox with ample letters in it. And these letters need to be delivered to different consumers. RabbitMQ acts as a postman, who picks these letters from your postbox and delivers them one by one to the required locations. Now, what are these locations in a real-world application?

In a real-world application, a lot of times you need to process certain PDFs/images/videos in the background. If you queue these images(the letters) in the RabbitMQ and ask the rabbit to process them one by one(Imagine the postman with the letters now).

Rabbit has its own consumers(the locations) that receives these images(the letters). Being the owner of your application, you can decide what process to perform upon receiving the image.

You can launch as many consumers as you want depending on how fast you want to process them. Use the analogy again, If I have 100 letters, and I assign 20 postmen to deliver them, the letters would get delivered much faster than a single man doing it. Note that, these workers consume your CPU. Make sure, you don’t launch as many workers that it starts hogging your system.

For my mini-library, I launched 3 workers to process my GIFs. Imagine all of them running parallelly, that’s the power of RabbitMQ.

I found RabbitMQ Image on DockerHub through which I was able to set up RabbitMQ for my project. I decided to use their Management Plugin, which made it easier to monitor the queues through browser-based UI. Cool Stuff, I was able to see my GIFs being processed in the browser itself.💯.

Now the next step was to connect my Amazon RDS to my node application. I decided to use MYSQL workbench to make things easier for me and a remote ssh connection with my database.

Excited enough to move further, I tried connecting the DB to my node Application, But whoops! It failed. How would I ssh the connection for my node application running on my local machine?

After digging through the documentation, I found that my RDS security group rules only allowed connections for a whitelisted IP. So, I decided, I would whitelist my EC2 instance’s IP here since my Application would inturn be running on EC2, and it wouldn’t change its IP as soon as my local machine does. So, it would reduce the hassle of updating IP every time I want to use it. Everything was right in place, RDS whitelisted in my EC2’s IP, and an ssh connection on MySQL Workbench.

so many moving parts ♿️

But my Node application still wouldn’t know my little trickery. I knew I had to tunnel my connection to fool my node application. I made a few Google Searches on how I can tunnel my SQL connection to RDS, and finally after a lot of attempts of switching ports, and explicitly defining localhosts I was successful. I was able to insert my processed GIFs into the RDS. (This was Pats her back moment) 🎉

It was ready, I could create a collection of a million GIFs people can use it.

Now, since everything was in place, I started thinking about what all optimizations to make, as well as a strategy to populate my library.

While observing my manually populated data, I realized, all the GIFs had the same URL they were only differentiated by their respective gif_id, So I thought why not to make them on the go, and store just the gif_id with the keyword and make the URL on the go. With the limited storage, This seemed like a good idea. 💁

Filtered Data with Unique gif_ids

GIFs, matched to cool and awesome keywords were ready. But what do I do with it now?

How do I populate it? If I go out with a GIF-API wrapper and in the background process the HD ones? But this would take forever to build my library 😐
What to do with the keywords? Where do I store them?
What to do with the GIFs that belong to multiple categories.

Pheww. This seemed like a 3-day job in the beginning, and it was already 3 weeks with my full-time job going and the project didn’t seem to come to an end.

(Replays the coding mantra in the heart, KEEP IT SIMPLE STUPID, And use your pen).

And here it was. A separate table for keywords with unique keywords only and a corresponding id, which would be mapped to my GIFs. So, the same gif_id can have multiple `keyword_ids`. I used the GIPHYs categories endpoint to create a list of keywords in my library. PERFECT. My keywords were sorted. so were the gifs.

Now I required a system that could get pre-saved keywords from my DB, pick keywords, and keep calling my API so that my GIFs can be processed in the background. What easier way of doing it than setting up a CRON job for this!

Obviously I had to use some ethical hacks to avoid my API_Key being blocked by GIPHY and here I was with 3lac GIFs within 10days :D

I can’t wait for them to reach a million and build a REST API of it for the general public.💛

Here’s my mini dashboard to let you go through the collection

GIF THIEF- 🙋

My next few challenges include adding tenor and other platform support to it, as well as, how to keep my library updated with the new GIFs.