How to Perform Data Augmentation with Augly Library

Written by davisdavid | Published 2021/08/04
Tech Story Tags: machine-learning | data-augmentation | data-science | data-engineering | artificial-intelligence | blogging-fellowship | ai | hackernoon-top-story

TLDR An open-source library from Facebook called Augly is a data augmentation library that can help you evaluate and improve the robustness of your models. The library was developed by Joanna Bitton — a software Engineer at Facebook AI, Zoe Papakipos — Research Engineer at FAIR, and other researchers and engineers at Facebook. The library supports four modalities (audio, video, image, and text) and it contains over 100 ways to perform data augmentations. It has been used in different projects such as:Image Similarity Challenge - a NeurIPS 2021 competition run by Facebook AI with $200k in prizes.via the TL;DR App

In machine learning and deep learning, having more data is very important to help you get good performance from your models. You can create more data by using a technique called data augmentation. Data augmentation is a technique used by practitioners to increase the data by creating modified data from the existing data.
 “We don’t have better algorithms. We just have more data.”- Peter Norvig
It is a good practice to use data augmentation techniques if you have a small dataset for your project or you want to reduce overfitting in your ML or deep learning (DL) models. 
In this article, you will learn how to perform data augmentation by using a new open-source library from Facebook called Augly.

What is Augly?

AugLy is a data augmentation library that can help you evaluate and improve the robustness of your models. The library supports four modalities (audio, video, image, and text) and it contains over 100 ways to perform data augmentations. 
If you are working on a machine learning or deep learning project that uses audio, videos, images, or texts datasets, you can use this library to increase your data and improve your model performance.
The library was developed by Joanna Bitton — a software Engineer at Facebook AI, Zoe Papakipos — Research Engineer at FAIR, and other researchers and engineers at Facebook.
 The library has been used in different projects such as:
  • Image Similarity Challenge - a NeurIPS 2021 competition run by Facebook AI with $200k in prizes. It has produced the DISC21 dataset, which will be made publicly available after the challenge concludes!
  • DeepFake Detection Challenge - a Kaggle competition run by Facebook AI in 2020 with $1 million in prizes; also produced the DFDC dataset.
  • SimSearchNet - a near-duplicate detection model developed at Facebook AI to identify infringing content on the platforms.

How to Install Augly

AugLy is a Python 3.6+ library. It can be installed with:
pip install augly
Note: The above command installs only base requirements to use the image and text modalities. For audio and video modalities, you can install the extra dependencies required with 
pip install augly[av]
In some environments, pip doesn't install python-magic as expected. In that case, you will need to additionally run:
conda install -c conda-forge python-magic

Data Augmentation Techniques for Text Data

The first step is to import text modality which contains augmentation techniques for text data.
import augly.text as textaugs
Then create a simple text input.
# Define input text
input_text = "Hello, world! Today we learn Data Augmentation techniques"
Now we can apply various augmentations as follows:

(a) Simulates Typos

Simulates typos in each text using misspellings, keyboard distance, and swapping techniques.
print(textaugs.simulate_typos(input_text))
Hello, world! Today ew leanr Dtaa Augmentation techniques
As you can see this technique adds some misspellings and swapping on some of the words of text.

(b) Insert Punctuation Chars

You can insert punctuation characters in each input text.
print(textaugs.insert_punctuation_chars(input_text))
['H,e,l,l,o,,, ,w,o,r,l,d,!, ,T,o,d,a,y, ,w,e, ,l,e,a,r,n, ,D,a,t,a, ,A,u,g,m,e,n,t,a,t,i,o,n, ,t,e,c,h,n,i,q,u,e,s']

(c) Replace Bidirectional

This technique reverses each word (or part of the word) in each input text and uses bidirectional marks to render the text in its original order. It reverses each word separately which keeps the word order even when a line wraps.
print(textaugs.replace_bidirectional(input_text))
['\u202eseuqinhcet noitatnemguA ataD nrael ew yadoT !dlrow ,olleH\u202c']

(d) Replace Similar Characters 

This replaces letters in each text with similar characters.
print(textaugs.replace_similar_chars(input_text))
Hello, wor7d! T()day we learn Data Augm3^tati[]n techniques
As you can see the character “l” has been replaced with number 7, character “o” has been replaced with “()”, character “e” has been replaced with number 3 and then the character “o” has been replaced with “[]”.

(e) Replace Upside Down

This flips words in the text upside down depending on the granularity.
print(textaugs.replace_upside_down(input_text))
sǝnbᴉuɥɔǝʇ uoᴉʇɐʇuǝɯɓnⱯ ɐʇɐᗡ uɹɐǝl ǝʍ ʎɐpoꞱ ¡plɹoʍ 'ollǝH

(f) Split Words 

This function splits words in the text into subwords.
print(textaugs.split_words(input_text))
He llo, world! To day we learn Data Augmentation techniques

Data Augmentation Techniques for Image Data

The first step is to import image modality with its dependencies which contain augmentation techniques for image data.
import os
import augly.image as imaugs
import augly.utils as utils
from IPython.display import display
Now we can apply various augmentations as follows:

 (a) Image Scaling

The scale function can help you to alter the resolution of an image. You can use an argument called factor to define the ratio by which the image should be downscaled or upscaled.
input_img_path = "images/simple-image.jpg"

# We can use the AugLy scale augmentation

input_img = imaugs.scale(input_img_path, factor=0.2)
display(input_img)

(b) Blurs the Image

In this function, the larger the radius the blurrier the image.
input_img = imaugs.blur(input_img, radius=5.0)
display(input_img)

(c) Change the Brightness of the Image

To change the brightness you need to adjust the factor argument in this function. Values less than 1.0 darken the image and values greater than 1.0 brighten the image. Setting the factor to 1.0 will not alter the image's brightness.
Let's set factor's value  be 1.5.
input_img = imaugs.brightness(input_img,factor=1.5)
display(input_img)
Then let’s set the factor's value to 0.5 to make it darker.
#make it darker 
input_img = imaugs.brightness(input_img,factor=0.5)
display(input_img)

(d) Changes the Aspect Ratio of the Image

In this function, the aspect ratio is the width/height of the new image you want to create.
input_img = imaugs.change_aspect_ratio(input_img, ratio=0.8)
display(input_img)

(e)Alters the Contrast of the Image 

In this function the factor argument handle everything, When you set the factor to zero, it gives a grayscale image, values below 1.0 decreases contrast,
 A factor of 1.0 gives the original image, and a factor greater than 1.0 increases the contrast.
input_img = imaugs.contrast(input_img,factor=1.7)
display(input_img)

(f) Crop the Image 

To crop the image, you need to define the position of the left, right top and down the edge of the cropped image.
input_img = imaugs.crop(input_img,
                        x1=0.25,
                        x2=0.75,
                        y1=0.25,
                        y2=0.75
                        )
display(input_img

Final Thoughts on Data Augmentation with Augly Library

In this article, you have learned the importance of data augmentation in your ML or DL project. Also, you have learned how to perform data augmentation with augly library for image and text data.
As I have explained before, the library has over 100 augmentation techniques and most of them were not covered in this article.
If you want to learn how to perform data augmentation for audio and video data, please read in the README for each modality!
If you learned something new or enjoyed reading this article, please share it so that others can see it. Until then, see you in the next post!
You can also find me on Twitter @Davis_McDavid.
And you can read more articles like this here.
Want to keep up to date with all the latest in python? Subscribe to our newsletter in the footer below

Written by davisdavid | Data Scientist | AI Practitioner | Software Developer| Technical Writer
Published by HackerNoon on 2021/08/04