GANs Training: Generate Images Following a Sketch

Written by whatsai | Published 2021/08/14
Tech Story Tags: artificial-intelligence | ai | machine-learning | tech | deep-learning | image-synthesis | gans | generative-adversarial-network | web-monetization

TLDR

Machine learning models can now generate new images based on what it has seen from an existing set of images. This type of architecture is called a generative adversarial network, or GAN. You can control your GAN's output based on the simplest type of knowledge you could provide it: hand-drawn sketches. This new method by Sheng-Yu Wang et al. from Carnegie Mellon University and. MIT called Sketch Your Own GAN can take an existing model, a new model, to generate images of cats.via the TL;DR App

Make GANs training easier for everyone by generating Images following a sketch!

Indeed, whit this new method, you can control your GAN's outputs based on the simplest type of knowledge you could provide it: hand-drawn sketches.

Watch the video

Video Transcript

00:00

Machine learning models can now generate new images based on what it has seen from an existing

00:05

set of images.

00:06

We can't really say that the model is creative, as even though the image is indeed new, the

00:12

results are always highly inspired by similar photos it has seen in the past.

00:17

Such a type of architecture is called a generative adversarial network, or GAN.

00:21

If you already know how GANs work, you can skip to this time to see what the researchers

00:26

did.

00:27

If not, I'll quickly go over how it works.

00:29

This powerful architecture basically takes a bunch of images and tries to imitate them.

00:34

There are typically two networks, the generator, and the discriminator.

00:38

Their names are pretty informative...

00:39

The generator tries to generate a new image, and the discriminator tries to discriminate

00:44

such images.

00:45

The training process goes as follows: the discriminator is shown either an image coming

00:49

from our training dataset, which is our set of real images, or an image made by the generator

00:55

called a fake image.

00:56

Then, the discriminator tries to say whether the image was real or fake.

01:01

If the image sent guessed real was fake, we say that the discriminator has been fooled,

01:07

and we update its parameters to improve its detection ability for the next try.

01:11

In reverse, if the discriminator guessed right, saying it was fake, the generator is penalized

01:16

and updated the same way, thus improving the quality of the future generated image.

01:22

This process is repeated over and over until the discriminator is fooled half the time,

01:26

meaning that the generated images are very similar to what we have in our real dataset.

01:31

So the generated images now look like they were picked from our dataset, having the same

01:37

style.

01:38

If you'd like to have more details about how a generator and a discriminator model work

01:41

and what they look like on the inside, I'd recommend watching one of the many videos

01:46

I made covering them, like this one appearing on the top right corner

01:50

right now...

01:51

The problem here is that this process has been a black box for a while, and it is extremely

01:56

difficult to train, especially to control what kind of images are generated.

02:00

There has been a lot of progress in understanding what part of the generator network is responsible

02:05

for what.

02:07

Traditionally, building a model with control on the generated images' style to produce

02:11

what we want, like generating images of cats with a specific position,

02:16

needs specialized knowledge in deep learning, engineering work, patience, and a lot of trial

02:21

and error.

02:22

It would also need a lot of image examples, manually curated, of what you aim to generate

02:27

and a great understanding of how the model works to adapt it for your own needs correctly.

02:33

And repeat this process for any change you would like to make.

02:37

Instead, this new method by Sheng-Yu Wang et al. from Carnegie Mellon University and

02:41

MIT called Sketch Your Own GAN can take an existing model, for example, a generator trained

02:46

to generate new images of cats, and control the output based on the simplest

02:50

type of knowledge you could provide it: hand-drawn sketches.

02:54

Something anyone can do, making GANs training a lot more accessible.

02:59

No more hard work and model tweaking for hours to generate the cat in the position you wanted

03:04

by figuring out which part of the model is in charge of which component in the image!

03:08

How cool is that?

03:10

It surely at least deserves a like on this video and sending it to your group chat!

03:15

;) Of course, there's nothing special in generating

03:17

a cat in a specific position, but imagine how powerful this can be.

03:22

It can take a model trained to generate anything, and from a handful of sketches, control what

03:27

will appear while conserving the other details and the same style!

03:30

It is an architecture to re-train a generator model, encouraging it to produce images with

03:36

the structure provided by the sketches while preserving the original model’s diversity

03:41

and the maximum image quality possible.

03:44

This is also called fine-tuning a model, where you take a powerful existing model and adapt

03:49

it to perform better for your task.

03:51

Imagine you really wanted to build a gabled church but didn't know the colors or specific

03:56

architecture?

03:57

Just send the sketch to the model and get infinite inspiration for your creation!

04:02

Of course, this is still early research, and it will always follow the style in your dataset

04:06

you used to train the generator, but still, the images are all *new* and can be surprisingly

04:13

beautiful!

04:14

But how did they do that?

04:15

What have they figured out about generative models that can be taken advantage of to control

04:20

the output?

04:21

There are various challenges for such a task, like the amount of data and the model expertise

04:26

needed.

04:27

The data problem is fixed by using a model that was already trained, which we are simply

04:31

trying to adapt to our task using a handful of sketches instead of hundreds or thousands

04:36

of sketches and image pairs which are typically needed.

04:40

To attack the expertise problem, instead of manually figuring out the changes to make

04:44

to the model, they transform the generated image into a sketch representation using another

04:50

model trained to do that, called Photosketch.

04:51

Then, the generator is trained similarly to a traditional GAN training but with two discriminators

04:58

instead of one.

05:00

The first discriminator is used to control the quality of the output, just like a regular

05:04

GAN architecture would have following the same training process we described earlier.

05:09

The second discriminator is trained to tell the difference between the generated sketches

05:13

and the sketches made by the user.

05:15

Thus encouraging the generated images to match the user sketches structure similarly to how

05:20

the first discriminator encourages the generated images to match the images in the initial

05:25

training dataset.

05:27

This way, the model figures out by itself which parameters to change to fit this new

05:31

task of imitating the sketches and removing the model expertise requirements to play with

05:36

generative models.

05:38

This field of research is exciting, allowing anyone to play with generative models and

05:43

control the outputs.

05:44

It is much closer to something that could be useful in the real world than the initial

05:49

models, where you would need a lot of time, money, and expertise to build a model able

05:53

to generate such images.

05:54

Instead, from a handful of sketches anyone can do, the resulting model can produce an

05:59

infinite number of new images that resemble the input sketches allowing many more people

06:04

to play with these generative networks.

06:07

Let me know what you think and if this seems as exciting to you as it is to me!

06:11

If you'd like more detail on this technique, I'd strongly recommend reading their paper

06:15

linked in the description below!

06:17

Thank you for watching.

References

►Read the full article: https://www.louisbouchard.ai/make-gans-training-easier/
►Sheng-Yu Wang et all, "Sketch Your Own GAN", 2021, https://arxiv.org/pdf/2108.02774v1.pdf
►Project link: https://peterwang512.github.io/GANSketching/
►Code: https://github.com/PeterWang512/GANSketching
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.

Published by HackerNoon on 2021/08/14