SDEdit Helps Regular People Do Complex Graphic Design Tasks

Written by whatsai | Published 2021/08/10
Tech Story Tags: artificial-intelligence | technology | image-synthesis | image-generation | ai | machine-learning | computer-vision | hackernoon-top-story | web-monetization

TLDR

Chenling Meng et al. from Stanford University and Carnegie Mellon University can generate new images from any user-based inputs. Even people like me with zero artistic skills can now generate beautiful images or modifications out of quick sketches. It may sound weird at first, but by just adding noise to the input, they can smooth out the undesirable artifacts, like the user edits, while preserving the overall structure of the image. This new noisy input is then sent to the model to reverse this process. Learn more in the video and watch the amazing results!via the TL;DR App

Say goodbye to complex GAN and transformer architectures for image generation. This new method by Chenling Meng et al. from Stanford University and Carnegie Mellon University can generate new images from any user-based inputs.

Even people like me with zero artistic skills can now generate beautiful images or modifications out of quick sketches. It may sound weird at first, but by just adding noise to the input, they can smooth out the undesirable artifacts, like the user edits, while preserving the overall structure of the image.

So the image now looks like this, complete noise, but we can still see some shapes of the image and stroke, and specific colors. This new noisy input is then sent to the model to reverse this process and generate a new version of the image following this overall structure.

Meaning that it will follow the overall shapes and colors of the image, but not so precisely that it can create new features like replacing this sketch with a real-looking beard. Learn more in the video and watch the amazing results!

Watch the video

References:

►Read the full article: https://www.louisbouchard.ai/image-synthesis-from-sketches/
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/
►SDEdit, Chenlin Meng et al., 2021, https://arxiv.org/pdf/2108.01073.pdf
►Project link: https://chenlin9.github.io/SDEdit/
►Code: https://github.com/ermongroup/SDEdit
►Demo: https://colab.research.google.com/drive/1KkLS53PndXKQpPlS1iK-k1nRQYmlb4aO?usp=sharing

Video Transcript

00:00

say goodbye to complex GAN and

00:02

transformer architectures for image

00:03

generation

00:04

this new method by channing meng el from

00:07

stanford university and carnegie mellon

00:09

university can generate new images from

00:12

any user based inputs even people like

00:14

me

00:15

with zero artistic skills can now

00:17

generate beautiful images

00:18

or modifications out of quick sketches

00:21

it may sound weird at first but just by

00:23

adding noise to the input

00:25

they can smooth out the undesirable

00:26

artifacts like the user edits

00:28

while preserving the overall structure

00:30

of the image so the image now looks like

00:32

this

00:33

complete noise but we can still see some

00:35

shapes of the image and strokes and

00:37

specific colors

00:38

this new noisy input is then sent to the

00:40

model to reverse this process

00:42

and generate a new version of the image

00:44

following this overall structure

00:46

meaning that it will follow the overall

00:48

shapes and colors

00:49

of the image but not so precisely that

00:51

it can create

00:52

new features like replacing the sketch

00:54

with a real looking beard

00:56

the same way you can send a complete

00:58

draft of an image like this

01:00

add noise to it and it will remove the

01:02

noise by simulating the reverse steps

01:04

this way it will gradually improve the

01:06

quality of the generated image following

01:08

a specific dataset style

01:10

from any input this is why you don't

01:12

need any drawing skills anymore

01:14

since it generates an image from noise

01:16

it has no id and doesn't need to know

01:19

the initial input before applying noise

01:21

this is a big difference and a huge

01:23

advantage compared to other generative

01:25

networks

01:26

like conditional GANs where you train a

01:28

model to go from one style to another

01:30

with image pairs coming from two

01:32

different but related data sets

01:34

by the way if you find this interesting

01:36

don't forget to subscribe like the video

01:38

and share it with your friends or

01:39

colleagues

01:40

it helps a lot thank you this model

01:42

called sd edits

01:44

uses stochastic differential equations

01:46

or sdes

01:47

which means that by injecting gaussian

01:49

noise they transform

01:50

any complex data distribution into a

01:53

known prior

01:54

distribution this known prior

01:56

distribution is seen

01:57

during training and this is what the

01:59

model is trained on to reconstruct the

02:01

image

02:02

so the model learns how to transform

02:04

this gaussian noisy input

02:05

into a less noisy image and repeats this

02:08

step until we have an image

02:10

following the one style this method

02:12

works with whatever type of input

02:14

because if you add enough noise to it

02:16

the image will become so noisy that it

02:18

joins the known distribution

02:20

then the model can take this known

02:22

distribution and

02:23

do the reverse steps denoising the image

02:26

based on what it was trained on

02:28

indeed just like GANs we need a target

02:31

dataset

02:32

which is the kind of data or images we

02:34

want to generate

02:35

for example to generate realistic faces

02:37

we need a data set

02:38

full of realistic faces then we add

02:41

noise to these face

02:42

images and teach the model to denoise

02:45

them iteratively and this is the beauty

02:47

of this model

02:48

because once it has learned how to

02:50

denoise an image we can pretty much do

02:52

anything to the image

02:53

before adding noise to it like adding

02:55

strokes since they are blended within

02:57

the expected image distribution

02:59

from the noise we are adding typically

03:02

editing an image based on

03:04

such strokes is a challenging task for a

03:06

gan architecture

03:07

since these strokes are extremely

03:08

different from the image and from what

03:10

the model has seen

03:12

during training a GAN architecture will

03:14

need two data sets to fix this

03:16

the target data set which will be the

03:17

one we try to imitate and a source data

03:20

set which is the images with strokes

03:22

that we are trying to edit these are

03:25

called paired

03:26

datasets because we need each image to

03:28

come in pairs

03:29

in both data sets to train our model on

03:32

we also need to define a proper loss

03:34

function to train it

03:35

making the image synthesis process very

03:38

expensive and time consuming

03:40

in our case with sd edits we do not need

03:43

any paired data sets since the stroke

03:45

and the image styles are merged

03:47

because of this noise this makes the new

03:49

noisy image part of the known data

03:52

for the model which uses it to generate

03:54

a new image very similar to the training

03:56

data set

03:57

but taking the new structure into

03:59

account in other words

04:00

it can easily take an edited image as

04:03

input

04:03

blurs it just enough but not too much to

04:06

keep global semantics and structural

04:08

detail

04:09

and denoise it to produce a new image

04:11

that magically takes your edits into

04:13

account

04:14

and the model wasn't even trained with

04:16

strokes or edits examples only with the

04:19

original images

04:20

of course in the case of a simple user

04:23

edit

04:23

they carefully designed the architecture

04:25

to only generate the edited part and not

04:27

recreate

04:28

the whole picture this is super cool

04:30

because it enables applications such as

04:32

conditional image generation

04:34

stroke based image synthesis and editing

04:37

image and painting colorization and

04:39

other inverse problems to be solved

04:41

using a single unconditional modal

04:44

without

04:45

retraining it of course this will still

04:47

work

04:48

for only one generation style which will

04:50

be the data set it was trained on

04:52

however it's still a big advantage as

04:55

you only need one data set

04:56

instead of multiple related data sets

04:59

with a GAN based

05:00

image and painting network as we

05:02

discussed the only downside

05:04

may be the time needed to generate the

05:05

new image as

05:07

this iterative process takes much more

05:09

time than a single pass

05:10

through a more traditional gan based

05:12

generative model

05:13

still i'd rather wait a couple of

05:15

seconds to have

05:16

great results for an image than having a

05:18

blurry fail

05:19

in real time you can try it yourself

05:22

with the code they made publicly

05:23

available

05:24

or use the demo on their website both

05:26

are linked in the description

05:28

let me know what you think of this model

05:30

i'm excited to see what will happen with

05:32

this

05:32

sd based method in a couple of months or

05:35

even less

05:36

thank you for watching

05:42

[Music]

Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.

Published by HackerNoon on 2021/08/10