StyleCLIPDraw: Text-to-Drawing Synthesis with Artistic Control

Written by whatsai | Published 2021/11/15
Tech Story Tags: ai | artificial-intelligence | computer-vision | computer-science | technology | innovation | machine-learning | hackernoon-top-story | web-monetization

TLDRSimply take a picture of the style you want to copy, enter the text you. generate, and this algorithm will generate a new picture out of it. The results are extremely impressive, especially if you consider that they were made from a single line of text! If that sounds interesting, watch the video and learn more! You can even achieve that from only text and can try it right now with this new method and their Google Colab notebook available for everyone (see references). Just look back at the results above, such a big step forward!via the TL;DR App

Have you ever dreamed of taking the style of a picture, like this cool TikTok drawing style on the left, and applying it to a new picture of your choice? Well, I did, and it has never been easier to do. In fact, you can even achieve that from only text and can try it right now with this new method and their Google Colab notebook available for everyone (see references).
Simply take a picture of the style you want to copy, enter the text you want to generate, and this algorithm will generate a new picture out of it! Just look back at the results above, such a big step forward! The results are extremely impressive, especially if you consider that they were made from a single line of text! If that sounds interesting, watch the video and learn more!

Watch the video

References

►Read the full article: https://www.louisbouchard.ai/clipdraw/
►CLIPDraw: Frans, K., Soros, L.B. and Witkowski, O., 2021. CLIPDraw:
exploring text-to-drawing synthesis through language-image encoders. https://arxiv.org/abs/2106.14843
►StyleCLIPDraw: Schaldenbrand, P., Liu, Z. and Oh, J., 2021.
StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis. https://arxiv.org/abs/2111.03133
►CLIPDraw Colab notebook: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb
►StyleCLIPDraw code: https://github.com/pschaldenbrand/StyleCLIPDraw
►StyleCLIPDraw Colab notebook: https://colab.research.google.com/github/pschaldenbrand/StyleCLIPDraw/blob/master/Style_ClipDraw_1_0_Refactored.ipynb
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video Transcript

00:00
have you ever dreamed of taking a
00:01
picture like this cool tick tock drawing
00:03
style and applying it to a new picture
00:06
of your choice well i did and it has
00:08
never been easier to do in fact you can
00:10
even achieve that from only text and you
00:13
can try it right now with this new
00:15
method and their google collab notebook
00:17
available for everyone simply take a
00:19
picture of the style you want to copy
00:21
enter the text you want to generate and
00:23
this algorithm will generate a new
00:25
picture out of it look at that such a
00:28
big step forward the results are
00:30
extremely impressive especially if you
00:31
consider that they were made from a
00:33
single line of text here i tried
00:35
imitating the same style with another
00:37
text input to be honest sometimes it may
00:40
look a bit all over the place especially
00:42
if you select a more complicated or
00:44
messy drawing style like this one
00:46
speaking of something messy if you are
00:47
like me and your model versioning and
00:49
resource tracking looks like this you
00:51
may be the perfect candidate to try the
00:53
sponsor of two days video which is none
00:55
other than weights and biases i always
00:57
assumed i could stack folders like this
00:59
and simply add old v1 v2 v3 and so on to
01:03
my file names without any problem until
01:06
i had to work with someone while it may
01:07
be easy for me to find my old tests it
01:10
was impossible to explain my thought
01:12
process behind this mess and was my
01:14
teammate's nightmare if you care about
01:15
your teammates and reproducibility don't
01:18
do like i did and give weights and
01:20
biases a shot no more notebooks or
01:22
results saved everywhere as it creates a
01:24
super friendly user dashboard for you
01:26
and your team to track your experiments
01:28
and it's super easy to set up and use
01:30
it's the first link in the description
01:32
and i promise within a month you will be
01:34
completely dependent
01:37
as we said this new model by peter
01:39
schaldenbrunn ethel called style clip
01:42
draw which is an improvement upon clip
01:44
draw by kevin franz at all takes an
01:46
image and takes as inputs and can
01:48
generate a new image based on your text
01:50
and following the style in the image so
01:52
the model has to both understand what's
01:54
in the text and the image to correctly
01:56
copy its style as you may suspect this
01:59
is incredibly challenging but we are
02:01
fortunate enough to have a lot of
02:02
researchers working on so many different
02:04
challenges like trying to link text with
02:07
images which is what clip can do quickly
02:10
clip is a model developed by openai that
02:12
can basically associate a line of text
02:14
with an image both the text and images
02:17
will be encoded similarly so that they
02:19
will be very close to each other in the
02:21
new space they are encoded in if they
02:23
both mean the same thing using clip the
02:25
researchers could understand the text
02:27
from the user input and generate an
02:29
image out of it if you are not familiar
02:31
with clip yet i would recommend watching
02:33
a video i made about it together with
02:35
dolly earlier this year but then how did
02:38
they apply a new style to it clip is
02:40
just linking existing images to texts it
02:43
cannot create a new image indeed we also
02:46
need something else to capture the style
02:48
of the image sent in both the textures
02:50
and shapes well the image generation
02:52
process is quite unique it won't simply
02:55
generate an image right away rather it
02:57
will draw on a canvas and get better and
02:59
better over time it will just draw
03:01
random lines at first and create an
03:03
initial image this new image is then
03:06
sent back to the algorithm and compared
03:08
with both the style image and the text
03:10
which will generate another version this
03:12
is one iteration at each iteration we
03:15
draw random curves again oriented by the
03:17
two losses we'll see in a second this
03:19
random process is quite cool since it
03:22
will allow each new test to look
03:24
different so using the same image and
03:26
same text as inputs you will end up with
03:29
different results that may look even
03:31
better here you can see a very important
03:33
step called image augmentation it will
03:35
basically create multiple variations of
03:38
the image and allow the model to
03:39
converge on results that look right to
03:42
humans and not simply on the right
03:44
numerical values for the machine this
03:46
simple process is repeated until we are
03:49
satisfied with the results so this whole
03:51
model learns on the fly over many
03:54
iterations optimizing two losses we see
03:56
here one for aligning the content of the
03:59
image with the text sent and the other
04:01
further style here you can see the first
04:03
lust is based on how close the clip
04:06
encodings are as we said earlier where
04:08
clip is basically judging the results
04:11
and its decision will orient the next
04:12
generation the second one is also very
04:15
simple we send both images into a
04:18
pre-trained convolutional neural network
04:20
like vgg which will encode the images
04:22
similarly to clip we then compare these
04:24
encodings to measure how close they are
04:26
to each other this will be our second
04:29
judge that will orient the next
04:30
generation as well this way using both
04:33
judges we can get closer to the text and
04:35
the wanted style at the same time in the
04:37
next generation if you are not familiar
04:39
with convolutional neural networks and
04:41
encodings i will strongly recommend
04:43
watching the video i made explaining
04:45
them in simple terms this iterative
04:47
process makes the model a bit slow to
04:49
generate a beautiful image but after a
04:51
few hundred iterations or in other words
04:53
after a few minutes you have your new
04:55
image and i promise it's worth the wait
04:58
it also means that it doesn't require
05:00
any other training which is pretty cool
05:02
now the interesting part you've been
05:04
waiting for indeed you can use it right
05:06
now for free or at least pretty cheaply
05:08
using the collab notebook linked in the
05:10
description below i had some problems
05:12
running it and i would recommend buying
05:14
the pro version of collab if you'd like
05:16
to play with it without any issues
05:19
otherwise feel free to ask me any
05:21
questions in the comments if you
05:22
encounter any problems i pretty much
05:24
went through all of them myself to use
05:27
it you simply run all cells like that
05:29
and that's it you can now enter a new
05:31
text for the generation or send a new
05:33
image for the style from a link and
05:35
voila now tweak the parameters and see
05:38
what you can do if you play with it
05:40
please send me the results on twitter
05:42
and tag me i'd love to see them as they
05:44
state in the paper the results will have
05:46
the same biases as the models they use
05:49
such as clip which you should consider
05:51
if you play with it of course this was a
05:53
simple overview of the paper and i
05:55
strongly invite you to read both clip
05:57
draw and style clip draw for more
05:58
technical details and try their collab
06:01
notebook both are linked in the
06:02
description below thank you once again
06:05
weights and biases for sponsoring this
06:07
video and huge thanks to you for
06:09
watching until the end i hope you
06:11
enjoyed this week's video let me know
06:13
what you think and how you will use this
06:15
new model
06:17
[Music]



Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.
Published by HackerNoon on 2021/11/15