Imagic: AI Image Editing from Text Commands

Written by whatsai | Published 2022/10/23
Tech Story Tags: machine-learning | artificial-intelligence | hackernoon-top-story | youtubers | computer-vision | youtube-transcripts | ai-top-story | technology | web-monetization

TLDRImagic takes such a diffusion-based model able to take text and generate images out of it and adapts the model to edit the images. You can generate an image and then teach the model to edit it any way you want. Imagic: Text-Based Real Image Editing with Diffusion Models. ArXiv preprint arXiv: 2210.09276. Use it with Stable Diffusion: https://://://www.louisbouchard.ai/imagic/via the TL;DR App

This week’s paper may just be your next favorite model to date.
If you think the recent image generation models like DALLE or Stable Diffusion are cool, you just won’t believe how incredible this one is.
"This one" is Imagic.
Imagic takes such a diffusion-based model able to take text and generate images out of it and adapts the model to edit the images. Just look at that... You can generate an image and then teach the model to edit it any way you want.
Learn more in the video below...

References:

►Read the full article: https://www.louisbouchard.ai/imagic/
►Kawar, B., Zada, S., Lang, O., Tov, O., Chang, H., Dekel, T., Mosseri, I. and Irani, M., 2022. Imagic: Text-Based Real Image Editing with Diffusion Models. arXiv preprint arXiv:2210.09276.
► Use it with Stable Diffusion: https://github.com/justinpinkney/stable-diffusion/blob/main/notebooks/imagic.ipynb
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video Transcript

0:24
look at that you can generate an image
0:26
and then teach the model to edit it any
0:29
way you want this is a pretty big step
0:31
towards having your very own Photoshop
0:33
designer for free the model not only
0:36
understands what you want to show but
0:38
it's also able to stay realistic as well
0:41
as keeping the properties of the initial
0:43
images just look at how the dog stays
0:46
the same in all images here this task is
0:49
called text conditioned image editing
0:51
this means editing images by only using
0:54
text and an initial image which was
0:57
pretty much impossible not even a year
0:59
ago now look at what it can do yes this
1:03
is all done from a single input image
1:05
and a short sentence where you see what
1:07
you'd like to have how amazing is that
1:09
the only thing even cooler is how it
1:12
works let's dive into it but first if
1:15
you are currently learning AI or want to
1:17
start learning it you will love this
1:19
opportunity I know how hard it can be to
1:22
make real progress when learning AI
1:24
sometimes extra structure and
1:26
accountability can be what propos you to
1:29
the next level if that sounds like you
1:31
join the sponsor of this video Delta
1:33
Academy at Delta Academy you learn
1:36
reinforcement learning by building game
1:38
AIS in a live cohort go from zero to
1:41
alphago through export crafted
1:43
interactive tutorials live discussions
1:46
with these experts and weekly AI
1:48
building competitions it's not just
1:51
another course spam website it's intense
1:53
hand-on and focused on high quality
1:56
designed by experts from deepmind Oxford
1:58
and Cambridge it's where coders go to
2:01
Future proof their carrier from the
2:03
advance of AI and have fun plus with a
2:06
live community of peers and experts to
2:08
push you forward you'll write iconic
2:10
algorithms in Python ranging from dqn to
2:13
alphago one of the coolest program ever
2:16
made join them now through my link below
2:18
and use the promo code what's AI to get
2:21
10 off
2:23
so how does iMagic work as we said it
2:26
takes an image and a caption to edit the
2:29
set image and you can even generate
2:31
multiple variations of it this model
2:33
like the vast majority of the papers
2:35
that are released these days is based on
2:38
diffusion models more specifically it
2:41
takes an image generator model that was
2:43
already trained to generate images from
2:45
text and adapts it to image editing in
2:48
their case it uses Imogen which I
2:51
covered in a previous video it's a
2:53
diffusion based generative model able to
2:55
create high definition images after
2:57
being trained on a huge data set of
3:00
image caption pairs in the case of
3:02
iMagic they simply take this pre-trained
3:05
imagen model as a Baseline and make
3:08
modifications to it in order to edit the
3:10
images sent as input keeping the image
3:13
specific appearance such as the dog's
3:16
race and identity and editing it
3:18
following our text so to start we have
3:21
to encode both the text and the initial
3:23
image Edge so that it can be understood
3:25
by our Imaging model when this is done
3:28
we optimize our text encodings our text
3:31
embeddings to better fit our initial
3:33
image basically taking our text
3:35
representation and optimize it for our
3:38
initial image called e optimize to be
3:41
sure it understands that in this example
3:43
we want to generate the same kind of
3:45
image with a similar looking bird and
3:48
background then we take our pre-trained
3:51
image generator to fine tune it meaning
3:53
that we will retrain the image and model
3:55
keeping the optimized text embeddings we
3:58
just produced the same so these two
4:01
steps are used to get the text embedding
4:03
closer to the image embedding by
4:06
freezing one of the two and getting the
4:08
other closer which will ensure that we
4:10
optimize for both the text and initial
4:12
image not only one of the two now that
4:15
our model understands the initial image
4:17
in our text and understands that they
4:19
are similar we need to teach it to
4:21
generate New Image variations for this
4:24
text this spark is super simple our text
4:27
embeddings and image optimized
4:29
embeddings are very similar but still
4:32
not the exact same the only thing we do
4:34
here is that we take the image embedding
4:36
in our encoded space and move it a bit
4:39
toward the text embedding at this moment
4:42
if you ask the iMagic model to generate
4:45
an image using the optimized text it
4:47
should give you the same image as your
4:49
input image so if you move the embedding
4:52
a bit toward your text embeddings it
4:55
will also edit the image a bit toward
4:58
what you want the more you move it in
5:00
this space the more the edit will be big
5:02
and the farther away you will get from
5:05
your initial image so the only thing you
5:07
need to figure out now is the size of
5:10
this step you want to take toward your
5:12
text and voila when you find your
5:15
perfect balance you have a new model
5:17
able to generate as many variations as
5:20
you want to conserve the important image
5:22
attribute views while editing the way
5:25
you want of course the results are not
5:27
perfect yet as you can see here where
5:30
the model either does not edit properly
5:32
or does random image modifications to
5:35
the initial image like cropping or
5:37
zooming inappropriately still it stays
5:40
pretty impressive if you ask me I find
5:42
the pace of the image generation
5:44
progress incredible and that's both
5:47
amazing and scary at the same time I'd
5:50
love to know your opinion on these kinds
5:52
of image generating and image editing
5:54
models do you think they are a good or
5:57
bad thing what kinds of consequences you
5:59
can think of from such models becoming
6:02
more and more powerful you can find more
6:04
details on the specific parameters they
6:06
use to achieve these results in their
6:08
paper which I definitely invite you to
6:10
read I also invite you to watch my image
6:13
and video if you'd like more information
6:14
about the image generation part and
6:17
understand how it works huge thanks to
6:20
my friends at Delta Academy for working
6:22
on making learning AI fun something I am
6:26
passionate about please give it a try
6:28
and let me know what you think I
6:30
personally love this way of teaching and
6:33
I am sure you will too thank you for
6:35
supporting my work by checking out their
6:37
website and by watching the whole video
6:39
and I hope you enjoyed it I will see you
6:42
next week with another amazing paper


Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.
Published by HackerNoon on 2022/10/23