High-Resolution Photorealistic Image Translation in Real Time

Written by whatsai | Published 2021/05/29
Tech Story Tags: style-tranfer | machine-learning | artificial-intelligence | ai | computer-vision | hackernoon-top-story | neural-networks | image-recognition | web-monetization

TLDR You can apply any design, lighting, or graphics style to your 4K image in real-time using this new machine learning-based approach. If you think this looks interesting, watch the video on this topic and read more about it from the references below. They could translate 4K images in not even a tenth of a second using a single regular GPU. It is faster than all these approaches on 480p image translations! But how is that possible? They are 80 times faster on average!via the TL;DR App

You can apply any design, lighting, or graphics style to your 4K image in real-time using this new machine learning-based approach! If you think this looks interesting, watch the video on this topic and read more about it from the references below 👇

Watch the video

References

►Liang, Jie and Zeng, Hui and Zhang, Lei, (2021), "High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network", https://export.arxiv.org/pdf/2105.09188.pdf

Video Transcript

00:00
You've all seen these kinds of pictures where a person's face is "toonified" into an anime
00:05
character.
00:06
Many of you must have seen other kinds of image transformations like this, where an
00:10
image is changed to follow the style of a certain artist.
00:13
Here, an even more challenging task could be something like this, where an image is
00:18
transformed into another season or time of the day.
00:21
What you have not seen yet is the time it takes to produce these results and the actual
00:25
resolutions of the produced pictures.
00:28
This new paper is completely transparent towards this as it attacks exactly this problem.
00:33
Indeed, compared to most approaches, they translate high-definition 4K images, and this
00:38
is done in real-time.
00:39
In this work, they showed their results on season translation, night and day translations,
00:45
and photo retouching, which you've been looking at for the last minute.
00:49
This task is also known as 'image-to-image translation', and all the results you see
00:53
here were produced in 4K.
00:55
Of course, this video is not in 4K, and the images were taken from their paper, so it
01:00
might not look that high-quality here.
01:03
Please look at their paper or try their code if you are not convinced!
01:07
These are the most amazing results of this paper.
01:09
Here, you can see their technique below called LPTN, which stands for Laplacian Pyramid Translation
01:16
Network.
01:17
Look at how much less time it took LPTN to produce the image translations where most
01:22
approaches cannot even do it as this amount of definition is just too computationally
01:27
demanding.
01:28
And yes, this is in seconds.
01:30
They could translate 4K images in not even a tenth of a second using a single regular
01:36
GPU.
01:37
It is faster than all these approaches on 480p image translations!
01:41
And yes, it is not eight times faster, but 80 times faster on average!
01:46
But how is that possible?
01:48
How can they be so much more efficient and still produce amazing and high-quality results?
01:53
This is achieved by optimizing the fact that illumination and color manipulation, which
01:58
relates to the style of an image, is contained in the low-frequency component of an image.
02:03
Whereas the content details, which we want to keep when translating an image into another
02:08
style, can be adaptively refined on high-frequency components.
02:13
This is where it becomes interesting.
02:15
These two components can be divided into two tasks that can be performed simultaneously
02:19
by the GPU.
02:21
Indeed, they split the image into low-resolution and high-resolution components, use a network
02:26
to process the information of the low-frequency or the style of the image,
02:30
and render a final image merging this processed style with the refined high-frequency component,
02:37
which is the details of the image but adapted by a smaller sub-network to fit the new style.
02:43
Thus dodging the unavoidable heavy computation consumption when processing the high-resolution
02:48
components in the whole network.
02:50
This has been a long-standing studied field achieved with a popular technique called Laplacian
02:55
Pyramid.
02:57
The main idea of this Laplacian Pyramid method is to decompose the image into high and low-frequency
03:02
segments and reconstruct it afterward.
03:05
First, we produce an average of the initial image, making it blurry and removing high-frequency
03:11
components.
03:12
This is done using a kernel that passes through the whole image to round batches of pixels
03:17
together.
03:18
For example, if they take a 3 by 3 kernel, it would go through the whole image averaging
03:23
3 by 3 patches removing all unique values.
03:26
They are basically blurring the image by softening the edges.
03:30
Then, the difference between this blurry image and the initial image is saved to use at the
03:35
end of the algorithm to re-introduce the details, which are the high-frequency components.
03:41
This is repeated three times with bigger and bigger averaging kernels producing smaller
03:47
and smaller low-frequency versions of the image having less and less high-frequency
03:52
details.
03:53
If you remember, these low-frequency versions of the image contain information about the
03:57
colors in the image and illumination.
03:59
Indeed, they are basically just a blurred low-quality version of our image, which is
04:03
why the model is so much more efficient.
04:06
This is convenient since they are smaller versions of the image, and this is the exact
04:11
information we are trying to change when translating the image into another style.
04:16
Meaning that using these low-frequency versions is much more computationally efficient than
04:21
using the whole image directly, but they are also focused on the information we want to
04:26
change in the image, which is why the results are so great.
04:30
This lower-quality version of the image can be easily translated using an encoder-decoder,
04:35
just like any other image translation technique we previously mentioned, but since it is done
04:40
on a much lower quality image and a much smaller image,
04:43
it is exponentially faster to process.
04:47
The best thing is that the quality of the results only depends on the initially saved
04:52
high-frequency versions of the image sent as input which is not processed throughout
04:57
the whole network.
04:59
This high-frequency information is simply merged at the end of the processing with the
05:03
low-frequency image to improve the details.
05:06
Basically, it is so much faster because the researchers split the image's information
05:11
in two: low-frequency general information and detailed high-frequency information.
05:17
Then, they send only the computational-friendly part of the image, which is exactly what we
05:23
want to transform, the blurry, low-quality general style of the image, or in other words:
05:29
the low-frequency information.
05:30
Then, only fast and straightforward transformations are done on the high-frequency parts of the
05:36
image to resize them and merge them with the blurry newly-stylized image,
05:42
improving the results by adding details on all edges in the picture.
05:46
And voilà!
05:47
You have your results with a fraction of the time and computational power needed.
05:51
This is brilliant, and the code is publicly available if you would like to try it, which
05:56
is always cool!
05:58
As always, the links to the complete article and references are in the description of the
06:02
video.
06:03
Thank you for watching!

Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.
Published by HackerNoon on 2021/05/29