Zoom in to WebRTC

The WebRTC technology, provided by Google for general use in 2011, has found wide application in video communication systems, video broadcasting, for creating group chats, video conferences and telepresence robots.

Its main advantage is the implementation of basic methods and classes directly in the browser, which provides the widest compatibility with various OS and hardware platforms.

The below article describes the method of video scaling implemented in the BotEyes telepresence robot, using Javascript and HTML5.

One of the WebRTC features is the automatic adaptation of the transmitted video resolution to the communication channel bandwidth.

The bandwidth depends not only on Internet speed but on processor performance and RAM capacity on the receiving and transmitting side. Therefore, it is impossible to transmit video with a resolution of more than 0.5-1 MP when using gadgets of the lower price range and the Internet with a bandwidth of fewer than 5 Mbit/s.

Therefore, if we want to transmit small image details via WebRTC with a low bandwidth of the communication channel, it is not enough to increase the camera resolution.

This is because the WebRTC algorithm will compress the video anyway, automatically adjusting it to the bandwidth of the communication channel to transmit it without delay. Video transmission without delay is the main purpose of WebRTC, where "RT" means "Real-Time."

The only way out is to scale the image before it is transmitted over the Internet. Let's look at how you can do this.

Three Ways of Video Scaling

There are three main ways of zooming: optical magnification and software (digital), which is divided into a built-in camera and implemented by a separate program.

Optical magnification is performed using a system of lenses, the distance between which varies mechanically, using micromotors. It is most effective because the enlarged image is sampled using the full resolution of the optical matrix of the image sensor.

The disadvantage is the high price of such a camera, from $200 to $700, and often the lack of software support for common tablets, as well as the need to use USB-OTG, which does not allow charging the tablet with the camera connected at the same time.

Before considering the principle of digital magnification, let us recall that the main limitation of the resolution of video transmitted over the Internet is the bandwidth of the Internet communication channel. In addition, the real-time requirement in WebRTC prohibits the use of a buffer to compensate for sudden delays in the communication channel, i.e., the video must be transmitted in real-time.

Therefore, if, for example, the video sensor matrix has a size of 8 MP, and the bandwidth of the communication channel allows you to transmit video with a resolution of only 1 MP, then for video transmission, its size is reduced to 1 MP to prevent the occurrence of delays of unacceptable magnitude. In this case, small details of the image become indistinguishable.

Digital image scaling in such conditions may consist in cutting out the necessary 1 MP fragment from the full-scale video (which will have the maximum resolution for this matrix) before reducing the resolution from 8 MP to 1 MP, and only after that transmit it through the communication channel, see Fig.1.

Reducing the resolution is not required since the cut-out fragment has a size of 1 MP allowed for the communication channel.

The described procedure can be performed in two ways: by setting the camera parameter:

var constraints = { advanced: [{ zoom: 1 }] }

(see https://w3c.github.io/mediacapture-image/#zoom) - this if the first way, and using HTML5, with method

CanvasRenderingContext2D.drawImage()

from Canvas 2D API - this is the second way.

The first way implies that camera manufacturers for a tablet (or computer) implement scaling firmware (we will call it native) directly in the camera processor and provide the operating system with an interface for controlling the "zoom" parameter.

Unfortunately, not all tablets perform scaling in the same way: some enlarge the image while increasing the camera resolution, others do scaling without changing the resolution, and others do not support this parameter at all. In the second case, scaling can increase the number of pixels by interpolation without changing the amount of information in the video. At the same time, the image increases, but its clarity decreases; it becomes more blurry.

The second way is implemented not in the camera but in the software that is executed in the tablet processor. Such software can be written, for example, in Java, Javascript, or Objective-C.

The scaling algorithm consists of two stages: increasing the camera resolution by a specified number of times and then cutting out the desired video fragment to keep the same image size in pixels.

At the same time, the value of the video stream remains the same, and it can be transmitted without delay through a communication channel with the same bandwidth as before scaling.

Digital Scaling

Native digital scaling is not supported by all tablets, and if it is supported by a tablet, it may not be supported by a browser. For example, native scaling in the Samsung Galaxy Tab S7+ tablet is supported in the Google Chrome browser but is not supported in Microsoft Edge.

Unfortunately, when using the software method, you have to operate with a large image in the tablet's memory, and this sometimes (quite rarely) leads to the well-known error "AW, Snap!" in the Google Chrome browser if the maximum resolution is selected. A similar error does not appear in the Microsoft Edge browser; at least, we were not able to detect it experimentally.

We have implemented both scaling methods, and Figure 2 shows the result. Text that is completely illegible before scaling becomes readable after it.

To reduce the load on the processor and reduce the likelihood of an error "AW, Snap!", we made it possible to reduce the frame rate of the video after the increase.

This does not significantly reduce the convenience of using WebRTC in video chats since the described method of magnification does not affect the pace of speech and magnification is usually used to examine stationary objects (a blackboard, a sheet on a table, equipment parts) and is rarely used for dynamic scenes.

At the same time, the frame rate can be increased if the user has a gadget with a large RAM and a powerful processor.

Since the video magnification factor depends on the resolution of the camera, and the rear camera of most tablets has a resolution almost 2 times greater than the front, we made it possible to switch cameras quickly. To get the maximum magnification, you can use the rear camera.

Another problem associated with programmatic magnification is automatic focus guidance.

The fact is that with the software method of scaling, the focus criterion uses the entire field of the matrix, while only part of it increases. Therefore, it may turn out that the focus is not set to the part of the image that we have enlarged, and it turns out to be blurry.

In Fig. 2, as well as in this video, you can see the result of such an algorithm when using an inexpensive Huawei Mediapad T5 tablet and using a rear camera. You can also view the operation of the algorithm on a real WebRTC application at this link.