An Introduction to WebRTC App Development

In simple terms, WebRTC is the technology that provides peer-to-peer communication between web browsers and mobile applications. It is known as Web Real-Time Communication, which refers to open-source projects and allows transmission of audio, video, and data.

WebRTC developers mention that it is simple yet complex technology. The essence of simplicity comes down to the ease of implementation. It’s possible to use five to ten lines of code to organize peer-to-peer video communication between two browsers. At the same time, there is always a “but” when we work with technologies.

In the case of WebRTC, the main challenge is the backend: developers must ensure that the solution works in different networks. To find out the other reasons for the technology’s complexity and how to overcome them, read our article, which also describes the future of WebRTC, business cases, and their relationship with app development.

What is WebRTC?

An open-source project released by Google in 2011, WebRTC provides API-based communication between web browsers and mobile applications, including transmissions of audio, video, and data. Eliminating the need for native plugins and app installations makes these connections user-friendly and supported by all the major browsers and mobile operating systems.

The adoption of WebRTC in the tech community has grown dramatically in the past few years. Facebook, Amazon, and Google are among the significant technology companies that implemented WebRTC to make their web applications faster, reliable, and more secure.

WebRTC features are also provided in off-the-shelf solutions that can be easily integrated with other software. A good example is OpenTok, a PaaS for live communications, courtesy of our business partners at former TokBox (now Vonage). We successfully used it in many solutions for our clients, including an advanced authentication service based on biometric techniques.

As was already mentioned, the key characteristic of WebRTC is that it is simple yet complex technology. The essence of simplicity comes down to the ease of implementation. It’s enough to use five to ten lines of code to organize peer-to-peer video communication between two browsers. The complexity of the technology is related to the specificity of WebRTC, which must be adapted to different browsers, and to the fact that it is hard to configure if it doesn’t work correctly. Also, to obtain the desired result, you should be aware of STUN, TURN, and NAT.

STUN is a standardized set of methods, including a network protocol, for traversal of network address translator (NAT) gateways in applications of real-time voice, video, messaging, and other interactive communications. Why do we need it?

STUN is mandatory when we need to connect two browsers that do not have external IP addresses. Both connect to servers and find out their IP. Browsers exchange these ports through which they relate to each other.

TURN does almost the same thing. It sends traffic through itself. This traffic isn’t being modified or changed in any way. Such an approach allows us to connect two points while working over TCP (more reliable but slower protocol than UDP). It is noteworthy that about 15% of calls cannot be made without TURN.

Now, that you know what WebRTC is, let’s plunge into history to understand when and how the technology appeared, and in which cases it can be used. Also, we’ll overview the pros and cons of the technology, examples of WebRTC solutions, and high-demanded WebRTC apps. By default, these applications are based on peer-to-peer communication. If we need to organize group calls and live streaming, it’s mandatory to use a server that operates as a protocol client.

How does WebRTC work?

The primary focus of WebRTC is to provide real-time audio and video communication between participants, who use web browsers to start conversations, locating each other, and bypassing firewalls.

WebRTC utilizes JavaScript APIs and HTML5, being embedded within a browser. The typical features of a WebRTC application are as follows:

Send and receive streaming audio and video.
Retrieve network configuration data, e.g., IP addresses, application ports, firewalls, and NATs (network address translators), which are needed to send and receive data to another client using the WebRTC API
Open/close connections and report errors.
Transmit media data, e.g., image resolution and video codecs

To send and receive streams of data, WebRTC provides the following APIs that can be used in web applications:

RTCPeerConnection for audio and video transmissions, encryption, and bandwidth configuration
RTCDataChannel for transmission of generic data
MediaStream for access to multimedia data streams from such devices as digital cameras, webcams, microphones, or shared desktops

A set of standards for the use of WebRTC in software is currently being developed by the Internet Engineering Task Force and the Web Real-Time Communications Working Group.

WebRTC under the hood

WebRTC is primarily just a way to send and receive UDP packages inside browsers. Also, WebRTC knows about the transfer of media – both audio and video, and it can connect two clients directly – peer-to-peer. Developers admit that under the hood, WebRTC is a fairly simple thing: open the UDP port, know the partner’s IP port, wrap the traffic in RTP.

Let’s talk about what happens between the capture from the camera and the video playback on the screen. This process consists of 7 basic steps:

1. Capture of camera

The browser has an API that allows us to ask users for access to the camera or microphone – navigator.getUserMedia => MediaStream. The main difficulty is that we can’t immediately send media streams to the interlocutor because they weigh a lot without compression. For example, one image 640×480 in format BMP weights 1.2 Mb. The number of such pictures per second is 30. It means that one-second of the video weighs 36 Mb. Therefore, the bit rate will be 288 Mbps. Data must be compressed for transfer. So the next step – coding – is mandatory.

2. Coding

In simple terms, codecs allow the compression of audio and video streams. There is a broad set of such codecs, and part of them are available in WebRTC. Let’s take VP9 as an example. This codec is being used for coding images in WebRTC. It can transmit images with resolution 1280×720, compressing them so that 30 frames weigh 1.5 Mbps. How can VP9 do this?

Instead of constantly sending information about the images, VP9 differentiates between the two images. We get the mainframe at the output, while other interframes represent differences from the mainframe. More actions in the frame mean more image weight.

On the basis stage, the keyframe with information about all pixels is determined, and interframes represent the differentiation in comparison with previous states. If we lose in the chain of interframes at least once, we cannot draw other interframes.

3. Packing in RTP

Data is packed in RTP – Real-time Transport Protocol, which contains information about the order of the packages. It’s a mandatory step because packages can come in a different order or even be lost. We need the number of packages to reproduce them in the correct order. Also, RTP stores information about the time that allows synchronization of audio and video tracks. Additional details of RTP have a small overhead of about 5%.

There is an extension of the primary protocol named RTCP. It serves to exchange information about lost packages and statistics of their receiving.

4. Network transmission over UDP

Data is being sent as a formed UDP package. If we compare UDP and TCP, the main advantage will be a minimal interval between packages. UDP has a few disadvantages: packages are being lost, arrive late, and end up in the wrong order.

5. Unpacking RTP

The order of packages is restored at this stage. The video traffic is received and transmitted to the decoder.

6. Decoding

Data is being sent in the correct order, and at the output, we get a pure video stream – MediaStream.

7. Drawing on the screen

We attach the stream to the video element and get the image.

During peer-to-peer communication between two browsers, sometimes you will notice that the video is covered with squares or freezes. The reason is the loss of the packages caused by different problems:

Random loss or Lossy network (in simple words, part of packages are left in the house walls).
Packages can be dropped by mistake (bugs in the OS or network equipment).
Network congestion.

To achieve stable video communication, we need to bypass package loss. Four main solutions help to implement it:

Jitter buffer. We render one RTT later. We may request the missing package. In the case of a massive loss, the frieze is shorter because there is more time to request a keyframe. The main minus of such an approach is the additional constant delay.
Decrease the bitrate. Bitrate = FPS * quality * resolution. We can manipulate bitrate by changing any of these parameters.
Forward Error Correction. The codec duplicates some data. When the data is sent to the client, there are certain duplicates. These can exacerbate network congestion, but we have a higher chance of delivering content the first time.
Network tuning. The best network routes (we can design networks to make the routes optimal, and the media server is selected according to the principle of the minor ping amount). And setting up servers and routers.

Pros and cons of WebRTC technology

The main advantages of WebRTC are:

There are implementations for all platforms.
Using modern audio and video codecs promotes high-quality communication.
Secure and encrypted DTLS and SRTP connections.
There is a built-in mechanism of content grabbing (desktop sharing).
P2P = End-to-end encryption.
Browsers agree directly.
The flexibility of implementation of management interface based on HTML5 and JavaScript.
Open-source.
Versatility: a standard-based application works well on any OS as long as the browser supports WebRTC.

The conditional disadvantage of WebRTC is the high price of its maintenance, which is connected to the need for powerful servers.

Business use cases and examples of WebRTC

As was already mentioned in the article, the basis for Web Real-Time Communication is video chat. Services with audio and video calls, data sharing are the primary types of applications involving WebRTC technologies, the most famous examples being WhatsApp, Google Hangouts, and Facebook Messenger. But if we piece all business cases and examples of WebRTC together, we can find out that there are many areas of use.

The technology is highly demanded in telehealth, surveillance and remote monitoring, online education, Internet of Things, virtual reality gaming, streaming, online games with voice communications, betting, emergency response, etc.

MobiDev has repeatedly faced the need to apply WebRTC in different niches. One of the most notable use cases is remote assistance via shared AR and WebRTC. The two-way connection is organized here thanks to WebRTC. It is being used for peer-to-peer communication and helps to avoid server overload. The essence of the case itself boils down to the fact that two-way communication in real-time with AR helps to solve tasks with assistance in many areas.

The simplest example is the repair and maintenance of any equipment. In this case, WebRTC app development is combined with our experience working with Augmented Reality.

The Future of WebRTC: Trends and Predictions

According to Market Study Report, the global WebRTC market’s size is predicted to reach $16,570.5 million in 2026. Let us recall that in 2016 the worldwide market value of products using WebRTC was $10.7 billion. The turning point for WebRTC came in 2017 when Microsoft Edge and iOS Safari 11 began supporting it.

In terms of global coverage, the WebRTC market spans North America, Europe, Asia, the Middle East, South America, and Africa. It is expected to remain the dominant region, owing to easy access to high-speed internet and the massive number of mobile device owners.

Nowadays, Google puts great efforts into the development of Web Real-Time Communication. Therefore, the future of WebRTC can be cloudless. It is easy to verify this by evaluating Google’s investments in the technology. All of them are directed to the code optimization and expansion or improvement of the feature set.

The main trends related to WebRTC in 2021-2022 are:

WebRTC, which is known as a W3C standard, will develop rapidly.
The meeting sizes provided by WebRTC will grow, and that influences the complexity of solutions. Notably, 1000 users in the meeting is a real challenge that needs a new architecture.
Additional tools like background blur and noise suppression were already developed and will be improved in the future, and these tools are connected to the implementation of WebRTC in Chrome. The Pandemic triggered their boom.
A great deal of activity connected to user privacy and application security will be done.
Codecs VP9 and AV1 will be modernized.

The future of WebRTC is associated with the emergence of technology in new markets. Furthermore, as long as WebRTC is a W3C standard, anybody can influence its development, which implies great prospects.

If you are interested in WebRTC app development not in the future, but right now, be sure that MobiDev will implement any of your projects.

The author - Yuriy Luchaninov, JavaScript Group Leader at MobiDev