Learning WebRTC in Practice: Best Tools and Demos

Written by vbeskrovnov | Published 2024/03/18
Tech Story Tags: webrtc | videoconferencing | webrtc-guide | how-to-set-up-webrtc | webrtc-basics | how-webrtc-works | webrtc-internals | webrtc-logs

TLDRThe post serves as a practical guide to WebRTC, offering a compilation of tools, demos, and open-source projects but no tutorials. It covers WebRTC's main components like Media Capture API, and Connection Establishment API, explaining how WebRTC enables real-time communication in browsers. Tools for analyzing WebRTC in Chrome and various demo applications are highlighted to assist in understanding WebRTC's functionality in practice. Additionally, it introduces open-source applications like Jitsi Meet, Janus, and Mediasoup, aiming to aid developers in grasping WebRTC's application in large products.via the TL;DR App

Who is this article for?

In this article (or digest), I will share key tools, demonstration applications, and open projects essential for a practical understanding of WebRTC. There will be no tutorials or detailed explanations about any part of WebRTC, but rather a digest of resources that will help you better understand the topic. If you have been working with this technology for some time, you'll unlikely find something new here.

WebRTC Basics: What You Need to Know

If you are familiar with how video calls work in browsers, feel free to skip this section (and possibly the entire material).

Most popular video call platforms (except for Zoom) use WebRTC technology to implement their products in browsers. It has become the de facto standard for creating real-time communication services and is supported by all modern browsers.

WebRTC consists of several main components:

  • Media Capture API: It allows for the capture of audio and video streams from a device's camera and microphone.
  • Connection Establishment API: This includes protocols and methods for establishing direct connections between browsers for the real-time transmission of media and data. This involves the exchange of network information, choosing the best route for data, and connection management.
  • Codecs and Media Processing: WebRTC supports various audio and video codecs for encoding and decoding media streams. There are also tools available for media processing, including echo cancellation, noise suppression, and automatic gain control.
  • Security: WebRTC uses encryption for all transmitted data and media streams to ensure confidentiality and security over the network.

How WebRTC Works:

  • Signalling: To start the exchange of media data between peers, a signalling process is necessary. This process is not directly included in the WebRTC specification, leaving developers free to choose their signalling method (via WebSocket, REST API, etc.). Signalling is used to exchange metadata such as media types, codec parameters, network addresses, and ports needed to establish a connection.
  • Connection Establishment: After exchanging initial information, ICE (Interactive Connectivity Establishment) mechanisms are applied using STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relays around NAT) servers to bypass NAT (Network Address Translation) and establish a direct connection between peers.
  • Data Transfer: Once the connection is established, the transfer of streaming data between peers begins through a secured connection, using RTP (Real-time Transport Protocol) for media and SCTP (Stream Control Transmission Protocol) for data.
  • Adaptation and Processing: During transmission, WebRTC can adapt to changing network conditions by adjusting the stream quality to optimize performance. Media processing also occurs, including noise filtering, echo cancellation, and other functions to improve communication quality.

Next, in three sections, I will talk about the tools/services and demos that helped me learn this technology in a practical key. Then, I'll share open-source products that will allow you to understand in detail and in practice how WebRTC works in large products.

Tools

Chrome Tools

The Chrome browser offers several powerful tools for analyzing WebRTC:

  • chrome://webrtc‑internals
  • chrome://webrtc-logs

Webrtc-internals

Let's start with chrome://webrtc-internals. This is probably the most basic tool that people get acquainted with at the very beginning of their journey when working with WebRTC. It provides a wealth of information about current connections, which helps with debugging and reverse engineering of other products.

Here, you can find out which codecs are used by existing video call services. For example, by joining a call on Google Meet and opening chrome://webrtc-internals, you can see that Google Meet uses the AV1 video codec. Or, for example, you can see the number of frames sent per second and their resolution. The main limitation is that data is available only during the call (or if recording is enabled in advance), which means it's impossible to use the tool for unexpected problems.

More information can be found here: https://getstream.io/blog/debugging-webrtc-calls/

Webrtc-logs

This section is used much less frequently, as it contains more detailed information and is necessary for a more thorough analysis of complex issues. Unlike chrome://webrtc-internals, it contains logs in text format about previous sessions. Thus, you can review logs for an already completed call, something chrome://webrtc-internals cannot do.

Inside, you can find information about the start and end of video calls, media stream parameters, codecs, video resolution, frame rate, and other media stream quality metrics.

Some information from these logs can be used to build metrics similar to those in chrome://webrtc-internals. Unfortunately, Chrome does not provide such functionality, however, there is an open-source product that helps solve this task: https://fippo.github.io/dump-webrtc-event-log/
This tool is ideal if there was an issue during one of the real calls, and when real-time analysis is not available. By default, these logs are saved in the directory …/Google/Chrome/Default/WebRTC Logs/<log_name>.gz, which allows for collecting this data from non-technical users who are willing to help with the problem analysis.

TestRTC

Link

analyzeRTC from the testRTC service is somewhat in beta testing. The entire platform is paid, but you can analyze your dump absolutely for free. In general, you won't find anything new here after chrome://webrtc-internals, as their data source is the same. However, the presentation of this data in analyzeRTC is much better.

For example, information about the video stream is presented especially conveniently:

Here, you can also view the overall call rating and audio quality:

Score is an overall rating of the scenario from 0 to 10, considering both audio and video across all channels. It helps to evaluate the service performance and requires subjective interpretation based on experience with the application. MOS (Mean Opinion Score) assesses only the audio channels of the scenario, representing the subjective quality of sound on a scale from 1 to 5, where above 3 is considered good quality, from 2 to 3 is satisfactory, and below 2 is poor.

A short video tutorial can be found here

Demo Applications

Official Demos

Link

Next are the official examples of using WebRTC. In this case, it is not a tool as I described above, but a set of working examples. These examples cover a wide range of functionalities, from basic peer connection establishment to more complex scenarios such as data exchange and media stream management.

Each link contains a working demo and a link to the source code on GitHub. Using just these examples, one can already understand how to develop a rather complex web application with video call functionality. Overall, these examples help to better understand each aspect of WebRTC in practice, to reinforce theoretical knowledge.

SDP Explanation

Link

Next, let's talk about SDP - the protocol used for establishing a connection before starting a call. The format is described in RFC 8866, and it's intended to negotiate media stream parameters before a call, such as codec formats, video resolution, and types of media (audio, video, data) needed to establish a connection and exchange media data.

The protocol is designed to be human-readable on one hand, and on the other, to occupy as little memory as possible. As a result, it's quite difficult to understand what is contained in a specific message at first glance. This tool is exactly for solving this issue.

In interactive mode, you can read about each line from the SDP protocol, saving time on searching for explanations in the RFC.

Media test

Link

In addition to establishing connections and transmitting media streams, an important component of any video calling application is the processing of these media streams. This tool helps to understand how video can be processed.

Here, you can experiment with various aspects of video streams, such as creating a stream from scratch or from a camera, selecting video resolution and frame rate. You can also apply transformations to frames, for example, changing colors, converting to black and white, encoding/decoding in H.264, and adding overlays for tracking display time in a <video> element.

After experiments, you can learn the processing results, including processing time at different stages (e.g., conversion to RGBX, adding backgrounds and overlays), frame display time, total processing time from start to finish, and queuing time. Each metric includes the number of processed frames, average, median, minimum, and maximum processing times. These data help assess the performance and efficiency of video processing in the application and understand the cost of each operation.

Slides from the authors.

Simulcast Playgrounds

The first

Link

Another demo, this time unofficial, but still open source. Here, we focus on Simulcast. It's an approach that allows conference participants with different network bandwidths to receive video from you in different resolutions (the higher the bandwidth, the better the resolution).

To avoid using server resources, each participant sends several video streams in different resolutions. This algorithm is simulated in this example. It allows you to configure video transmission parameters, including the selection of resolution, frame rate, codecs, and specific settings for Simulcast, such as adding and removing layers, changing priorities, and configuring encoder settings.

If you've never heard about Simulcast, it might be better to spend some time on the theory before diving into this demo.

The Second

Link

Another variation of the simulcast demo, not very flexible, but much more representative. You can see the standard configuration with three parallel streams and metrics from chrome://webrtc-internals. It looks more familiar and doesn't distract with additional settings. An excellent addition to the tool described above.

Pagination Demo App

Link

A wonderful tool that helps understand the display of incoming video streams and manage the resources they use. Unfortunately, the expanded demo was not working at the time the article was written, but it wouldn't have been of much help anyway, as the main value of this product is its source code.

There, you can figure out how to properly manage incoming video/audio streams from other call participants. This is especially important with a large number of participants, when not all can fit on the screen, and it becomes necessary to add other pages. There's no point in receiving data streams from participants on other pages, and it must be handled correctly.

The most interesting component is ‘components/PaginatedGrid.js’, which contains almost all the logic for managing rendering and stream management. A description of the source code and the application as a whole can be found in the official documentation.

Opensource Applications

Jitsi Meet

GitHub repository

One of the most popular open-source products for developing browser-based video calling. In fact, Jitsi includes not just a web client, but all the other necessary parts for building the entire product. It's a fantastic resource for diving deep into the RTC topic, but it will take a significant amount of time, as Jitsi is very flexible and quite an old product with a huge codebase.

This tool generally has a relatively high entry threshold due to its size and age. Documentation is available and quite detailed, but again, very extensive: https://jitsi.github.io/handbook/docs/category/developer-guide

Janus

GitHub repository

Janus Gateway is an open-source WebRTC server that addresses a similar issue. However, unlike Jitsi Meet, Janus focuses on providing a low-level API for managing media streams, making it more flexible but potentially more complex for beginners. This product is also more suitable for those interested in the server side rather than the client side, which Jitsi Meet would better assist with.

Janus comes with very detailed documentation and a whole bunch of separate demos (which, by the way, were not included in the previous section, but I would still recommend checking them out).

Mediasoup

GitHub repository

MediaSoup is another WebRTC server. Unlike Jitsi Meet and Janus Gateway, it addresses the same issues as Janus Gateway, with the difference that MediaSoup has both server and client components.

Other products

In fact, there are a whole bunch of open-source products for building video calls; I've mentioned those that I consider the most useful and convenient for learning about the technologies and approaches used in WebRTC.

You can learn about other products via the link.

Instead of Conclusion

There were no discussions or innovations in the article, which means no conclusions can be made. I merely described all the tools I used, and I'm sure this is far from an exhaustive list. I would be grateful for your recommendations of tools and services that I missed in the comments!


Written by vbeskrovnov | Software Engineer enjoying working with any platform
Published by HackerNoon on 2024/03/18