Blog

...

A Guide to WebRTC Architecture by RTCWeb.in

WebRTC technology is used for peer to peer connection on the web, where a browser can connect to another browser and share a variety of data, like video, audio stream, JSON data, etc. This is a simple definition of WebRTC. However, when it comes to knowing the technology inside out, decision-makers do not find it very simple.

Many of our clients ask us to elaborate more on the technology front, which is always encouraged and entertained by RTCWeb.in. We maintain that you must understand the technology completely so you can implement it to the best of your advantage. In this blog, we are trying to explain the complete WebRTC architecture. The aim is to make decision-makers understand the mechanism behind the technology.

In WebRTC there are too many objects and classes. Many components make something like Google Meet, Skype, Hangout, etc. For instance, MediaStream, RTCPeerConnection, RTCDataChannel, Media Server, Server Side infrastructure, etc. There exist some hard words too like NATS, Stunt, Signaling Server, TURN. 

We’ll demystify all those components and pick the most important ones to explain the overall architecture of WebRTC.

While WebRTC has simplified real-time communication via the web, its backend consists of a series of standards, protocols, and JavaScript APIs! These SPIs and Javascripts enable peer-to-peer audio, video, and data sharing between browsers.

WebRTC architecture includes different standards, catering to the application and browser APIs. Although the primary purpose of WebRTC technology is to enable real-time communication, it is also designed such that it can be integrated with existing communication systems: VOIP, SIP clients, and STN, just to name a few.

The Architecture

***

First up, understanding the server-side infrastructure for WebRTC

WebRTC is peer-to-peer communication technology and the majority of technology development is focused on the client device. Regardless, it is important to have a clear understanding of the server-side infrastructure for WebRTC. Every WebRTC application has an infrastructure for the exchange of signaling messages and media handling as well.  

There exist three WebRTC architectures – Peer-to-Peer, Multipoint Conferencing Units, and Selective Forwarding Units. Each architecture has its strengths and weaknesses and fits well for different use cases. Here is an article discussing Peer-to-Peer, MCU, and SFU in detail.

This was in regards to the server-side infrastructure of WebRTC, but they are more competent in WebRTC technology that goes beyond the server and includes devices, media, signaling, etc.

***

MediaStream

For transmitting video from one browser to another, first, we need to catch the webcam (media devices) using MediaStream API. MediaStream API gives us access to the microphone or webcam stream. Even if we don’t want to transmit data, we can keep a webcam view inside our web page. All of this can be written using Javascript. 

And of course, user permission is required for accessing media. Whenever this method is called,

the browser popups some alerts or signals via light saying that the application is trying to access the camera. If the user accepts, our application gets the media stream and attach it to our video element on our page.

***

RTCPeerConnection

After successfully grabbing the stream or any other data for that matter, we can now send it to any other browser. We’d be using RTCPeerConnection for that. If we are to establish a connection between two browsers, Signaling Server comes to play.

Here is how the conversation between the two browsers (let’s call them Alpha and Charlie) who want to share stream) and signaling server take place. 

Alpha to Signaling Server:

Hey, I would like to connect to someone I know, give me my unique Identifier? 

Signaling Server to Alpha:

Sure, here is your Unique Identifier.

Alpha to Signaling Server:

Thank you Signaling Server, for putting this unique identifier to my “local descriptor”.

Alpha to Charlie:

Hello Charlie here is initiating a WebRTC connection with you, sending my unique Identifier 

(The application will send a unique Identifier to browser Charley, we can do this however we want, the important part is that charlie gets this data)

Charlie to Alpha

Hi, I heard you want to stream some data with me. I also got your unique Identifier. Putting the same in my remote descriptor, so I know who am I talking to.

Charlie to Signaling Server:

Hi, I would like to accept an offer from Alpha browser. Create a unique Identifier for me too. 

Signaling Server to Charlie:

Sure there is one.

Charlie to Signaling Server:

Thanks, putting it in my local descriptor 

Charlie to Alpha

Hey, catch my unique identifier (the identifier is sent the same way it was received).

Alpha to Charlie:

Great, got it, putting it in my remote descriptor, now we can share our streams.

So, here we have 2 RTCPeerConnection instances, sharing their unique identifiers to establish a connection. The same communication between browsers and signaling servers takes place in peer-to-peer connection, in Multipoint Conferencing Units, and Selective Forwarding Units. This conversation happens extremely fast and a connection is established in no time. 

To sum up… 

WebRTC may look overwhelming, to begin with. However, it’s as easy as starting a conversation with two browsers, having a unique identifier. All this was to help you gain an understanding of the technology. For practical implementation and bringing the application to life, you have experts from RTCWeb.in ready to serve you. Contact us to have a WebRTC application built