Breaking Down a WebRTC Powered Video Conferencing App
These times of social distancing have made video confessing and virtual meetups a new normal. As remote working capabilities are on the test, so are the real-time communication facilities.
Be it Zoom, Jitsi, Skype, Talkroom, or any browser-based interface that you are using, if it is facilitating real-time communication with end-to-end encryption, it is in all likelihood a WebRTC standard.
We, at RTCWeb.in have developed open-source peer-to-peer video conferencing applications over the years. In case you are thinking of having a video conferencing application for your enterprise, here is a brief into what goes under the hood.
This is to give you a basic understanding of involved technologies, right from the new APIs integrated into the browser to the servers and protocols that establish unique connections.
The new APIs into your browsers are the most important part of your WebRTC. APIs enabling access to the user’s camera and microphone were designed especially for video conferencing. That isn’t it. There are other APIs too for different purposes like handling of audio and video.
When experimenting with WebRTC, developers encounter – getUserMedia. This function is used to get a MediaStream of the user’s camera and microphone.
We can specify some constraints (as per your needs and requirements) when requesting the stream. For instance, requesting a higher resolution, allowing the user to choose a specific camera and microphone, getting a list of available devices, and more. The browser ensures that this feature is not exploited. To that end, users will always be asked for permission for access.
When the user accepts the request for the media stream, we can display the stream locally by attaching it to a video tag, or we can add it on any RTCPeerConnection to send it to others. The use depends on the nature and purpose of your product/services. This is the second big part of the WebRTC standard.
We can add the stream to the peer connection and receive the tracks from the other peer after the connection is established. Other complex scenarios such as screen sharing require the use of transceivers. Such complexities are handled with a WebRTC library.
Establishing a connection
Media stream, APIs, permissions are all good, but how is the connection between two users established?
To find each other, peers need to exchange some signaling messages. These messages contain the supported codecs, information about the network, and information about the media tracks that are to be sent and received.
There are three types of messages that can be exchanged over the signaling mechanism:
- Media data: Type of media that is to be shared (audio or video).
- Session control data: Opening and closing of the communication.
- Network data: The exchange of IP addresses and ports between users.
Using SDP (Session Description Protocol), the data in the signaling messages are encoded. We often use this well-established voice over IP telephony.
The standard does not define how the messages are transmitted, but this is usually done through a WebSocket. There are signaling servers created for WebRTC powered video conferencing. These servers use the abstraction of named rooms used by clients to find other peers for exchanging messages with them.
There are other methods too for exchanging messages between the users. In most of the methods, one user creates an offer and the other one responds to the offer. Post the exchanging of signaling messages, the browsers will know which codec and settings both clients support. The browser will then establish the connection using the transmitted network information.
Connection through firewalls and NATs
The easiest way to establish a connection is when both clients are in the same local network. This is preferred as the users will be sending all known IP addresses through the signaling channel. But of course, this won’t be the case always.
It is way too common for clients to try to connect through the internet. The outgoing data will generally use a different port than it was originally sent from. Moreover, incoming connections can not be connected without creating a manual port forwarding.
If a client (say Alpha) is directly connected to the internet) and the other users (say charlie) are connected through a router, a connection can still be created. Charlie sends a package to Alpha through the router (outgoing port gets changed). Alpha will now simply answer to the port she received the data from. Charlie’s router will receive the answer and forward it back to charlie as the router remembers sending data through that port before. Both clients can send and receive data now.
Now the big question. What if both clients are using a router? For this scenario, WebRTC uses the STUN (Session Traversal Utilities for NAT) protocol to establish the connection.
Both users send a UDP packet to a STUN server. As a response, the STUN server sends the IP address and port from which it received the package.
Let’s go to Alpha and Charlie again.
Alpha sends address information to Charlie via the signaling channel. Charlie uses the given IP address and port to establish the connection. It is important to use the same port as Alpha’s router will remember that it sent a UDP package using that public port although the address was different. The router will send the package to Alpha, to which Alpha can simply respond, and here we have an open connection. Easy! No?
We mentioned peer-to-peer connections a lot. You must be thinking, are all WebRTC applications peer to peer?
Peer connection is a concept given by the WebRTC standard. What we do with it is up to us.
A peer-to-peer connection can be used to create an actual peer-to-peer application or to connect an intermediary server. These servers can be used to connect multiple participants with limited bandwidth. On the other hand, peer-to-peer-based applications will connect all clients to all other peers, sending all the data to each user.
Peer to Peer conference –
A media server can limit the bandwidth requirements for users of a WebRTC application. All clients connect to the media server instead of creating direct connections. They only have to send the data once and the media server will distribute it to all peers.
Conference using the media server –
Some pointers –
- A peer-to-peer WebRTC application will always be end-to-end encrypted.
- On the other hand, a media server will have to decrypt the peer connection and therefore can access your data.
- Media servers also have infrastructure costs.
- In a peer-to-peer connection, the traffic will be exchanged between the peers directly. Infrastructure will only use limited resources.
- A media server will use a lot of traffic. Depending on the type of media server, it might also use a lot of CPU.
- Popular video conference apps with a media server: Jitsi Meet and Big Blue Button
- Examples of a peer-to-peer application:
To sum up
These were some of the main components that go under the hood. If you are planning to have a WebRTC application for your business, you must get familiar with these terminologies. With all our blogs and articles, we aim to explore different aspects of WebRTC and make our clients familiar with this technology.
Thanks for reading. Hope to see you again for the next article!