Scaling Your WebRTC Application in 2021

Everyone that builds web-based software products often asks ‘how do I scale it?’. It is better to address this question in the early stages of your software product development. The importance of having a plan in place to address scalability issues cannot be emphasized enough. But unfortunately, efforts for scalability are considered only when the application demands it after the go-live.  

Scalability is critical for Web-enabled Real-Time Communication, especially if a video is an integral component of your solution. Scaling WebRTC-based applications isn’t that hard. You only need the right pieces in the right place.

How to Scale WebRTC Applications

To get started, evaluate your application and ask yourself these questions:

  • How many users do I need to support in a session?
  • What is the average number of users expected in a session?
  • How many media files are expected to exchange within a session between users?
  • Are users connecting from residential or corporate networks?
  • Are there users connecting from high-security networks like hospitals?
  • What needs to be recorded, and with what guarantees?

These are basic questions that require some forethought and research. Having clarity with such questions will help you build technical architecture matching your needs. 

Once you have the answers to these, now you can choose between:

  • Mesh
  • Forwarding
  • Mixing
  • Hybrid
Mesh Architecture

Mesh completely depends on the client’s capabilities. Each client connects to every other client directly, managing (n-1) bidirectional connections, N being the total number of clients. In a session having 4 clients, each one has to manage 3 separate connections. Each connection is encrypted uniquely, and each user needs to encrypt and upload their media 2 separate times.

Most Internet connections are asymmetric (download bandwidth > upload bandwidth). Upload bandwidth is a limiting factor when using data-heavy video. Encrypting each media stream for every connection creates unease, especially on mobile.

Having said that, Mesh architecture needs no server infrastructure except for the requisite signaling and TURN server(s). Mesh is an excellent, low-cost option for applications hosting sessions with 2-3 clients. 

Forwarding Architecture

Forwarding relies on SFU (selective forwarding unit), which is an intelligent media relay in the middle of a session. 

All clients need to connect to the SFU once to send media, and then again for every other client. Thus, each client manages ‘n’ unidirectional connections, n being the total number of clients. In a session having 5 clients, each one has to manage 5 separate connections. Here, one connection is reserved for sending media, while all others are for receiving media.

Forwarding is far more scalable for your users. It boasts the asymmetric nature of most Internet connections by requiring each client to upload only once. 

A big advantage forwarding has over mesh is the employment of various scaling techniques. 

  • It acts as a proxy between the sender and receiver. 
  • It monitors the bandwidth capabilities of each leg and selectively applies frame rate and resolution to the packet stream. 
  • It requires additional server infrastructure but is highly efficient. An SFU doesn’t attempt to depacketize the stream.
Mixing Architecture

It is a multipoint control unit, or MCU, acting as a high-powered media mixer in the middle of a session. Every client connects to the MCU once and each client manages a single bidirectional connection, regardless of the number of clients in the session. The connection is used to send and receive media from the server.

Like forwarding, encrypting & uploading are happening here only once. This is the most efficient approach from the client’s perspective and the least effective from the server’s perspective. The burden of depacketizing, encoding, decoding, mixing, and packetizing is borne by the server entirely. 

Mixing takes advantage of scaling techniques. 

  • We can adapt quickly to changing conditions with the network routes to individual clients by applying temporal and spatial scaling to the output of the audio and/or video mixer(s). 
  • The quality is kept as high as possible and without affecting other clients in the session.

This is a great approach for applications having large numbers of active participants, like virtual classrooms, but a potentially expensive one in terms of server cost.

Hybrid Architecture

As the name implies, Hybrid architecture is a combination of mesh, forwarding, and/or mixing. In a hybrid, participants can join a session based on whatever makes the most sense for the session and user endpoint. 

To conclude
  • For simple two-party calls, a mesh setup is best and requires minimal server resources. 
  • For small group sessions, broadcasts, etc, forwarding meets the needs better. 
  • For larger group sessions or telephony integrations, mixing is a practical option.
Image by
Hardware Required to Scale Your WebRTC Application

No matter the architecture, you will always need-

  • A signaling server (for registration and presence)
  • A TURN server (for network traversal)


About is a one-stop solution for all your WebRTC needs. We have expertise in WebRTC product development. Scaling your WebRTC application with becomes easier than ever before. Contact us for free consultations now.