Blog

...

ML-based Noise Cancelation in WebRTC

Background noises are everywhere and they are annoying. Quite often, important business or personal calls are hindered by the background noise in your surroundings. This issue needs to be addressed, and quickly since the remote working and hybrid work model is here to stay. 

Technologies such as WebRTC, cloud computing. etc, have facilitated the enterprises in this remote work model. Quality, speed, and ease of communication are more important today than ever before. Thus, communication vendors are investing in ML/AI in media processing. 

Applying heuristics to media processing algorithms has lost its appeal. Combining engines and heuristics will pave the way for the future. Technology advances have made this possible to a large extent. The ongoing pandemic around the world only created the urgency for such innovations. 

ML in media processing is challenging

Machine Learning is challenging in itself and so is media processing. These are two separate disciplines that need to come together to address this humongous issue. Here is the checklist that you need –

  • Finding machine learning engineers.
  • Engineers should have expertise in media processing.
  • Generating or getting access to a suitable data set to use.
  • Having access to enough data.
  • Deciding where to focus your efforts? Audio or video? Or network? 
  • Whether to go for server-side implementation or client-side one? 
  • Model optimization.

A lot of planning, management, and research goes into incorporating machine learning with media processing. A lot more than other features that you have lined up for your app.

***

Machine Learning is finding a home in communications in two main areas. Video background processing (an article on the same in the future) and Noise suppression. Both areas were always there to be worked on. But, they took center stage during the pandemic. 

People started working from offbeat places – coffee shops, parks, their living rooms, or ‘Workations’. 

Noise suppression using machine learning is to ensure everyone on the call will not hear the lawnmower buzzing in your neighborhood.

This need led to a few quick gains in the domain of WebRTC. Sharing three stories :

Google Meet

Google Meet has built its noise suppression technology.

Here is Serge Lachapelle, G Suite Director of Product Management, giving a quick interview for VentureBeat on Google Meet’s noise suppression.

Watch the short interview here.

Google has implemented this technology in the cloud and uses “secure” Tensorflow Processing Units for a specialized chip in the Google Cloud for machine learning. This feature is optional. It cancels arbitrary noises and will be fine-tuned and tweaked over time by Google. 

Google is contributing back to the community. They are differentiating by ensuring the implementation of their machine learning chops outside of the open-source WebRTC library. 

Discord!

Partnering with Krisp, Discord “bought” its way to noise suppression. Krisp is one of the very few vendors tackling machine learning in media processing. They are doing that as a product/service. They’ve been doing it for a couple of years now successfully.

The noise suppression feature first came in beta to Discord’s desktop application. Later, Discord got noise suppression into iOS and Android, using Krisp.

A short video explainer on the same.

Cisco!

Cisco acquired BabbleLabs to own noise suppression technology. It opted for the traditional approach of reducing risk by acquiring the technology. BabbleLabs is similar to Kris and offers machine learning-based algorithms to process voice. Cisco has integrated this technology into WebEx.

What is ahead?

More and more vendors will take note of this technology advancement and take measures to integrate noise suppression technology into their apps. This will happen either through self-development or through the licensing of a third party.

Everyone needs noise suppression now and it is just a beginning. Machine Learning is finding a place in communication in different ways. 3 main areas seeing growing investment. Voice treatment (noise suppression, cancelation), Video treatment (compression, super-resolution, etc), and Background blur/replacement.

Are you planning for ML or AI in WebRTC?

Machine learning and artificial intelligence are the future! Both in the communication space and elsewhere. It is also coming to media processing. In times to come, it will be a common requirement.

Are you planning for ML/AI? Are you confused about whether to rely on third parties or your in-house technology? If you need assistance with answering these questions, RTCWeb.in is here for you. Contact us now for all your WebRTC needs.