New AI-based audio codecs in WebRTC – Lyra, Satin

Google Lyra and Microsoft Satin are two new audio codes that bank on AI-based voice coding. Both are competing for inclusion in WebRTC. Microsoft announced the AI-powered voice codec Satin in February. Google reciprocated with its low-bitrate codec for speech compression Lyra a week later.

Now comes the questions. How are these codecs relevant, and to whom? What is their future? What do these codecs mean to WebRTC, to developers, and you?

Let’s discuss audio codecs in WebRTC

It is mandatory to install codecs in WebRTC. Like video codecs, there are audio/voice codecs like G.711 and Opus.

G.711 is a legacy codec dealing with narrowband audio. This one results in low-quality audio. G.711 is mostly reserved to connect it with the telephony networks. It isn’t recommended as a solution.

Opus is the main voice codec, offering a flexible solution. It can handle anything from narrowband to full-band stereo even at low bitrates. Opus is 10 years old and a lot has been said and written about it. Opus is a merger of two different codecs. One was SILK (for speech) and the other was CELT (for music).

How Opus works

  • Opus’s range is 6-510 kbps of compression.
  • For WebRTC, it would be 6-40kbps.
  • Opus runs the gamut of narrowband up to the full-band stereo.
  • Latency of 26.5ms, very little inherent delay of its own.
  • For speech, Opus uses a modified SILK.
  • For music, it uses CELT.
  • Opus can use both SILK and CELT if needed.
  • It also has a machine learning model for classifying the audio input as either speech or noise.

In 2021, virtual meetings are the new normal. Businesses rely highly on real-time communication. WebRTC usage is at an all-time high and here Opus has started to show its age and its limitations. Opus is a great codec, but it has served its purpose and it is time for better things.

The impact of AI in audio codec

Audio codecs are going to be a new generation of codecs. It is a migration from the old heuristics ways to a new world of machine learning and artificial intelligence.

ML/AI is where the future lies for algorithms. Particularly with the algorithms using rule engines or heuristics. Both rule engines or heuristics are abundantly found in real-time media processing eg. WebRTC.

AI and ML trends are seeping into WebRTC and real-time communication. The path is becoming somewhat clearer. We have discussed AI-assisted noise suppression and background replacement solutions in this blog.

AI in media compression is picking up with research around AI compression flourishing. There exists an AI-specific standards organization i.e. MPAI (Moving Picture, Audio, and Data Coding by Artificial Intelligence). It is still small, but significant for the future. There’s also Mozilla’s Common Voice. It is an open-source, high-quality, labeled multi-language dataset for training language-related models.

Where Microsoft Satin and Google Lyra are going?

Microsoft Satin

Microsoft Satin is an AI-powered audio codec. Here’s what it can do-

  • Super wideband speech at a bitrate of 6 kbps.
  • Full-band stereo music starting at a bitrate of 17 kbps.
  • Higher quality at higher bitrates.
  • Quality audio even under high packet loss.
  • Better redundancy algorithms protecting burst loss.

Satin was presented as a battle-tested codec. Microsoft is already using it in Microsoft Teams and Skype in 2-way calls with plans to extend it to group calls. Satin is a brand new codec aiming to replace Opus altogether.

Google Lyra

  • Very low-bitrate speech codec.
  • Processing latency of 90ms.
  • Made to operate at 3kbps.
  • Optimized for the 64-bit ARM android platform.

Lyra is meant for SPEECH, not for AUDIO. It is no replacement for Opus. Google believes that with AV1, Lyra can offer a decent video conferencing experience at modem bitrates of 56kbps. Lyra is rolled out to Google Duo for low bandwidth connections. And, that’s about it for the time being.
Lyra has been open-sourced by Google. One of the reasons is that many of the advancements of Google in AI around real-time communications weren’t open sourced.

Lyra and Satin will be fighting it out on being included in WebRTC. Lyra is superior to Satin at a very low bitrate of 3kbps. Satin was designed for 6kbps and above. However, it is no match to Satin in higher bitrates.

What is the audio future of WebRTC?

Lyra and Satin are not yet included in WebRTC. You can use them only where Google and Microsoft are permitting. Announcing these efforts was important for standardization and WebRTC. WebRTC needs backing in the audio front. For video, things are pretty much sorted.

There are some possibilities of how things can pan out. First, Satin becomes a 3rd optional codec in WebRTC. For that Microsoft will need Google’s approval/help. If not, Satin will be kept proprietary and niche. Second, Lyra makes no sense as a standalone codec. It will be challenging to add it to WebRTC as is. For Lyra, if Google pushes, there are some routes.

  1. Make Lyra handle wideband and full-band better, competing with Opus, and let things pan out.
  2. Incorporate it into Opus like SILK and CELT. Let Lyra join the party and leave it to Opus to decide which one to use.

We foresee the inclusion of more AI-powered optional audio codecs to find their way into WebRTC. The WebRTC space will see innovation and experiments. Whichever way things turn, we will keep you updated. Till then keep reading blogs here. For any WebRTC related help and development projects, contact us now.