News

Mozilla introduces LPCNet: A DSP and deep learning-powered speech synthesizer for lower-power devices

2 min read

Yesterday, Mozilla’s Emerging Technologies group introduced a new project called LPCNet, which is a WaveRNN variant. LPCNet aims to improve the efficiency of speech synthesis by combining deep learning and digital signal processing (DSP) techniques. It can be used for Text-to-Speech (TTS), speech compression, time stretching, noise suppression, codec post-filtering, and packet loss concealment.

Why is LPCNet introduced?

Many recent neural speech synthesis algorithms have made it possible to synthesize high-quality speech and code high-quality speech at very low bitrate. These algorithms, which are often based on algorithms like WaveNet, give promising results in real-time with a high-end GPU. But LPCNet aims to perform speech synthesis on end-user devices like mobile phones, which generally do not have powerful GPUs and have a very limited battery capacity.

We do have some low complexity parametric synthesis models such as low bitrate vocoders, but their quality is a concern. Generally, they are efficient at modeling the spectral envelope of the speech using linear prediction, but no such simple model exists for the excitation. LPCNet aims to show that the efficiency of speaker-independent speech synthesis can be improved by combining newer neural synthesis techniques with linear prediction.

What mechanisms does LPCNet use?

In addition to linear prediction, it includes the following tricks:

  • Pre-emphasis/de-emphasis filters: These filters allow shaping the noise caused by the μ-law quantization. LPCNet is capable of shaping the μ-law quantization noise to be mostly inaudible.
  • Sparse matrices: LPCNet uses sparse matrices in the main RNN similar to WaveRNN. These block-sparse matrices consist of blocks with size 16×1 to make it easier to vectorize the products. Instead of forcing many non-zero blocks along the diagonal, as a minor improvement, all the weights on the diagonal of the matrices are kept.
  • Input embedding: Instead of feeding the inputs directly to the network, the developers have used an embedding matrix. Embedding is generally used in natural language processing, but using it for μ-law values makes it possible to learn non-linear functions of the input.

You can read more in detail about LPCNet on Mozilla’s official website.

Read Next

Mozilla v. FCC: Mozilla challenges FCC’s elimination of net neutrality protection rules

Mozilla shares why Firefox 63 supports Web Components

Mozilla shares how AV1, the new open source royalty-free video codec, works

Bhagyashree R

Share
Published by
Bhagyashree R
Tags: AI News

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago