News

Facebook introduces a fully convolutional speech recognition approach and open sources wav2letter++ and flashlight

2 min read

Last week, Facebook AI Research (FAIR) speech team introduced the first fully convolutional speech recognition approach. Additionally, they have also open-sourced flashlight, a C++ library for machine learning and wav2letter++, a fast and simple system for developing end-to-end speech recognizers.

Fully convolutional speech recognition approach

The current state-of-the-art-speech recognition systems are built on RNNs for acoustic or language modeling. Facebook’s newly-introduced system provides an alternative approach based solely on convolutional neural networks. This system eliminates the feature extraction step altogether as it is trained end-to-end to predict characters from the raw waveform. It uses an external convolutional language model to decode words.

The following diagram depicts the architecture of this CNN-based speech recognition system:

Source: Facebook

  • Learnable frontend: This section of the system first contains a convolution of width 2 that emulates the pre-emphasis step followed by a complex convolution of width 25 ms. After calculating the squared absolute value, the low-pass filter and stride perform the decimation. The frontend finally applies a log-compression and a per-channel mean-variance normalization.
  • Acoustic model: It is a CNN with gated linear units (GLU), which is fed with the output of the learnable frontend. These acoustic models are trained to predict letters directly with the Auto Segmentation Criterion.
  • Language model: The convolutional language model (LM) contains 14 convolutional residual blocks and uses GLUs as the activation function. It is used to score candidate transcriptions in addition to the acoustic model in the beam search decoder.
  • Beam-search decoder: The beam-search decoder is used to generate word sequences given the output from our acoustic model.

Apart from this CNN-based approach, Facebook released the wav2letter++ and flashlight frameworks to complement this approach and enable reproducibility.

flashlight is a C++ standalone library for machine learning. It uses the ArrayFire tensor library and features just-in-time compilation with modern C++. It targets both CPU and GPU backends to provide maximum efficiency and scale.

The wav2letter++ toolkit is built on top of flashlight and written entirely in C++. It also uses ArrayFire as its primary library for tensor operations. ArrayFire is a highly optimized tensor library that can execute on multiple backends including a CUDA GPU and CPU backed. It supports multiple audio file formats such as wav and flac. And, also supports several feature types including the raw audio, a linearly scaled power spectrum, log-Mels (MFSC) and MFCCs.

To read more in detail, check out Facebook’s official announcement.

Read Next

Facebook halted its project ‘Common Ground’ after Joel Kaplan, VP, public policy, raised concerns over potential bias allegations

Facebook releases DeepFocus, an AI-powered rendering system to make virtual reality more real

The district of Columbia files a lawsuit against Facebook for the Cambridge Analytica scandal

Bhagyashree R

Share
Published by
Bhagyashree R
Tags: AI News

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago