3 min read

Google announced a new and free Android app, called, Live Transcribe, earlier this week. Live Transcribe is aimed at making real-world conversations more accessible globally for deaf and Hard of Hearing (HoH) people. Live Transcribe, powered by Google Cloud, automatically captions conversations in real-time. It supports more than 70 languages and more than 80% of the world’s population.

How does Live Transcribe work?

Live Transcribe combines the results of extensive user experience (UX) research with sustainable connectivity to speech processing servers. To ensure that connectivity to these servers doesn’t cause excessive data usage, the team used cloud ASR (Automated Speech Recognition) for greater accuracy. Similarly, to reduce the network data consumption required by Live Transcribe, an on-device neural network-based speech detector was implemented.

 

The on-device neural network-based speech detector is built using Google’s dataset for audio event research, called AudioSet, announced last year. AudioSet is an image-like model that is capable of detecting speech, automatically managing network connections to the cloud ASR engine, and minimizing data usage over long periods of use.

Additionally, the Google team partnered with Gallaudet University to make Live Transcribe intuitive, with the help of user experience research collaborations. This, in turn, would ensure that the core user needs are satisfied while maximizing the app’s potential. Google considered different devices ranging from computers, tablets, smartphones, and small projectors, etc., to effectively display auditory information and captions. After rigorous analysis, Google decided to choose smartphones because of its ” sheer ubiquity” and enhanced capabilities.

Addressing transcription confidence level issue

Google mentions that while building Live Transcribe, they faced a challenge regarding displaying transcription confidence. The researchers explored if they needed to show word-level or phrase-level confidence, as it was traditionally considered to be helpful. Using previous UX research, they found out that a transcript is easiest to read when it is not layered and focuses on the better presentation of the text, thus supplementing it with other auditory signals apart from speech signals.

Another useful UX signal is the noise level of the current environment and to address this, researchers built an indicator that visualizes the volume of user speech relative to background noise. This helps provide users instant feedback on microphone performance, allowing them to adjust the placement of the phone.

What next?

To enhance the capabilities of this mobile-based automatic speech transcription service, researchers plan to include on-device recognition, speaker-separation, and speech enhancement.

“Our research with Gallaudet University shows that combining it with other auditory signals like speech detection and a loudness indicator makes a tangibly meaningful change in communication options for our users”, state the researchers. Google has currently rolled out the test version of Live Transcribe on Play Store, and it has been pre-installed on all Pixel 3 devices with the latest update.

Public reaction to the news has been largely positive, with people appreciating the newly released app:

For more information, check out the official Live Transcribe blog.

Read Next

Transformer-XL: A Google architecture with 80% longer dependency than RNNs

Google News Initiative partners with Google AI to help ‘deep fake’ audio detection research

Google Cloud Firestore, the serverless, NoSQL document database, is now generally available

Tech writer at the Packt Hub. Dreamer, book nerd, lover of scented candles, karaoke, and Gilmore Girls.