Google announced a new and free Android app, called, Live Transcribe, earlier this week. Live Transcribe is aimed at making real-world conversations more accessible globally for deaf and Hard of Hearing (HoH) people. Live Transcribe, powered by Google Cloud, automatically captions conversations in real-time. It supports more than 70 languages and more than 80% of the world’s population.
How does Live Transcribe work?
Live Transcribe combines the results of extensive user experience (UX) research with sustainable connectivity to speech processing servers. To ensure that connectivity to these servers doesn’t cause excessive data usage, the team used cloud ASR (Automated Speech Recognition) for greater accuracy. Similarly, to reduce the network data consumption required by Live Transcribe, an on-device neural network-based speech detector was implemented.
The on-device neural network-based speech detector is built using Google’s dataset for audio event research, called AudioSet, announced last year. AudioSet is an image-like model that is capable of detecting speech, automatically managing network connections to the cloud ASR engine, and minimizing data usage over long periods of use.
Additionally, the Google team partnered with Gallaudet University to make Live Transcribe intuitive, with the help of user experience research collaborations. This, in turn, would ensure that the core user needs are satisfied while maximizing the app’s potential. Google considered different devices ranging from computers, tablets, smartphones, and small projectors, etc., to effectively display auditory information and captions. After rigorous analysis, Google decided to choose smartphones because of its ” sheer ubiquity” and enhanced capabilities.
Addressing transcription confidence level issue
Google mentions that while building Live Transcribe, they faced a challenge regarding displaying transcription confidence. The researchers explored if they needed to show word-level or phrase-level confidence, as it was traditionally considered to be helpful. Using previous UX research, they found out that a transcript is easiest to read when it is not layered and focuses on the better presentation of the text, thus supplementing it with other auditory signals apart from speech signals.
Another useful UX signal is the noise level of the current environment and to address this, researchers built an indicator that visualizes the volume of user speech relative to background noise. This helps provide users instant feedback on microphone performance, allowing them to adjust the placement of the phone.
To enhance the capabilities of this mobile-based automatic speech transcription service, researchers plan to include on-device recognition, speaker-separation, and speech enhancement.
“Our research with Gallaudet University shows that combining it with other auditory signals like speech detection and a loudness indicator makes a tangibly meaningful change in communication options for our users”, state the researchers. Google has currently rolled out the test version of Live Transcribe on Play Store, and it has been pre-installed on all Pixel 3 devices with the latest update.
Public reaction to the news has been largely positive, with people appreciating the newly released app:
I'm hard of hearing and I've just tested this…wow.
— Matt Williams (@MattWilliams84) February 4, 2019
Great work. Look forward to seeing u guys implement computer vision in android. So that The visually challenged community is able to do some image processing in apps like Whatsapp etc.
— Abhisar Waghmare (@iamAbhisarW) February 5, 2019
Thank you for continuing to improve your #accessibility
— 𝓢𝓮𝓪𝓷 𝓜. 𝓐𝓻𝓷𝓸𝓵𝓭 (@seanmarnold) February 4, 2019
For more information, check out the official Live Transcribe blog.