Home Data News Microsoft’s new neural text-to-speech service lets machines speak like people

Microsoft’s new neural text-to-speech service lets machines speak like people

September 28, 2018 - 8:02 am

3552

2 min read

Microsoft has come out with a production system that performs text-to-speech (TTS) synthesis using deep neural networks. This new production system makes it hard for you to distinguish the voice of computers from human voice recordings.

The Neural text-to-speech synthesis has significantly reduced the ‘listening fatigue’ when talking about interaction with AI systems. It enables the system with human-like, natural sounding voice, that makes the interaction with chatbots and virtual assistants more engaging. This neural-network powered text-to-speech system was demonstrated by the Microsoft team at the Microsoft Ignite conference in Orlando, Florida, this week.

Additionally, Neural text-to-speech converts digital texts such as e-books into audiobooks. It also enhances in-car navigation systems. Deep Neural networks are great at overcoming the limits of traditional text-to-speech systems. Neural networks are very accurate in matching the patterns of stress and intonation in spoken language, called prosody. They’re also quite effective in synthesizing the units of speech into a computer voice.

Neural TTS

Traditional text-to-speech systems generally break down the prosody into separate linguistic analysis and acoustic prediction steps that get governed by independent models. This usually results in muffled, buzzy voice synthesis. Whereas, neural networks perform prosody prediction and voice synthesis simultaneously. This results in a more fluid and natural-sounding voice.

Microsoft makes use of the computational power of Azure to offer real-time streaming. This makes it useful for situations such as interacting with a chatbot or virtual assistant. This TTS capability is served in the Azure Kubernetes Service to ensure high scalability and availability.

Only the preview of the text-to-speech service is available currently. The preview comes with two pre-built neural text-to-speech voices in English – Jessa, and Guy. Microsoft will be making more languages available soon. It will also be offering customization services in 49 languages for customers wanting to build branded voices optimized for their specific needs.

For more information, check out the official Microsoft Blog post.

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Microsoft’s new neural text-to-speech service lets machines speak like people

Read Next

MobilePro

datapro

Programming

Subscribe to our newsletter