As speech synthesis technology has advanced a lot in recent years and with neural networks from DeepMind creating realistic, human-like voices, Google is working in the same direction to advance state-of-the-art research on fake audio detection. Google Maps and Google Home use Google’s speech synthesis, or text-to-speech (TTS) technology. The Google News Initiative (GNI) announced last year that it wanted to tackle “deep fakes” and other systems that try to bypass voice authentication systems.
Yesterday, Google AI and Google News Initiative (GNI) partnered together for creating a body of synthetic speech containing thousands of phrases spoken by its deep learning text-to-speech (TTS) models. It contains 68 synthetic voices from a large variety of regional accents from English newspaper articles.
Malicious actors can synthesize speech in order to fool voice authentication systems, or they can even create forged audio recordings to defame public figures. Deep fakes, audio or video clips generated by deep learning models can be exploited for manipulating trust in media. It then becomes difficult to distinguish real from tampered content. And the bad actors can also claim that authentic data is fake. Because of this issue, there was a need for synthetic speech database. This effort is also in the direction of Google’s AI Principles to ensure “strong safety practices to avoid unintended results that create risks of harm.”
Currently, this dataset is available for participants of the 2019 ASVspoof challenge for creating countermeasures against fake speech. The aim is to make the automatic speaker verification (ASV) systems more secure. ASVspoof participants can develop systems that learn to distinguish between the real and computer-generated speech by training models on both. The results for this challenge will be announced in September at the 2019 Interspeech conference in Graz, Austria.