2 min read

Baidu has released a new AI-powered tool called STACL, that performs simultaneous interpretation. A simultaneous interpreter performs translation concurrently with the speaker’s speech, with a delay of only a few seconds. However, Baidu has taken a step ahead by predicting and anticipating the words a speaker is about to say a few seconds in the future.

Current translation systems are generally prone to latency such as “3-word delay” and their systems are overcomplicated and slow to train. Baidu’s STACL overcomes these limitations by predicting the verb to come, based on all the sentences it has seen in the past.

The system uses a simple “wait-k” model trained to generate the target sentence concurrently with the source sentence, but always k words behind, for any given k. STACL directly predicts target words, and seamlessly integrates anticipation and translation in a single model. STACL is also flexible in terms of the latency-quality trade-off, where the user can specify any arbitrary latency requirements (e.g., one-word delay or five-word delay). Presently, STACL works on text-to-text translation and speech-to-text translation.

The model is trained on newswire articles, where the same story appeared in multiple languages. In the paper, the researchers demonstrated its capabilities in translating from Chinese to English.

Baidu STACL Demo 1

Source: Baidu

They have also come up with a new metric of latency called “Averaged Lagging”, which addresses deficiencies in previous metrics.

The system is of course, far from perfect. For instance, at present, it can’t correct its mistakes or apologize for it. However,  it is adjustable in the sense that users will be able to make trade-offs between speed and accuracy. It can also be made more accurate by training it in a particular subject so that it understands the likely sentences that will appear in presentations related to that subject. The researchers are also planning to include speech-to-speech translation capabilities in STACL. To do this, they will need to integrate speech synthesis into the system while trying to make it sound natural.

According to Liang Huang, principal scientist of Baidu’s Silicon Valley AI Lab, “STACL will be demoed at a Baidu World conference on November 1st, where it will provide a live simultaneous translation of the speeches. Baidu has previously shown off a prototype consumer device that does sentence-by-sentence translation,” and Huang says “his team plans to integrate STACL into that gadget.”

Go through the research paper and video demos for extensive coverage.

Read Next

Baidu announces ClariNet, a neural network for text-to-speech synthesis.

Baidu Security Lab’s MesaLink, a cryptographic memory safe library alternative to OpenSSL.

Baidu releases EZDL – a platform that lets you build AI and machine learning models without any coding knowledge


Subscribe to the weekly Packt Hub newsletter. We'll send you the results of our AI Now Survey, featuring data and insights from across the tech landscape.