Data

Baidu open sources ERNIE 2.0, a continual pre-training NLP model that outperforms BERT and XLNet on 16 NLP tasks

3 min read

Today Baidu released a continual natural language processing framework ERNIE 2.0. ERNIE stands for Enhanced Representation through kNowledge IntEgration. Baidu claims in its research paper that ERNIE 2.0 outperforms BERT and the recent XLNet in 16 NLP tasks in Chinese and English. Additionally, Baidu has open sourced ERNIE 2.0 model.

In March Baidu had announced the release of ERNIE 1.0, its pre-trained model based on PaddlePaddle, Baidu’s deep learning open platform. According to Baidu, ERNIE 1.0 outperformed BERT in all Chinese language understanding tasks.

Pre-training procedures of the models such as BERT, XLNet and ERNIE 1.0 are mainly based on a few simple tasks modeling co-occurrence of words or sentences, highlights the paper. For example, BERT constructed a bidirectional language model task and the next sentence prediction task to capture the co-occurrence information of words and sentences; XLNet constructed a permutation language model task to capture the co-occurrence information of words.

But besides co-occurring information, there are much richer lexical, syntactic and semantic information in training corpora. For example, named entities, such as person names, place names, and organization names, contain concept information; Sentence order and sentence proximity information can enable the models to learn structure-aware representations; Semantic similarity at the document level or discourse relations among sentences can enable the models to learn semantic-aware representations. So is it possible to further improve the performance if the model was trained to learn more kinds of tasks constantly?

Source: ERNIE 2.0 research paper

Based on this idea, Baidu has proposed a continual pre-training framework for language understanding in which pre-training tasks can be incrementally built and learned through multi-task learning in a continual way. According to Baidu, in this framework, different customized tasks can be incrementally introduced at any time and these tasks are trained through multi-task learning, which enables the encoding of lexical, syntactic and semantic information across tasks. And whenever a new task arrives, this framework can incrementally train the distributed representations without forgetting the previously trained parameters.

The Structure of Released ERNIE 2.0 Model

Source: ERNIE 2.0 research paper

ERNIE is a continual pre-training framework which provides a feasible scheme for developers to build their own NLP models. The fine-tuning source codes of ERNIE 2.0 and pre-trained English version models can be downloaded from the GitHub page.

The team at Baidu compared the performance of ERNIE 2.0 model with the existing  pre-training models on the English dataset GLUE and 9 popular Chinese datasets separately. The results show that ERNIE 2.0 model outperforms BERT and XLNet on 7 GLUE language understanding tasks and outperforms BERT on all of the 9 Chinese NLP tasks, such as DuReader Machine Reading Comprehension, Sentiment Analysis and Question Answering. 

Specifically, according to the experimental results on GLUE datasets, ERNIE 2.0 model almost comprehensively outperforms BERT and XLNET on English tasks, whether it is a base model or the large model. Furthermore, the research paper shows that ERNIE 2.0 large model achieves the best performance and creates new results on the Chinese NLP tasks.

Source: ERNIE 2.0 research paper

To know more about ERNIE 2.0, read the research paper and check out their official blog on Baidu’s website.

Read Next

DeepMind’s AI uses reinforcement learning to defeat humans in multiplayer games

CMU and Google researchers present XLNet: new pre-training method for language modeling that outperforms BERT on 20 tasks

Transformer-XL: A Google architecture with 80% longer dependency than RNNs

 

Fatema Patrawala

Being a Senior Content Marketing Editor at Packt Publishing, I handle vast array of content in the tech space ranging from Data science, Web development, Programming, Cloud & Networking, IoT, Security and Game development. With prior experience and understanding of Marketing I aspire to grow leaps and bounds in the Content & Digital Marketing field. On the personal front I am an ambivert and love to read inspiring articles and books on life and in general.

Share
Published by
Fatema Patrawala

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago