2 min read

Google open-sourced Bidirectional Encoder Representations from Transformers (BERT) last Friday for NLP pre-training. Natural language processing (NLP) consists of topics like sentiment analysis, language translation, question answering, and other language-related tasks. Large datasets for NLP containing millions, or billions, of annotated training examples is scarce.

Google says that with BERT, you can train your own state-of-the-art question answering system in 30 minutes on a single Cloud TPU, or a few hours using a single GPU. The source code built on top of TensorFlow. A number of pre-trained language representation models are also included.

BERT features

BERT improves on recent work in pre-training contextual representations. This includes semi-supervised sequence learning, generative pre-training, ELMo, and ULMFit. BERT is different from these models, it is the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus – Wikipedia.

Context-free models like word2vec generate a single word embedding representation for every word. Contextual models, on the other hand, generate a representation\ of each word based on the other words in the sentence. BERT is deeply bidirectional as it considers the previous and next words.


It is not possible to train bidirectional models by simply conditioning each word on words before and after it. Doing this would allow the word that’s being predicted to indirectly see itself in a multi-layer model. To solve this, Google researchers used a straightforward technique of masking out some words in the input and condition each word bidirectionally in order to predict the masked words. This idea is not new, but BERT is the first technique where it was successfully used to pre-train a deep neural network.


On The Stanford Question Answering Dataset (SQuAD) v1.1, BERT achieved 93.2% F1 score surpassing the previous state-of-the-art score of 91.6% and human-level score of 91.2%. BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks.

For more details, visit the Google Blog.

Read next

Intel AI Lab introduces NLP Architect Library

FAT Conference 2018 Session 3: Fairness in Computer Vision and NLP

Implement Named Entity Recognition (NER) using OpenNLP and Java

Data science enthusiast. Cycling, music, food, movies. Likes FPS and strategy games.