2 min read
Google says that with BERT, you can train your own state-of-the-art question answering system in 30 minutes on a single Cloud TPU, or a few hours using a single GPU. The source code built on top of TensorFlow. A number of pre-trained language representation models are also included.
BERT improves on recent work in pre-training contextual representations. This includes semi-supervised sequence learning, generative pre-training, ELMo, and ULMFit. BERT is different from these models, it is the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus – Wikipedia.
Context-free models like word2vec generate a single word embedding representation for every word. Contextual models, on the other hand, generate a representation\ of each word based on the other words in the sentence. BERT is deeply bidirectional as it considers the previous and next words.
It is not possible to train bidirectional models by simply conditioning each word on words before and after it. Doing this would allow the word that’s being predicted to indirectly see itself in a multi-layer model. To solve this, Google researchers used a straightforward technique of masking out some words in the input and condition each word bidirectionally in order to predict the masked words. This idea is not new, but BERT is the first technique where it was successfully used to pre-train a deep neural network.
On The Stanford Question Answering Dataset (SQuAD) v1.1, BERT achieved 93.2% F1 score surpassing the previous state-of-the-art score of 91.6% and human-level score of 91.2%. BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks.
For more details, visit the Google Blog.