3 min read

OpenAI researchers demonstrated a new AI model, yesterday, called GPT-2, that is capable of generating coherent paragraphs of text without needing any task-specific training. In other words, give it the first line of a story, and it’ll form the rest. Apart from generating articles, it can also perform rudimentary reading comprehension, summarization, machine translation, and question answering.  

GPT-2 is an unsupervised language model comprising 1.5 billion parameters and is trained on a dataset of 8 million web pages. “GPT-2 is simply trained to predict the next word in a 40GB of internet tex”, says the OpenAI team. The OpenAI team states that it is superior to other language models trained on specific domains (like Wikipedia, news, or books) as it doesn’t need to use these domain-specific training datasets.

For languages related tasks such as question answering, reading comprehension, and summarization, GPT-2 can learn these tasks directly from the raw text and doesn’t require any training data. The OpenAI team states that the GPT-2 model is ‘chameleon-like’ and easily adapts to the style and content of the input text.

However, the team has observed certain failures in the model such as repetitive text, world modeling failures, and unnatural topic switching. Finding a good sample depends on the familiarity of the model with that sample’s context. For instance, when the model is prompted with topics that are ‘highly represented in data’ like Miley Cyrus, Lord of the rings, etc, it is able to generate reasonable samples 50% of the time. On the other hand, the model performs poorly in case of highly technical or complex content.


The OpenAI team has specified that it envisions the use of GPT-2 in development of AI writing assistants, advanced dialogue agents, unsupervised translation between languages and enhanced speech recognition systems. It has also specified the potential misuses of GPT-2 as it can be used to generate misleading news articles, and automate the large scale production of fake and phishing content on social media.

Due to the concerns related to this misuse of language generating models, OpenAI has decided to release a ‘small’ version of GPT-2  with its sampling code and a research paper for researchers to experiment with. The dataset, training code, or GPT-2 model weights have been excluded from the release.

The OpenAI team states that this release strategy will give them and the overall AI community the time to discuss more deeply about the implications of such systems. It also wants the government to take initiatives to monitor the societal impact of AI technologies and to track the progress of capabilities in these systems. “If pursued, these efforts could yield a better evidence base for decisions by AI labs and governments regarding publication decisions and AI policy more broadly”, states the OpenAI team.

Public reaction to the news is positive, however, not everyone is okay with OpenAI’s release strategy, and feels that the move signals towards ‘closed AI’ and propagates the ‘fear of AI’:

For more information, check out the official OpenAI GPT-2 blog post.

Read Next

OpenAI charter puts safety, standards, and transparency first

OpenAI launches Spinning Up, a learning resource for potential deep learning practitioners

OpenAI builds reinforcement learning based system giving robots human like dexterity