Categories: NewsData

DeepMind introduces NarrativeQA: A real-world dataset for testing the limits of Reading Comprehension

2 min read

DeepMind introduces NarrativeQA, a data repository setup for understanding complex narratives.

Reading comprehension (RC)—in contrast to information retrieval—requires integrating information and reasoning about events, entities, and their relations across a full document. The question answering technique is traditionally used to assess the abilities of RC, both in AI agents and in children who are learning to read.

However, DeepMind surveyed that the existing RC datasets such as MCTest, Children’s Book Test(CBT), CNN/Daily Mail, NewsQA, SearchQA, and so on and found out certain limitations which include, presence of small datasets, unnatural data, requirement of a single sentence of information to answer the questions, and so on. Hence, these RC datasets are unable to test an important integrative aspect of machine’s Reading Comprehension.

In order to encourage deeper comprehension of language, DeepMind presents a brand new dataset and a set of tasks, known as the NarrativeQA.

This dataset includes fictional stories, which are 1,567 complete stories from books and movie scripts, with human written questions and answers based solely on human-generated abstract summaries.

The dataset is divided into three parts:

  • non-overlapping training
  • validation and
  • testing

There are 46,765 pairs of answers to questions written by humans and includes mostly the more complicated variety of questions such as “when / where / who / why”. This dataset permits the training of neural network-based models over word embeddings and provide decent lexical coverage and diversity.Thus, this dataset would test and reward agents that approach human level of competency.

Having given a quantitative and qualitative analysis of the difficulty of the more complex tasks, DeepMind suggests research directions that may help bridge the gap between existing models and human performance. DeepMind also hopes that this dataset will serve not only as a challenge for the machine reading community, but also as a driver for the development of a new class of neural models which will take a significant step beyond the level of complexity which existing datasets and tasks permit.

To have a detailed understanding on the working of NarrativeQA dataset, you can have a look at the research paper here.

Savia Lobo

A Data science fanatic. Loves to be updated with the tech happenings around the globe. Loves singing and composing songs. Believes in putting the art in smart.

Share
Published by
Savia Lobo

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago