Last week, the Facebook AI research team published a progress report on dialogue research that is fundamentally building more engageable and personalized AI systems.
According to the team, “Dialogue research is a crucial component of building the next generation of intelligent agents. While there’s been progress with chatbots in single-domain dialogue, agents today are far from capable of carrying an open-domain conversation across a multitude of topics. Agents that can chat with humans in the way that people talk to each other will be easier and more enjoyable to use in our day-to-day lives — going beyond simple tasks like playing a song or booking an appointment.”
In their blog post, they have described new open source data sets, algorithms, and models that improve five common weaknesses of open-domain chatbots today. The weaknesses identified are maintaining consistency, specificity, empathy, knowledgeability, and multimodal understanding. Let us look at each one in detail:
Dataset called Dialogue NLI introduced for maintaining consistency
Inconsistencies are a common issue for chatbots partly because most models lack explicit long-term memory and semantic understanding. Facebook team in collaboration with their colleagues at NYU, developed a new way of framing consistency of dialogue agents as natural language inference (NLI) and created a new NLI data set called Dialogue NLI, used to improve and evaluate the consistency of dialogue models.
The team showcased an example in the Dialogue NLI model, where in they considered two utterances in a dialogue as the premise and hypothesis, respectively. Each pair was labeled to indicate whether the premise entails, contradicts, or is neutral with respect to the hypothesis.
Training an NLI model on this data set and using it to rerank the model’s responses to entail previous dialogues — or maintain consistency with them — improved the overall consistency of the dialogue agent. Across these tests they say they saw 3x lesser contradictions in the sentences.
Several conversational attributes were studied to balance specificity
As per the team, generative dialogue models frequently default to generic, safe responses, like “I don’t know” to some query which needs specific responses. Hence, the Facebook team in collaboration with Stanford’s AI researcher Abigail See, studied how to fix this by controlling several conversational attributes, like the level of specificity.
In one experiment, they conditioned a bot on character information and asked “What do you do for a living?” A typical chatbot responds with the generic statement “I’m a construction worker.” With control methods, the chatbots proposed more specific and engaging responses, like “I build antique homes and refurbish houses.”
In addition to specificity, the team mentioned, “that balancing question-asking and answering and controlling how repetitive our models are make significant differences. The better the overall conversation flow, the more engaging and personable the chatbots and dialogue agents of the future will be.”
Chatbot’s ability to display empathy while responding was measured
The team worked with researchers from the University of Washington to introduce the first benchmark task of human-written empathetic dialogues centered on specific emotional labels to measure a chatbot’s ability to display empathy.
In addition to improving on automatic metrics, the team showed that using this data for both fine-tuning and as retrieval candidates leads to responses that are evaluated by humans as more empathetic, with an average improvement of 0.95 points (on a 1-to-5 scale) across three different retrieval and generative models.
The next challenge for the team is that empathy-focused models should perform well in complex dialogue situations, where agents may require balancing empathy with staying on topic or providing information.
Wikipedia dataset used to make dialogue models more knowledgeable
The research team has improved dialogue models’ capability of demonstrating knowledge by collecting a data set with conversations from Wikipedia, and creating new model architectures that retrieve knowledge, read it, and condition responses on it.
This generative model has yielded the most pronounced improvement and it is rated by humans as 26% more engaging than their knowledgeless counterparts.
To engage with images, personality based captions were used
To engage with humans, agents should not only comprehend dialogue but also understand images. In this research, the team focused on image captioning that is engaging for humans by incorporating personality. They collected a data set of human comments grounded in images, and trained models capable of discussing images with given personalities, which makes the system interesting for humans to talk to. 64% humans preferred these personality-based captions over traditional captions.
To build strong models, the team considered both retrieval and generative variants, and leveraged modules from both the vision and language domains. They defined a powerful retrieval architecture, named TransResNet, that works by projecting the image, personality, and caption in the same space using image, personality, and text encoders.
The team showed that their system was able to produce captions that are close to matching human performance in terms of engagement and relevance. And annotators preferred their retrieval model’s captions over captions written by people 49.5% of the time.
Apart from this, Facebook team has released a new data collection and model evaluation tool, a Messenger-based Chatbot game called Beat the Bot, that allows people to interact directly with bots and other humans in real time, creating rich examples to help train models.
To conclude, the Facebook AI team mentions, “Our research has shown that it is possible to train models to improve on some of the most common weaknesses of chatbots today. Over time, we’ll work toward bringing these subtasks together into one unified intelligent agent by narrowing and eventually closing the gap with human performance. In the future, intelligent chatbots will be capable of open-domain dialogue in a way that’s personable, consistent, empathetic, and engaging.”
On Hacker News, this research has gained positive and negative reviews. Some of them discuss that if AI will converse like humans, it will do a lot of bad. While other users say that this is an impressive improvement in the field of conversational AI.
A user comment reads, “I gotta say, when AI is able to converse like humans, a lot of bad stuff will happen. People are so used to the other conversation partner having self-interest, empathy, being reasonable. When enough bots all have a “swarm” program to move conversations in a particular direction, they will overwhelm any public conversation. Moreover, in individual conversations, you won’t be able to trust anything anyone says or negotiates. Just like playing chess or poker online now. And with deepfakes, you won’t be able to trust audio or video either.
The ultimate shock will come when software can render deepfakes in realtime to carry on a conversation, as your friend but not. As a politician who “said crazy stuff” but really didn’t, but it’s in the realm of believability.
I would give it about 20 years until it all goes to shit. If you thought fake news was bad, realtime deepfakes and AI conversations with “friends” will be worse.
Scroll Snapping and other cool CSS features come to Firefox 68
Google Chrome to simplify URLs by hiding special-case subdomains
Lyft releases an autonomous driving dataset “Level 5” and sponsors research competition