Top Research papers showcased at NIPS 2017 – Part 2

7 min read

Continuing from where we left our previous post, we are back with a quick roundup of top research papers on Machine Translation, Predictive Modelling, Image-to-Image Translation, and Recommendation Systems from NIPS 2017.

Machine Translation

In layman terms, Machine translation (MT) is the process by which a computer software translates a text from one natural language to another. This year at NIPS, a large number of presentations focused on innovative ways of improving translations. Here are our top picks.

Value Networks: Improving beam search for better Translation

Microsoft has ventured into translation tasks with the introduction of Value Networks in their paper “Decoding with Value Networks for Neural Machine Translation”. Their prediction network improves beam search which is a shortcoming of Neural Machine Translation (NMT). This new methodology inspired by the success of AlphaGo, takes the source sentence x, the currently available decoding output y1, ··· , yt1 and a candidate word w at step t as inputs, using which it predicts the long-term value (e.g., BLEU score) of the partial target sentence if it is completed by the NMT(Neural Machine Translational) model. Experiments show that this approach significantly improves the translation accuracy of several translation tasks.

CoVe: Contextualizing Word Vectors for Machine Translation

Salesforce researchers have used a new approach to contextualize word vectors in their paper “Learned in Translation: Contextualized Word Vectors”. A wide variety of common NLP tasks namely sentiment analysis, question classification, entailment, and question answering use only supervised word and character vectors to contextualize Word vectors. The paper uses a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation. Their research portrays that adding these context vectors (CoVe) improves performance over using only unsupervised word and character vectors. For fine-grained sentiment analysis and entailment also, CoVe improves the performance of the baseline models to the state-of-the-art.

Predictive Modelling

A lot of research showcased at NIPS was focussed around improving the predictive capabilities of Neural Networks. Here is a quick look at the top presentations.

Deep Ensembles for Predictive Uncertainty Estimation

Bayesian Solutions are most frequently used in quantifying predictive uncertainty in Neural networks. However, these solutions can at times be computationally intensive. They also require significant modifications to the training pipeline. DeepMind researchers have proposed an alternative to Bayesian NNs in their paper “Simple and scalable predictive uncertainty estimation using deep ensembles”. Their proposed method is easy to implement, readily parallelizable requires very little hyperparameter tuning, and yields high-quality predictive uncertainty estimates.

VAIN: Scaling Multi-agent Predictive Modelling

Multi-agent predictive modeling predicts the behavior of large physical or social systems by an interaction between various agents. However, most approaches come at a prohibitive cost. For instance, Interaction Networks (INs) were not able to scale with the number of interactions in the system (typically quadratic or higher order in the number of agents). Facebook researchers have introduced VAIN, which is a simple attentional mechanism for multi-agent predictive modeling that scales linearly with the number of agents. They can achieve similar accuracy but at a much lower cost. You can read more about the mechanism in their paper “VAIN: Attentional Multi-agent Predictive Modeling

PredRNN: RNNs for Predictive Learning with ST-LSTM

Another paper titled “PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs” showcased a new predictive recurrent neural network.  This architecture is based on the idea that spatiotemporal predictive learning should memorize both spatial appearances and temporal variations in a unified memory pool. The core of this RNN is a new Spatiotemporal LSTM (ST-LSTM) unit that extracts and memorizes spatial and temporal representations simultaneously. Memory states are allowed to zigzag in two directions: across stacked RNN layers vertically and through all RNN states horizontally. PredRNN is a more general framework, that can be easily extended to other predictive learning tasks by integrating with other architectures. It achieved state-of-the-art prediction performance on three video prediction datasets.

Recommendation Systems

New researches were presented by Google and Microsoft to address the cold-start problem and to build robust and powerful of Recommendation systems.

Off-Policy Evaluation For Slate Recommendation

Microsoft researchers have studied and evaluated policies that recommend an ordered set of items in their paper “Off-Policy Evaluation For Slate Recommendation”.

General recommendation approaches require large amounts of logged data to evaluate whole-page metrics that depend on multiple recommended items, which happens when showing ranked lists. The number of these possible lists is called as slates. Microsoft researchers have developed a technique for evaluating page-level metrics of such policies offline using logged past data, reducing the need for online A/B tests. Their method models the observed quality of the recommended set as an additive decomposition across items. It fits many realistic measures of quality and shows exponential savings in the amount of required data compared with other off-policy evaluation approaches.

Meta-Learning on Cold-Start Recommendations

Matrix Factorization techniques for product recommendations, although efficient, suffer from serious cold-start problems. The cold start problem concerns with the recommendations for users with no or few past history i.e new users. Providing recommendations to such users becomes a difficult problem for recommendation models because their learning and predictive ability are limited. Google researchers have come up with a meta-learning strategy to address item cold-start when new items arrive continuously. Their paper “A Meta-Learning Perspective on Cold-Start Recommendations for Items” has two deep neural network architectures that implement this meta-learning strategy. The first architecture learns a linear classifier whose weights are determined by the item history while the second architecture learns a neural network whose biases are instead adjusted. On evaluating this technique on the real-world problem of Tweet recommendation, the proposed techniques significantly beat the MF baseline.

Image-to-Image Translation

NIPS 2017 exhibited a new image-to-image translation system, a model to hide images within images, and use of feature transforms to improve universal style.

Unsupervised Image-to-Image Translation

Researchers at Nvidia have proposed an unsupervised image-to-image translation framework based on Coupled GANs. Unsupervised image-to-image translation learns a joint distribution of images in different domains by using images from the marginal distributions in individual domains. However, there exists an infinite set of joint distributions that can arrive from the given marginal distributions. So, one could infer nothing about the joint distribution from the marginal distributions, without additional assumptions. Their paper “Unsupervised Image-to-Image Translation Networks

” uses a shared-latent space assumption to address this issue. Their method presents high-quality image translation results on various challenging unsupervised image translation tasks, such as street scene image translation, animal image translation, and face image translation.

Deep Steganography

Steganography is commonly used to unobtrusively hide a small message within the noisy regions of a larger image. Google researchers in their paper “Hiding Images in Plain Sight: Deep Steganography” have demonstrated the successful application of deep learning to hiding images. They have placed a full-size color image within another image of the same size. They have also trained Deep neural networks to create the hiding and revealing processes and are designed to specifically work as a pair. Their approach compresses and distributes the secret image’s representation across all of the available bits, instead of encoding the secret message within the least significant bits of the carrier image. This system is trained on images drawn randomly from the ImageNet database and works well on natural images.

Improving Universal style transfer on images

NIPS 2017 witnessed another paper aimed at improving the Universal Style Transfer. Universal style transfer is used for transferring arbitrary visual styles to content images. The paper “Universal Style Transfer via Feature Transforms” by Nvidia researchers highlight feature transforms, as a simple yet effective method to tackle the limitations of existing feed-forward methods for Universal Style Transfer, without training on any pre-defined styles. Existing feed-forward based methods are mainly limited by the inability of generalizing to unseen styles or compromised visual quality. The research paper embeds a pair of feature transforms, whitening and coloring, to an image reconstruction network. The whitening and coloring transform reflect a direct matching of feature covariance of the content image to a given style image. The algorithm can generate high-quality stylized images with comparisons to a number of recent methods.

Key Takeaways from NIPS 2017

The Research papers covered in this and the previous post highlight that most organizations are at the forefront of machine learning and are actively exploring virtually all aspects of the field.

Deep learning practices were also in trend. The conference was focussed on the current state and recent advances in Deep Learning. A lot of talks and presentations were about industry-ready neural networks suggesting a fast transition from research to industry.

Researchers are also focusing on areas of language understanding, speech recognition, translation, visual processing, and prediction. Most of these techniques rely on using GANs as the backend.

For live content coverage, you can visit NIPS’ Facebook page.


Please enter your comment!
Please enter your name here