Are Recurrent Neural Networks capable of warping time?

2 min read

‘Can Recurrent neural networks warp time?’ is authored by Corentin Tallec and Yann Ollivier to be presented at ICLR 2018.

This paper explains that plain RNNs cannot account for warpings, leaky RNNs can account for uniform time scalings but not irregular warpings, and gated RNNs can adapt to irregular warpings.

Gating mechanism of LSTMS (and GRUs) to time invariance / warping

What problem is the paper trying to solve?

In this paper, that authors prove that learnable gates in a recurrent model formally provide quasi-invariance to general time transformations in the input data. Further, the authors try to recover part of the LSTM architecture from a simple axiomatic approach. This leads to a new way of initializing gate biases in LSTMs and GRUs. Experimentally, this new chrono initialization is shown to greatly improve learning of long term dependencies, with minimal implementation effort.

Paper summary

The authors have derived the self loop feedback gating mechanism of recurrent networks from first principles via a postulate of invariance to time warpings. Gated connections appear to regulate the local time constants in recurrent models. With this in mind, the chrono initialization, a principled way of initializing gate biases in LSTMs, has been introduced. Experimentally, chrono initialization is shown to bring notable benefits when facing long term dependencies.

Key takeaways

In this paper, the authors show that postulating invariance to time transformations in the data (taking invariance to time warping as an axiom) necessarily leads to a gate-like mechanism in recurrent models.
The paper provides precise prescriptions on how to initialize gate biases depending on the range of time dependencies to be captured.
The empirical benefits of the new initialization on both synthetic and real world data have been tested.
The authors also observed a substantial improvement with long-term dependencies, and slight gains or no change when short-term dependencies dominate.

Reviewer comments summary

Overall Score: 25/30

Average Score: 8

According to a reviewer, the core insight of the paper is the link between recurrent network design and its effect on how the network reacts to time transformations. This insight is simple, elegant and valuable, as per the reviewer. A minor complaint highlighted is that there are an unnecessarily large number of paragraph breaks, which make reading slightly jarring.