Recurrent neural networks and the LSTM architecture

2 min read

A recurrent neural network is a class of artificial neural networks that contain a network like series of nodes, each with a directed or one-way connection to every other node. These nodes can be classified as either input, output, or hidden. Input nodes receive data from outside of the network, hidden nodes modify the input data, and output nodes provide the intended results. RNNs are well known for their extensive usage in NLP tasks.

The video tutorial above has been taken from Natural Language Processing with Python.

Why are recurrent neural networks well suited for NLP?

What makes RNNs so popular and effective for natural language processing tasks is that they operate sequentially over data sets. For example, a movie review is an arbitrary sequence of letters and characters, which the RNN can take as an input. The subsequent hidden and output layers are also capable of working with sequences.

In a basic sentiment analysis example, you might just have a binary output – like classifying movie reviews as positive or negative. RNNs can do more than this – they are capable of generating a sequential output, such as taking an input sentence in English and translating it into Spanish. This ability to sequentially process data is what makes recurrent neural networks so well suited for NLP tasks.

RNNs and long short-term memory

Recurrent neural networks can sometimes become unstable due to the complexity of the connections they are built upon. That’s where LSTM architecture helps. LSTM introduces something called a memory cell. The memory cell simplifies what could be incredibly by using a series of different gates to govern the way it changes within the network.

The input gate manages inputs
The output gates manage outputs
Self-recurrent connection that keeps the memory cell in a consistent state between different steps
The forget gate simply allows the memory cell to ‘forget’ its previous state