As humans we always tend to consider the previous states of our task when doing something.For an example, when you are reading this article, you keep grabbing each word into your mind based on the understanding of the previous words you just encountered in the sentences. It is sort of like predicting what are you going to read next based on your what you encounter now. Imagine you want a network to read the lyrics of a song. This task cannot be accomplished by a traditional artificial neural networks. Because there should be a persistence of information. Therefore, this is where the Recurrent Neural Networks come into play. In the following article, I’m going to give the readers an elaborate idea of what a RNN is and how it actually works based on what I learnt from the subject module I learnt in the university.

What is a RNN?

RNN is a class of artificial neural networks where connections between nodes form a directed graphed along a temporal sequence. RNNs should not be confused with recursive neural network.

Recurrent Neural Network (RNN) is a Neural network model proposed in the 80s for sequence modelling, specially the time series models. When a variable state of defined by the previous state of the same variable, then it is called the sequence data. All the time series functions are sequential.

The structure of the network is similar to feed forward neural network, with the distinction that it allows a recurrent hidden state whose activation at each time is dependent on that of the previous time (cycle).

The Architecture of RNN (Vanilla Architecture)

A RNN has loops, allowing information to persist. As in the above diagram, an instance of a neural network, looks at some input xt and outputs a value ht. A loop allows information to be passed from one step of the network to the next. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.

Since RNNs are supervised networks, we need a teacher signal in order to train RNN. We actually call the labelled data as the teacher signal because its sequence of objects over the time steps.

RNN Formula

It basically says the current hidden state h(t) is a function f of the previous hidden state h(t-1) and the current input x(t). The theta are the parameters of the function f. The network typically learns to use h(t) as a kind of lossy summary of the task-relevant aspects of the past sequence of inputs up to t.

Loss function

The total loss for a given sequence of x values paired with a sequence of y values would then be just the sum of the losses over all the time steps. For example, if L(t) is the negative log-likelihood of y (t) given x (1), . . . , x (t) , then sum them up you get the loss for the sequence

How to train RNN using Teacher Signal

Initialize weights randomly
Give the model a char pair (input char & target char. The target char is the char the network should guess, its the next char in our sequence)
Forward pass (We calculate the probability for every possible next char according to the state of the model, using the weights)
Measure error (the distance between the previous probability and the target char)
We calculate gradients for each of our weights to see their impact they have on the loss.
Update all weights in the direction via gradients that help to minimize the loss. Actually the weights updates of each step should be equal. This is achieved by obtaining the mean value of of all the states.
This way of weight updating is known as Back Propagation Through Time (BPTT)
This procedure is repeated until the minimal error is received.

Applications of RNN

Machine Translation
Robot control
Time series prediction
Speech recognition
Speech synthesis
Time series anomaly detection
Rhythm learning
Music composition
Grammar learning
Handwriting recognition
Human action recognition
Protein Homology Detection
Predicting sub-cellular localization of proteins
Several prediction tasks in the area of business process management
Prediction in medical care pathways

Limitations of RNN

Vanishing gradient problem
API is too constrained: they accept a fixed-sized vector as input (e.g. an image) and produce a fixed-sized vector as output (e.g. probabilities of different classes)
Perform the mapping using a fixed amount of computational steps (e.g. the number of layers in the model).

Kasun Chathuranga | Official Site

Recurrent Neural Networks