Deep Learning – An Annotated Bibliography

RCL lab director Assof/Prof David Kearney has made the following information available in preparation of the deep learning reading group to be held on Wednesday 15th January 2014 at DSTO Edinburgh.


Deep Leaning – An annotated bibliography by David Kearney

This is a work in progress please send feedback to David dot Kearney at unisa dot edu dot au


This bibliography aims to focus on material that provides conceptual understanding without resorting to assumed knowledge of advanced probability and professional mathematical concepts that not everyone who wants to understand deep learning might have. I also emphasise simple software tools and video presentations that are easily accessible to those without specialist mathematical backgrounds. This is not to under rate mathematical treatments but more to recognize that mathematical treatments in the literature are often completely inaccessible without years of study.

What’s in a name?

There are a number of words that have entered the jargon of deep learning. These include:

Deep belief networks,  HMAX, Deep architectures, SIFT, hierarchical models, deep networks, structural SVMs, Convolutional networks, Hierarchical Temporal Memory, hierarchical sparse coding.


The Coursera course from Geoff Hinton and his group is highly recommended. Although the course is finished you can still enrol and watch the videos. Depending on your current knowledge of neural networks you could skip the early lectures and start in the middle with Hopfield nets; which are more relevant to deep learning.

If you want to hear from the experts in the field all in one place then you should go to the UCLA Institute for Pure and Applied Mathematics Graduate Summer School: Deep Learning, Feature Learning July 9 – 27 2012.

Andrew Ng from Stanford has a good introductory lecture here.

I also found this tutorial helpful

Many other videos are listed on the deep learning web site:


There are yet to appear dedicated textbooks of deep learning. The text book that receives high ratings on probabilistic machine learning has a single last chapter on deep learning:

Historical Perspective

Geoff Hinton has provided a historical introduction to deep learning which contains good conceptual insights and almost no mathematics.

[To Recognize Shapes, First Learn to Generate Images - Geoffrey Hinton]

Specific topics

Hopfield nets

The Hebian learning rule “fire together wire together” used in Hopfield nets is explained well in these slides:

Restricted Boltzmann Machines (RBMs)

There is a good explanation of the key algorithms in probabilistic machine learning. This is the best conceptual description of Gibbs sampling that I have seen so far:

Convolutional deep belief networks

Convolution is introduced as a means of coding images so they are shift invariant before they are presented to the restricted Boltzmann machine. Unfortunately I have yet to find a good conceptual description of these types of deep learning networks. You can read an original paper by Andrew Ng’s team:

[Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations - Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng]

Training a deep belief network

Stacking several RBMs on top of one another and using the output of one layer as the input to the next is the main way that deep belief networks gain their classification power.

“A more general solution (for training) was proposed by Hinton and collaborators (Hinton et al., 2006). They showed that a deep network can be trained in two steps. First, each layer is trained in sequence by using an unsupervised algorithm to model the distribution of the input. Once a layer has been trained, it is used to produce the input to train the layer above. After all layers have been trained in an unsupervised way, the whole network is trained by traditional back-propagation of the error (e.g., classification error), but the parameters are initialized using the weights learned in the first phase. Since the parameters are nicely initialized, the optimization of the whole system can be carried out successfully.”

[Learning Feature Hierarchies for Object Recognition - Koray Kavukcuoglu  (dissertation)]

[Unsupervised Learning of Feature Hierarchies - Marc’Aurelio Ranzato (dissertation)]

Introducing temporal information into deep learning

Again it is hard to find a good easy to grasp conceptual explanation of how temporal information is included in a RBM and thus a deep belief network. The best available seems to be:

[Modeling human motion using binary latent variables. Advances in Neural Information Processing Systems - Taylor, G. W., Hinton, G. E. and Roweis, S]

Conditional Restricted Boltzmann Machines

See Taylor, Hinton and Roweis NIPS 2006, JMLR 2011:

There is a set of slides and a presentation from the IPAM grad course:

You can watch the video at:


Simple software examples:

Hopfield nets

There is a Java based simulation of a Hopfield net that illustrates its ability to store patterns and recover input patterns contaminated by noise:

Restricted Boltzman Machines

A simple python example of a Restricted Boltzmann machine learning movie preferences is provided here:

Complete software examples

There is a lot of software available but each item often requires complex installation and support. The following two have been tried out (on a macintosh running 10.9) and found to be relatively straightforward to install or have adequate installation instructions.

This tracking example requires Matlab and uses numerous open source packages which are supplied with the code. It is interesting in that it provides a tracking example. However understanding the tracking example means extra knowledge not attempted to be covered in this bibliography.

The deep learning site has a complete example of a stacked set of restricted boltzman machines in Python (also known as a Deep Belief Net or DBN):

To train this is in a reasonable time requires the configuration of your GPU to work with the software. This is not covered well in the documentation. The example requires the Theano python expression compiler as well.