An Introduction to Convolutional Neural Networks
Background
The Early Neural Network Models

Computational models of neural networks have been around for more than half a century, beginning with the simple model that McCulloch and Pitts developed in 1943 [1]. Hebb subsequently contributed a learning algorithm to train such models [2], summed up by the familiar refrain: 'Neuron that fire together, wire together'. Hebb's rule, and a popular variant known as the Delta rule, were crucial for early models of cognition, but they quickly ran into trouble with respect to their computational power. In their extremely influencial book, Perceptrons, Minsky and Papert proved that these networks couldn't even learn the boolean XOR function, due to due only being able to learn a single layer of weights[3]. Luckily, a more complex learning algorithm, backpropagation, eventually emerged, which could learn across arbitrarily many layers, and was later proven to be capable of approximating any computable function.
How a Feed-Forward Neural Network Works
The backpropagation algorithm is defined over a multilayer feed-forward neural network, or FFNN. A FFNN can be thought of in terms of neural activation and the strength of the connections between each pair of neurons. Because we are only concerned with feed forward networks, the pools of neurons are connected together in some directed, acyclic way so that the networks activation has a clear starting and stopping place (i.e. an input pool and an output pool). The pools in between these two extremes are known as hidden pools.
The flow of activation in these network is specified through a weighted summation process. Each neuron sends its current activation any unit is connected to, which is then multiplied by the weight of the connection to the receiving neuron and passed through some squashing function, typical a sigmoid, to introduce nonlinearities (if this were a purely linear process, then additional layers wouldn't matter, since adding two linear combinations together produces another linear combination). Since we typically assume each layer to be completed connected to the next layer, these calculations can be done via multiplying the vector of activations by the weight matrix and then passing all of the results through the squashing function.
Learning in these networks occurs through changing the weights so as to minimize some error function, typically specified as the difference between the output pool's activation vector and the desired activation vector. Normally this is accomplished incrementally via the previously mentioned backpropagation algorithm, in which the partial derivative of the error with respect to last layer of weights is calculated (and generally scaled down) and used to update the weights. Then the partial derivatives can be calculated for the second-to-last weight layers and so on, with the process repeating recursively until the weight layer connected to the input pool is updated. For more information about how these derivatives are calculated, visit the PDPHandbook.
Problems with Backpropagation

Despite being a universal function approximator in theory, FFNNs weren't good at dealing with many sorts of problems in practice. The example relevant to the current discussion is the FFNN's poor ability to recognize objects presented visually. Since every unit in a pool was connected to every unit in the next pool, the number of weights grew very rapidly with the dimensionality of the input, which led to slow learning for the typically high dimensional domain of vision. Even more disconcerting was the spatial ignorance of the FFNN. Since every pair of neurons between two pools had their own weight, learning to recognize a object in one location wouldn't transfer to the same object presented in a different part of the visual field; separate weights would be involved in that calculation. What was needed was an architecture that exploited the two dimensional spacial constraints imposed by its input modality whilst reducing the amount of parameters involved in training. Convolutional neural networks are the architecture.
Convolutional Neural Networks
LeCun's formulation
Serre's H-max Pools
Open Questions
References
- McCulloch, Warren; Walter Pitts (1943). "A Logical Calculus of Ideas Immanent in Nervous Activity". Bulletin of Mathematical Biophysics 5 (4): 115–133.
- Hebb, Donald (1949). The Organization of Behavior. New York: Wiley.
- Minsky, M. and Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge, MA.