Revision as of 10:25, 6 June 2013

This wiki explores some of the applications and models of neural networks being applied to research in both biology and neuroscience as well as artificial intelligence and computer science. Modeling how the brain sends signals through these neural networks has brought along many breakthroughs in the field of learning.

Introduction

A Neural Network is a network of neurons working together to send a flow of signals to accomplish some task. The original biological neural networks consist of neurons which interact with their neighbors through axon terminals connected via synapses to dendrites in other neurons. A neural circuit is a functional entity of interconnected neurons that regulates its own activity using a feedback loop. Artificial intelligence in the field of Computer Science adopted this information processing paradigm to create artificial neural networks. These artificial neural networks have been applied successfully to speech recognition, image analysis, and recognition tasks. Lots of research in Professor Andrew Ng's lab is geared towards applying neural networks to unsupervised learning tasks. ^[1]

Neural Networks in Neuroscience

History

Neural networks were first discovered/modeled in the late 1800's by a couple of biologists/psychologists including Herbert Spencer, Theodor Meynert, William James, and Sigmund Freud. The first rule of neuronal learning, Hebbian learning, was described by Hebb^[2] in 1949 which Hebb states that "the persistence or repetition of a reverberatory activity (or "trace") tends to induce lasting cellular changes that add to its stability...when an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased." It attempts to explain "associative learning," in which simultaneous activation of cells leads to pronounced increases in synaptic strength between those cells.

Neuron connections

Neuron connections basically consist of chemical synapses and electrical gap junctions. One principal by which neurons work is neural summation where potentials at the post synaptic membrane sum up in the cell body and if they surpass a certain threshold, an action potential will occur that travels down the action to the terminal endings to transmit a signal to other neurons. Neuroplasticity of the response characteristics consists of changes to the brain caused by activity or experience which lead to the idea of memory.

Back-propagating action potentials cannot occur because after an action potential travels down a given segment of the axon, the m gate becomes closed, thus blocking the generation of an action potential back towards the cell body. However, in some cells, back-propagation does occur through the dendritic arbor and have important effects on synaptic plasticity and computational applications.

Receptive fields

A receptive field is a small region within an entire visual field and is commonly used in the ideas of convolutional neural networks. Any given neuron only responds to a subset of stimuli within its receptive field (tuning). A neuron in V1 may fire to any vertical stimulus in its receptive field because of its simple tuning, but in higher visual areas such as the fusiform gyrus, the neuron may only fire when a certain face appears in the receptive field. Memories are very likely represented by patterns of activation amongst these neural networks. The study and modeling of these networks has attracted a lot of interest in many different fields. It has the potential to explain various aspects of behavior, learning, and memory. The most important property of neural networks is the ability to learn complex patterns which is heavily emphasized in other fields like Computer Science.

Neural Networks in Computer Science (Artificial Intelligence)

Neural Network Models

In an artificial neural network (ANN), there are multiple layers of neurons in the system. The first layer has input neurons which send data via synapses to the second layer of neurons and then more to the third layer of output neurons. Some systems have more complex systems which will have more layers of neurons with different responsibilities. The synapses store parameters called "weights" that manipulate the data in the calculations.

An ANN typically has the following:

Interconnection pattern between different layers of neurons
Learning process for updating the weights of the interconnections
Activation function that converts neuron's weighted input to its output activation

The neuron's network function $f (x)$ is defined as a composition of other functions $g_{i} (x)$ in a weighted sum that is then passed through a non-linear activation function $K$ such as the hyperbolic tangent or sigmoid function, $f (x) = K (Σ_{i} W_{i} g_{i} (x))$ .

Networks such as these are commonly called feedforward networks because the graph is a directed acyclic graph.

Supervised Learning

Suppose we have a fixed training set $(x^{(1)}, y (1)), ..., (x^{(m)}, y (m))$ of $m$ training examples. We can train our neural network using batch gradient descent. In detail, for a single training example $(x, y)$ , we define the cost function with respect to that single example to be:

J (W, b; x, y) = \frac{1}{2} {‖ h_{W, b} (x) - y ‖}^{2} .

This is a (one-half) squared-error cost function. Given a training set of $m$ examples, we then define the overall cost function to be:

J (W, b) = [\frac{1}{m} = [\frac{1}{m} \sum_{i = 1}^{m} J (W, b; x^{(i)}, y^{(i)})] + \frac{λ}{2} \sum_{l = 1}^{n_{l} - 1} \sum_{i = 1}^{s_{l}} \sum_{j = 1}^{s_{l + 1}} (W_{j i}^{(l)})^{2} = [\frac{1}{m} \sum_{i = 1}^{m} (\frac{1}{2} | h_{W, b} (x^{(i)}) - y^{(i)} |^{2})] + \frac{λ}{2} \sum_{l = 1}^{n_{l} - 1} \sum_{i = 1}^{s_{l}} \sum_{j = 1}^{s_{l + 1}} (W_{j i}^{(l)})^{2}

The first term in the definition of $J (W, b)$ is an average sum-of-squares error term. The second term is a regularization term (also called a weight decay term) that tends to decrease the magnitude of the weights, and helps prevent overfitting.

The weight decay parameter $λ$ controls the relative importance of the two terms. Note also the slightly overloaded notation: $J (W, b; x, y)$ is the squared error cost with respect to a single example; $J (W, b)$ is the overall cost function, which includes the weight decay term.

Our goal is to minimize $J (W, b)$ as a function of $W$ and $b$ . To train our neural network, we initialize all the network parameters to a small random value near zero and then apply an optimization algorithm such as batch gradient descent. Since $J (W, b)$ is a a non-convex function, gradient descent could converge to a local optima, however, in practice, it works fairly well.

One iteration of gradient descent updates the parameters $W, b$ as follows:

$W_{i j}^{(l)} = W_{i j}^{(l)} - α \frac{\partial}{\partial W_{i j}^{(l)}} J (W, b)$

$b_{i}^{(l)} = b_{i}^{(l)} - α \frac{\partial}{\partial b_{i}^{(l)}} J (W, b)$

The key component is the gradients which can be calculated using a back-propagation algorithm. The intuition behind the back-propagation algorithm is as follows. Given a training example $(x, y)$ , we will first run a "forward pass" to compute all the activations throughout the network, including the output value of the hypothesis $h_{W, b} (x)$ . Then, for each node $i$ in layer $l$ , we would like to compute an "error term" $δ_{i}^{(l)}$ that measures how much that node was "responsible" for any errors in our output. For an output node, we can directly measure the difference between the network's activation and the true target value, and use that to define $δ_{i}^{(n_{l})}$ (where layer $n_{l}$ is the output layer). For the hidden units, we compute $δ_{i}^{(l)}$ based on a weighted average of the error terms of the nodes that uses $a_{i}^{(l)}$ as an input.

To train our neural network, we can now repeatedly take steps of gradient descent to reduce our cost function $J (W, b)$

Unsupervised Learning

An autoencoder neural network is an unsupervised learning algorithm that applies back-propagation, setting the target values to be equal to the inputs. I.e., it uses $y^{(i)} = x^{(i)}$ .

The autoencoder tries to learn an approximation to the identity function. By limiting the number of hidden units, we can discover interesting structure about the data. Supposedly there are only 50 hidden units in layer $L_{2}$ for a 100 pixel input, the network is then forced to learn a compressed representation of the input. The algorithm would be able to discover some of the correlations in the input features.

When using a sparse autoencoder trained on 100 hidden units on 10x10 pixel inputs, we can get a lot of features that look like edges at different positions and orientations. When passing a new image through this neural network, edges that are similar to these features will set off the activations and send off "synapses" similar to the biological network. If enough activations are sent out, then the network would recognize the image as positive for the object of interest (such as face, numerical digit, etc.).

Conclusion

References

Ng, Andrew. Neural Networks Representation. 2012. Retrieved from http://cs.uky.edu/~jacobs/classes/2012_learning/lectures/neuralnets_ng.pdf.
Hebb, D.O. (1949). The Organization of Behavior. New York: Wiley and Sons.

@@ Line 35: / Line 35: @@
 Networks such as these are commonly called feedforward networks because the graph is a directed acyclic graph.
-===Learning===
+===Supervised Learning===
 Suppose we have a fixed training set <math>{(x^{(1)}, y{(1)}),...,(x^{(m)}, y{(m)})}</math> of <math>m</math> training examples.  We can train our neural network using batch gradient descent.  In detail, for a single training example <math>(x,y)</math>, we define the cost function with respect to that single example to be:
 :<math>
@@ Line 69: / Line 69: @@
 </math>
+The key component is the gradients which can be calculated using a back-propagation algorithm.  The intuition behind the back-propagation algorithm is as follows. Given a training example <math>(x,y)</math>, we will first run a "forward pass" to compute all the activations throughout the network, including the output value of the hypothesis <math>h_{W,b}(x)</math>.  Then, for each node <math>i</math> in layer <math>l</math>, we would like to compute an "error term" <math>\delta^{(l)}_i</math> that measures how much that node was "responsible" for any errors in our output. For an output node, we can directly measure the difference between the network's activation and the true target value, and use that to define <math>\delta^{(n_l)}_i</math> (where layer <math>n_l</math> is the output layer).  For the hidden units, we compute <math>\delta^{(l)}_i</math> based on a weighted average of the error terms of the nodes that uses <math>a^{(l)}_i</math> as an input.
+To train our neural network, we can now repeatedly take steps of gradient descent to reduce our cost function <math>J(W,b)</math>
+===Unsupervised Learning===
+An '''autoencoder''' neural network is an unsupervised learning algorithm that applies back-propagation,
+setting the target values to be equal to the inputs.  I.e., it uses <math>y^{(i)} = x^{(i)}</math>.
+[[Image: 400px-Autoencoder636.png|thumb|300px|Autoencoder]]
+The autoencoder tries to learn an approximation to the identity function.  By limiting the number of hidden units, we can discover interesting structure about the data.  Supposedly there are only 50 hidden units in layer <math>L_2</math> for a 100 pixel input, the network is then forced to learn a compressed representation of the input.  The algorithm would be able to discover some of the correlations in the input features.
+[[Image: 400px-ExampleSparseAutoencoderWeights.png|thumb|300px|Example Autoencoder features]]
+When using a sparse autoencoder trained on 100 hidden units on 10x10 pixel inputs, we can get a lot of features that look like edges at different positions and orientations.  When passing a new image through this neural network, edges that are similar to these features will set off the activations and send off "synapses" similar to the biological network.  If enough activations are sent out, then the network would recognize the image as positive for the object of interest (such as face, numerical digit, etc.).
 =Conclusion=

Neural Networks in Neuroscience and Computer Science: Difference between revisions

Revision as of 10:25, 6 June 2013

Contents

Introduction

Neural Networks in Neuroscience

History

Neuron connections

Receptive fields

Neural Networks in Computer Science (Artificial Intelligence)

Neural Network Models

Supervised Learning

Unsupervised Learning

Conclusion

References

Navigation menu

Neural Networks in Neuroscience and Computer Science: Difference between revisions

Revision as of 10:25, 6 June 2013

Introduction

Neural Networks in Neuroscience

History

Neuron connections

Receptive fields

Neural Networks in Computer Science (Artificial Intelligence)

Neural Network Models

Supervised Learning

Unsupervised Learning

Conclusion

References

Navigation menu

Search