Open main menu

CDOT Wiki β

Changes

DPS921/PyTorch: Convolutional Neural Networks

3,024 bytes added, 23:36, 29 November 2020
Introduction to Neural Networks
Even with the above infographic, there is still probably some confusion with what distinguishes Deep Learning from Machine Learning. If you are confused, it is an important reminder that everything it says about Machine Learning also applies to Deep Learning, “enable machines to improve with experience.” The main difference is that conventional Machine Learning algorithms require manual intervention in the area of feature extraction, while Deep Learning algorithms do it themselves. See the infographic below:
[[File:Figure2.jpg]]
What this means is that Deep Learning algorithms have the added advantage that they can be setup randomly, with random weights and biases. And as along as we tell it what we want the output to be, let’s say a car, it will find out the best way to distinguish whether any given input is in indeed a car. In other words, it can teach itself. On the other hand, with traditional machine learning, the programmer would have to tell the algorithm what “features” to look for when determining if something was a car.
This type of neural network is probably the simplest and easiest to understand. The name Neural Network comes from the fact that their architecture is vaguely modelled off neurons in the brain. They can be activated like a neuron and they often linked with other neurons. The analogy is considered misleading, so I’ll stop it there. It’s better think of these artificial neurons as bits that can hold a value from zero to one, instead of only zero or one. For our purposes we want classify an image. So, the neural network starts by associating each pixel with a value, like previously mentioned, from zero to one. One represents a fully colored in pixel, while zero represents an empty pixel. See below:
[[File:Figure3.png]]
 
All these pixel neurons make up the first input layer. To get from a series of values to the final output decision, requires intervening layers. These “hidden” layers do a lot of the actual computation, and their job is to extract features from a given image of a number. These features are then used to determine if the image represents, for example, a nine.
 
[[File:Figure4.png]]
 
Each one of these hidden layer neurons, also called Perceptrons, is tasked with activating if it finds its corresponding feature. A programmer can initially set these features or set random features, but ultimately the ANN will come up with its own rules. It does this through constantly evolving weights and biases, which are stored in its Perceptrons. See the figure below:
 
[[File:Figure5.gif]]
 
Composed of weights and a bias, a Perceptron gives each input a separate weight. It then takes the sum of all those values. In this case, the sum is w1x0.7 + w2x0.6 + w3x1.4. A bias is then added to the weighted sum to determine if the value is worthy enough for activation. The activation stage takes the result of the last step and squishes it into a number between zero and one. A value close to one suggests high activation; while zero suggests no activation. These activation values are often fed into a subsequent layer of Perceptrons.
 
Finally, the Perceptrons spit out confidence values for each possible number. For example, it might identity a number as a two with 90% confidence.
 
[[File:Figure6.png]]
 
=== Back Propagation ===
As you might imagine, the ANN is unlikely to get it right the first time. In fact, it will undoubtedly get it wrong, horribly wrong! It improves itself by adjusting those weights and biases mentioned earlier. In order to do this, it must be trained with tons of example numbers, as well as a cheat sheet to check its answers. How far the ANN’s final answer is from the correct answer is called the cost. Once a cost is determined, the weights and biases that make up the ANN are adjusted to minimise this cost. That’s a lot of math I summed up in one sentence. The algorithm that does this math is called Back Propagation, and it’s how an ANN learns. It’s called that because it works backwards from what it wants the output to be, down the hidden layers. This is extremely computationally intensive because it usually has tens of thousands of answers to work backwards from.
 
What is commonly done instead is that the training data is split into batches and the back-prop algorithm is performed on each batch. This is not as accurate as performing it on the entire training set, but is good enough for the increase in performance. As you might imagine, each batch has an opportunity to be parallelized.
 
=== CNN ===
Convolutional Neural Networks are faster and more accurate at image recognition than standard ANNs. They achieve this by focusing on spatial features such as ears, nose, mouth, and by ignoring irrelevant data.
 
[[File:Figure7.gif]]
== Implementation of a Neural Network ==
56
edits