Motivation for Convolutional Neural Networks
Finding good internal representations of images objects and features has been the main goal since the beginning of computer vision. Therefore many tools have been invented to deal with images. Many of these are based on a mathematical operation, called convolution. Even when Neural Networks are used to process images, convolution remains the core operation.
Convolutional Neural Networks finally take the advantages of Neural Networks (link to Neural Networks) in general and goes even further to deal with two-dimensional data. Thus, the training parameters are elements of two-dimensional filters. As a result of applying a filter to an image a feature map is created which contains information about how well the patch corresponds to the related position in the image.
Additionally, convolution connects perceptrons locally. Because features always belong to their spatial position in the image, there is no need to fully connect each stage with each other. Convolving preserves information about the surrounding perceptrons and processes them according to their corresponding weights. In each stage, the data is additionally processed by a non-linearity and a rectification. In the end, pooling subsamples each layer.
Deep learning finally leads to multiple trainable stages, so that the internal representation is structured hierarchically. Especially for images, it turned out that such a representation is very powerful. Low-level stages are used to detected primary edges. High-level stages lastly connect information on where and how objects are positioned regarding the scene.
Figure 1: Typical convolutional neural network with two feature stages [2].
After introducing relevant basics in image processing and discrete convolution, the typical layers of convolutional neural networks are regarded more precisly.
Layers
A typical convolutional neural network is composed of multiple stages. Each of them takes a volume of feature maps as an input and provides a new feature map, henceforth called activation volume. The stages are consecutive separated in three layers: A convolutional layer, a ReLU layer and a pooling layer. The fully-connected layer finally maps the last activation volume onto a class of probability distributions at the output.
The following chapters will provide an overview regarding the structure and the tasks of each layer.
Literature
[1] Y. Ma, S. Soatt, J. Kosecka, S.S. Sastry. An Invitation to 3-D Vision: From Images to Geometric Models. Springer New York, 2005.
[2] Y. LeCun, K. Kavukvuoglu, and C. Farabet. Convolutional networks and applications in vision. In Circuits and Systems, International Symposium on, pages 253–256, 2010.
[3] Y. LeCun, Y. Bengio, G. Hilton. Deep Learning. Nature 251, pages 436-444, May 2015.
Weblinks
[1] Understanding Convolution in Deep Learning (March, 2015,
[2] How do Convolutional Neural Networks work? (August, 2016, Brandon Rohrer)
[3] Deep Learning Tutorial (2015, Yoshua Bengio)
2 Kommentare
Unbekannter Benutzer (ga25dis) sagt:
27. Januar 2017Hi Simon,
here are my suggestions to your text:
- Information is clear in general
- References are still missing
- Try to use shorter and less complex sentences, to make the text easier to understand
- In Section Layers:
- The following layers (Not finished yet?)
- Remove “//” ?
Correction suggestions:
- Since the beginning of Visual Science the key is to achieve a good internal representation.
- Finding good internal representations of images, objects and features has been the main goal since the beginning of visual science.
- Therefore many tools has been invented to deal with images
- Therefore many tools have been invented to deal with images
- Convolutional Neural Networks finally takes the advantages of Neural Networks in general ( ref) and goes even further to deal with two-dimensional data.
- Convolutional Neural Networks finally take the advantages of Neural Networks in general ( ref) and go even further in order to deal with two-dimensional data.
- Thus, the training parameters are elements of two-dimension filters
- Thus, the training parameters are elements of two-dimensional filters
- Convolving includes information about the surrounding perceptrons and process them according their corresponding weights.
- Convolving preserves information about the surrounding perceptrons and processes them according to their corresponding weights.
- Thus, in low-level stages primary edges are detected and lastly high-level stages connect information how objects are placed with regard to the scene
- Low-level stages are used to detected primary edges. High-level stages lastly connect information on where and how objects are positioned regarding the scene.
- Therefore, the resulting feature map provides information how well the local filter fits to the patch.
- Therefore, the resulting feature map provides information on how well the local filter fits to the patch.
- At the output finally the fully-connected layer map the last activation volume into a class of probability distribution.
- The fully-connected layer finally maps the last activation volume onto a class of probability distributions at the output.
- As a result of convolution in a neural networks, the image is split into perceptrons, creating local receptive fields and finally compressing the perceptrons in feature maps of size .
- As a result of convolution in neural networks, the image is split into perceptrons, creating local receptive fields and finally compressing the perceptrons in feature maps of size .
- The result of staging these convolutional layer in conjunction with the following layers is that the information of the image is classified like in vision.
- The result of staging these convolutional layers in conjunction with the following layers is that the information of the image is classified like in vision.
Unbekannter Benutzer (ga46yar) sagt:
29. Januar 2017Apart from the stuff Florian suggested, which I agree on, I would:
use "computer vision" instead of "visual science", which may also include biology, optics etc.
"more precisely" instead of "more precise".
Apart from that, I can't find anything further, that hasn't been said already