The field of machine learning that deals with neural networks is called deep learning.
A neural network (NN) is a mathematical model for solving a machine learning problem. At the input, it accepts data — photos, text, videos, etc. — and then outputs the answer to the task. People were inspired by the biological model of the human brain when they came up with neural networks.
Neural networks are used in a variety of tasks, such as recognizing faces/people in a crowd, translating texts, predicting sales of an enterprise, filling in the background in video conferences, recognizing text in pictures — this list goes on indefinetly and it is expanding day by day.
You may have seen pictures of a neural network, like this one.
On it, each circle is a separate neuron. Columns of neurons are called layers.
To start working NN takes information to its input layer. The calculations that are necessary for it to work, it produces in hidden layers. There can be from one to several dozen of them. When the network has calculated the result, it is sent to the output layer.
Let’s analyze how the NN works on one of the simplest tasks — determining which number is shown in the picture.
Each image is a set of points (pixels) and in order to work with it, our network must be able to look at each point on it.
The layer of blue circles on the left will correspond to each individual pixel in the image. These neurones will take their values from the corresponding point in the image. For example, let’s take a black-and-white (just for simplicity) 32x32 image — this means that it has 32x32=1024 pixels, and therefore 1024 neurones should be in the input layer of the network.
On the right, the green color indicates the output neurons in which the network transmits the response to the task. Since we determine what number is shown in the picture, there will be 10 of them, one for each possible number.
The layers of black circles in the middle are simple neurons, their principle of operation will be discussed further.
In general, when working with an image, we try to recognize patterns on it that are peculiar to a certain category of images. For our case, the image category is all images that show the same number. For example, take the number 8. Conventionally, you can imagine that it consists of two circles, one at the top and the other at the bottom. When we see that the image has this circle at the bottom, we will definitely understand that most likely we are looking at either the number 8 or the number 6.
Thus, we divide the problem of image classification into subtasks-the search for characteristic features. Fragments can be either simple or more complex. For example, the lower circle at the number 8 may in turn consist of straight lines located at different angles.
In general, the NN solves the classification problem in this way — it recognizes patterns of different levels of abstraction(line- > circle). The simpler the object, the earlier hidden layers our network learns to see this abstraction. In the end, when the network decided that the picture shows the number 8, then it is out of all the neurons on the output layer, it will somehow highlight to us the neuron that corresponded to the class of the number 8 and thus give us the answer.
Let’s take a look at the work of a specific single neuron. Like the entire network, the neuron also receives the input signal(s), processes them, and outputs the result. A single neuron is the simplest possible element. It can have many inputs, but only one output.
Consider a neuron that has 3 input signals-x, y, and z. Suppose our neuron is already trained to solve a certain problem. This means that he already knows which input signals are most important to him, and which ones should be paid less attention to. In accordance with this, the neuron memorized the coefficients(weights) for each input signal, with which they should be taken into account in the process of solving the problem. In the example, we input three values to the neuron: x=5, y=1, and z=8. It multiplies the values obtained from the inputs with the appropriate coefficient (it takes into account the necessary weight) and adds them together. After that, it performs the necessary transformation over the received sum and outputs the result of this transformation as an output signal or response. What are the transformations and why they are needed will be discussed in another article about how neural networks are trained.
I will pay attention to one subtle point. You might not have noticed, but with the input of the neuron, we just passed the values of the signals that it will process. In our network, the only neurons that receive signals from the outside are the neurons from the input layer. In fact, this layer of neurons does not perform any calculations. It was so named only in order to have a common terminology. It just takes the input information, but all the layers after it process it.
From the previous example, it can be seen that each neuron in the network that processes information remembers its own numbers for the signals coming to it. There are a huge number of these values. When you see the news — “ Neural network with 30 million parameters”, they mean that the NN has 30 million numbers that it remembers.
To learn how to solve the problem for which a neural network is designed, it must be trained. Process of training NN is very important and interesting and therefore deserves a separate post.
I will describe the process of training the network in general terms. The network is served images with a pre-known class. For our task, this means that images are served, with a pre-known number on them. The NN looks at the input image, makes some kind of assumption based in fact on a very vital shot in the dark method. If she notices that she was wrong (that is, her answer did not match the correct one), then she tries to slightly correct the weights in the neurons, so that she is no longer wrong. This process is called neural network training, so the saying “smart people learn from other people’s mistakes” is clearly not applicable for NN.
I wrote all the time that some patterns/abstractions are being learned, and the further to the output layers, the more they resemble the desired objects. But if you think about it, then there is only the original image at the input. In it, each pixel is taken into account with some weight, and where can these dashes, circles, come from, if they may not be in the original image? In 2009, a group of engineers published an article that it is possible to take a separate class from the output layer of a neural network and generate for it an image (completely artificial) that will be almost a generalized representation for the image class.
Later, this study had many sequels, in particular, how to similarly visualize a specific neuron of interest to us, or even a layer in a trained neural network. If you are interested, here is a link to one of the articles on this topic. Interestingly, a separate type of contemporary art was subsequently formed, in which such images are presented. The article turned out to be a little voluminous, but the stated topic is also large-scale. Thanks for your attention!
I hope it was interesting and informative for you.
If you have any questions or have any suggestions please contact me on https://www.linkedin.com/in/nikita-sidorov/