In this article I `ll not stop on description of how works neural networks, because main purpose of this text is ti explain why they works so well in some domains.
Like in many tasks in machine learning what model to use depends on your data. What’s why it is better to explain when neural networks are appropriate to use and can boost your performance. First of all these are tasks that process graphic information — images and videos. Since 2015, in this field human copes worser than computer with the tasks of classifying or determining what is represented on images. Another field that comes to mind is everything related to audio and text. Those of you who used to translate via web translators probably noticed that nowadays translations become literary and as reliable as as possible, especially if you remember PROMT. A qualitative leap it has carried out in 2018, when Google and a little later and Yandex switched to using neural networks of a special type for translation of text from one language to another.
What unites these areas and why classical machine learning models are not suitable?
By the way, it is more correct to say that ML approaches are applicable, but they do not sometimes give such good results, comparable to the results of a person in such tasks. Thus we come to the answer to the first part of the question. The concept of using classical machine learning involves working with data and creating additional features so that the algorithm can correctly process them and use them for more accurate predictions. In the tasks stated earlier, it is often difficult to generate new meaningful features, but neural networks while training independently build complex representations of data within themselves, which they themselves are guided by when solving the problem.
Why do NN work and do it so well? The main key to their success, as someone believe, is the ability to represent and decompose a task into sequential blocks that can be performed in parallel due to the depth of the network(a sequence of different layers of neurons). Moving from one layer to the next, the network learns different representations and allows you to look at the original data from a new angle, revealing new(increasingly abstract) features in the data, and the abstractions built by the network at the early stages may be completely incomprehensible, and the representations at later layers are drawn into something more meaningful and recognizable. A good example of the work of a neural network is the creation of intermediate representations that the model learns to recognize as part of solving a global problem (a picture with filters). Interpreting and visualizing a neural network is a complex thing, and sometimes impossible. This is why deep learning is so well suited in tasks that work with poorly/complexly structured information. And if you think about the task of working with text information, then for example, you can give an example that a neural network, translating from one language to another, independently learns to recognize which word in the original sentence corresponds to the word in the translated sentence.
Why do these approaches work so well? The main key to their success, as some believe, is the ability, due to the depth of the network, to represent and decompose the task into sequential blocks that can be performed in parallel. Since the number of neurons in each layer is sometimes in the hundreds, then the GPU comes to our rescue, which allow us to vectorize calculations and parallelize data as much as possible, and thereby significantly speed up the speed of calculations.
In the following articles, I will write in more detail about neural network learning algorithms, the key points that engineers encounter when working with them, but which we sometimes do not think about, and analyze the most important structural components of deep learning.
I hope it was interesting and informative for you.
If you have any questions or have any suggestions please contact me on https://www.linkedin.com/in/nikita-sidorov/