Recursive Neural Networks (RNNs) are used to model and predict variables that evolve over time. Unlike standard neural networks, they have the ability to remember the context they are in. A traditional neural network assumes that all inputs (and outputs) are independent of each other, while RNNs perform the same task for each element of the sequence, and the output depends on previous calculations. Many practical problems, such as recognition and processing of text, speech, images, are solved more efficiently if they take into account previous states of variables. It is this dependence of the current state on previous states that recurrent networks attempt to reproduce. For this purpose, loops are introduced into their structure. Thus, certain parts of the network are repeated. In other words, RNNs contain feedback, that is, connections that provide output signals located in the downstream layers of the neural network to neurons located in the input layer or in the closer hidden layers. This makes it possible to perform more complex calculations than networks with only unidirectional information transfer.
The recursiveness of RNNs boils down to performing the same task for each element of the sequence. Intuition aside, one can think of an RNN as a being that has a memory storing information about past calculations.
Figure 1. Hopfield neural network diagram
Source: Basic neural network models
The RNN concept is not new, the recursive architecture was already proposed in the 1980s by Hopfield (a schematic of its design is shown in Figure 1). The structure of a Hopfield network can be described as an arrangement of many identical elements connected to each other on a one-to-one basis. The introduction of feedback loops results in multiple excitation of neurons in a single cycle. Hopfield’s work contributed to the revival of neural networks research, which, after a period of tumultuous development in the 1950s and 1960s, was abruptly halted in the early 1970s. The aforementioned resurgence involved accelerated research and development of RNNs.
It turned out that while RNNs have memory, it is a rather short memory. Theoretically, neurons with recurrent connections are potentially able to model relationships occurring with arbitrarily long intervals, but practice does not confirm this. The RNN’s short memory is the result of the gradual fading of the first inputs, with the result that, after a certain time, the RNN state contains virtually no traces of the initial inputs. From a formal point of view, the short memory of RNNs is associated with a fading gradient for time-distant relationships, with the fading occurring exponentially with time. This is not the only problem, as RNNs often struggle with learning instability, often equated with overtraining, and also referred to as an exploding gradient. The latter problem can be solved by using gradient pruning, which is the restriction of gradient values, and regularization, which is the penalty for over-complexity of the estimated model. The problem of gradient fading, on the other hand, is generally solved by introducing different types of cells with long-term memory.
The Long short-term memory (LSTM) network is based on just such a solution. They were proposed by Hochreiter and Schimdhuber in 1997. In addition to standard units, the LSTM also has three types of special gates to control the stored data (Figure 2):
Figure 2. LSTM neuron architecture
The addition of gates has significantly improved the efficiency of recurrent networks. RNNs with LSTM units allowed, among other things:
Areas of further application are limited only by human imagination.
This article was written thanks to the funds from the European Union’s co-financing of the Operational Program Intelligent Development 2014-2020, a project implemented under the competition of the National Center for Research and Development: under the “Fast Track” competition for micro, small and medium-sized entrepreneurs – competition for projects from less developed regions under Measure 1.1: R&D projects of enterprises Sub-measure 1.1.1 Industrial research and development work carried out by enterprises. Project title: “Developing software to improve forecast accuracy and inventory optimization from the perspective of customer and supplier collaborating in the supply chain using fuzzy deep neural networks.
Artificial intelligence (AI) is the ability of machines to exhibit human skills such as learning, inference and recommending solutions. Artificial intelligence enables associations to be made […]
The main feature of the model based on the so-called. “ordering point,” also known as an information-level ordering system or continuous review , is a condition […]