Qnet 2000

Backpropagation Technical Overview

This overview is intended to provide both general information on backpropagation theory and some specific details of Qnet's modeling techniques. While an understanding of specific theoretical details is not required for Qnet, it can provide the initiated user with a more complete overview of Qnet's internal operation.

Qnet backpropagation neural networks are multi-layered and feedforward (connections must connect to the next layer) in design. Networks can be fully connected or connections can be removed individually. Removed connections are modeled in Qnet by explicitly setting the connection's receiving weight to 0. This removes the effect of that individual connection on the network's response.

New networks have randomly initialized weight values. Each time an initialization is performed a network state will be created that is completely unique. This leads to the possibility that identical training runs with newly initialized networks may exhibit different learning characteristics. However, the converged states of two such training runs will be nearly identical for the vast majority of cases.

Backpropagation training is accomplished using the following logic sequence (NOTE: Vectors presented in Italics):

1. Input patterns are stored in an array of input vectors, X (P,1) = (Xp,1, Xp,2, Xp,3, ..., Xp,N), where P is the pattern sequence number and N is the vector length (equal to the number of input nodes). These vectors are unaltered by the input layer and are output to the nodes in the first hidden layer. (NOTE: The vector elements must be pre-normalized between 0 and 1 either by Qnet or the user)

2. Each node of a given hidden or output layer receives an identical input vector, X, from the preceding layer. Each node processes the vector internally through the equation:

Y(P,L,J) = X(P,L-1) W(J,L) + B(J,L)

where Y(P,L,J) is the processed result for node J in layer L (i.e. the input layer is layer 1, the first hidden layer is layer 2, etc.) The dot product is taken between the node's input vector, X(P,L-1), and the node's internal weight vector, W(J,L), and summed with the bias value, B(J,L).

3. The resulting value, Y(P,L,J), for node J is then processed through a transfer function to determine the signal strength for the node's output connection. The transfer function used by Qnet is the sigmoid function, f(Y) = 1/(1+exp(-Y)), the gaussian function, f(Y) = exp(-Y*Y) ; the hyperbolic tangent, f(Y) = (tanh(Y) + 1)/2; or the hyperbolic secant function, f(Y) = sech(Y). This function serves to normalize the output of a node between 0 and 1 and is continuous in form (the first derivative must exist for backpropagation training).

4. Each node's output value is combined in the current hidden or output layer to form the layer's output vector:

X(P,L) = (f(Y(P,L,1),f(Y(P,L,2),...,f(Y(P,L,K)))

where K is the total number of nodes in layer L. This output vector becomes the input vector to the next layer.

5. Processing proceeds to the output layer where the final output vector, X(P,O) is obtained. (In recall mode processing ends at this point.)

6. The final output vector is combined with the training target vector, T(P), to obtain the output layer's error vector, E(P). The equation governing the computation of the error vector is:

E(P,O) = (T(P) - X(P,O)) X'(P,O)

where:

X'(P,O) = (f'(Y(P,O,1),f'(Y(P,O,2),...,f'(Y(P,O,K))) (note: f’ is the first derivative of the transfer function f).

The error for node J in hidden layer L is computed by the equation:

E(P,L,J) = X'(P,L,J) * SUMK(E(P,L+1,K) * W(K,L+1,J)).

where K represents the Kth node in layer L+1. Through this method, the error vector, E(P,L), is obtained for each hidden layer. Note that this equation causes the errors to be backpropagated through the network (thus the name for the paradigm).

7. Next the weight vectors for each node must be updated. The new weights for node J in layer L (output and hidden) are computed by:

W(J,L) T+1 = W(J,L) T + (eta) SUMP(E(P,L,J) X(P,L-1)) + (alpha) ( W(J,L) T - W(J,L) T-1).

where (eta) is the learning rate, (alpha) is the momentum factor and T is the iteration cycle. Note that the weight change computed from the previous weight update cycle is multiplied by the momentum factor. The momentum term helps to keep the training process stable by damping weight change oscillations.

8. All input vectors (patterns) are processed through the network to adjust the weights for a given iteration.

9. The RMS error between the network response and the training targets is computed by Qnet after each iteration. Its equation is given by:

RMS Error = SQRT( SUMP,K((T(P,K) - X(P,O,K))^2)/(PT*KT))

where P is the Pth input pattern and K is the Kth output node. PT is the total number of patterns and KT is the total number of output nodes. The RMS error is also equivalent to the standard deviation of the error in the network's response.

10. If Learn Rate Control is active for the run, a new learning rate, (eta), is computed by Qnet based on the change in the RMS Error value.

11. The entire process cycles again with next training iteration.

The FAST-Prop method used in Qnet differs by the weight update algorithm. The modified to the form of this equation becomes:

W(J,L) T+1 = W(J,L) T + (eta) E(P,L,J) (X(P,L-1)+(fp) E(P,L-1)) + (alpha) ( W(J,L) T - W(J,L) T-1)

where (fp) is the FAST-Prop coefficient.