Qnet 2000

Training Analysis Tools

Many tools are available for analyzing the training process. Tools are available for overtraining analysis, interrogating the quality of the model and for examining essential model development issues. Available tools to graph and analyze models during training include:

RMS Error History plot: The training error history can be monitored to determine the rate of network learning and it can be used to determine when learning has reached its maximum level. Other interesting information can be derived from the training and test set error history plots. It is common to find long “plateaus” in the error level where no significant learning takes place. This behavior is particularly common when multiple hidden layers are being employed and indicates that the network is trying to “figure out” certain input/output relationships at a given layer. Plateaus are often followed by steep descents in the training error, yielding accelerated periods of learning. It is important that “plateau” conditions are not mistaken for a converged network. Another common feature in error history plots is minor oscillations representing training instabilities. Training instabilities are quickly damped by the LRC feature, if active. When oscillations occur frequently, consider augmenting LRC by setting a maximum eta that LRC should not exceed.

Correlation History plot: The correlation coefficient measures how well the network predictions trend with the targets in the training set. The range of the correlation coefficient is from -1 to 1. The closer the coefficient is to 1, the more accurate the predictions. The closer to 0 (or below), the less accurate and more random the predictions become. This plot often trends opposite the RMS error; the correlation increases as RMS error decreases. It can, however, be more informative because it uses an absolute scale to better quantify the agreement (1 is perfect linear correlation, 0 is random). The extreme targets and predictions are the most heavily weighted in the calculation of the correlation coefficient. In models where accuracy should be optimized for extreme predictions (some financial forecasting models for example), the use of correlation may be desired over RMS error as the best tool for determining optimal training conditions.

Tolerance History plot: The tolerance history produces the percentage of training set predictions that fall within the user’s pre-defined tolerance of the targets. This can be beneficial when a pre-defined accuracy must be achieved by the model. The history also helps to better quantify when additional training is not producing tangible or measurable improvements in the training set agreement.

Test RMS Error plot: This plot is used for overtraining analysis and helps determine how well the network generalizes learned information. This plot depicts how well the network predicts cases not used in the training process.

Test Correlation History plot: The correlation coefficient measures how well the network predictions trend with the targets for the cases outside the training set. The correlation range is between -1 to 1. As with the RMS error of the test set, the correlation history can be used to determine overtraining. While the correlation and RMS errors often trend together, they can give slightly different indications as to the onset of overtraining.

Test Tolerance History plot:The tolerance history produces the percentage of test set predictions that fall within the user’s pre-defined tolerance of the targets. As with the test set RMS error and the correlation coefficient, this percentage can also be used to help determine where generalized learning is optimal and overtraining has begun.

Learning Rate History plot: The learn rate (eta) history can be viewed to determine how Qnet's Learn Rate Control has adjusted eta during training. Use this option to determine what eta limits should be applied to Qnet's LRC option to help avoid training instabilities.

Targets/Network Outputs plots: Qnet provides three separate plot formats for viewing training targets and network outputs. The quickest overview plot is the "Targets vs. Network Outputs" plot. This plot displays network predictions vs. the targets for all output nodes on a single plot (all data remains normalized so that all output nodes will share a common scale). The closer the points fall on the plotted “X=Y” line, the better the overall agreement for the model. Training and test set points are plotted with different symbols. Network predictions and targets may also be compared separately for each output node. In this case, "Targets/Net Outputs vs. Pattern Sequence", the information is plotted versus the input sequence number. Up to three separate curves will be shown: the training targets, the training set network responses and the test set network responses. The test and training set predictions can be distinguished by different colored curves or symbols. This plot format offers a detailed view of the agreement between output predictions and the training targets. A final format shows the error between the network predictions and targets plotted versus the training pattern sequence number (separate plot for each output node).

Input Node plots: The input node data is displayed versus the training pattern sequence number (separate plot for each input node). This plot format can be used to scan the input node sets for possible data anomalies. It is recommended that input node plots be reviewed at some point during training to scan the inputs for bad data.

Input Interrogator: After a network is near its fully trained state, it is often useful to determine what inputs are important to a network’s output response. The Input Interrogator will plot (or list under the Info menu), for each output node, the relative importance of each input on that particular output. Sensitivities are determined by cycling each input for all training patterns and computing the effect on the network’s output response. This plot helps to determine what the key inputs for the model are and which are not effective in formulating output predictions. Note: this sensitivity study assumes that each input value is independent of all other inputs. For models where this is not true, some caution should be used when interpreting the results.

Input Color Contours: Choose two inputs and visualize their contribution in formulating an output. Full color contours are produced depicting the influence of the inputs on the selected output.

Node Analyzer: The Node Analyzer plot (or list under the Info menu) helps determine how the hidden nodes are being utilized by the network. For networks that are over designed in the hidden layer structure, many nodes may contribute little or nothing to the output response. For each hidden layer in the network, a plot will be generated comparing the relative strengths of all output connections for that layer. The plot shows the nodes’ percent contribution to that layer’s output signals over all training patterns. If there are many nodes in a layer that are showing limited contributions, then that layer may be specified with too many nodes. Likewise, if all nodes show strong contributions, it is possible adding extra nodes will help the model.

Network Information: View detailed network information of a model’s construction and training state. This information is particularly useful if you are sorting through several network designs. Prior to terminating training, printing this record will provide details that will be pertinent in comparing the current model with other model constructions. Having a detailed record for each model attempted will offer a quick way of comparing the results between models.

Statistics:The statistics option is used during training to statistically compare network predictions with training targets. For each output node and for both training and test sets, standard deviation, error bias, maximum error and the correlation coefficient is computed and displayed. The standard deviation between the predictions and targets assumes a Gaussian distribution exists in the prediction error. The bias value measures any shift between predictions and targets. This indicates whether the prediction is systematically high or low. If the error in the predictions is a true Gaussian distribution, than the bias value should approach zero. The correlation coefficient is a statistical measure of how well the predictions agree with the targets. A value of 1 indicates perfect correlation. Values close to 0 (or below for this analysis) indicate that little or no correlation exists between network predictions and targets. As with all statistical computations, validity of the calculations increases with sample size.

Tolerance Checking: The tolerance checking option is useful in determining the number of network predictions that fall within the selected tolerance from the training targets. This option displays the actual "correct" and "wrong" counts for the tolerance percentage tracked during training.

Threshold Checking: The threshold checking option is beneficial for analyzing network models that generate an up or down prediction (for example, a financial model that is predicting percent gains and losses). A threshold value is specified and only network predictions that exceed the threshold (+ or -) are counted “Correct” or “Wrong”. “Correct” and “Wrong” is determined by the direction change of the actual target. If both the prediction and target move in the same direction (i.e. have the same sign) and the prediction exceeds the threshold, a “Correct” case would be counted. “Wrong” cases occur when the predicted direction is not correct. If the network prediction does not exceed the threshold, the case is ignored. By analyzing results of several threshold values, one can determine the point that the network model begins to yield reliable predictions. For models that do not have this type of output format, this option will not provides no useful information.

Divergence Check: If a divergence occurs during training, the diverged output node will produce all 0’s or 1’s for sigmoid or hyperbolic tangent transfer functions at the output layer and all 0’s for gaussian or hyperbolic secant transfer functions (normalized responses). This check scans the network output nodes for this behavior. The first output node found exhibiting this characteristic will be indicated. Additional plotting and analysis of the node will verify the diverged condition.

Qnet’s rich set of real-time analysis tools allows the user to perform thorough and detailed model analysis. All tools are menu selectable during training making them accessible at any time. Using these tools during the training process will improve your ability to train and identify good network models.