Qnet 2000

Learning Rates and Learn Rate Control

The backpropagation training paradigm uses two controllable factors that affect the algorithm’s rate of learning. To optimize the rate at which a network learns, these factors must be set and/or adjusted properly during the training process. The two factors are the learning rate coefficient, eta, and the momentum factor, alpha. The valid range for both eta and alpha is between 0 and 1. Higher values adjust node weights in greater increments, increasing the rate at which the network attempts to converge, while lower values decrease the rate of learning. Just as there are limits to how fast a brain can learn ideas and concepts, there are also limits to the rate at which a network can learn. If a network is forced to learn at a rate that is too fast, instabilities develop that can lead to training divergence.

The learn rate coefficient can be controlled manually during training or Qnet can control it automatically using its Learn Rate Control (LRC) feature. LRC will drive eta higher or lower in a systematic fashion depending on the current learning activity. If the network appears to be learning at a relatively slow rate, eta is driven up quickly. Conversely, if the network is learning at a fast pace, Qnet will hold eta constant or even lower it to avoid instabilities. If at any time the network shows signs of instability (seen as oscillations in the training error), eta is lowered to damp the instabilities. Damping instabilities is critical to preventing training divergence. The LRC feature can be turned on and off interactively during the training process, and it can be activated at setup time by specifying the iteration number that LRC will start (for new networks should wait several hundred iterations prior to turning learn rate control on.)

The LRC system will also interact with the eta min and max values. LRC analyzes the stability ceiling during training and will adjust the eta channel as necessary to promote stable training. If at anytime you wish to take manual control over eta, simply toggle LRC off and set learn rates as desired. It is also wise to save your network prior to making any changes in the learn rate (or learn rate control). This will prevent any loss of training should a divergence occur.

LRC concerns itself only with control of eta. Usually, little or no interaction is required with the momentum factor. The momentum factor damps high frequency weight changes and helps with overall algorithm stability, while still promoting fast learning. For the majority of networks, alpha can be set in the 0.8 to 0.9 range and left there. However, there is no definitive rule regarding alpha. Some networks may train better with alpha values set at a lower level. Some networks train well with no alpha term used at all (set to 0). Most neural modelers prefer to use higher momentum values, since the damping effect usually helps training characteristics. If training problems occur with a given alpha value, it may be useful to experiment with different values. Alpha can be changed interactively at any time during the training process with Qnet.

Note: For the many network designs and data models, LRC is an effective tool that accelerates learning and prevents divergence. If a model exhibits poor learning characteristics with LRC active (i.e. training divergences, instabilities), simply turn LRC off (Options menu of the training window). Take manual control of eta and set it to a value low enough for stable, sustained learning. In addition, when all training cases are not used in the weight update cycle (Patterns per Weight Update Cycle), LRC should be off. Error descent in this case can be rather noisy and training characteristics can be adversely affected by varying the learn rate.