Qnet 2000

Hidden Processing Layers

In the loan application example it is clear that determining the number of input and output nodes is trivial once the data model has been formulated. Choosing the number of hidden layers and the number of hidden nodes in each layer is not so trivial. The construction of the hidden processing structure of the network is arbitrary. While there is normally a large envelope of hidden layer constructions that yield like results, the importance of selecting an adequate hidden structure should not be underestimated. Many factors play a part in determining what the optimal configuration should be. These factors include the quantity of training patterns, the number of input and output nodes and the relationships between the input and output data.. It may often be tempting to construct a network with many hidden layers and processing units—falling into “the bigger the brain the better the model” trap. This philosophy can easily result in a poorly performing model. When a network’s hidden processing structure is too large and complex for the model being developed, the network may tend to memorize input and output sets rather than learn relationships between them. Such a network may train well but test poorly when presented with inputs outside the training set. In addition, network training time will significantly increase when a network is unnecessarily large and complex. The concept of memorization learning versus cognizant or generalized learning will be explained in detail in chapter 9. Generally, it is best to start with simple network designs that use relatively few hidden layers and processing nodes. If the degree of learning is not sufficient, or certain trends and relationships cannot be grasped, the network complexity can be increased in an attempt to improve learning. A plausible starting point for the loan application model would be to use 2 hidden layers with 3 to 4 nodes per layer. If this design does not train sufficiently, the size and complexity of the hidden structure can be increased. For this problem, memorization would not be likely due to the relatively large number of training patterns (5000).

It has been demonstrated theoretically that for a given network design with multiple hidden layers, there will always exist a design with a single hidden layer that will learn at an equivalent level. However, in practice, it is usually better to employ multiple hidden layers for solving complex problems. To adequately model a complex problem, a single hidden layer design may require a substantial increase in the number of hidden nodes compared to a 3, 4 or 5 hidden layer construction. In simple terms, a single hidden layer design with 10 nodes may not learn and perform as well as a network with two hidden layers containing 5 nodes each. Multi-hidden layer networks tend to grasp complex concepts more easily than networks with one layer. One reason for this is that the multi-hidden layer construction creates an increased cross-factoring of information and relationships. Thus, a network’s learning ability is controlled by both the total number of hidden layers and the total number of hidden nodes.

Qnet allows up to 8 hidden layers (experience has shown that the vast majority of problems will work fine with 4 or less hidden layers). The number of nodes per layer in Qnet is limited only by the memory available and practical limits in processing speed. Expect the practical limits to be closely tied to your processor speed and memory capacities.

The design of the hidden processing structure is specified in the network design dialog