Qnet 2000

Optical Character Recognition / Pattern Recognition

Construction

Layer Type Transfer Function Node Count
Input Normalized/Linear

64

Hidden Gaussian 10
Hidden Tanh 10
Hidden Sech 10
Output Sigmoid 10

Training

Training Data 57 cases
Test Data 11 cases
Convergence Time Fast
Learn Rate Control Off
Overtraining No

This example is designed to illustrate the effectiveness of using Qnet and backpropagation neural networks for pattern recognition. One of the most difficult and intriguing pattern recognition problems is optical character recognition. Optical character recognition (OCR) applications have proliferated greatly with the onset digital documents (PC scanners and fax's). OCR software allows the user to turn graphical images of text into editable documents. Neural networks are often used in OCR applications for the character recognition task and have been shown to greatly improve the translation accuracy over less sophisticated methods.

We will set up a small example to show how a neural network can be designed to recognize characters. The numbers 0 through 9 will make up the character set for this model. A full-featured OCR program would be designed to recognize a large set of alphanumeric characters. The characters in this example will consist of 8x8 bitmap images. This means that a total of 64 bits will be used to draw the image of each number (8 bits across by 8 bits down). Several different images (or font types) will be used for each number so that we can teach our neural network a variety of possible number types. For example, there is more than one way to draw the number “4” and we want the neural model to successfully handle the different possibilities. The training set, therefore, will consist of multiple bitmap images of the numbers 0-9. The test set for this problem will consist of number images slightly different from the numbers in the training set. This will enable us to determine how well the network has learned to generalize the differences between each number. If the network can only recognize character images that exactly match the ones in the training set, the model would not be useful in an OCR application that must process many different and imperfect character images.

Some of the bitmaps for numbers 1 through 5 are shown on the following page. The test cases used for numbers 1 through 5 are also shown. For each number, 64 inputs are generated for our neural model. Every bit that is turned on in a character’s bitmap pattern has a value of 1 and each bit that is off has a value of 0. An input array of 64 1’s and 0’s will make up each character used in the training (and test) set. Consequently, the network design for this model must consist of 64 nodes in the input layer. The output layer has been designed with 10 output nodes—one node for each of the characters we wish to recognize. When a number is recognized from a set of inputs, the network will output a 1 at the appropriate output node. For this model, when the number 1 is recognized the first output node responds with a 1 and all other output nodes respond with a 0. The second output node responds with 1 when the number 2 is recognized and so on. The hidden structure has 3 layers containing 10 nodes each. Data normalization is not necessary for this model since all input node data and training targets use a binary representation. The network is fully connected and utilizes a hybrid transfer function structure.

The training set is contained in the file Ocr.dat. The file contains 68 total patterns (58 used for training and 10 used for the test set). Data columns 2 through 11 contain the training targets and 12 through 75 contain the input node bitmap data. The Qnet network files OptCharRec.net (the untrained network) and OptCharRecTrained.net (the trained network) are available to run. Use NetGraph to visually analyze the quality of agreement between the model’s output response and the training targets.

The training results indicate that the Qnet neural model easily learned to correctly classify the bitmap images of the training set. All of the 10 test cases were also correctly recognized by the network. This indicates that the learning was generalized enough that it could classify non-learned, similar case types to an excellent degree of accuracy.

To build a character recognition model for a full-featured OCR application, the neural net model must be significantly more sophisticated than our sample shown here. There would likely be 100 or more output nodes to properly classify most of the common characters. Advanced capabilities like font type detection can be added. The 8x8 bitmap pattern used to represent a character in this example may be too small for resolving full character sets. More sophisticated methods of handling input bitmaps like using 1 or 2 bytes of bitmap data per input node should be considered to reduce network size and increase efficiency. If 10 to 20 different font types were used in training, along with imperfections in these fonts (i.e., slightly rotated or non-centered), we would likely have a training set with many thousands of patterns. To adequately learn all this information, a large complex hidden structure would be necessary. While a full OCR example is beyond the scope of our sample problems here, it is completely within the capabilities of Qnet to handle such modeling tasks.