Qnet 2000 |
Optical Character Recognition / Pattern Recognition |
||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||
This example is designed to illustrate the effectiveness of using Qnet and
backpropagation neural networks for pattern recognition. One of the most difficult and
intriguing pattern recognition problems is optical character recognition. Optical
character recognition (OCR) applications have proliferated greatly with the onset digital
documents (PC scanners and fax's). OCR software allows the user to turn graphical images
of text into editable documents. Neural networks are often used in OCR applications for
the character recognition task and have been shown to greatly improve the translation
accuracy over less sophisticated methods.
We will set up a small example to show how a neural network can be designed to recognize
characters. The numbers 0 through 9 will make up the character set for this model. A
full-featured OCR program would be designed to recognize a large set of alphanumeric
characters. The characters in this example will consist of 8x8 bitmap images. This means
that a total of 64 bits will be used to draw the image of each number (8 bits across by 8
bits down). Several different images (or font types) will be used for each number so that
we can teach our neural network a variety of possible number types. For example, there is
more than one way to draw the number 4 and we want the neural model to
successfully handle the different possibilities. The training set, therefore, will consist
of multiple bitmap images of the numbers 0-9. The test set for this problem will consist
of number images slightly different from the numbers in the training set. This will enable
us to determine how well the network has learned to generalize the differences between
each number. If the network can only recognize character images that exactly match the
ones in the training set, the model would not be useful in an OCR application that must
process many different and imperfect character images.
Some of the bitmaps for numbers 1 through 5 are shown on the following page. The test
cases used for numbers 1 through 5 are also shown. For each number, 64 inputs are
generated for our neural model. Every bit that is turned on in a characters bitmap
pattern has a value of 1 and each bit that is off has a value of 0. An input array of 64
1s and 0s will make up each character used in the training (and test) set.
Consequently, the network design for this model must consist of 64 nodes in the input
layer. The output layer has been designed with 10 output nodesone node for each of
the characters we wish to recognize. When a number is recognized from a set of inputs, the
network will output a 1 at the appropriate output node. For this model, when the number 1
is recognized the first output node responds with a 1 and all other output nodes respond
with a 0. The second output node responds with 1 when the number 2 is recognized and so
on. The hidden structure has 3 layers containing 10 nodes each. Data normalization is not
necessary for this model since all input node data and training targets use a binary
representation. The network is fully connected and utilizes a hybrid transfer function
structure.
The training set is contained in the file Ocr.dat. The file contains 68 total patterns (58
used for training and 10 used for the test set). Data columns 2 through 11 contain the
training targets and 12 through 75 contain the input node bitmap data. The Qnet network
files OptCharRec.net (the untrained network) and OptCharRecTrained.net (the trained
network) are available to run. Use NetGraph to visually analyze the quality of agreement
between the models output response and the training targets.
The training results indicate that the Qnet neural model easily learned to correctly
classify the bitmap images of the training set. All of the 10 test cases were also
correctly recognized by the network. This indicates that the learning was generalized
enough that it could classify non-learned, similar case types to an excellent degree of
accuracy.
To build a character recognition model for a full-featured OCR application, the neural net
model must be significantly more sophisticated than our sample shown here. There would
likely be 100 or more output nodes to properly classify most of the common characters.
Advanced capabilities like font type detection can be added. The 8x8 bitmap pattern used
to represent a character in this example may be too small for resolving full character
sets. More sophisticated methods of handling input bitmaps like using 1 or 2 bytes of
bitmap data per input node should be considered to reduce network size and increase
efficiency. If 10 to 20 different font types were used in training, along with
imperfections in these fonts (i.e., slightly rotated or non-centered), we would likely
have a training set with many thousands of patterns. To adequately learn all this
information, a large complex hidden structure would be necessary. While a full OCR example
is beyond the scope of our sample problems here, it is completely within the capabilities
of Qnet to handle such modeling tasks.
