Qnet 2000

Input File Format

Preparing training data for use with Qnet is an easy process. Training data files use the universally compatible ASCII (text) row/column (columnar) input format. Each data column represents data for one input node or a target for one output node. Each row in the file represents one input training case or pattern. Using the loan application example, the data columns would include the age, marital status, number of dependents, years of education, total family income, monthly debt payments, and the monthly payment required for the loan. There is also one output node, the qualification status. The target data for the output node(s) can be located in the same file as the inputs or in a separate file. If the same file were used for both, then the loan application training set would consist of 8 data columns. If the historical database contained 5000 loans, the training file would have 5000 rows (i.e., lines or records).

Data column delimiters can be any combination of spaces, commas or tabs. The only requirement is that both the input node data and output node targets be in contiguous data columns. For the loan example, the data columns containing the input node information can be 1 through 7, 2-8, 10-16, etc. You simply tell Qnet which data column to start reading from. The same rules apply for the target data used by the output node(s). Blank and commented lines are ignored. Comments can be inserted in the files by starting the line with a “#” character. The following is a template of what an input file may look like:

# THIS IS A COMMENT
<INPUT NODE 1> <INPUT NODE 2> <INPUT NODE 3> <INPUT NODE 4> ...... <OUTPUT NODE 1> ...... <=== PATTERN 1
<INPUT NODE 1> <INPUT NODE 2> <INPUT NODE 3> <INPUT NODE 4> ...... <OUTPUT NODE 1> ...... <=== PATTERN 2
<INPUT NODE 1> <INPUT NODE 2> <INPUT NODE 3> <INPUT NODE 4> ...... <OUTPUT NODE 1> ...... <=== PATTERN 3
<INPUT NODE 1> <INPUT NODE 2> <INPUT NODE 3> <INPUT NODE 4> ...... <OUTPUT NODE 1> ...... <=== PATTERN 4
.
.
.
.

It is not required that the input node data columns precede the output node data columns if the two sets of information are contained in the same file.

Qnet also supports the use of column labels. If you choose to use column labels, they should appear in the first record (or line) of the file and start with the comment (“#”) character. Labels, if present, will be used when plotting and viewing training and recall data. The format for labels should be as follows:

#”Label 1” “Label 2” “Label 3” ....

As with data, the labels can be delimited by commas, spaces or tabs. Each label must be enclosed in quotes.

The use of a spreadsheet as a training data preprocessor to Qnet is highly recommended. A spreadsheet will allow you to group columns, move, add or eliminate rows and perform virtually any type of data preparation required. If the training data can be imported into or has been generated with a spreadsheet application, it is very easy to format and save the data in an ASCII (text) format. For example, Microsoft Excel™ allows any spreadsheet to be saved in a formatted text mode with comma (.CSV), tab (.TXT) or space (.PRN) column delimiters — all compatible with Qnet. Most popular spreadsheets also allow data to be saved as .TXT text files that are tab delimited. Refer to your spreadsheet documentation for further information. If the loan application problem were arranged in an Excel spreadsheet, the setup could appear as follows:

#Age Married (1Y,0N) Dependents Education (0,1,2,3) Total Income Monthly Debt Loan Payment Qualified?
24 0 1 0 $26,000 $732 $399 0
35 1 3 3 $66,000 $1412 $299 1
58 1 6 2 $120,000 $3800 $2200 0
44 0 1 1 $39,000 $500 $300 1
               
(Education code - 0 No HS, 1 HS, 2 College, 3 Higher)

The “#” character in front of the “Age” label is used to create a Qnet comment record. Also, for many spreadsheets, you will need to turn off special formatting features like currency or percent type formats (the currency format is used in the income, debt and payments columns above). Spreadsheets are often designed to write the “$” or “%” signs into the output file. These type of number formats are not supported in Qnet. The number formats supported by Qnet are integer (non-decimal numbers), floating point (numbers with decimals) and scientific notation (numbers with exponential formats). Also, make sure that negative numbers are preceded with a “-“ sign and not enclosed with parentheses and that commas are not used in partitioning large numbers (i.e. 1,000,000). In our example, after performing the necessary reformatting operations (removing currency and currency/comma formats), a .csv (Comma Separated Values) file could be written for use with Qnet. If the file is saved as Loan.csv, this information would be set in Qnet’s Training Setup/Training Data dialog window. The starting column locations of input node data, column 1, and the target data, column 8, must also be specified.

Likewise, all popular database applications can create data files in an ASCII text format. If the training data is coming from your own private application, simply follow the above rules when writing to formatted text files. The prepared data file is specified in Qnet’s Training Setup/ Training Data Specification.

An alternative method to creating an ASCII file from your spreadsheet, is to transfer your training data directly into Qnet’s DataPro application and save the data to a Qnet compatible file using DataPro. See the DataPro section for more information. For very large models with hundreds or thousands of input and output nodes, each line in the file can become quite long. Qnet’s has no limit on the length of each line or record that will be scanned for data. Most spreadsheet programs limit output to 256 data columns. When generating a Qnet file that would contain data for 1000 input nodes, several files would have to be combined to create the entire input set. If you require utilities for working with large data sets, contact Vesta for assistance.