CHAPTER 11
DATA VALIDATION
CHAPTER OBJECTIVES
Upon
completion of this chapter, the student should be able to:
1.
Describe consequences of running a
program using invalid input data.
2.
Design and code validation routines that
use the NUMERIC, ALPHABETIC, POSITIVE, NEGATIVE, and ZERO tests to verify that
an input field contains the correct type of data.
3.
Design
and code a validation routine that checks for missing data by comparing the
input field to SPACES.
4.
Design
and code validation routines that use the INSPECT…TALLYING, INSPECT…REPLACING,
and INSPECT…CONVERTING statements.
5.
Design
and code validation routines using range tests and limit tests to check for
reasonableness of input data.
6.
Design
and code validation routines that use condition-names to check coded fields for
valid contents.
7.
Design
and code a validation routine that verifies that input records are in sequence
on the control or key field.
8.
Design
and code validation routines that use the EVALUATE verb.
9.
Describe
how a control listing (audit trail) may be used as a validation tool.
10.
Describe
two procedures commonly used for verifying input data in interactive programs.
11.
Describe
the following actions commonly taken when input errors are detected: print an
error record, terminate execution of the program, partially process erroneous
records, bypass erroneous records, terminate execution of the program when an
excessive number of errors are detected, use switches, and print control
totals.
12.
Explain
how the READ…INTO statement can be used to READ data records directly into
WORKING-STORAGE.
13.
Describe
the purpose and format of the INITIALIZE verb.
14.
Describe
the meaning and causes of the following common program interrupts: data
exception, divide exception, addressing error, operation error, and specification
error.
15.
Describe
some of the global considerations in COBOL such as the way decimal points and
commas are used and ways COBOL can handle the differences between some other
countries and the United States..
16.
Describe
how the USAGE clause can be used to improve program efficiency.
LECTURE OUTLINE
I.
Avoiding Logic Errors by Validating Input
A. Debugging Tips:
1. Test data should include cases that
check each branch of an IF statement.
2. To test page breaks, include enough
test data to print several pages of output.
3. Test ON SIZE ERROR clauses by including
data that produces size errors.
4. Inserting DISPLAY statements at key
points in the program is a helpful debugging tool.
5. Output files should be checked for
accuracy.
6. Check loop counts to make sure that the
loop is being executed the correct number of times.
B. Why Input to a Business System Must Be
Validated
1. Due to the large volume of data
typically submitted as input, the risks of data entry or input errors are
great.
2. Programs should include error control
procedures to identify input errors.
C. Some Consequences of Invalid Input
1. If data is entered incorrectly, the
result could be inaccurate output. Such
an error might be the transposition of two digits within a numeric field.
2. Data entered incorrectly could result
in a program interrupt. For example, a blank or space entered into a numeric
field will cause abnormal termination of the program when the field is used in
an arithmetic calculation.
D. Data Validation Techniques
1. Testing Fields to Ensure a Correct
Format
a. Use the class test NUMERIC to determine
if numeric data fields do, in fact, contain numeric data. Likewise, use the class test ALPHABETIC to
determine if alphabetic data fields do, in fact, contain alphabetic data.
b. Use a sign test (POSITIVE, NEGATIVE,
ZERO) to verify that a signed numeric field contains the appropriate type of
value.
2. Check for missing data by comparing the
field to SPACES.
3. The three variations of the INSPECT
statement may be used to perform the following types of validation. Formats of
and examples using each type of INSPECT may be found in the text.
a. INSPECT…TALLYING may be used to count
the occurrences of a given character in a field.
b. INSPECT…REPLACING will replace specific
occurrences of a given character with another character.
c. INSPECT…CONVERTING, may also be used to
replace specific occurrences of a given character with another character.
4. Testing for Reasonableness
a. Use a range test to determine if the
value of a field falls within an established range.
b. Use a limit test to determine that the
value in a field does not exceed an established limit.
5. Condition-Names: Checking Coded Fields for Valid Contents
When verifying that
specified fields contain valid codes or values, use condition-names to help
document the routine.
6. Determine, where necessary, that input
records are in sequence on the control or key field.
E. Using the EVALUATE Verb for Data
Validation
1. The EVALUATE statement is commonly used
with for data validation.
2. The WHEN OTHER clause may be used to
trap invalid data.
3. A THRU clause may be used with the WHEN
clause of the EVALUATE statement to specify a range of values.
4. The EVALUATE statement may make use of
condition-names.
5. The most commonly used formats of the
EVALUATE statement:
a. EVALUATE identifier
WHEN value(s) PERFORM ...
END-EVALUATE
b. EVALUATE TRUE
WHEN condition PERFORM ...
END-EVALUATE
c. EVALUATE condition
WHEN TRUE PERFORM ...
WHEN FALSE PERFORM ...
END-EVALUATE
E. Other Methods for Validating Data
1. Print a control listing or audit trail
that is manually checked by the user to help minimize the risk of undetected
errors. A control listing includes:
a. the key field and other identifying
data for every input record.
b. any errors encountered.
c. totals of amounts accumulated for
groups of input records processed.
2. There are two commonly used
verification procedures available for interactive processing:
a. DISPLAY the fields just entered and
prompt the user to verify that the fields are correct.
b. Use a re-keying procedure, checking to
see that the data originally keyed is the same as the data being keyed the
second time.
II. What
to Do If Input Errors Occur
A. Print an Error Record Containing the Key
Field, the Contents of the Erroneous Field, and an Error Message.
1. It is a good idea to print an error
message indicating the key field, the contents of the erroneous field, and an
appropriate error message.
2. A count of each specific error type
should be maintained. If the count of a
specific type of error is excessive, there may be a program error, or a data
entry operator may be repeatedly making a particular mistake.
B. Stop the Run
1. If a major error occurs and data integrity
is the primary consideration, it may be best to stop the run.
2. Close all files and print or display an
error message before terminating the program.
C. Partially Process or Bypass Erroneous
Records
1. If the error is a major one, the
program should bypass the erroneous record.
2. If the error is a minor one, the
program could partially process some portion of the erroneous record.
D. Stop the Run if the Number of Errors
Exceeds a Predetermined Limit
1. Normally, when errors occur, processing
continues.
2. The program may count errors and
terminate processing if the number of errors is unacceptable.
E. Use Switches
1. A switch is a field that may be used to
indicate whether an input record contains valid or invalid data.
2. As a record is processed, the error
switch is initialized to ’N’ to indicate no errors. If one of a number of error conditions occur,
the field is set to ’Y’. The error-switch can then be checked at any time to
determine whether the record is valid or invalid.
F. Print
Totals
1. Programs should provide a count of all
records processed as well as a count of all errors.
2. Batch totals are printed to check
groups of input data. The
computer-generated batch totals are compared with those created manually by the
user.
III. Global Considerations in COBOL
A. COBOL
is used both in the United States and globally.
B. Outside
the United States, numbers may be represented differently.
C. The DECIMAL-POINT IS COMMA clause of the
SPECIAL-NAMES paragraph can be used to handle the difference.
IV.
When Data Should Be Validated
A. All programs run on a regular basis
should include data validation techniques to minimize errors.
B. Data validation is of the utmost
importance if data is being entered into a system for the first time.
C. COBOL 2000+ Changes:
1. The
INSPECT statement will no longer have the one-character limitation placed on
the size of the AFTER/BEFORE items coded in the REPLACING clause.
2. A
VALIDATE statement has been added; it will be used to check the format of data
fields and to verify that the contents of such fields fall within established
ranges or have acceptable contents as defined by DATA DIVISION VALUEs.
V. Understanding Program Interrupts
A. During program execution, logic errors
sometimes cause the program to abnormally terminate before it has completed all
processing. This is called a program
interrupt.
B. Common program interrupts are listed
below. Refer to the text for
descriptions and a list of possible causes for each type of program interrupt.
1. DATA EXCEPTION
2. DIVIDE EXCEPTION
3. ADDRESSING ERROR
4. OPERATION ERROR
5. SPECIFICATION ERROR
VI. Verifying File-Names with ACCEPT and
DISPLAY Statements When Using a PC Compiler
A. The file-names in the SELECT statements
of programs written a PC compiler are often specified in the exact form in
which they appear on disk (such as ’A:\REPORT.OUT’). This practice creates two problems:
1. In order to change the file-names, it
is necessary to modify and recompile the program.
2. There is the danger of destroying a
previously-created output file when the same file-name is used a second time as
an output file.
B. The text suggests two strategies for
avoiding the above problems.
1. Use the ACCEPT and DISPLAY statements
in a routine that asks the user to verify that the file about to be created
does not already exist. This routine should be executed before any files are
opened.
2. Use a variable name instead of an
actual file-name in the SELECT statement.
Then use the ACCEPT and DISPLAY statements in a routine that prompts the
user to enter the names of the files from the keyboard.
A. The READ...INTO Statement in Place of
Using READ and MOVE Statements
1. WORKING-STORAGE is sometimes used for
storing input records.
2. A MOVE operation can be used to move
the record defined in the FILE SECTION to the WORKING-STORAGE SECTION after the
record has been read.
3. The READ ... INTO statement will read a
record and then move it into an area defined in the WORKING-STORAGE SECTION.
B. Clearing Fields Using the INITIALIZE
Statement
1. A series of elementary items contained
within a group item can all be initialized with the INITIALIZE verb.
2. Numeric fields are initialized to zero
and nonnumeric fields are initialized with blanks.
VIII. Improving Program Efficiency with the USAGE
Clause
A. Format
1. There are many ways in which numeric
data can be stored internally within the computer.
2. The specific method for storing numeric
data affects the program’s efficiency.
3. The USAGE clause, which specifies the
form in which numeric data is stored, has the following format:
[USAGE
IS] {DISPLAY/COMPUTATIONAL/COMP/PACKED-DECIMAL}
4. When the USAGE clause is used with a
group item, it applies to all elements within the group.
B. USAGE IS DISPLAY
1. USAGE IS DISPLAY stores one character
of data per storage position.
2. USAGE IS DISPLAY is the default.
C. USAGE IS PACKED-DECIMAL or
COMPUTATIONAL-3 (COMP-3) – A Common
Enhancement
1. On many computers, PACKED-DECIMAL or
COMPUTATIONAL-3 enables the computer to store two digits in each storage
position, except for the rightmost position, which holds one digit and the
sign.
2. USAGE IS PACKED-DECIMAL saves a
significant amount of storage space both in WORKING-STORAGE and for disk files.
3. The PACKED-DECIMAL or COMPUTATIONAL-3
option should not be used for printing output because packed-decimal data is
not readable.
4. The computer automatically converts
from packed to unpacked form and vice versa when a MOVE statement is executed.
D. USAGE IS COMPUTATIONAL (COMP)
1. USAGE IS COMPUTATIONAL stores data in
the form in which the computer actually does its computation, usually in binary
format.
2. Use of this clause is desirable when
many arithmetic computations must be performed.
3. Since subscripts and counters are
generated in binary form on many computers, the programmer should define them
with USAGE IS COMP or COMPUTATIONAL.
4. The USAGE IS BINARY clause may be used
to represent data in binary form.
SOLUTIONS
TO REVIEW QUESTIONS
I. True-False Questions
1. T
2. F Syntax
errors are detected by the compiler while run-time errors are detected later,
when
the program is being run.
3. T
4. F When
checking for a condition-name, no relational condition is needed. The
code should read:
IF
X-ON PERFORM 200-X-RTN.
5. T
6. T
7. F The
field may be alphanumeric.
8. F It
may be zero, which is neither positive nor negative.
9. T
10. T
II. General Questions
1. WORKING-STORAGE SECTION field
definitions:
01 WORK-FIELDS.
05
SALESPERSON-NAME PIC X(20).
05
A-COUNTER PIC
9(2) VALUE ZERO.
PROCEDURE DIVISION
code:
DISPLAY
'ENTER A NAME (20 CHARACTER MAXIMUM): '
ACCEPT SALESPERSON-NAME
INSPECT SALESPERSON-NAME
TALLYING A-COUNTER
FOR ALL 'A'
DISPLAY 'THE NAME CONTAINS ' A-COUNTER '
A''S'.
2. WORKING-STORAGE SECTION field
definitions:
01 EDITED-AMOUNT PIC X(11).
PROCEDURE DIVISION
code:
DISPLAY 'ENTER
AN EDITED DOLLAR/CENTS AMOUNT'
DISPLAY
'IN THE FORMAT $ZZZ,ZZ9.99: '
ACCEPT EDITED-AMOUNT
INSPECT EDITED-AMOUNT
REPLACING ALL SPACE BY '*'
AFTER INITIAL '$'
DISPLAY 'THE REFORMATTED AMOUNT IS '
EDITED-AMOUNT.
3. This solution uses the mathematical
divisibility test for 5: a number is
divisible by 5 if and only if its last digit contains a 0 or 5.
WORKING-STORAGE
SECTION field definitions:
01 UNIT-PRICE PIC
X(6).
PROCEDURE DIVISION
code:
DISPLAY
'ENTER A UNIT PRICE IN THE FORMAT 999.99: '
ACCEPT UNIT-PRICE
IF UNIT-PRICE (6:1) NOT = '0' AND NOT =
'5'
DISPLAY 'THE UNIT PRICE IS NOT
DIVISIBLE BY 5'
ELSE
DISPLAY 'THE UNIT PRICE IS DIVISIBLE
BY 5'
END-IF.
4. WORKING-STORAGE SECTION field
definitions:
01 TEST-FIELDS.
05
TEST-DATE.
10
TEST-YEAR PIC 9(4).
10
TEST-MONTH PIC 9(2).
10
TEST-DAY PIC 9(2).
05
REMAINDER-4 PIC 9(1).
05
REMAINDER-400 PIC 9(3).
05
LEAP-YEAR-FLAG PIC X(3).
PROCEDURE DIVISION
code:
DISPLAY
'ENTER A DATE (YYYYMMDD): '
ACCEPT TEST-DATE
MOVE FUNCTION REM (TEST-YEAR, 4) TO
REMAINDER-4
IF REMAINDER-4 = 0
IF TEST-YEAR (3:2) = ZERO
MOVE FUNCTION REM (TEST-YEAR, 400)
TO REMAINDER-400
IF REMAINDER-400 = 0
MOVE 'YES' TO LEAP-YEAR-FLAG
ELSE
MOVE 'NO ' TO LEAP-YEAR-FLAG
END-IF
ELSE
MOVE 'YES' TO LEAP-YEAR-FLAG
END-IF
ELSE
MOVE 'NO ' TO LEAP-YEAR-FLAG
END-IF
IF LEAP-YEAR-FLAG = 'NO ' AND TEST-MONTH =
2
AND TEST-DAY
> 28
DISPLAY 'THIS IS AN INVALID DATE'
END-IF.
5. WORKING-STORAGE SECTION field
definitions:
01 INPUT-FIELD PIC X(30).
01
LOWER-CASE-LETTERS PIC X(26)
PROCEDURE DIVISION
code:
DISPLAY
'ENTER AN ALPHANUMERIC FIELD: '
ACCEPT INPUT-FIELD
INSPECT INPUT-FIELD
CONVERTING LOWER-CASE-LETTERS TO
SPACES
DISPLAY 'THE REFORMATTED FIELD: '
INPUT-FIELD.
III. Internet/Critical Thinking Questions
Since
the following Web sites contain current news stories, future searches will most
likely locate
different
pages than the ones listed here.
Search Engine: yahoo.com>Computers & Internet
URL: http://www.post-gazette.com/localnews/20020501911out5.asp
Contents: 911
System Failure Tied To Programming Error
Search Engine: yahoo.com>Computers & Internet
Keywords: “programming
error”
URL: http://209.58.136.120/recall/p/99/wxrad2.htm
Contents: Weather Alert Radio
Recalled for Programming Error
SOLUTIONS
TO DEBUGGING EXERCISES
1. AMT2 cannot be negative because the
PICTURE clause is unsigned.
05 AMT2 PIC S9(3).
Some compilers may not
flag this as a syntax error, but then it would be a logic error.
2. Yes.
200-EDIT-CHECK is performed once for every input record, and the ADD
statement is executed once each time through 200-EDIT-CHECK.
3. The word NOT needs to be coded in both
conditions.
4. Once ERR-SWITCH is set to a non-zero
value, it is never reinitialized.
ERR-SWITCH needs to be reset to zero as the first line of code in
200-EDIT-CHECK.