CHAPTER 11

DATA VALIDATION

CHAPTER OBJECTIVES

Upon completion of this chapter, the student should be able to:

1. Describe consequences of running a program using invalid input data.

2. Design and code validation routines that use the NUMERIC, ALPHABETIC, POSITIVE, NEGATIVE, and ZERO tests to verify that an input field contains the correct type of data.

3. Design and code a validation routine that checks for missing data by comparing the input field to SPACES.

4. Design and code validation routines that use the INSPECT…TALLYING, INSPECT…REPLACING, and INSPECT…CONVERTING statements.

5. Design and code validation routines using range tests and limit tests to check for reasonableness of input data.

6. Design and code validation routines that use condition-names to check coded fields for valid contents.

7. Design and code a validation routine that verifies that input records are in sequence on the control or key field.

8. Design and code validation routines that use the EVALUATE verb.

9. Describe how a control listing (audit trail) may be used as a validation tool.

10. Describe two procedures commonly used for verifying input data in interactive programs.

11. Describe the following actions commonly taken when input errors are detected: print an error record, terminate execution of the program, partially process erroneous records, bypass erroneous records, terminate execution of the program when an excessive number of errors are detected, use switches, and print control totals.

12. Explain how the READ…INTO statement can be used to READ data records directly into WORKING-STORAGE.

13. Describe the purpose and format of the INITIALIZE verb.

14. Describe the meaning and causes of the following common program interrupts: data exception, divide exception, addressing error, operation error, and specification error.

15. Describe some of the global considerations in COBOL such as the way decimal points and commas are used and ways COBOL can handle the differences between some other countries and the United States..

16. Describe how the USAGE clause can be used to improve program efficiency.

LECTURE OUTLINE

I. Avoiding Logic Errors by Validating Input

A. Debugging Tips:

1. Test data should include cases that check each branch of an IF statement.

2. To test page breaks, include enough test data to print several pages of output.

3. Test ON SIZE ERROR clauses by including data that produces size errors.

4. Inserting DISPLAY statements at key points in the program is a helpful debugging tool.

5. Output files should be checked for accuracy.

6. Check loop counts to make sure that the loop is being executed the correct number of times.

B. Why Input to a Business System Must Be Validated

1. Due to the large volume of data typically submitted as input, the risks of data entry or input errors are great.

2. Programs should include error control procedures to identify input errors.

C. Some Consequences of Invalid Input

1. If data is entered incorrectly, the result could be inaccurate output. Such an error might be the transposition of two digits within a numeric field.

2. Data entered incorrectly could result in a program interrupt. For example, a blank or space entered into a numeric field will cause abnormal termination of the program when the field is used in an arithmetic calculation.

D. Data Validation Techniques

1. Testing Fields to Ensure a Correct Format

a. Use the class test NUMERIC to determine if numeric data fields do, in fact, contain numeric data. Likewise, use the class test ALPHABETIC to determine if alphabetic data fields do, in fact, contain alphabetic data.

b. Use a sign test (POSITIVE, NEGATIVE, ZERO) to verify that a signed numeric field contains the appropriate type of value.

2. Check for missing data by comparing the field to SPACES.

3. The three variations of the INSPECT statement may be used to perform the following types of validation. Formats of and examples using each type of INSPECT may be found in the text.

a. INSPECT…TALLYING may be used to count the occurrences of a given character in a field.

b. INSPECT…REPLACING will replace specific occurrences of a given character with another character.

c. INSPECT…CONVERTING, may also be used to replace specific occurrences of a given character with another character.

4. Testing for Reasonableness

a. Use a range test to determine if the value of a field falls within an established range.

b. Use a limit test to determine that the value in a field does not exceed an established limit.

5. Condition-Names: Checking Coded Fields for Valid Contents

When verifying that specified fields contain valid codes or values, use condition-names to help document the routine.

6. Determine, where necessary, that input records are in sequence on the control or key field.

E. Using the EVALUATE Verb for Data Validation

1. The EVALUATE statement is commonly used with for data validation.

2. The WHEN OTHER clause may be used to trap invalid data.

3. A THRU clause may be used with the WHEN clause of the EVALUATE statement to specify a range of values.

4. The EVALUATE statement may make use of condition-names.

5. The most commonly used formats of the EVALUATE statement:

a. EVALUATE identifier

WHEN value(s) PERFORM ...

END-EVALUATE

b. EVALUATE TRUE

WHEN condition PERFORM ...

END-EVALUATE

c. EVALUATE condition

WHEN TRUE PERFORM ...

WHEN FALSE PERFORM ...

END-EVALUATE

E. Other Methods for Validating Data

1. Print a control listing or audit trail that is manually checked by the user to help minimize the risk of undetected errors. A control listing includes:

a. the key field and other identifying data for every input record.

b. any errors encountered.

c. totals of amounts accumulated for groups of input records processed.

2. There are two commonly used verification procedures available for interactive processing:

a. DISPLAY the fields just entered and prompt the user to verify that the fields are correct.

b. Use a re-keying procedure, checking to see that the data originally keyed is the same as the data being keyed the second time.

II. What to Do If Input Errors Occur

A. Print an Error Record Containing the Key Field, the Contents of the Erroneous Field, and an Error Message.

1. It is a good idea to print an error message indicating the key field, the contents of the erroneous field, and an appropriate error message.

2. A count of each specific error type should be maintained. If the count of a specific type of error is excessive, there may be a program error, or a data entry operator may be repeatedly making a particular mistake.

B. Stop the Run

1. If a major error occurs and data integrity is the primary consideration, it may be best to stop the run.

2. Close all files and print or display an error message before terminating the program.

C. Partially Process or Bypass Erroneous Records

1. If the error is a major one, the program should bypass the erroneous record.

2. If the error is a minor one, the program could partially process some portion of the erroneous record.

D. Stop the Run if the Number of Errors Exceeds a Predetermined Limit

1. Normally, when errors occur, processing continues.

2. The program may count errors and terminate processing if the number of errors is unacceptable.

E. Use Switches

1. A switch is a field that may be used to indicate whether an input record contains valid or invalid data.

2. As a record is processed, the error switch is initialized to ’N’ to indicate no errors. If one of a number of error conditions occur, the field is set to ’Y’. The error-switch can then be checked at any time to determine whether the record is valid or invalid.

F. Print Totals

1. Programs should provide a count of all records processed as well as a count of all errors.

2. Batch totals are printed to check groups of input data. The computer-generated batch totals are compared with those created manually by the user.

III. Global Considerations in COBOL

A. COBOL is used both in the United States and globally.

B. Outside the United States, numbers may be represented differently.

C. The DECIMAL-POINT IS COMMA clause of the SPECIAL-NAMES paragraph can be used to handle the difference.

IV. When Data Should Be Validated

A. All programs run on a regular basis should include data validation techniques to minimize errors.

B. Data validation is of the utmost importance if data is being entered into a system for the first time.

C. COBOL 2000+ Changes:

1. The INSPECT statement will no longer have the one-character limitation placed on the size of the AFTER/BEFORE items coded in the REPLACING clause.

2. A VALIDATE statement has been added; it will be used to check the format of data fields and to verify that the contents of such fields fall within established ranges or have acceptable contents as defined by DATA DIVISION VALUEs.

V. Understanding Program Interrupts

A. During program execution, logic errors sometimes cause the program to abnormally terminate before it has completed all processing. This is called a program interrupt.

B. Common program interrupts are listed below. Refer to the text for descriptions and a list of possible causes for each type of program interrupt.

1. DATA EXCEPTION

2. DIVIDE EXCEPTION

3. ADDRESSING ERROR

4. OPERATION ERROR

5. SPECIFICATION ERROR

VI. Verifying File-Names with ACCEPT and DISPLAY Statements When Using a PC Compiler

A. The file-names in the SELECT statements of programs written a PC compiler are often specified in the exact form in which they appear on disk (such as ’A:\REPORT.OUT’). This practice creates two problems:

1. In order to change the file-names, it is necessary to modify and recompile the program.

2. There is the danger of destroying a previously-created output file when the same file-name is used a second time as an output file.

B. The text suggests two strategies for avoiding the above problems.

1. Use the ACCEPT and DISPLAY statements in a routine that asks the user to verify that the file about to be created does not already exist. This routine should be executed before any files are opened.

2. Use a variable name instead of an actual file-name in the SELECT statement. Then use the ACCEPT and DISPLAY statements in a routine that prompts the user to enter the names of the files from the keyboard.

VII. Other Methods for Improving Program Performance

A. The READ...INTO Statement in Place of Using READ and MOVE Statements

1. WORKING-STORAGE is sometimes used for storing input records.

2. A MOVE operation can be used to move the record defined in the FILE SECTION to the WORKING-STORAGE SECTION after the record has been read.

3. The READ ... INTO statement will read a record and then move it into an area defined in the WORKING-STORAGE SECTION.

B. Clearing Fields Using the INITIALIZE Statement

1. A series of elementary items contained within a group item can all be initialized with the INITIALIZE verb.

2. Numeric fields are initialized to zero and nonnumeric fields are initialized with blanks.

VIII. Improving Program Efficiency with the USAGE Clause

A. Format

1. There are many ways in which numeric data can be stored internally within the computer.

2. The specific method for storing numeric data affects the program’s efficiency.

3. The USAGE clause, which specifies the form in which numeric data is stored, has the following format:

[USAGE IS] {DISPLAY/COMPUTATIONAL/COMP/PACKED-DECIMAL}

4. When the USAGE clause is used with a group item, it applies to all elements within the group.

B. USAGE IS DISPLAY

1. USAGE IS DISPLAY stores one character of data per storage position.

2. USAGE IS DISPLAY is the default.

C. USAGE IS PACKED-DECIMAL or COMPUTATIONAL-3 (COMP-3) – A Common Enhancement

1. On many computers, PACKED-DECIMAL or COMPUTATIONAL-3 enables the computer to store two digits in each storage position, except for the rightmost position, which holds one digit and the sign.

2. USAGE IS PACKED-DECIMAL saves a significant amount of storage space both in WORKING-STORAGE and for disk files.

3. The PACKED-DECIMAL or COMPUTATIONAL-3 option should not be used for printing output because packed-decimal data is not readable.

4. The computer automatically converts from packed to unpacked form and vice versa when a MOVE statement is executed.

D. USAGE IS COMPUTATIONAL (COMP)

1. USAGE IS COMPUTATIONAL stores data in the form in which the computer actually does its computation, usually in binary format.

2. Use of this clause is desirable when many arithmetic computations must be performed.

3. Since subscripts and counters are generated in binary form on many computers, the programmer should define them with USAGE IS COMP or COMPUTATIONAL.

4. The USAGE IS BINARY clause may be used to represent data in binary form.

SOLUTIONS TO REVIEW QUESTIONS

I. True-False Questions

1. T

2. F Syntax errors are detected by the compiler while run-time errors are detected later,

when the program is being run.

3. T

4. F When checking for a condition-name, no relational condition is needed. The

code should read:

IF X-ON PERFORM 200-X-RTN.

5. T

6. T

7. F The field may be alphanumeric.

8. F It may be zero, which is neither positive nor negative.

9. T

10. T

II. General Questions

1. WORKING-STORAGE SECTION field definitions:

01 WORK-FIELDS.

05 SALESPERSON-NAME PIC X(20).

05 A-COUNTER PIC 9(2) VALUE ZERO.

PROCEDURE DIVISION code:

DISPLAY 'ENTER A NAME (20 CHARACTER MAXIMUM): '

ACCEPT SALESPERSON-NAME

INSPECT SALESPERSON-NAME

TALLYING A-COUNTER

FOR ALL 'A'

DISPLAY 'THE NAME CONTAINS ' A-COUNTER ' A''S'.

2. WORKING-STORAGE SECTION field definitions:

01 EDITED-AMOUNT PIC X(11).

PROCEDURE DIVISION code:

DISPLAY 'ENTER AN EDITED DOLLAR/CENTS AMOUNT'

DISPLAY 'IN THE FORMAT $ZZZ,ZZ9.99: '

ACCEPT EDITED-AMOUNT

INSPECT EDITED-AMOUNT

REPLACING ALL SPACE BY '*'

AFTER INITIAL '$'

DISPLAY 'THE REFORMATTED AMOUNT IS ' EDITED-AMOUNT.

3. This solution uses the mathematical divisibility test for 5: a number is divisible by 5 if and only if its last digit contains a 0 or 5.

WORKING-STORAGE SECTION field definitions:

01 UNIT-PRICE PIC X(6).

PROCEDURE DIVISION code:

DISPLAY 'ENTER A UNIT PRICE IN THE FORMAT 999.99: '

ACCEPT UNIT-PRICE

IF UNIT-PRICE (6:1) NOT = '0' AND NOT = '5'

DISPLAY 'THE UNIT PRICE IS NOT DIVISIBLE BY 5'

ELSE

DISPLAY 'THE UNIT PRICE IS DIVISIBLE BY 5'

END-IF.

4. WORKING-STORAGE SECTION field definitions:

01 TEST-FIELDS.

05 TEST-DATE.

10 TEST-YEAR PIC 9(4).

10 TEST-MONTH PIC 9(2).

10 TEST-DAY PIC 9(2).

05 REMAINDER-4 PIC 9(1).

05 REMAINDER-400 PIC 9(3).

05 LEAP-YEAR-FLAG PIC X(3).

PROCEDURE DIVISION code:

DISPLAY 'ENTER A DATE (YYYYMMDD): '

ACCEPT TEST-DATE

MOVE FUNCTION REM (TEST-YEAR, 4) TO REMAINDER-4

IF REMAINDER-4 = 0

IF TEST-YEAR (3:2) = ZERO

MOVE FUNCTION REM (TEST-YEAR, 400)

TO REMAINDER-400

IF REMAINDER-400 = 0

MOVE 'YES' TO LEAP-YEAR-FLAG

ELSE

MOVE 'NO ' TO LEAP-YEAR-FLAG

END-IF

ELSE

MOVE 'YES' TO LEAP-YEAR-FLAG

END-IF

ELSE

MOVE 'NO ' TO LEAP-YEAR-FLAG

END-IF

IF LEAP-YEAR-FLAG = 'NO ' AND TEST-MONTH = 2

AND TEST-DAY > 28

DISPLAY 'THIS IS AN INVALID DATE'

END-IF.

5. WORKING-STORAGE SECTION field definitions:

01 INPUT-FIELD PIC X(30).

01 LOWER-CASE-LETTERS PIC X(26)

VALUE 'abcdefghijklmnopqrstuvwxyz'.

PROCEDURE DIVISION code:

DISPLAY 'ENTER AN ALPHANUMERIC FIELD: '

ACCEPT INPUT-FIELD

INSPECT INPUT-FIELD

CONVERTING LOWER-CASE-LETTERS TO SPACES

DISPLAY 'THE REFORMATTED FIELD: ' INPUT-FIELD.

III. Internet/Critical Thinking Questions

Since the following Web sites contain current news stories, future searches will most likely locate

different pages than the ones listed here.

Search Engine: yahoo.com>Computers & Internet

Keywords: “programming error”

URL: http://www.post-gazette.com/localnews/20020501911out5.asp

Contents: 911 System Failure Tied To Programming Error

Search Engine: yahoo.com>Computers & Internet

Keywords: “programming error”

URL: http://209.58.136.120/recall/p/99/wxrad2.htm

Contents: Weather Alert Radio Recalled for Programming Error

SOLUTIONS TO DEBUGGING EXERCISES

1. AMT2 cannot be negative because the PICTURE clause is unsigned.

05 AMT2 PIC S9(3).

Some compilers may not flag this as a syntax error, but then it would be a logic error.

2. Yes. 200-EDIT-CHECK is performed once for every input record, and the ADD statement is executed once each time through 200-EDIT-CHECK.

3. The word NOT needs to be coded in both conditions.

4. Once ERR-SWITCH is set to a non-zero value, it is never reinitialized. ERR-SWITCH needs to be reset to zero as the first line of code in 200-EDIT-CHECK.