CHAPTER 14
SORTING AND MERGING
CHAPTER OBJECTIVES
Upon
completion of this chapter, the student should be able to:
1. Explain how files may be sorted within
a COBOL program.
2. Explain how to use an INPUT PROCEDURE
to process a file before it is actually sorted.
3. Explain how to use an OUTPUT PROCEDURE
to process a file after it is sorted.
4. Explain when it is appropriate to use
an INPUT and/or OUTPUT PROCEDURE.
5. Explain how to use the MERGE verb for
merging files.
LECTURE OUTLINE
I. The SORT Feature: An Overview
A. Format of the SORT Statement
1. Sorting is a common procedure used for
arranging records into a specific order so that sequential processing may be
performed.
2. Two techniques are used for sorting
files processed by COBOL programs:
a. A utility sort program may be used on
the file before executing the COBOL program.
b. Using COBOL’s SORT verb, a file may be
sorted within a COBOL program.
3. Format of the SORT statement:
SORT
file-name-1
{ON {ASCENDING/DESCENDING}
KEY data-name-1}...
USING file-name-2
GIVING file-name-3
B. ASCENDING or DESCENDING Key
1. The programmer must specify whether the
key field in the file is to be put into ASCENDING or DESCENDING order.
2. Records may be sorted using either
numeric or nonnumeric key fields.
3. Collating sequence refers to the
specific order in which characters are sequenced from lowest to highest.
4. The results of a sort on alphanumeric
fields containing both letters and digits or special characters will differ
depending upon the collating sequence used by the computer:
a. Letters are "greater than"
numbers in ASCII, and letters are "less than" numbers in EBCDIC.
b. Lowercase letters are "less
than" uppercase letters in EBCDIC and "greater than" uppercase
letters in ASCII.
5. Multiple key fields are permitted in a
SORT statement. Key fields are listed in
order of importance from the major key field to the minor key field.
6. Multiple records with the same value in
the sort field may be placed in the sort file in the same order as they
appeared in the original file. The WITH
DUPLICATES IN ORDER clause is used to accomplish this.
C. Coding a Simple SORT Procedure with the
USING and GIVING Options
1. Three files are used in a SORT:
a. An input file of unsorted records.
b. A work or sort file used to temporarily
store records during the sorting process.
c. An output file of sorted records.
2. All three files are defined in the
ENVIRONMENT DIVISION using standard ASSIGN clauses.
3. The input and output files are defined
and described in the DATA DIVISION with FDs.
4. The work or sort file is defined in the
DATA DIVISION with an SD (sort file description) rather than an FD.
5. SD entries must not have a LABEL
RECORDS clause.
6. The key field(s) for the SORT must be
defined as part of the sort record format.
7. A SORT procedure can precede an update
or control break procedure within the same program.
II. Processing Data Before and/or After
Sorting
A. When the USING and GIVING clauses are
used, the SORT statement performs the following operations:
1. Opens the input file (USING clause) and
output file (GIVING clause).
2. Moves the records from the input file
to the sort file.
3. Sorts the sort file into the sequence
specified by the ASCENDING/DESCENDING clause(s).
4. Moves the sorted sort file to the
output file.
5. Closes both files specified in the
USING and GIVING clauses.
B. SORT statement can also be used in
conjunction with procedures that process records before they are sorted and/or
process records after they are sorted.
C. INPUT PROCEDURE
1. An INPUT PROCEDURE processes data from
the unsorted input file prior to sorting.
2. An INPUT PROCEDURE clause is used in
place of a USING clause.
3. INPUT PROCEDURE Summary:
a. The INPUT PROCEDURE of the SORT should
refer to a paragraph-name but it could refer to a section-name.
b. In the paragraph specified in the INPUT
PROCEDURE:
i. OPEN the input file.
ii. PERFORM a paragraph that will read and
process input records until there is no more data.
iii. After all records have been processed,
close the input file.
iv. After the last sentence in the INPUT
PROCEDURE paragraph is executed, control will then return to the SORT, at which
time the records in the sort file will be sorted.
c. At the paragraph that processes input
records prior to sorting:
i. Perform any operations on input that
are required.
ii. MOVE input data to the sort record.
iii. RELEASE each sort record, which makes
it available for sorting.
iv. Continue to read input until there is
no more data.
4. The RELEASE statement is necessary in
an INPUT PROCEDURE to make records available for sorting. It functions just like the WRITE statement
and has the same format as the WRITE statement.
It writes records to the sort or work file.
5. Never OPEN or CLOSE the sort file
specified in the SD. It is always opened
and closed automatically, as are files specified with USING or GIVING.
D. OUTPUT PROCEDURE
1. An OUTPUT PROCEDURE is used to process
records of the sort file after they have been sorted.
2. OUTPUT PROCEDURE is used in place of
the GIVING clause.
3. OUTPUT PROCEDURE Summary for COBOL 85:
a. The OUTPUT PROCEDURE of the SORT should
refer to a paragraph-name but it could refer to a section-name.
b. In the paragraph specified in the
OUTPUT PROCEDURE:
i. OPEN the output file.
ii. PERFORM a paragraph that will RETURN
(which is like a READ) and process records from the sort file until there is no
more data. The records in the sort file
will be in sequence at this point.
iii. After all records have been processed,
close the output file.
iv. When the OUTPUT PROCEDURE paragraph has
been fully executed, control will then return to the SORT.
c. At the paragraph that processes the
sort records after they have been sorted but before they are created as output:
i. Perform any operations on the work or
sort records.
ii. MOVE the work or sort record to the
output area.
iii. WRITE each sort record to the output
file. (A WRITE ... FROM can be used in
place of MOVE and WRITE).
E. When to Use INPUT and/or OUTPUT
PROCEDUREs
1. Sometimes it is more efficient to
process data before it is sorted, while other times it is more efficient to
process data after it is sorted.
2. If there are a large number of records
that will be eliminated during a run, it is more efficient to remove them
before sorting. In this way, computer
resources are not wasted sorting records that will eventually be discarded
anyway.
3. An INPUT or OUTPUT PROCEDURE must be
used if the unsorted and sorted files have different-sized fields or have
fields in different order.
4.
As an alternative to using an INPUT
and/or OUTPUT PROCEDURE, it is possible to code a stand-alone SORT along with
separate procedures to process the records before and/or after the SORT is
executed.
III.
The MERGE Statement
A. MERGE statements combine two or more
files into a single file.
B. Format of the MERGE statement is very
similar to that of the SORT:
1. Key fields, specified in order of
importance, must be defined within the SD.
2. With the USING clause we specify a
minimum of two files that are to be merged.
3. An INPUT PROCEDURE may not be specified
with the MERGE statement, but an OUTPUT PROCEDURE is permitted.
C. MERGE statements automatically handle
the opening, closing, and input/output associated with the files.
D. Files to be merged must be in sequence
by the key field.
E. New merged file will maintain the
original key sequence of the input files.
F. Same rules apply to OUTPUT PROCEDUREs
for the MERGE as for the SORT.
SOLUTIONS
TO REVIEW QUESTIONS
I. True-False Questions
1. F An
OUTPUT PROCEDURE may be used along with the USING clause.
2. T
3. F The
collating sequences are different. For
example, the relative positions of
letters
and digits are different.
4. T
5. F There
is no limit to the number of sort fields that may be specified.
6. F An
alternative is to use either a utility or a database management system’s sort
program. A programmer may also write his or her own
sort program.
7. F The
sort field need not be numeric.
8. F While
paragraph-names are most common, section names may be used.
9. F Any
file described with an SD must be defined in a SELECT clause.
10. F While
there are lower values in the EBCDIC collating sequence, blank has the
lowest
value of the printable characters.
However, the SORT verb DOES
distinguish
between upper- and lower case letters.
II. General Questions
1. Store
No Dept No Salesperson Amt
of Sales
001 01 O'CONNOR 05899
002 01 GONZALES 12500
002 02 CHANG 06275
002 02 ADAMS 18733
003 01 FRANKLIN 12358
003 02 BROWN 05873
003 02 ANDREWS 09277
2. Store
No Dept No Salesperson Amt of Sales
003 02 ANDREWS 09277
003 02 BROWN 05873
003 01 FRANKLIN 12358
002 02 ADAMS 18733
002 02 CHANG 06275
002 01 GONZALES 12500
001 01 O'CONNOR 05899
3.
IDENTIFICATION DIVISION.
PROGRAM-ID. CH14QST3.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
SELECT INPUT-FILE ASSIGN TO 'CH14QST3.IN'
ORGANIZATION IS LINE SEQUENTIAL.
SELECT SORT-FILE ASSIGN TO 'CH14QST3.SRT'.
SELECT OUTPUT-FILE ASSIGN TO 'CH14QST3.OUT'
ORGANIZATION IS LINE SEQUENTIAL.
DATA DIVISION.
FILE SECTION.
FD INPUT-FILE.
01 INPUT-RECORD PIC X(20).
SD SORT-FILE.
01 SORT-RECORD.
05
STORE-NO PIC X(3).
05
DEPT-NO PIC X(2).
05
SALESPERSON PIC X(10).
05
AMT-OF-SALES PIC 9(5).
FD OUTPUT-FILE.
01 OUTPUT-RECORD PIC X(20).
WORKING-STORAGE SECTION.
01 ARE-THERE-MORE-RECORDS PIC X(3)
VALUE 'YES'.
88
NO-MORE-RECORDS VALUE 'NO '.
01 INPUT-RECORD-COUNTER PIC 9(3)
VALUE ZERO.
PROCEDURE DIVISION.
100-MAIN-MODULE.
SORT SORT-FILE
ASCENDING KEY DEPT-NO
INPUT PROCEDURE IS
200-COUNT-INPUT-RECORDS
GIVING OUTPUT-FILE
STOP RUN.
200-COUNT-INPUT-RECORDS.
OPEN INPUT INPUT-FILE
PERFORM UNTIL NO-MORE-RECORDS
READ INPUT-FILE
AT END
MOVE 'NO ' TO
ARE-THERE-MORE-RECORDS
NOT AT END
ADD 1 TO INPUT-RECORD-COUNTER
RELEASE SORT-RECORD FROM INPUT-RECORD
END-READ
END-PERFORM
CLOSE INPUT-FILE
DISPLAY 'THE INPUT FILE CONTAINS '
INPUT-RECORD-COUNTER
' RECORDS'.
4.
IDENTIFICATION DIVISION.
PROGRAM-ID. CH14QST4.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
SELECT INPUT-FILE ASSIGN TO 'CH14QST4.IN'
ORGANIZATION IS LINE SEQUENTIAL.
SELECT SORT-FILE ASSIGN TO 'CH14QST4.SRT'.
SELECT OUTPUT-FILE ASSIGN TO 'CH14QST.OUT'
ORGANIZATION IS LINE SEQUENTIAL.
DATA DIVISION.
FILE SECTION.
FD INPUT-FILE.
01 INPUT-RECORD PIC X(20).
SD SORT-FILE.
01 SORT-RECORD.
05
STORE-NO PIC X(3).
05
DEPT-NO PIC X(2).
05
SALESPERSON PIC X(10).
05
AMT-OF-SALES PIC 9(5).
FD OUTPUT-FILE.
01 OUTPUT-RECORD PIC X(20).
WORKING-STORAGE SECTION.
01 ARE-THERE-MORE-RECORDS PIC X(3)
VALUE 'YES'.
88
NO-MORE-RECORDS
VALUE 'NO '.
01 TOTAL-AMT-OF-SALES PIC 9(7) VALUE ZERO.
01 TOTAL-AMT-OF-SALES-EDITED PIC $$,$$$,$$9.
PROCEDURE DIVISION.
100-MAIN-MODULE.
SORT SORT-FILE
ASCENDING KEY DEPT-NO
SALESPERSON
USING INPUT-FILE
OUTPUT PROCEDURE IS
200-ACCUMULATE-SALES
STOP RUN.
200-ACCUMULATE-SALES.
OPEN OUTPUT OUTPUT-FILE
PERFORM UNTIL NO-MORE-RECORDS
RETURN SORT-FILE
AT END
MOVE 'NO ' TO
ARE-THERE-MORE-RECORDS
NOT AT END
ADD AMT-OF-SALES TO TOTAL-AMT-OF-SALES
WRITE OUTPUT-RECORD FROM
SORT-RECORD
END-RETURN
END-PERFORM
MOVE TOTAL-AMT-OF-SALES TO
TOTAL-AMT-OF-SALES-EDITED
CLOSE OUTPUT-FILE
DISPLAY 'TOTAL AMOUNT OF SALES = '
TOTAL-AMT-OF-SALES-EDITED.
5.
IDENTIFICATION DIVISION.
PROGRAM-ID. CH14QST5.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
SELECT INPUT-FILE ASSIGN TO
'CH14QST5.IN'
ORGANIZATION IS LINE SEQUENTIAL.
SELECT SORT-FILE ASSIGN TO
'CH14QST5.SRT'.
DATA DIVISION.
FILE SECTION.
FD INPUT-FILE.
01 INPUT-RECORD PIC X(20).
SD SORT-FILE.
01 SORT-RECORD.
05
STORE-NO PIC X(3).
05
DEPT-NO PIC X(2).
05
SALESPERSON PIC X(10).
05
AMT-OF-SALES PIC 9(5).
WORKING-STORAGE SECTION.
01 FLAGS.
05
ARE-THERE-MORE-RECORDS PIC
X(3) VALUE 'YES'.
88
NO-MORE-RECORDS
VALUE 'NO '.
88
MORE-RECORDS
VALUE 'YES'.
05
FIRST-RECORD-FLAG PIC
X(3) VALUE 'YES'.
88
FIRST-RECORD
VALUE 'YES'.
01 CALCULATION-FIELDS.
05
DEPT-TOTAL-SALES PIC
9(7) VALUE ZERO.
05
DEPT-AVERAGE-SALES PIC
$$$,$$9.99.
05
SALESPERSON-COUNTER PIC
9(2) VALUE ZERO.
01 CONTROL-BREAK-HOLD-FIELDS.
05
STORE-NO-HOLD PIC X(3).
05
DEPT-NO-HOLD PIC X(2).
PROCEDURE DIVISION.
100-MAIN-MODULE.
SORT SORT-FILE
ASCENDING KEY STORE-NO
DEPT-NO
USING INPUT-FILE
OUTPUT PROCEDURE IS
200-DISPLAY-DEPT-TOTALS
STOP RUN.
200-DISPLAY-DEPT-TOTALS.
PERFORM UNTIL NO-MORE-RECORDS
RETURN SORT-FILE
AT END
MOVE 'NO ' TO
ARE-THERE-MORE-RECORDS
NOT AT END
PERFORM 300-PROCESS-ONE-RECORD
END-RETURN
END-PERFORM
PERFORM 400-DEPT-BREAK
PERFORM 500-STORE-BREAK.
300-PROCESS-ONE-RECORD.
EVALUATE TRUE
WHEN FIRST-RECORD
MOVE STORE-NO TO STORE-NO-HOLD
MOVE DEPT-NO TO DEPT-NO-HOLD
DISPLAY 'SALES AVERAGES FOR STORE '
STORE-NO-HOLD
':'
MOVE 'NO' TO FIRST-RECORD-FLAG
WHEN STORE-NO NOT = STORE-NO-HOLD
PERFORM 400-DEPT-BREAK
PERFORM 500-STORE-BREAK
WHEN DEPT-NO NOT = DEPT-NO-HOLD
PERFORM 400-DEPT-BREAK
END-EVALUATE
ADD AMT-OF-SALES TO DEPT-TOTAL-SALES
ADD 1 TO SALESPERSON-COUNTER.
400-DEPT-BREAK.
DIVIDE SALESPERSON-COUNTER INTO
DEPT-TOTAL-SALES
GIVING DEPT-AVERAGE-SALES ROUNDED
DISPLAY '
DEPARTMENT '
DEPT-NO-HOLD
' = '
DEPT-AVERAGE-SALES
MOVE DEPT-NO TO DEPT-NO-HOLD
MOVE 0 TO DEPT-TOTAL-SALES
SALESPERSON-COUNTER.
500-STORE-BREAK.
IF MORE-RECORDS
MOVE STORE-NO TO STORE-NO-HOLD
DISPLAY 'SALES AVERAGES FOR STORE '
STORE-NO-HOLD
':'
END-IF.
III. Validating Data
1. Routines should be added to check that
TERR, AREAX, and DEPT are valid numeric fields.
2. A control listing should be produced
that includes:
a. the total number of records processed
from the IN-FILE file.
b. the number of records containing errors
from the IN-FILE file.
c. a detailed description of each error
found in the IN-FILE file.
IV. Internet/Critical Thinking Questions
1.
Search Engine: yahoo.com
Keywords: COBOL
+"external sort"
URL: http://www.nd.edu/~ndora/standard/cobol.htm
Contents: Brief
description of programming standards.
Includes a recommendation
Regarding
sorts.
Search Engine: yahoo.com
Keywords: COBOL
+“external sort”
URL: http://cayfer.bilkent.edu.tr/~cayfer/ctp108/sort.htm
Contents: Discussion
and examples of sorting files in COBOL (RM-Cobol)
Search Engine: altavista.com
Keywords: “external
sort”
Contents: Dictionary of Algorithms
and Data Structures - Look up "sort"
Search Engine: altavista.com
Keywords: “external
sort”
URL: http://csc208.csudh.edu/makinde/csc353/ch6.html
Contents: Chapter
6. External Sort/Merge Algorithms.
Discussion of external
sort/merge algorithms.
2. Since sorting creates great demands
upon system resources, it is advisable to minimize the size of the file to be
sorted when possible. The INPUT PROCEDURE
allows the program to reduce the size of the input file prior to sorting by
eliminating records and/or fields that are not needed in the final
results. When it is possible to
substantially decrease the size of the input file prior to sorting, then using
an INPUT PROCEDURE to accomplish this is worthwhile.
SOLUTIONS TO DEBUGGING
EXERCISES
1. The SORT-FILE is opened automatically
and thus should not be included in the OPEN statement.
2. Records
must be RETURNed from the SORT-FILE, not the SORT-REC.
3. The RELEASE statement is only permitted
within an INPUT PROCEDURE. It should be
deleted from this program.
4. A STOP RUN is required at the end of
100-MAIN-MODULE. Without it, after
100-MAIN-MODULE is executed, control falls into the module 200-ADD-TAX
SECTION. There the RETURN statement will
cause a program interrupt because the SORT-FILE is no longer open.