CHAPTER 14

 

SORTING AND MERGING

 

 

CHAPTER OBJECTIVES

 

Upon completion of this chapter, the student should be able to:

 

1.         Explain how files may be sorted within a COBOL program.

 

2.         Explain how to use an INPUT PROCEDURE to process a file before it is actually sorted.

 

3.         Explain how to use an OUTPUT PROCEDURE to process a file after it is sorted.

 

4.         Explain when it is appropriate to use an INPUT and/or OUTPUT PROCEDURE.

 

5.         Explain how to use the MERGE verb for merging files.


LECTURE OUTLINE

 

I.          The SORT Feature: An Overview

 

A.        Format of the SORT Statement

 

1.         Sorting is a common procedure used for arranging records into a specific order so that sequential processing may be performed.

 

2.         Two techniques are used for sorting files processed by COBOL programs:

a.         A utility sort program may be used on the file before executing the COBOL program.

b.         Using COBOL’s SORT verb, a file may be sorted within a COBOL program.

 

3.         Format of the SORT statement:

 

SORT file-name-1

 {ON {ASCENDING/DESCENDING}

     KEY data-name-1}...

  USING  file-name-2

  GIVING file-name-3

 

B.         ASCENDING or DESCENDING Key

 

1.         The programmer must specify whether the key field in the file is to be put into ASCENDING or DESCENDING order.

 

2.         Records may be sorted using either numeric or nonnumeric key fields.

 

3.         Collating sequence refers to the specific order in which characters are sequenced from lowest to highest.

 

4.         The results of a sort on alphanumeric fields containing both letters and digits or special characters will differ depending upon the collating sequence used by the computer:

a.         Letters are "greater than" numbers in ASCII, and letters are "less than" numbers in EBCDIC.

b.         Lowercase letters are "less than" uppercase letters in EBCDIC and "greater than" uppercase letters in ASCII.


5.         Multiple key fields are permitted in a SORT statement.  Key fields are listed in order of importance from the major key field to the minor key field.

 

6.         Multiple records with the same value in the sort field may be placed in the sort file in the same order as they appeared in the original file.  The WITH DUPLICATES IN ORDER clause is used to accomplish this.

 

C.        Coding a Simple SORT Procedure with the USING and GIVING Options

 

1.         Three files are used in a SORT:

a.         An input file of unsorted records.

b.         A work or sort file used to temporarily store records during the sorting process.

c.         An output file of sorted records.

 

2.         All three files are defined in the ENVIRONMENT DIVISION using standard ASSIGN clauses.

 

3.         The input and output files are defined and described in the DATA DIVISION with FDs.

 

4.         The work or sort file is defined in the DATA DIVISION with an SD (sort file description) rather than an FD.

 

5.         SD entries must not have a LABEL RECORDS clause.

 

6.         The key field(s) for the SORT must be defined as part of the sort record format.

 

7.         A SORT procedure can precede an update or control break procedure within the same program.

 

II.         Processing Data Before and/or After Sorting

 

A.        When the USING and GIVING clauses are used, the SORT statement performs the following operations:

 

1.         Opens the input file (USING clause) and output file (GIVING clause).

 

2.         Moves the records from the input file to the sort file.

 

3.         Sorts the sort file into the sequence specified by the ASCENDING/DESCENDING clause(s).

 

4.         Moves the sorted sort file to the output file.

 

5.         Closes both files specified in the USING and GIVING clauses.

 

B.         SORT statement can also be used in conjunction with procedures that process records before they are sorted and/or process records after they are sorted.

 

C.        INPUT PROCEDURE

 

1.         An INPUT PROCEDURE processes data from the unsorted input file prior to sorting.

 

2.         An INPUT PROCEDURE clause is used in place of a USING clause.

 

3.         INPUT PROCEDURE Summary:

a.         The INPUT PROCEDURE of the SORT should refer to a paragraph-name but it could refer to a section-name.

b.         In the paragraph specified in the INPUT PROCEDURE:

i.          OPEN the input file.

ii.          PERFORM a paragraph that will read and process input records until there is no more data.

iii.         After all records have been processed, close the input file.

iv.         After the last sentence in the INPUT PROCEDURE paragraph is executed, control will then return to the SORT, at which time the records in the sort file will be sorted.

c.         At the paragraph that processes input records prior to sorting:

i.          Perform any operations on input that are required.

ii.          MOVE input data to the sort record.

iii.         RELEASE each sort record, which makes it available for sorting.

iv.         Continue to read input until there is no more data.

 

4.         The RELEASE statement is necessary in an INPUT PROCEDURE to make records available for sorting.  It functions just like the WRITE statement and has the same format as the WRITE statement.  It writes records to the sort or work file.

 

5.         Never OPEN or CLOSE the sort file specified in the SD.  It is always opened and closed automatically, as are files specified with USING or GIVING.


 

D.        OUTPUT PROCEDURE

 

1.         An OUTPUT PROCEDURE is used to process records of the sort file after they have been sorted.

 

2.         OUTPUT PROCEDURE is used in place of the GIVING clause.

 

3.         OUTPUT PROCEDURE Summary for COBOL 85:

a.         The OUTPUT PROCEDURE of the SORT should refer to a paragraph-name but it could refer to a section-name.

b.         In the paragraph specified in the OUTPUT PROCEDURE:

i.          OPEN the output file.

ii.          PERFORM a paragraph that will RETURN (which is like a READ) and process records from the sort file until there is no more data.  The records in the sort file will be in sequence at this point.

iii.         After all records have been processed, close the output file.

iv.         When the OUTPUT PROCEDURE paragraph has been fully executed, control will then return to the SORT.

c.         At the paragraph that processes the sort records after they have been sorted but before they are created as output:

i.          Perform any operations on the work or sort records.

ii.          MOVE the work or sort record to the output area.

iii.         WRITE each sort record to the output file.  (A WRITE ... FROM can be used in place of  MOVE and WRITE).

 

E.         When to Use INPUT and/or OUTPUT PROCEDUREs

 

1.         Sometimes it is more efficient to process data before it is sorted, while other times it is more efficient to process data after it is sorted.

 

2.         If there are a large number of records that will be eliminated during a run, it is more efficient to remove them before sorting.  In this way, computer resources are not wasted sorting records that will eventually be discarded anyway.

 

3.         An INPUT or OUTPUT PROCEDURE must be used if the unsorted and sorted files have different-sized fields or have fields in different order.

 

4.                  As an alternative to using an INPUT and/or OUTPUT PROCEDURE, it is possible to code a stand-alone SORT along with separate procedures to process the records before and/or after the SORT is executed.

 

III.       The MERGE Statement

 

A.        MERGE statements combine two or more files into a single file.

 

B.         Format of the MERGE statement is very similar to that of the SORT:

 

1.         Key fields, specified in order of importance, must be defined within the SD.

 

2.         With the USING clause we specify a minimum of two files that are to be merged.

 

3.         An INPUT PROCEDURE may not be specified with the MERGE statement, but an OUTPUT PROCEDURE is permitted.

 

C.        MERGE statements automatically handle the opening, closing, and input/output associated with the files.

 

D.        Files to be merged must be in sequence by the key field.

 

E.         New merged file will maintain the original key sequence of the input files.

 

F.         Same rules apply to OUTPUT PROCEDUREs for the MERGE as for the SORT.


SOLUTIONS TO REVIEW QUESTIONS

 

I.          True-False Questions

 

1.         F          An OUTPUT PROCEDURE may be used along with the USING clause.

 

2.         T

 

3.         F          The collating sequences are different.  For example, the relative positions of

letters and digits are different.

 

4.         T         

 

5.         F          There is no limit to the number of sort fields that may be specified.

 

6.         F          An alternative is to use either a utility or a database management system’s sort

program.  A programmer may also write his or her own sort program.

 

7.         F          The sort field need not be numeric.

 

8.         F          While paragraph-names are most common, section names may be used.

 

9.         F          Any file described with an SD must be defined in a SELECT clause.

 

10.       F          While there are lower values in the EBCDIC collating sequence, blank has the

lowest value of the printable characters.  However, the SORT verb DOES

distinguish between upper- and lower case letters.

 

II.        General Questions

 

1.         Store No          Dept No           Salesperson                 Amt of Sales

 

            001                  01                    O'CONNOR               05899

            002                  01                    GONZALES                12500

            002                  02                    CHANG                      06275

            002                  02                    ADAMS                      18733

            003                  01                    FRANKLIN                12358

            003                  02                    BROWN                     05873

            003                  02                    ANDREWS                 09277

 


2.         Store No          Dept No           Salesperson                 Amt of Sales

 

            003                  02                    ANDREWS                 09277

            003                  02                    BROWN                     05873

            003                  01                    FRANKLIN                12358

            002                  02                    ADAMS                      18733

            002                  02                    CHANG                      06275

            002                  01                    GONZALES                12500

            001                  01                    O'CONNOR               05899

 

3.        

 

IDENTIFICATION DIVISION.

PROGRAM-ID. CH14QST3.

ENVIRONMENT DIVISION.

INPUT-OUTPUT SECTION.

    SELECT INPUT-FILE ASSIGN TO 'CH14QST3.IN'

        ORGANIZATION IS LINE SEQUENTIAL.

    SELECT SORT-FILE ASSIGN TO 'CH14QST3.SRT'.

    SELECT OUTPUT-FILE ASSIGN TO 'CH14QST3.OUT'

        ORGANIZATION IS LINE SEQUENTIAL.

 

DATA DIVISION.

FILE SECTION.

 

FD  INPUT-FILE.

01  INPUT-RECORD            PIC X(20).

 

SD  SORT-FILE.

01  SORT-RECORD.

    05  STORE-NO            PIC X(3).

    05  DEPT-NO             PIC X(2).

    05  SALESPERSON         PIC X(10).

    05  AMT-OF-SALES        PIC 9(5).

 

FD  OUTPUT-FILE.

01  OUTPUT-RECORD           PIC X(20).

 

WORKING-STORAGE SECTION.

01  ARE-THERE-MORE-RECORDS  PIC X(3)   VALUE 'YES'.

    88  NO-MORE-RECORDS                VALUE 'NO '.

01  INPUT-RECORD-COUNTER    PIC 9(3)   VALUE ZERO.


 

PROCEDURE DIVISION.

 

100-MAIN-MODULE.

    SORT SORT-FILE

        ASCENDING KEY DEPT-NO

        INPUT PROCEDURE IS 200-COUNT-INPUT-RECORDS

        GIVING OUTPUT-FILE

    STOP RUN.

 

200-COUNT-INPUT-RECORDS.

    OPEN INPUT INPUT-FILE

    PERFORM UNTIL NO-MORE-RECORDS

        READ INPUT-FILE

            AT END

                MOVE 'NO ' TO ARE-THERE-MORE-RECORDS

            NOT AT END

                ADD 1 TO INPUT-RECORD-COUNTER

                RELEASE SORT-RECORD FROM INPUT-RECORD

        END-READ

    END-PERFORM

    CLOSE INPUT-FILE

    DISPLAY 'THE INPUT FILE CONTAINS ' INPUT-RECORD-COUNTER

            ' RECORDS'.

 

4.

 

IDENTIFICATION DIVISION.

PROGRAM-ID. CH14QST4.

ENVIRONMENT DIVISION.

INPUT-OUTPUT SECTION.

    SELECT INPUT-FILE ASSIGN TO 'CH14QST4.IN'

        ORGANIZATION IS LINE SEQUENTIAL.

    SELECT SORT-FILE ASSIGN TO 'CH14QST4.SRT'.

    SELECT OUTPUT-FILE ASSIGN TO 'CH14QST.OUT'

        ORGANIZATION IS LINE SEQUENTIAL.

 


DATA DIVISION.

FILE SECTION.

 

FD  INPUT-FILE.

01  INPUT-RECORD            PIC X(20).

 

SD  SORT-FILE.

01  SORT-RECORD.

    05  STORE-NO            PIC X(3).

    05  DEPT-NO             PIC X(2).

    05  SALESPERSON         PIC X(10).

    05  AMT-OF-SALES        PIC 9(5).

 

FD  OUTPUT-FILE.

01  OUTPUT-RECORD           PIC X(20).

 

WORKING-STORAGE SECTION.

01  ARE-THERE-MORE-RECORDS      PIC X(3)   VALUE 'YES'.

    88  NO-MORE-RECORDS                    VALUE 'NO '.

01  TOTAL-AMT-OF-SALES          PIC 9(7)   VALUE ZERO.

01  TOTAL-AMT-OF-SALES-EDITED   PIC $$,$$$,$$9.

 

PROCEDURE DIVISION.

 

100-MAIN-MODULE.

    SORT SORT-FILE

        ASCENDING KEY DEPT-NO

                      SALESPERSON

        USING INPUT-FILE

        OUTPUT PROCEDURE IS 200-ACCUMULATE-SALES

    STOP RUN.

 

200-ACCUMULATE-SALES.

    OPEN OUTPUT OUTPUT-FILE

    PERFORM UNTIL NO-MORE-RECORDS

        RETURN SORT-FILE

            AT END

                MOVE 'NO ' TO ARE-THERE-MORE-RECORDS

            NOT AT END

                ADD AMT-OF-SALES TO TOTAL-AMT-OF-SALES

                WRITE OUTPUT-RECORD FROM SORT-RECORD

        END-RETURN

    END-PERFORM

    MOVE TOTAL-AMT-OF-SALES TO TOTAL-AMT-OF-SALES-EDITED

    CLOSE OUTPUT-FILE

    DISPLAY 'TOTAL AMOUNT OF SALES = '

             TOTAL-AMT-OF-SALES-EDITED.


 

5.        

 

IDENTIFICATION DIVISION.

PROGRAM-ID. CH14QST5.

ENVIRONMENT DIVISION.

INPUT-OUTPUT SECTION.

SELECT INPUT-FILE ASSIGN TO 'CH14QST5.IN'

    ORGANIZATION IS LINE SEQUENTIAL.

SELECT SORT-FILE ASSIGN TO 'CH14QST5.SRT'.

 

DATA DIVISION.

FILE SECTION.

 

FD  INPUT-FILE.

01  INPUT-RECORD            PIC X(20).

 

SD  SORT-FILE.

01  SORT-RECORD.

    05  STORE-NO            PIC X(3).

    05  DEPT-NO             PIC X(2).

    05  SALESPERSON         PIC X(10).

    05  AMT-OF-SALES        PIC 9(5).

 

WORKING-STORAGE SECTION.

01  FLAGS.

    05  ARE-THERE-MORE-RECORDS  PIC X(3)    VALUE 'YES'.

   88  NO-MORE-RECORDS                 VALUE 'NO '.

        88  MORE-RECORDS                    VALUE 'YES'.

    05  FIRST-RECORD-FLAG       PIC X(3)    VALUE 'YES'.

        88  FIRST-RECORD                    VALUE 'YES'.

01  CALCULATION-FIELDS.

    05  DEPT-TOTAL-SALES        PIC 9(7)    VALUE ZERO.

    05  DEPT-AVERAGE-SALES      PIC $$$,$$9.99.

    05  SALESPERSON-COUNTER     PIC 9(2)    VALUE ZERO.

01  CONTROL-BREAK-HOLD-FIELDS.

    05  STORE-NO-HOLD           PIC X(3).

    05  DEPT-NO-HOLD            PIC X(2).


 

PROCEDURE DIVISION.

 

100-MAIN-MODULE.

    SORT SORT-FILE

        ASCENDING KEY STORE-NO

                      DEPT-NO

        USING INPUT-FILE

        OUTPUT PROCEDURE IS 200-DISPLAY-DEPT-TOTALS

    STOP RUN.

 

200-DISPLAY-DEPT-TOTALS.

    PERFORM UNTIL NO-MORE-RECORDS

        RETURN SORT-FILE

            AT END

                MOVE 'NO ' TO ARE-THERE-MORE-RECORDS

            NOT AT END

                PERFORM 300-PROCESS-ONE-RECORD

        END-RETURN

    END-PERFORM

    PERFORM 400-DEPT-BREAK

    PERFORM 500-STORE-BREAK.

 

300-PROCESS-ONE-RECORD.

    EVALUATE TRUE

        WHEN FIRST-RECORD

            MOVE STORE-NO TO STORE-NO-HOLD

            MOVE DEPT-NO  TO DEPT-NO-HOLD

            DISPLAY 'SALES AVERAGES FOR STORE '

                     STORE-NO-HOLD

                    ':'

            MOVE 'NO' TO FIRST-RECORD-FLAG

        WHEN STORE-NO NOT = STORE-NO-HOLD

            PERFORM 400-DEPT-BREAK

            PERFORM 500-STORE-BREAK

        WHEN DEPT-NO NOT = DEPT-NO-HOLD

            PERFORM 400-DEPT-BREAK

    END-EVALUATE

    ADD AMT-OF-SALES TO DEPT-TOTAL-SALES

    ADD 1 TO SALESPERSON-COUNTER.


 

400-DEPT-BREAK.

    DIVIDE SALESPERSON-COUNTER INTO DEPT-TOTAL-SALES

        GIVING DEPT-AVERAGE-SALES ROUNDED

    DISPLAY '    DEPARTMENT '

             DEPT-NO-HOLD

            ' = '

             DEPT-AVERAGE-SALES

    MOVE DEPT-NO TO DEPT-NO-HOLD

    MOVE 0 TO DEPT-TOTAL-SALES

              SALESPERSON-COUNTER.

 

500-STORE-BREAK.

    IF MORE-RECORDS

        MOVE STORE-NO TO STORE-NO-HOLD

        DISPLAY 'SALES AVERAGES FOR STORE '

                 STORE-NO-HOLD

                ':'

    END-IF.

 


III.       Validating Data

 

1.         Routines should be added to check that TERR, AREAX, and DEPT are valid numeric fields.

 

2.         A control listing should be produced that includes:

a.         the total number of records processed from the IN-FILE file.

b.         the number of records containing errors from the IN-FILE file.

c.         a detailed description of each error found in the IN-FILE file.

 

IV.       Internet/Critical Thinking Questions

 

1.        

 

Search Engine:  yahoo.com

Keywords:                   COBOL +"external sort"

URL:                            http://www.nd.edu/~ndora/standard/cobol.htm

Contents:                      Brief description of programming standards.  Includes a recommendation

                                    Regarding sorts.

 

Search Engine:  yahoo.com

Keywords:                   COBOL +“external sort”

URL:                            http://cayfer.bilkent.edu.tr/~cayfer/ctp108/sort.htm

Contents:                      Discussion and examples of sorting files in COBOL (RM-Cobol)

 

Search Engine:  altavista.com

Keywords:                   “external sort”

URL:                            http://www.nist.gov/dads/

Contents:                      Dictionary of Algorithms and Data Structures - Look up "sort"

 

Search Engine:  altavista.com

Keywords:                   “external sort”

URL:                            http://csc208.csudh.edu/makinde/csc353/ch6.html

Contents:                      Chapter 6. External Sort/Merge Algorithms.  Discussion of external

                                    sort/merge algorithms.

 

2.         Since sorting creates great demands upon system resources, it is advisable to minimize the size of the file to be sorted when possible.  The INPUT PROCEDURE allows the program to reduce the size of the input file prior to sorting by eliminating records and/or fields that are not needed in the final results.  When it is possible to substantially decrease the size of the input file prior to sorting, then using an INPUT PROCEDURE to accomplish this is worthwhile.

 


SOLUTIONS TO DEBUGGING EXERCISES

 

1.         The SORT-FILE is opened automatically and thus should not be included in the OPEN statement.

 

2.         Records must be RETURNed from the SORT-FILE, not the SORT-REC.

 

3.         The RELEASE statement is only permitted within an INPUT PROCEDURE.  It should be deleted from this program.

 

4.         A STOP RUN is required at the end of 100-MAIN-MODULE.  Without it, after 100-MAIN-MODULE is executed, control falls into the module 200-ADD-TAX SECTION.  There the RETURN statement will cause a program interrupt because the SORT-FILE is no longer open.