CHAPTER 15

 

INDEXED AND RELATIVE FILE PROCESSING

 

 

CHAPTER OBJECTIVES

 

After completion of this chapter, the student should be able to:

 

1.         Describe sequential, indexed, and relative file organizations.

 

2.         Describe the physical features of magnetic disks and disk drives.

 

3.                  Design and code a program that creates an indexed file.

 

4.         Design and code a program that performs a random update of an indexed file.

 

5.         Design and code a program that accesses the records of an indexed file sequentially.

 

6.         Design and code a program that accesses records of an indexed file based on the values of an ALTERNATE RECORD KEY.

 

7.         Describe the function and format of the START statement.

 

8.         Describe situations in which DYNAMIC ACCESS might be used to process an indexed file.

 

9.         Describe the function and use of the FILE STATUS clause.

 

10.       Explain how exception handling can be accomplished with the DECLARATIVES segment of the PROCEDURE DIVISION.

 

11.       Describe how an indexed file can be used as an external table.

 

12.       Explain what a relative file is.

 

13.              Explain how to create and access relative files both sequentially and randomly.

 

14.       Describe several methods used to convert a key field to a relative record number.

 

15.       Explain what a collision is and describe methods for handling collisions.

 

16.       Define the term hashing.


LECTURE OUTLINE

 

I.          Systems Considerations for Organizing Disk Files

 

A.        Sequential File Organization

 

1.         The simplest type of disk file organization is sequential.

 

2.         Records are stored sequentially by a key field.

 

3.         Sequential records are processed in the same manner whether they are stored on tape or disk.

 

4.         In order to access a specific record, the computer must first read past all preceding records.

 

B.         Indexed File Organization

 

1.         An indexed file consists of two files:

a.         A data file that stores the actual physical data.

b.         An index file that contains the value of each key field along with the disk address of the corresponding record.

 

2.         The index enables a user to access a disk file randomly.

 

3.         To access an indexed record randomly, the key field is looked up in the index file to find the disk address of the record; with this address, the record can be accessed directly in the indexed data file.

 

4.         Records of an indexed file may be accessed sequentially or randomly.  They may also be processed both sequentially and randomly in the same program. 

 

C.        Relative File Organization

 

1.         Relative files also permit random access.

 

2.         A relative file does not use an index to access a record randomly.

 

3.         The key field of each record is used to calculate the record’s relative location in the file.

 

4.         Records of a relative file may be accessed either sequentially or randomly.


II.         Features of Magnetic Disks and Disk Drives

 

A.        Magnetic disk is a storage medium that can serve as either input to or output from a program.

 

B.         Most hard disks are fixed disks, which cannot be removed from the disk drive.

 

C.        Each disk surface of a hard disk writes data as magnetized bits in concentric circles called tracks.

 

D.        Individual records on disk can be addressed by:

 

1.         Surface number.

 

2.         Track number.

 

3.         Sector number (floppy disks) or cylinder number (larger units).

 

E.         On microcomputer disks, the concentric tracks are segmented into wedge-shaped sectors.

 

F.         On large computers, cylinders are used for addressing disk records.

 

III.       Processing Indexed Disk Files

 

A.        Creating an Indexed File

 

1.         Indexed files are created in sequence.

 

2.         Once an indexed file is created, it can be accessed either sequentially or randomly.

 

3.         Format of the SELECT statement used when creating an indexed file:

 

SELECT file-name-1 ASSIGN to implementor-name-1

    [ORGANIZATION IS] INDEXED

    [ACCESS MODE IS SEQUENTIAL]

     RECORD KEY IS data-name-1

 

4.         The ORGANIZATION IS INDEXED clause indicates that the file is to be created with an index.

 

5.         The ACCESS clause indicates the method (sequential or random) that will be used to access the file.  It is optional when the file is to be accessed sequentially, since sequential is the default.

 

6.         No two records in an indexed file can have the same key field value.

 

7.         The RECORD KEY clause specifies the key field within the disk record that will be used to form the index.

 

8.         The WRITE statement is modified to include an INVALID KEY clause when writing records to an indexed file.

 

9.         The INVALID KEY tests for two possible conditions:

a.         a key field that is out of sequence.

b.         a key field that is the same as one already on the indexed file.

 

10.       If INVALID KEY is coded and a record has an erroneous key:

a.         the record is not written.

b.         the statements following INVALID KEY are executed.

 

11.       The format for the WRITE statement with an INVALID KEY clause:

 

WRITE record-name-1 [FROM identifier-1]

    [INVALID KEY imperative-statement-1]

 

12.       NOT INVALID KEY and END-WRITE are additional options of the WRITE statement.

 

13.              The FILE STATUS clause can be used with the WRITE statement to identify the specific error that caused the INVALID KEY clause to be executed.

 

B.         Updating an Indexed File Randomly

 

1.         When updating an indexed file, only two files are needed.  They are:

a.         the transaction file.

b.         the master index file, which serves as both an input and output file.

 

2.         Since the indexed disk file may be accessed randomly, there is no need to sort the transaction file for an update procedure.  Sorting the transaction file however, can reduce processing time.

 

3.         The ACCESS clause of the SELECT statement for an indexed file that is to be updated randomly is:

 

ACCESS MODE IS RANDOM

 


4.         When updating an indexed file, the file is opened as I-O.

 

5.         Before executing a random READ, the key value of the desired record must be moved to the key field of the indexed file.

 

6.         When reading an indexed file randomly, the READ statement contains an INVALID KEY clause instead of an AT END clause.

 

7.         The INVALID KEY clause of the READ statement will be executed if the record with the specified key value is not found in the indexed file.

 

8.                  There is no AT END clause when reading randomly from an indexed file.

 

9.         The NOT INVALID KEY clause can also be specified and will be executed if the record is found.

 

10.       After the record is accessed from the indexed file, it is updated using the transaction data.  The updated record is then replaced on the indexed file with a REWRITE statement.

 

11.       Steps to update an indexed master file:

a.         Open the master file as I-O.

b.         Read a transaction record or accept transaction data from the keyboard.

c.         Move the desired value from the transaction record to the RECORD KEY of the master file and READ the corresponding master record.

d.         Change the master record.

e.         REWRITE the master record.

 

12.              The format for the REWRITE statement:

 

REWRITE record-name-1 [FROM identifier-1]

    [INVALID KEY imperative-statement-1]

    [NOT INVALID KEY imperative-statement-2]

[END-REWRITE]

 

13.       An INVALID KEY will occur on a REWRITE if the programmer has changed the key field of the record.

 

14.       Interactive processing is often used when updating records of an indexed file.

 

15.       Types of update transactions are:

a.         Making changes to existing records (READ, change, and REWRITE).

b.         Creating new records (WRITE).

c.         Deleting existing records (READ and DELETE or just DELETE).

 

16.       A coded field in the transaction record is commonly used to specify whether the corresponding transaction is a change, addition, or deletion.

 

17.       Format of the DELETE statement:

 

DELETE indexed-file-name RECORD

    [INVALID KEY imperative-statement-1]

    [NOT INVALID KEY imperative-statement-2]

[END-DELETE]

 

C.        Updating an Indexed File with Multiple Transaction Records for Each Master Record

 

1.                  An indexed file update is exactly the same regardless of whether there is only one transaction record per master or there are multiple transactions per master.

 

2.                  Debugging Tips:

a.                   Indexed files cannot be created as test data using a text editor; a program is required to create them.

b.                  Always precede the test of an indexed file update program by running the indexed file creation program to ensure that the update program is being run against a “clean” copy of the file.

c.                   Always use scope terminators and include periods only for the last sentence in a paragraph.

 

D.        Accessing or Reading from an Indexed File for Reporting Purposes

 

1.         An indexed file may be accessed either sequentially or randomly for reporting purposes.

 

2.         An indexed file may be processed sequentially either by using the key field or by sorting on some other field.

 

3.         An indexed file may be processed randomly by directly accessing the desired record based on the value of the key field.


 

IV.              Additional Options for Indexed File Processing

 

A.                 Using ALTERNATE RECORD KEYs

 

1.                  Indexed files may be created with and accessed by more than one identifying key field.

 

2.                  The additional key fields are declared in the SELECT statement with the ALTERNATE RECORD KEY clause.

 

3.                  If the ALTERNATE RECORD KEY need not be unique, the WITH DUPLICATES clause is used.

 

4.                  An indexed file can be accessed by its RECORD KEY or any of its ALTERNATE RECORD KEYs.

 

5.                  For each RECORD KEY and ALTERNATE RECORD KEY established, an index is created that can be used for accessing records with specific values of that key.

 

6.                  A record can be accessed randomly by any of its alternate keys by specifying the key to be used in the READ statement.

 

7.                  When there is more than one record with the same ALTERNATE RECORD KEY value, the one retrieved by a random READ will be the record that was first placed in the indexed file.

 

B.                 The START Statement

 

1.                  The START statement enables a program to begin processing an indexed file sequentially at a record location other than the first or next physical record in the file.

 

2.         In order to use the START statement, the SELECT clause for the indexed file must specify ACCESS IS SEQUENTIAL or ACCESS IS DYNAMIC.

 


3.         The format for the START statement:

 

START file-name-1

                    [KEY {IS EQUAL TO

                          IS =

                           IS GREATER THAN

                           IS >

                           IS NOT LESS THAN

                           IS NOT <

                           IS GREATER THAN OR EQUAL TO

                           IS >=}

           data-name-1]

    [INVALID KEY imperative-statement-1]

    [NOT INVALID KEY imperative-statement-2] [END-START]

 

4.         If the KEY clause of the START statement is omitted, the relational operator ‘IS EQUAL TO’ is implied and the primary record key is assumed to be the key of reference.

 

5.         The START statement locates the specified record but does not READ it into storage.

 

            C.        Accessing an Indexed File Dynamically

 

1.                  When a file needs to be accessed both sequentially and randomly in the same program, ACCESS IS DYNAMIC is specified in the SELECT statement.

 

2.                  The READ … NEXT RECORD Instruction

a.                   When using ACCESS IS DYNAMIC, it is necessary to distinguish between sequential and random READs.

b.                  The NEXT RECORD clause is used to specify that a READ statement is a sequential READ.

 

D.        A Review of the INVALID KEY Clause

 

1.                  When used with a random READ statement, INVALID KEY means there is no record on the file with the specified key.

 

2.                  When used with a random WRITE statement, INVALID KEY means that the file already contains a record with the specified key or that there is no more room available in the file.


3.                  When used with a REWRITE, DELETE, or START statement, INVALID KEY means that no record with the specified key can be found on the file.  In the case of a REWRITE it can also mean that the programmer has changed the value of the RECORD KEY field since the record was read.

 

E.                  ALTERNATE RECORD KEYs Reduce the Need for Multiple Copies of the File

 

F.         The FILE STATUS Clause

 

1.         The FILE STATUS clause can be included at the end of the SELECT statement.  The FILE STATUS field is used to determine the exact cause of an INVALID KEY condition when accessing the corresponding file.

 

2.         The FILE STATUS clause, if included as part of the SELECT statement, appears as follows:

 

SELECT ...

    FILE STATUS is data-name

 

3.         The data-name specified in the FILE STATUS clause must be defined in WORKING-STORAGE as a two-position alphanumeric field.

 

4.         When an input or output operation is performed on the file associated with the FILE STATUS clause, the operating system will place a value in the FILE STATUS field.

 

5.         The programmer may test the value of the FILE STATUS field in the PROCEDURE DIVISION.

 

6.         Refer to the text for a list of commonly-encountered values of the FILE STATUS field.

 

7.         The FILE STATUS clause may be specified with any type of file.

 

8.                  The FILE STATUS field allows the program to determine the precise cause of an input or output error.

 


G.        Exception Handling With the USE Statement

 

1.                  It is possible to establish a separate section in the program designed to handle all input/output errors.

 

2.                  Such exception handling routines are placed in the DECLARATIVES segment of the PROCEDURE DIVISION.

 

3.                  The DECLARATIVES segment always appears first in the PROCEDURE DIVISION.

 

4.                  Each section in the DECLARATIVES segment contains a USE statement that specifies the corresponding file.

 

5.                  The appropriate section within the DECLARATIVES segment is automatically executed whenever an input or output error occurs for the corresponding file.

 

V.                 Using an Indexed File as an External Table

 

1.                  An alternative to loading an external table into primary memory is to store the table data as an indexed file.

 

2.                  The main reason for using an indexed file as an external table is to save primary storage space.

 

3.                  The key field of the input record can be used to retrieve the corresponding table entry into memory from auxiliary storage.

 

4.                  Processing an indexed file as an external table is somewhat slower than processing a table that has been loaded into main memory.

 

VI.              Proposed COBOL 2002+ Changes

 

1.                  A “less than” condition will be available with the START statement.

 

2.                  The DELETE FILE statement will be added to allow the program to delete an entire file.

 


VII.      Processing Relative Disk Files

 

A.        What is a Relative File?

 

1.         A relative file is a file in which each record is identified by its relative record number.

 

2.         The relative record number is the actual physical record location relative to the beginning of the file.

 

3.         Relative files allow records to be accessed randomly and sequentially.

 

4.         Accessing relative files is very fast because there is no need to look up a disk address from an index.

 

5.         Relative files are best used when each record contains a built-in record number.

 

6.         Key fields of relative records are converted to actual addresses.

 

7.         The field that supplies the key information can also serve as a relative record number or RELATIVE KEY.

 

8.         The statements in the PROCEDURE DIVISION for random or sequential processing of relative files are similar to those for indexed files.

 

9.         ORGANIZATION IS RELATIVE is required in the SELECT statement for a relative file.

 

10.       The RELATIVE KEY clause is required in the SELECT statement when ACCESS is RANDOM or DYNAMIC.

 

11.       The RELATIVE KEY clause is optional in the SELECT statement when ACCESS is SEQUENTIAL.

 

12.       The relative key is defined in WORKING-STORAGE; it is not part of the relative record.

 

13.       A FILE STATUS field may be defined and used with a relative file in the same way as with indexed files.

 


B.         Creating Relative Files

 

1.         Relative files are created sequentially.

 

2.         The computer creates a record for every value of the RELATIVE KEY, automatically inserting blank records for any RELATIVE KEY values that having no corresponding record in the input file.

 

3.         If the key field is not consecutive or if it is too large, it can be converted to a relative key using different types of algorithms.

 

C.        Sequential Reading of Relative Files

 

1.         Records in a relative file may be read sequentially in the order in which they were created.

 

2.         A sequential READ reads the records in ascending relative key order.

 

3.         The RELATIVE KEY need not be specified in the SELECT statement when the file is processed sequentially.

 

D.        Random Reading of Relative Files

 

1.         A relative file may be read randomly by specifying ACCESS IS RANDOM in the SELECT statement.

 

2.         An INVALID KEY clause is executed if the key on the query file does not match a key on the relative file.

 

E.         Random Updating of Relative Files

 

1.         To randomly update a relative file, READ each record to be changed, make the desired changes, and REWRITE the record.

 

2.         The relative file must be opened as I-O.

 

3.         The INVALID KEY clause of a READ statement is executed if the record in the transaction file does not match a corresponding record in the relative file.

 

4.         The INVALID KEY clause of a REWRITE statement is executed if the key field in WORKING-STORAGE has a value outside of the file's range.


5.         The DELETE statement is used to delete records from a relative file.

 

VIII.     Converting A Key Field To A RELATIVE KEY

 

A.        A relative file's key can be its actual disk location, or the key field may be used to compute the disk address.

 

B.         Methods used to convert a key field into a relative record number are called hashing algorithms.

 

C.        The same hashing method that was used to create the relative file must be used whenever accessing the file randomly.

 

D.        A collision occurs when two or more relative keys are transformed by the randomizing algorithm into the same record key address.

 

1.         One solution to the collision problem is to create a file where colliding records are placed in an overflow area.

 

2.         Another method for dealing with collisions is to add 1 to the relative key and attempt to write the record in the next disk location.

 

E.         Commonly-used randomizing or hashing algorithms:

 

1.         Divide the key field by the number of records in the file and use the remainder as the relative address.

 

2.         Split the key, add the parts, and truncate any extra digits (folding).

 

3.         Use selected digits to form a relative record number (digit extraction).

 

4.         Square the key value and truncate to the desired number of digits (square value truncation).

 

F.         It is not efficient to process relative files sequentially when a randomizing algorithm is used to compute the RELATIVE KEY.

 


SOLUTIONS TO REVIEW QUESTIONS

 

I.          True-False Questions

 

1.         F          An indexed file is created with ACCESS IS SEQUENTIAL and read

                        with ACCESS IS RANDOM or ACCESS IS SEQUENTIAL.

 

2.         T

 

3.         T         

 

4.         T

 

5.         T

 

6.         T

 

7.         T

 

8.         F          The relative file's key field is defined in WORKING-STORAGE.

 

9.         T

 

10.       T

 

11.       F          We would use REWRITE EMPLOYEE-REC.

 

12.       F          AT END would be used if the file was accessed sequentially.

 

II.      General Questions

 

1.         SELECT MASTER-INVENTORY-FILE ASSIGN TO device

    ORGANIZATION IS INDEXED

    ACCESS IS SEQUENTIAL

    RECORD KEY IS key-field.

 

2.         SELECT TRANS-FILE ASSIGN TO device

    ORGANIZATION IS INDEXED

    ACCESS IS RANDOM

    RECORD KEY IS TRANSACTION-NO

    ALTERNATE RECORD KEY IS INVOICE-NUMBER.

 

3.         The REWRITE statement is used to write updated records back onto disk.

 


4.         When used with a WRITE statement, the INVALID KEY option will detect different conditions depending on whether the access mode is random or sequential.  With random access, the INVALID KEY clause will detect an attempt to store a record with a key field value that already exists in the file.  With sequential access, the INVALID KEY will detect both a duplicate key or a record that is is out of sequence. 

 

When used with a READ statement the INVALID KEY option will be triggered if the record cannot be found.

 

AT END would be used if the file was accessed sequentially.

 

5.         A disk file is opened as I-O when records are to be read, changed, and rewritten back onto the file.

 

III.       Validating Data

 

1.         The program should include a routine to verify that:

 

a.         TRANS-AMT-IN is a valid numeric value.

b.         TRANS-CODE-IN contains a valid value.

 

2.         A control listing of totals should include:

 

a.         the number of transaction records processed from the TRANSACTION-IN file.

b.         the number of transaction records containing errors.

c.         a detailed explanation of each error found in the transaction file.

 

3.         A batch total should be included for TRANS-AMT-IN.

 


SOLUTIONS TO DEBUGGING EXERCISES

 

In order to correct this program, it is necessary to know what the values of TRANS-CODE represent.  The following solutions assume that there are two possible actions to be taken:  if  TRANS-CODE contains an X, then the program is to delete the corresponding record from the indexed file; otherwise the program should add a record to the indexed file.

 

1.         The hyphen is missing in I-O.  The OPEN statement should read:

 

OPEN INPUT TRANS-FILE

     I-0   INDEX-FILE

 

2.         The READ should have an INVALID KEY clause instead of an AT END clause.

 

3.         a.         The WRITE statement itself is coded correctly, but is in the wrong location.

First, the WRITE and the preceding MOVE should be executed only if there was no corresponding indexed record found by the READ statement.  These statements should be coded within the INVALID KEY segment of the READ statement.  Second, the MOVE and WRITE should be executed only if TRANS-CODE contains something other than X, so they must be preceded by an IF statement that tests the value of TRANS-CODE.

 

b.         The syntax error in the DELETE statement is that it must specify the file name rather than the record name.  There should be an INVALID KEY clause for the DELETE.  There is also a logic error in the location of the DELETE statement.  Since the DELETE should be executed only when the corresponding indexed record was found by the READ statement, it (and the corresponding IF) should be located within the NOT INVALID KEY segment of the READ statement

 

c.         There is a syntax error because there is no INVALID KEY clause.  The REWRITE statement will also cause a run time error because the record originally read has since been deleted by the previous statement in the code and a logic error since it doesn’t make sense to even attempt to REWRITE the record in this situation.

 


4.         When TRANS-CODE = ’X’, the corresponding master record is deleted.  Then after the IF statement, the record is written again. 

 

Instead of doing both of these operations for each transaction record, the program should do only one of the them.  The deletion should take place if (1) the READ statement located the corresponding record on the indexed file and (2) TRANS-CODE contains an X.  A record should be added only when (1) the READ statement did not find a record in the indexed file, and (2) TRANS-CODE does not contain an X.

 

            Module 200-CALC-RTN should be rewritten as shown below.

 

MOVE TRANS-NO TO DISK-TRANS-NO

READ INDEX-FILE

    INVALID KEY

        IF TRANS-CODE = ’X’

       MOVE ’RECORD DOES NOT EXIST; CANNOT DELETE’

           TO MSSGE

            WRITE PRINT-REC FROM ERROR-REC

        ELSE

            MOVE TRANS-AMT TO DISK-AMT

            WRITE DISK-TRANS-REC

        END-IF

    NOT INVALID KEY

        IF TRANS-CODE = ’X’

            DELETE DISK-TRANS-FILE

        ELSE

            MOVE ’RECORD ALREADY EXISTS; CANNOT ADD’

                TO MSSGE

            WRITE PRINT-REC FROM ERROR-REC

        END-IF

END-READ.