CHAPTER 15
INDEXED AND RELATIVE
FILE PROCESSING
CHAPTER OBJECTIVES
After
completion of this chapter, the student should be able to:
1. Describe sequential, indexed, and
relative file organizations.
2. Describe the physical features of
magnetic disks and disk drives.
3.
Design
and code a program that creates an indexed file.
4. Design and code a program that performs
a random update of an indexed file.
5. Design
and code a program that accesses the records of an indexed file sequentially.
6. Design
and code a program that accesses records of an indexed file based on the values
of an ALTERNATE RECORD KEY.
7. Describe
the function and format of the START statement.
8. Describe situations in which DYNAMIC
ACCESS might be used to process an indexed file.
9. Describe
the function and use of the FILE STATUS clause.
10. Explain
how exception handling can be accomplished with the DECLARATIVES segment of the
PROCEDURE DIVISION.
11. Describe
how an indexed file can be used as an external table.
12. Explain
what a relative file is.
13.
Explain how to create and access relative
files both sequentially and randomly.
14. Describe
several methods used to convert a key field to a relative record number.
15. Explain
what a collision is and describe methods for handling collisions.
16. Define
the term hashing.
LECTURE OUTLINE
I. Systems
Considerations for Organizing Disk Files
A. Sequential File Organization
1. The simplest type of disk file
organization is sequential.
2. Records are stored sequentially by a
key field.
3. Sequential records are processed in the
same manner whether they are stored on tape or disk.
4. In order to access a specific record,
the computer must first read past all preceding records.
B. Indexed File Organization
1. An indexed file consists of two files:
a. A data file that stores the actual
physical data.
b. An index file that contains the value
of each key field along with the disk address of the corresponding record.
2. The index enables a user to access a
disk file randomly.
3. To access an indexed record randomly,
the key field is looked up in the index file to find the disk address of the
record; with this address, the record can be accessed directly in the indexed
data file.
4. Records of an indexed file may be
accessed sequentially or randomly. They
may also be processed both sequentially and randomly in the same program.
C. Relative File Organization
1. Relative files also permit random
access.
2. A relative file does not use an index
to access a record randomly.
3. The key field of each record is used to
calculate the record’s relative location in the file.
4. Records of a relative file may be
accessed either sequentially or randomly.
II. Features of Magnetic Disks and Disk
Drives
A. Magnetic disk is a storage medium that
can serve as either input to or output from a program.
B. Most hard disks are fixed disks, which
cannot be removed from the disk drive.
C. Each disk surface of a hard disk writes
data as magnetized bits in concentric circles called tracks.
D. Individual records on disk can be
addressed by:
1. Surface number.
2. Track number.
3. Sector number (floppy disks) or
cylinder number (larger units).
E. On microcomputer disks, the concentric
tracks are segmented into wedge-shaped sectors.
F. On large computers, cylinders are used
for addressing disk records.
III.
Processing Indexed Disk Files
A. Creating an Indexed File
1. Indexed files are created in sequence.
2. Once an indexed file is created, it can
be accessed either sequentially or randomly.
3. Format of the SELECT statement used
when creating an indexed file:
SELECT
file-name-1 ASSIGN to implementor-name-1
[ORGANIZATION IS] INDEXED
[ACCESS MODE IS SEQUENTIAL]
RECORD KEY IS data-name-1
4. The ORGANIZATION IS INDEXED clause indicates
that the file is to be created with an index.
5. The ACCESS clause indicates the method
(sequential or random) that will be used to access the file. It is optional when the file is to be
accessed sequentially, since sequential is the default.
6. No two records in an indexed file can
have the same key field value.
7. The RECORD KEY clause specifies the key
field within the disk record that will be used to form the index.
8. The WRITE statement is modified to
include an INVALID KEY clause when writing records to an indexed file.
9. The INVALID KEY tests for two possible
conditions:
a. a key field that is out of sequence.
b. a key field that is the same as one
already on the indexed file.
10. If INVALID KEY is coded and a record has
an erroneous key:
a. the record is not written.
b. the statements following INVALID KEY
are executed.
11. The format for the WRITE statement with
an INVALID KEY clause:
WRITE
record-name-1 [FROM identifier-1]
[INVALID KEY imperative-statement-1]
12. NOT INVALID KEY and END-WRITE are
additional options of the WRITE statement.
13.
The FILE STATUS clause can be used with
the WRITE statement to identify the specific error that caused the INVALID KEY
clause to be executed.
B. Updating an Indexed File Randomly
1. When updating an indexed file, only two
files are needed. They are:
a. the transaction file.
b. the master index file, which serves as
both an input and output file.
2. Since the indexed disk file may be
accessed randomly, there is no need to sort the transaction file for an update
procedure. Sorting the transaction file
however, can reduce processing time.
3. The ACCESS clause of the SELECT
statement for an indexed file that is to be updated randomly is:
ACCESS MODE IS
RANDOM
4. When updating an indexed file, the file
is opened as I-O.
5. Before executing a random READ, the key
value of the desired record must be moved to the key field of the indexed file.
6. When reading an indexed file randomly,
the READ statement contains an INVALID KEY clause instead of an AT END clause.
7. The INVALID KEY clause of the READ
statement will be executed if the record with the specified key value is not
found in the indexed file.
8.
There is no AT END clause when reading
randomly from an indexed file.
9. The NOT INVALID KEY clause can also be
specified and will be executed if the record is found.
10. After the record is accessed from the
indexed file, it is updated using the transaction data. The updated record is then replaced on the
indexed file with a REWRITE statement.
11. Steps to update an indexed master file:
a. Open the master file as I-O.
b. Read a transaction record or accept
transaction data from the keyboard.
c. Move the desired value from the
transaction record to the RECORD KEY of the master file and READ the
corresponding master record.
d. Change the master record.
e. REWRITE the master record.
12.
The format for the REWRITE statement:
REWRITE
record-name-1 [FROM identifier-1]
[INVALID KEY imperative-statement-1]
[NOT INVALID KEY
imperative-statement-2]
[END-REWRITE]
13. An INVALID KEY will occur on a REWRITE if
the programmer has changed the key field of the record.
14. Interactive processing is often used when
updating records of an indexed file.
15. Types of update transactions are:
a. Making changes to existing records
(READ, change, and REWRITE).
b. Creating new records (WRITE).
c. Deleting existing records (READ and
DELETE or just DELETE).
16. A
coded field in the transaction record is commonly used to specify whether the
corresponding transaction is a change, addition, or deletion.
17. Format of the DELETE statement:
DELETE
indexed-file-name RECORD
[INVALID KEY imperative-statement-1]
[NOT INVALID KEY
imperative-statement-2]
[END-DELETE]
C. Updating an Indexed File with Multiple Transaction
Records for Each Master Record
1.
An indexed file update is exactly the
same regardless of whether there is only one transaction record per master or
there are multiple transactions per master.
2.
Debugging Tips:
a.
Indexed files cannot be created as test
data using a text editor; a program is required to create them.
b.
Always precede the test of an indexed
file update program by running the indexed file creation program to ensure that
the update program is being run against a “clean” copy of the file.
c.
Always use scope terminators and include
periods only for the last sentence in a paragraph.
D. Accessing or Reading from an Indexed
File for Reporting Purposes
1. An indexed file may be accessed either
sequentially or randomly for reporting purposes.
2. An indexed file may be processed
sequentially either by using the key field or by sorting on some other field.
3. An indexed file may be processed
randomly by directly accessing the desired record based on the value of the key
field.
IV.
Additional Options for Indexed File
Processing
A.
Using ALTERNATE RECORD KEYs
1.
Indexed files may be created with and
accessed by more than one identifying key field.
2.
The additional key fields are declared in
the SELECT statement with the ALTERNATE RECORD KEY clause.
3.
If the ALTERNATE RECORD KEY need not be
unique, the WITH DUPLICATES clause is used.
4.
An indexed file can be accessed by its
RECORD KEY or any of its ALTERNATE RECORD KEYs.
5.
For each RECORD KEY and ALTERNATE RECORD
KEY established, an index is created that can be used for accessing records
with specific values of that key.
6.
A record can be accessed randomly by any
of its alternate keys by specifying the key to be used in the READ statement.
7.
When there is more than one record with
the same ALTERNATE RECORD KEY value, the one retrieved by a random READ will be
the record that was first placed in the indexed file.
B.
The START Statement
1.
The START statement enables a program to
begin processing an indexed file sequentially at a record location other than
the first or next physical record in the file.
2. In order to use the START statement,
the SELECT clause for the indexed file must specify ACCESS IS SEQUENTIAL or
ACCESS IS DYNAMIC.
3. The format for the START statement:
START
file-name-1
[KEY {IS EQUAL TO
IS =
IS
GREATER THAN
IS
>
IS
NOT LESS THAN
IS
NOT <
IS
GREATER THAN OR EQUAL TO
IS
>=}
data-name-1]
[INVALID KEY imperative-statement-1]
[NOT INVALID KEY
imperative-statement-2] [END-START]
4. If
the KEY clause of the START statement is omitted, the relational operator ‘IS
EQUAL TO’ is implied and the primary record key is assumed to be the key of
reference.
5. The
START statement locates the specified record but does not READ it into storage.
C. Accessing an Indexed File Dynamically
1.
When
a file needs to be accessed both sequentially and randomly in the same program,
ACCESS IS DYNAMIC is specified in the SELECT statement.
2.
The
READ … NEXT RECORD Instruction
a.
When
using ACCESS IS DYNAMIC, it is necessary to distinguish between sequential and
random READs.
b.
The
NEXT RECORD clause is used to specify that a READ statement is a sequential
READ.
D. A Review of the INVALID KEY Clause
1.
When
used with a random READ statement, INVALID KEY means there is no record on the file
with the specified key.
2.
When
used with a random WRITE statement, INVALID KEY means that the file already
contains a record with the specified key or that there is no more room
available in the file.
3.
When
used with a REWRITE, DELETE, or START statement, INVALID KEY means that no
record with the specified key can be found on the file. In the case of a REWRITE it can also mean
that the programmer has changed the value of the RECORD KEY field since the
record was read.
E.
ALTERNATE RECORD KEYs
Reduce the Need for Multiple Copies of the File
F. The FILE STATUS Clause
1. The FILE STATUS clause can be included
at the end of the SELECT statement. The
FILE STATUS field is used to determine the exact cause of an INVALID KEY
condition when accessing the corresponding file.
2. The FILE STATUS clause, if included as
part of the SELECT statement, appears as follows:
SELECT ...
FILE STATUS is data-name
3. The data-name specified in the FILE
STATUS clause must be defined in WORKING-STORAGE as a two-position alphanumeric
field.
4. When an input or output operation is
performed on the file associated with the FILE STATUS clause, the operating
system will place a value in the FILE STATUS field.
5. The programmer may test the value of
the FILE STATUS field in the PROCEDURE DIVISION.
6. Refer to the text for a list of
commonly-encountered values of the FILE STATUS field.
7. The FILE STATUS clause may be specified
with any type of file.
8.
The FILE STATUS field allows the program
to determine the precise cause of an input or output error.
1.
It
is possible to establish a separate section in the program designed to handle
all input/output errors.
2.
Such
exception handling routines are placed in the DECLARATIVES segment of the
PROCEDURE DIVISION.
3.
The
DECLARATIVES segment always appears first in the PROCEDURE DIVISION.
4.
Each
section in the DECLARATIVES segment contains a USE statement that specifies the
corresponding file.
5.
The
appropriate section within the DECLARATIVES segment is automatically executed
whenever an input or output error occurs for the corresponding file.
V.
Using
an Indexed File as an External Table
1.
An
alternative to loading an external table into primary memory is to store the
table data as an indexed file.
2.
The
main reason for using an indexed file as an external table is to save primary
storage space.
3.
The
key field of the input record can be used to retrieve the corresponding table
entry into memory from auxiliary storage.
4.
Processing
an indexed file as an external table is somewhat slower than processing a table
that has been loaded into main memory.
VI.
Proposed
COBOL 2002+ Changes
1.
A
“less than” condition will be available with the START statement.
2.
The
DELETE FILE statement will be added to allow the program to delete an entire
file.
VII. Processing Relative Disk Files
A. What is a Relative File?
1. A relative file is a file in which each
record is identified by its relative record number.
2. The relative record number is the
actual physical record location relative to the beginning of the file.
3. Relative files allow records to be
accessed randomly and sequentially.
4. Accessing relative files is very fast
because there is no need to look up a disk address from an index.
5. Relative files are best used when each
record contains a built-in record number.
6. Key fields of relative records are
converted to actual addresses.
7. The field that supplies the key
information can also serve as a relative record number or RELATIVE KEY.
8. The statements in the PROCEDURE
DIVISION for random or sequential processing of relative files are similar to
those for indexed files.
9. ORGANIZATION IS RELATIVE is required in
the SELECT statement for a relative file.
10. The RELATIVE KEY clause is required in
the SELECT statement when ACCESS is RANDOM or DYNAMIC.
11. The RELATIVE KEY clause is optional in
the SELECT statement when ACCESS is SEQUENTIAL.
12. The relative key is defined in
WORKING-STORAGE; it is not part of the relative record.
13. A FILE STATUS field may be defined and
used with a relative file in the same way as with indexed files.
B. Creating Relative Files
1. Relative files are created
sequentially.
2. The
computer creates a record for every value of the RELATIVE KEY, automatically
inserting blank records for any RELATIVE KEY values that having no
corresponding record in the input file.
3. If the key field is not consecutive or
if it is too large, it can be converted to a relative key using different types
of algorithms.
C. Sequential Reading of Relative Files
1. Records in a relative file may be read
sequentially in the order in which they were created.
2. A sequential READ reads the records in
ascending relative key order.
3. The RELATIVE KEY need not be specified
in the SELECT statement when the file is processed sequentially.
D. Random Reading of Relative Files
1. A relative file may be read randomly by
specifying ACCESS IS RANDOM in the SELECT statement.
2. An INVALID KEY clause is executed if
the key on the query file does not match a key on the relative file.
E. Random Updating of Relative Files
1. To randomly update a relative file,
READ each record to be changed, make the desired changes, and REWRITE the
record.
2. The relative file must be opened as
I-O.
3. The INVALID KEY clause of a READ
statement is executed if the record in the transaction file does not match a
corresponding record in the relative file.
4. The INVALID KEY clause of a REWRITE
statement is executed if the key field in WORKING-STORAGE has a value outside
of the file's range.
5. The DELETE statement is used to delete
records from a relative file.
VIII. Converting A Key Field To A RELATIVE KEY
A. A relative file's key can be its actual
disk location, or the key field may be used to compute the disk address.
B. Methods used to convert a key field
into a relative record number are called hashing algorithms.
C. The same hashing method that was used to
create the relative file must be used whenever accessing the file randomly.
D. A collision occurs when two or more
relative keys are transformed by the randomizing algorithm into the same record
key address.
1. One solution to the collision problem
is to create a file where colliding records are placed in an overflow area.
2. Another method for dealing with collisions
is to add 1 to the relative key and attempt to write the record in the next
disk location.
E. Commonly-used randomizing or hashing
algorithms:
1. Divide the key field by the number of
records in the file and use the remainder as the relative address.
2. Split the key, add the parts, and
truncate any extra digits (folding).
3. Use selected digits to form a relative
record number (digit extraction).
4. Square the key value and truncate to
the desired number of digits (square value truncation).
F. It is not efficient to process relative
files sequentially when a randomizing algorithm is used to compute the RELATIVE
KEY.
SOLUTIONS
TO REVIEW QUESTIONS
I. True-False Questions
1. F An
indexed file is created with ACCESS IS SEQUENTIAL and read
with ACCESS IS RANDOM or
ACCESS IS SEQUENTIAL.
2. T
3. T
4. T
5. T
6. T
7. T
8. F The
relative file's key field is defined in WORKING-STORAGE.
9. T
10. T
11. F We
would use REWRITE EMPLOYEE-REC.
12. F AT
END would be used if the file was accessed sequentially.
II. General Questions
1. SELECT MASTER-INVENTORY-FILE
ASSIGN TO device
ORGANIZATION IS INDEXED
ACCESS IS SEQUENTIAL
RECORD KEY IS key-field.
2. SELECT TRANS-FILE ASSIGN TO
device
ORGANIZATION IS INDEXED
ACCESS IS RANDOM
RECORD KEY IS TRANSACTION-NO
ALTERNATE RECORD KEY IS INVOICE-NUMBER.
3. The REWRITE statement is used to write
updated records back onto disk.
4. When used with a WRITE statement, the
INVALID KEY option will detect different conditions depending on whether the
access mode is random or sequential.
With random access, the INVALID KEY clause will detect an attempt to
store a record with a key field value that already exists in the file. With sequential access, the INVALID KEY will
detect both a duplicate key or a record that is is
out of sequence.
When used with a READ
statement the INVALID KEY option will be triggered if the record cannot be
found.
AT END would be used
if the file was accessed sequentially.
5. A disk file is opened as I-O when
records are to be read, changed, and rewritten back onto the file.
III. Validating Data
1. The program should include a routine to
verify that:
a. TRANS-AMT-IN is a valid numeric value.
b. TRANS-CODE-IN contains a valid value.
2. A control listing of totals should
include:
a. the number of transaction records
processed from the TRANSACTION-IN file.
b. the number of transaction records
containing errors.
c. a detailed explanation of each error
found in the transaction file.
3. A batch total should be included for
TRANS-AMT-IN.
SOLUTIONS
TO DEBUGGING EXERCISES
In order to correct
this program, it is necessary to know what the values of TRANS-CODE
represent. The following solutions
assume that there are two possible actions to be taken: if
TRANS-CODE contains an X, then the program is to delete the
corresponding record from the indexed file; otherwise the program should add a
record to the indexed file.
1. The hyphen is missing in I-O. The OPEN statement should read:
OPEN INPUT TRANS-FILE
I-0
INDEX-FILE
2. The READ should have an INVALID KEY
clause instead of an AT END clause.
3. a. The
WRITE statement itself is coded correctly, but is in the wrong location.
First, the WRITE and the preceding MOVE
should be executed only if there was no corresponding indexed record found by
the READ statement. These statements
should be coded within the INVALID KEY segment of the READ statement. Second, the MOVE and WRITE should be executed
only if TRANS-CODE contains something other than X, so they must be preceded by
an IF statement that tests the value of TRANS-CODE.
b. The syntax error in the DELETE
statement is that it must specify the file name rather than the record
name. There should be an INVALID KEY
clause for the DELETE. There is also a logic
error in the location of the DELETE statement.
Since the DELETE should be executed only when the corresponding indexed
record was found by the READ statement, it (and the corresponding IF) should be
located within the NOT INVALID KEY segment of the READ statement
c. There is a syntax error because there
is no INVALID KEY clause. The REWRITE
statement will also cause a run time error because the record originally read
has since been deleted by the previous statement in the code and a logic error
since it doesn’t make sense to even attempt to REWRITE the record in this
situation.
4. When TRANS-CODE = ’X’, the
corresponding master record is deleted.
Then after the IF statement, the record is written again.
Instead of doing both
of these operations for each transaction record, the program should do only one
of the them. The deletion should take
place if (1) the READ statement located the corresponding record on the indexed
file and (2) TRANS-CODE contains an X. A
record should be added only when (1) the READ statement did not find a record
in the indexed file, and (2) TRANS-CODE does not contain an X.
Module 200-CALC-RTN should be
rewritten as shown below.
MOVE
TRANS-NO TO DISK-TRANS-NO
READ
INDEX-FILE
INVALID KEY
IF TRANS-CODE = ’X’
MOVE ’RECORD DOES NOT EXIST; CANNOT
DELETE’
TO MSSGE
WRITE PRINT-REC FROM ERROR-REC
ELSE
MOVE TRANS-AMT TO DISK-AMT
WRITE DISK-TRANS-REC
END-IF
NOT INVALID KEY
IF TRANS-CODE = ’X’
DELETE DISK-TRANS-FILE
ELSE
MOVE ’RECORD ALREADY EXISTS; CANNOT
ADD’
TO MSSGE
WRITE PRINT-REC FROM ERROR-REC
END-IF
END-READ.