Handling indexed-sequential files in COBOL.
Most programming languages natively support two types of file organization:
- Sequential file organization stores the data in sequence order. We can access the data sequentially and the data can be stored only at the end of the file.
- Relative file organization stores the record on the basis of their relative address. Each record is identified by its Relative Record Number, that corresponds to the position of the record from the beginning of the file. The records can be accessed sequentially, as well as randomly; in this latter case, the user must specify the relative record number as key.
- A line sequential file (text file; ASCII file) is a simple text file, where the records are separated from each other by adding a delimiter at their end. In the case of DOS/Windows, the carriage return (0Dh) and line feed (0Ah) is added, whereas in UNIX only the line feed is added at the end of the record.
- A record sequential file is a sequence of data structures (records) defined by the programmer. In most programming languages these records must all have the same length; in this case, the file is said to be a fixed length record sequential file.
COBOL is intended for business applications, and in this domain dealing with huge data files and quickly accessing given records is essential. That's why, from the beginning, COBOL supports file organizations that we don't normally find in other programming languages (except for PL/I). First, COBOL includes all we need to handle variable length record sequential files. And second, COBOL natively supports indexed-sequential (ISAM) file organization.
In fact, the usage of relative files in order to randomly access a record (i.e. to directly access a given record without having to read other ones) is limited. First, there is no direct relationship between the key that we want to use to get a given record (ex: an account number) and the key of the record in the file (the relative record number). And second, relative files can only have one key, what in most real world cases is not sufficient for easy and quick file processing. That's where the indexed-sequential file organization comes in:
- The records are not stored by their relative address, as in relative files, but by a record key, which is part of the data structure, thus has a well defined meaning (ex: an account number).
- The records are written to the file in sequential order, depending on the key values (values of the record key). Thus, in an indexed-sequential file with bank account data, and the account number as record key, the records in the file are sorted by the account number, and reading the file sequentially gives us the full sorted list of all accounts.
- Instead of reading the file sequentially, we can also read it randomly: retrieving the record with a given key (account number). This is slower than random access on a relative file, because the position of the record must be searched for in the file index, but much faster than iterating through a sequential file.
- Finally, as a difference with relative files, indexed-sequential files allow the creation and usage of alternate keys, what allows to quickly retrieve a given record based on the value of some other field of the data structure (ex: the customer's name).
This tutorial shows (in three small sample programs) how to create an indexed-sequential file, how to access this file sequentially (in order to display all records), and how to access the file randomly (in order to retrieve a given record based on its primary or alternate key). The programs have been developed with Visual COBOL for Visual Studio Personal Edition on Windows 11. I'm not sure how far the code is valid without adaptions using another COBOL development environment. Click the following link to download the source code of the 3 sample programs (with Visual COBOL, you'll have to create a project, and add this source code to the project's .cbl file).
The reader is supposed to have a basic to intermediate knowledge of the COBOL programming language: the divisions and sections and their usage, variable declaration, basic COBOL statements, declaration of files and instructions to access them. Some details about files in COBOL:
- COBOL supports several file organizations: SEQUENTIAL, LINE SEQUENTIAL, INDEXED, and RELATIVE. The file organization has to be specified in the organization is part of the select clause of the file-control paragraph of the input-output section of the environment division.
- COBOL supports three file access modes: SEQUENTIAL, RANDOM, and DYNAMIC (the latter one allowing both sequential and random access). The file access mode has to be specified in the access is part of the select clause. Obvious, that this is for indexed and relative files only; sequential files may only be accessed sequentially.
- Relative files have one key, that must be specified in the relative key is part of the
select clause. Indexed files have one or more keys. The record key (a sort of primary key) must be specified in the
record key is, other keys in the alternate key is part(s) of the select clause. - COBOL supports four file open modes: INPUT (read operations), OUTPUT (write operations), EXTEND (to append records), I-O (allowing read and write operations).
- COBOL supports several I/O operations: OPEN, CLOSE, READ, WRITE, REWRITE, DELETE, START. If you may use them or not depends on which of the characteristics described above actually apply.
Program 1: Creating an "amino acids" ISAM file.
The file used in our examples contains data about the 20 proteinogenic amino acids: their name, 1-letter and 3-letter IUB/IUPAC codes, and their simple and extended molecular formulas. Our first program (isam-create.cbl) reads the amino acids data from a text file and creates an ISAM file. In this file, the 1-letter code will be used as record key, the 3-letter code as alternate key.
Here are the first lines of the amino acids text file, used as input to the program isam-create.cbl.
Alanine Ala A C3H7NO2 CH3-CH(NH2)-COOH
Arginine Arg R C6H14N4O2 HN=C(NH2)-NH-(CH2)3-CH(NH2)-COOH
Asparagine Asn N C4H8N2O3 H2N-CO-CH2-CH(NH2)-COOH
Aspartic acid Asp D C4H7NO4 HOOC-CH2-CH(NH2)-COOH
Cysteine Cys C C3H7NO2S HS-CH2-CH(NH2)-COOH
The program logic is quite simple: Reading the text file line after line, and using the line data to write a record to the ISAM file. Note that error handling is coded in the declaratives of the procedure division, using the use after error statement.
Here is the code of the sample program isam-create.cbl:
identification division.
program-id. isam-create.
author. allu.
date-written. March 2024.
environment division.
input-output section.
file-control.
select INFILE assign to "aa.txt"
organization is line sequential
status is Infile-status.
select OUTFILE assign to "aa.dat"
organization is indexed
access is random
record key is Outfile-code1
alternate key is Outfile-code3
status is Outfile-status.
data division.
fd INFILE.
01 Infile-record.
05 Infile-name pic X(13).
05 filler pic X.
05 Infile-code3 pic XXX.
05 filler pic X.
05 Infile-code1 pic X.
05 filler pic X.
05 Infile-formula1 pic X(10).
05 filler pic X.
05 Infile-formula2 pic X(32).
fd OUTFILE.
01 Outfile-record.
05 Outfile-code1 pic X.
05 Outfile-code3 pic XXX.
05 Outfile-name pic X(13).
05 Outfile-formula1 pic X(10).
05 Outfile-formula2 pic X(32).
working-storage section.
77 Record-Count pic 99.
77 Infile-eof pic 9.
77 Infile-status pic XX.
77 Outfile-status pic XX.
procedure division.
declaratives.
Infile-errors section.
use after error procedure
on INFILE.
Infile-error.
display "Error when reading 'aa.txt': " no advancing.
display Infile-status.
stop run.
Infile-error-out.
exit.
Outfile-errors section.
use after error procedure
on OUTFILE.
Outfile-error.
display "Error when writing 'aa.dat': " no advancing.
display Outfile-status.
stop run.
Outfile-error-out.
exit.
end declaratives.
main section.
display "Copying 'aa.txt' to 'aa.dat'".
perform Open-files.
move 0 to Infile-eof.
move 0 to Record-Count.
perform Copy-files until Infile-eof = 1.
perform Display-count.
perform Close-files.
stop run.
Open-files.
open input INFILE.
open output OUTFILE.
Copy-files.
read INFILE
at end
move 1 to Infile-eof
not at end
perform Write-file.
Write-file.
move Infile-code1 to Outfile-code1.
move Infile-code3 to Outfile-code3.
move Infile-name to Outfile-name.
move Infile-formula1 to Outfile-formula1.
move Infile-formula2 to Outfile-formula2.
write Outfile-record.
add 1 to Record-Count.
Display-count.
display "Number of records copied: " no advancing.
display Record-Count.
Close-files.
close INFILE.
close OUTFILE.
The screenshot below shows the execution of the program and a directory listing showing the input text file aa.txt and the output ISAM file aa.dat. Note that the ISAM file is somewhat five times the size of the original text file. The space needed by the index(es) is one of the disadvantages of indexed-sequential files...
Program 2: Displaying the amino acids list.
Our second program sample (isam-list.cbl) makes a list of the 20 amino acids by sequentially reading the ISAM file from the beginning to the end. Note that to do this, we have to specify access is sequential in the file's select statement. Also note, that the code to sequentially read an ISAM file is exactly the same than the one to read an ordinary sequential file!
Here is the code of the sample program isam-list.cbl:
identification division.
program-id. isam-list.
author. allu.
date-written. March 2024.
environment division.
input-output section.
file-control.
select INFILE assign to "aa.dat"
organization is indexed
access is sequential
record key is Infile-code1
alternate key is Infile-code3
status is Infile-status.
data division.
fd INFILE.
01 Infile-record.
05 Infile-code1 pic X.
05 Infile-code3 pic XXX.
05 Infile-name pic X(13).
05 Infile-formula1 pic X(10).
05 Infile-formula2 pic X(32).
working-storage section.
77 Infile-eof pic 9.
77 Infile-status pic XX.
01 Display-line.
05 Display-code1 pic X.
05 filler pic XX value spaces.
05 Display-code3 pic XXX.
05 filler pic XX value spaces.
05 Display-name pic X(13).
05 filler pic XX value spaces.
05 Display-formula1 pic X(10).
05 filler pic XX value spaces.
05 Display-formula2 pic X(32).
procedure division.
declaratives.
Infile-errors section.
use after error procedure
on INFILE.
Infile-error.
display "Error when reading 'aa.dat': " no advancing.
display Infile-status.
stop run.
Infile-error-out.
exit.
end declaratives.
main section.
display "Displaying the content of 'aa.dat'".
open input INFILE.
move 0 to Infile-eof.
perform Read-and-Display until Infile-eof = 1.
close INFILE.
stop run.
Read-and-Display.
read INFILE
at end
move 1 to Infile-eof
not at end
perform Display-record.
Display-record.
move Infile-code1 to Display-code1.
move Infile-code3 to Display-code3.
move Infile-name to Display-name.
move Infile-formula1 to Display-formula1.
move Infile-formula2 to Display-formula2.
display Display-line.
Program output:
Program 3: Searching for a given amino acid.
The third program sample (isam-search.cbl) may be used to search for a given amino acid, using either the 1-letter, or the 3-letter code. Here, we access the ISAM file randomly, thus we have to specify access is random in the file's select statement. The read clause, instead of including a test of the end-of-file condition, as is the case with sequential read, has to include a test for the case where the key is invalid. If nothing is specified, the primary key will be used for the search. If, instead, we want to search using an alternate key, this one has to be specified as key is part of the read statement.
Here is the code of the sample program isam-search.cbl:
identification division.
program-id. isam-search.
author. allu.
date-written. March 2024.
environment division.
input-output section.
file-control.
select INFILE assign to "aa.dat"
organization is indexed
access is random
record key is Infile-code1
alternate key is Infile-code3
status is Infile-status.
data division.
fd INFILE.
01 Infile-record.
05 Infile-code1 pic X.
05 Infile-code3 pic XXX.
05 Infile-name pic X(13).
05 Infile-formula1 pic X(10).
05 Infile-formula2 pic X(32).
working-storage section.
77 Program-option pic X.
77 Infile-status pic XX.
01 Display-line.
05 Display-code1 pic X.
05 filler pic XX value spaces.
05 Display-code3 pic XXX.
05 filler pic XX value spaces.
05 Display-name pic X(13).
05 filler pic XX value spaces.
05 Display-formula1 pic X(10).
05 filler pic XX value spaces.
05 Display-formula2 pic X(32).
procedure division.
declaratives.
Infile-errors section.
use after error procedure
on INFILE.
Infile-error.
display "Error when reading 'aa.dat': " no advancing.
display Infile-status.
stop run.
Infile-error-out.
exit.
end declaratives.
main section.
open input INFILE.
perform Ask-for-option
until Program-option = "0".
close INFILE.
stop run.
Ask-for-option.
display "Options: 1=1-letter code, 3=3-letter code ? " no advancing.
accept Program-option.
if Program-option = "1"
perform Search-with-1-letter-code
else
if Program-option = "3"
perform Search-with-3-letter-code
else
move "0" to Program-option.
Search-with-1-letter-code.
display "1-letter AA code ? " no advancing.
accept Infile-code1.
read INFILE
invalid key
perform Key-Not-Found
not invalid key
perform Display-record.
Search-with-3-letter-code.
display "3-letter AA code ? " no advancing.
accept Infile-code3.
read INFILE
key is Infile-code3
invalid key
perform Key-Not-Found
not invalid key
perform Display-record.
Display-record.
move Infile-code1 to Display-code1.
move Infile-code3 to Display-code3.
move Infile-name to Display-name.
move Infile-formula1 to Display-formula1.
move Infile-formula2 to Display-formula2.
display Display-line.
Key-Not-Found.
display "Invalid or unknown AA code".
Program output:
If you find this text helpful, please, support me and this website by signing my guestbook.