Balance Line Algorithm Página 1 de 4
1
Chapter 6 - Balance Line Algorithm
Coordinated Processing
One of the motivations for ali of the effort and research spent on sorting was the need to coordinate
processing in two or more sequential files at the same time. Because early processing was mostly
batch processing, it was necessary to devise a way that the contents of two files could be processed
reliably and without intervention of a human operator. The typical scenario was for a file of
transactions to be collected throughout the day and either punched onto cards or keyed to a magnetic
file. At the end of the day, the transaction file would be input to a program that would also use a
master file as input. One output of the program, among other things, was to be an updated master file
that reflected the results of the transactions.
Clearly, taking the transactions in the random order that they arrived and searching sequentially for
the corresponding record in the master file would be unacceptable. Even if the m' aster file were
sorted and the information in key subblocks were used, it would still take multiple time-consuming
passes over the data. However, if both files are sorted on the same key field in the same sequence, it
may be possible to devise logic to process both files completely with only a single pass over each
file.
Balance Line Algorithm
The origin of the catchy name seems to be undocumented but probably derives from the 'balancing'
of key values against each other that forms the heart of the algorithm. Johnson and Cooper (1981, p.
59) point out that this is more than an algorithm; it is a design approach and many customized
routines may be based on it. This discussion will concem itself with the two-file problem; however,
it may be extended to as many files as necessary.
The fundamental assumption of the balance line is that the files be sorted on the same key
fields and in the same sequence. For ali of the following discussion, the assumption is that the files
are sorted in ascending sequence. Obviously, the key fields must be comparable, containing data in
like formats and values from the same domain of values. Continuing with the transaction file-master
file scenario established above, let File 1 with Key 1 be the transaction file and File 2 with Key 2 be
the master file. Further, it is assumed that the transactions are only updates; additions and deletions
are not used in this process (This is not very realistic, but it does provide a starting point for
discussion). Consider the following: -
if (key' < key2)
/* THIS IS AN ERROR CONDITION
REPORT THE ERROR AND READ
ANOTHER TRANSACTION RECORD */
else
if(key1 > key2)
/* THERE IS NOTHING FOR THIS MASTER RECORD
WRITE THE RECORD TO THE NEW MASTER AND
READ ANOTHER MASTER RECORD */ •
else /* keys must be equal */
[Link] 11/05/2004
Balance Line Algorithm Página 2 de 4
/* APPLY TRANSACTION TO THE MASTER RECORD
AND READ ANOTHER TRANSACTION RECORD */
•
The above logic is nested in a loop that continues until both input files are at end of file. Most
implementations of the balance une insure the process runs to proper completion by moving a high
value (some value larger than possible with any proper key) to the key field of a file that has reached
end of file. The other control elements follow.
fread(&recordl,sizeof(record1),1,filel);
if(feof(file1)) keyl = OxFFFFU;
fread(&record2,sizeof(record2),1,file2);
if(feof(file2)) key2 = OxFFFFU;
while( !(key1 == OxFFFFU && key2 == OxFFFFU))
/* THE IF LOGIC ABOVE GOES HERE */
The key fields above are implemented as unsigned integers. The OxFFFFU represents the highest
value that can be represented in an unsigned integer. The initializing reads set up the conditions for
the loop, but other actions may be desired should the first access to either file be an end of file,
perhaps a branch to an abrupt exit from the program. Other than initial end of file conditions, the
code above, coupled with the nested if statements and appropviate actions on each condition, to
include moving a high value to the key field when end of file is reached, will run to compIetion
and not miss any records. It is suggested that the student simulate this by hand with selected key
values to understand how the logic perfonms in all cases.
To process multiple types of transactions, there must be codes in the transaction file to indicate the
type of transaction. The specific transaction types that need to be addressed are those that deal with
adding, updating , or deleting records from the master file. The codes to represent these three
transactions must be chosen so that the transaction file may be sorted on the transaction code as a
minor sort key and ordered so that the add code will appear ahead of updates and that updates will
appear before deletes. 1, 2, and 3 have been suggested as codes; A(dd) C(hange), and D(elete)
would work as well. In implementing the balance line, the logic must be expanded to check the
transaction code when KEY1 < KEY2 and when KEY1 == KEY2 and the appropriate actions taken
for each combination of conditions.
Merging
The balance line is also the basis for merging files, but the logic must be changed. The fundamental
assumption remains as before, both files must be sorted on the same keys in the same sequence.
Normally, both files might be expected to have the same record format, as would be the case when
merging runs from the same original file; however, it is not an absolute requirement.
Consider the simplest of the problems, merging two files with the same record format sorted on the
same key in the same sequence. First, it must be decided what to do with matches. For the purpose of
this example, it will be assumed that neither of the matching records will be output to the result file;
rather, both of the matching records will be output to a separate file for matches that will be
examined to determine if they are duplicates or if one is to be retained over the other. Second, it must
be decided whether to doublecheck for sorting errors and for duplicates within each of the files; to do
so will require additional logic and memory to keep duplicate records for each file and to compare
the record just read with the previous record from the same file. To keep this simple, this "sequence
[Link] aihaworth/isq a3300/[Link] 11/05/2004
Balance Line Algorithm Página 3 de 4-
checking" will not be implemented, although the student must understand that sequence checking
must be implemented in any production code. With these considerations in mind, examine the
following example which is built on merging two lists of city names.
fgets(keyl, filei);
if(feof(file1)) strcpy(keyl, "-");
fgets(key2,file2);
if(feof(file2)) strcpy(key2, "-");
while( 1(key1[0] == 1 - 1 && key2[0] == 1 - 1 ))
result = strcmp(key1, key2);
if (result < O)
fputs(keyl, file3);
fgets(keyl, filei);
if(feof(file1)) strcpy(key1, "-");
else
if (result > O)
fputs(key2, file3);
fgets(key2, file2);
if(fec5f(file2)) strcpy(key2, "-H);
else /* keys must be equal */
fputs(keyl, file4);
fputs(key2, file4);,
fgets(key1, filei);
if(feof(file1)) strcpy(keyl, "-");
fgets(key2, file2);
if(feof(file2)) strcpy(key2, "-H);
Some commentary on the code is in order. First, when either input file reaches end of file, a tilde is
moved to the first character of the key field. Tilde is larger than any other printable character and in
the strcmp0 function will cause the field containing the tilde to be evaluated as larger than the other
field. The strcmp() function is needed to evaluate the string keys. And finally and obviously, the
above code is appropriate for cases where the key field is a string.
Three Files
Students are often uncertain of how to implement the logic for handling three files simultaneously.
Remember, the fundamental assumption still applies. The logic is based on determining which key is
smaller among the three at hand. The only tricky part is the case in which two equal keys are smaller
than the other, it must be decided how to pick among the ties. In the case of merging three files, an
arbitrary choice is appropriate. Consider the following control structure based on integer keys.
if(key1 <= key2 && keyl <= key3)
/* whatever is appropriate when key' is smallest */
[Link] 11/05/2004
Balance Line Algorithm Página 4 de 4
é
/* this would also cover the case of ali three keys equal */
/* if one set of conditions will not cover the case of ali three keys equal,
then a separate test for ali three keys equal should be placed Èirst
else
if (key2 <= keyl && key2 <= key3)
/* whatever is appropriate when key2 is smallest */
else
if (key3 <= keyl && key3 <= key2)
/* whatever is appropriate when key3 is smallest */
Summary
The usefulness of the balance une logic is that it can be adapted to implement many business
processes. In addition to posting transactions to a master file, it may be used to clean duplicates from
two or more lists and to merge two or more files into one. All that needs to be changed is the detailed
processing logic for each set of conditions. Balance une control logic remains a mainstay in the
toolkit of the business programmer.
[Link] 11/05/2004