364: File and Data Structures

Advanced data organization and persistent storage techniques.

Topics

  • Sequential, indexed, and hashed file organization

  • B-trees and indexed sequential access method (ISAM)

  • External sorting algorithms

  • Record-level locking and concurrency basics

Projects

Merge Sort (364/merge.c)

Merges two pre-sorted input files into a single sorted output file, illustrating external merge-sort — the classical technique for sorting data too large to fit in memory.

fscanf(infile1, "%d", &i);
fscanf(infile2, "%d", &j);

while (!feof(infile1) && !feof(infile2)) {
  if (i <= j) {
    fprintf(outfile, "%d\n", i);
    fscanf(infile1, "%d", &i);
  } else {
    fprintf(outfile, "%d\n", j);
    fscanf(infile2, "%d", &j);
  }
}
/* flush remaining elements from whichever file is not exhausted */
while (fscanf(infile1, "%d", &i) != EOF) fprintf(outfile, "%d\n", i);
while (fscanf(infile2, "%d", &j) != EOF) fprintf(outfile, "%d\n", j);

File Splitter (364/split.c)

Reads a stream of integers and splits it into separate files whenever the sequence decreases, producing pre-sorted runs suitable for external merge sort.

outfile = fopen(tempnam(OUTFILE_DIR, s), "w");
while (fscanf(infile, "%d", &j) != EOF) {
    if (i >= j) {
        /* sequence decreased — start a new sorted run */
        fclose(outfile);
        outfile = fopen(tempnam(OUTFILE_DIR, s), "w");
    }
    fprintf(outfile, "%d\n", j);
    i = j;
}

Random Number Generator (364/rng.c)

Generates N pseudo-random integers to stdout, used to produce test data for the sort and merge programs.

int x = atoi(argv[1]);
for (i = 0; i < x; i++)
    printf("%ld\n", random());

Temporary File Manager (364/tmpnam.c)

Creates and manages a series of temporary files, each holding a segment of monotonically increasing integers — demonstrating POSIX temporary file creation and file-based data staging.

printf("L_tmpnam = %d\n", L_tmpnam);
printf("TMP_MAX  = %d\n", TMP_MAX);

for (i = 0; i < j; i++)
    printf("%s\n", tmpnam(s));  /* generate a unique temp filename */

Buffer Formatter (364/buffer.c)

Processes binary format strings through a buffer, demonstrating low-level data formatting and buffer management techniques used in record-oriented file I/O.

#define BUF_SIZE 700

/* Dispatch on format specifier and write typed data into output buffer */
void do_string(int length, char ***buf, int fd, char out[BUF_SIZE],
               int *data_idx, int *s_idx, int *which, int *data_which);
void do_int   (int length, char ***buf, int fd, char out[BUF_SIZE],
               int *data_idx, int *d_idx, int *which, int *data_which);
void do_float (int len1, int len2, char ***buf, int fd, char out[BUF_SIZE],
               int *data_idx, int *f_idx, int *which, int *data_which);

Knockout Tournament (364/ko.c)

Implements a knockout (single-elimination) tournament bracket algorithm, exploring indexed record access and structured file organization.

struct node {
    int winner, loser;
    int index, lindex;
    int winner_index, loser_index;
};

/* Allocate bracket array: max_row rounds × max_col*2 match slots */
temp_array = (int **)malloc((max_row + 1) * sizeof(int *));
for (i = 0; i < max_row + 1; i++)
    temp_array[i] = (int *)malloc(max_col * 2 * sizeof(int));