Assignment 1: Command Line Arguments and Pointers


Over the course of the four assignments in this course you will implement different components of a simple file synchronization system. For examples, take a look at rsync or consider the services provided by OneDrive, Dropbox, Google Drive, or Box. The components you’ll be asked to build could also form the basis for a version control system like svn or git. In this, assignment, you’ll be writing a program to compute hash values for files.

Background: Why Hashes?

To function, a file synchronization system must be able to compare a set of files in one location with a set of files in a different location. As an example, let’s consider rsync. In its default mode, rsync simply checks the modification time and size of the files at the source and destination. If either the modification time or size differs for a particular file, then the file is copied from the source to the destination. However, this risks missing cases where the size of the file is unmodified by an edit. To solve this problem, rsync could simply transmit all of the files and compare each byte, but this would be a tremendous waste of bandwidth (and time) when a file is unchanged. Instead, rsync can compare small hashes that represent the contents of each file.

hash is the output of a hash function. A hash function is a function that takes, as input, a piece of data of arbitrary size (the content of a file, in our case) and that generates a value (typically much smaller than the original data) of fixed size. The key characteristic of a good hash function is that two pieces of data that are slightly different (i.e., modified files) will have different hash values except in rare cases. (If you are taking CSC263/B63, you will study be studying hashes soon.)

So, instead of comparing every byte of two files, rsync can compute a small hash for each file and compare them. So for example, in the case of a 160 bit hash value the probability of failing to notice that two files are not the same is only 1 out of 2^160. The actual file is transmitted only if the hashes do not match.


In this assignment you will implement a C program that computes a simple hash function for input data read from the command line. The hash function we will use is based on the boolean exclusive or (xor) operation.

Except for the first section, which explicitly asks you to check for specific types of invalid input, you do not need to perform error checking. In particular, you may assume the arguments provided to your functions are correct.

Part 1: Command-line Arguments

We are providing two files as your starter code: compute_hash.c and hash_functions.c.

The file compute_hash.c contains the main function for your program. At the moment, the function does nothing. Your first task is to update it to handle two command line arguments. The first argument is required and should be an integer representing the “block size”: the number of bytes that the computed hash should be. The second argument is optional and is a string of hexadecimal digits representing a hash value.

Add code to check the number of arguments and the value provided as the first argument. If the user provides insufficient or too many arguments, a usage message should be printed:

  Usage: compute_hash BLOCK_SIZE [ COMPARISON_HASH ]

If the user provides an illegal block size, a different message should be printed:

  The block size should be a positive integer less than or equal to MAX_BLOCK_SIZE.

The value of the macro MAX_BLOCK_SIZE should replace the string MAX_BLOCK_SIZE in the message above. A block size is illegal if it is negative, 0, or larger than the defined MAX_BLOCK_SIZE.

In either case, if an error message is printed, the program should terminate without computing or checking a hash.

This is a good place to pause. Make sure that your code handles command line arguments appropriately and generates no output if the arguments are correct. Once you have things working, commit a version as a backup and then continue.

Part 2: Computing a Hash

Next, write the function to compute the hash of data provided via STDIN. The function to implement is hash, and it is in the file hash_functions.c.

The hash you are to implement is based on xor. First, initialize all block_size bytes of the hash_val to ‘\0’ (the value 0). Then, read the input one character at a time. The first byte read should be xor’d with the first byte in hash_val. The second byte read should be xor’d with the second byte. This repeats until you have read block_size bytes. At this point, the computation wraps around, and the next byte is xor’d with the first byte in the hash_val again. This process repeats until all of the bytes in the input are read (i.e., until EOF is reached).

We have provided a helper function, show_hash in compute_hash.c , which you will find useful. It simply prints to standard output the hexadecimal values of a hash value. Note, that for each char element of the hash value two hexadecimal digits will be printed, since it takes two hexadecimal digits to represent 8 bits. (Note that we could have opted to print the hash as characters, rather than hex digits, but some values of the hash will not be printable ASCII values, which makes it hard to see if the function is operating correctly.)

Here are a few things to remember as you implement and test this function:

  1. The xor operator is ^.
  2. The hash is being computed on STDIN, so you should use scanf to read input.
  3. You can provide input either by typing at the command line or using a redirection operator (<). If you provide input by typing at the keyboard, you terminate the input (provide EOF) by typing ctrl-D.
  4. Test with small block sizes. If you have a block size of N characters, then an input of size K characters where K is less than N should result in a hash where the first K bytes are the same as the input, and the remaining N-K bytes have the value 0. If you provide an input of length 2 * N, where the first N characters match the second N characters, then the hash value should be N ‘\0’characters. (Recall: 1 xor 1 is 0. 0 xor 0 is 0. 1 xor 0 and 0 xor 1 are both 1.)
  5. Start with simple input and make sure that you understand what output to expect. For example, the character ‘\0’ corresponds to a byte with all bits equal to zero.

Call the function you have written from the main function. The result of the hash should be placed in the variable hash_val. You should print the hash by printing each character of the array as a hexadecimal value. For printing, the show_hash helper function will be useful. Do not print any other output.

Test this function carefully before moving on and as before, commit a version once you are done.

Part 3: Comparing Hashes

Part 3 deals with the case where two commandline arguments are provided. In this case, the program should compute a hash value for input it reads from STDIN (just as in Part 2, when there is only one argument) and then compare the resulting hash value to the hash value provided by the second argument. The hash value provided by the second argument will be a string consisting of hex digits.

You will need to implement check_hash, the function that compares two hashes. This function should take two hash values and return the first index into the hash where two hashes do not match. If they match completely, the function should return block_size.

You are given a helper function in compute_hash.c, called xstr_to_hash, which you might find useful. It converts a string of hexadecimal digits into a hash value array. You are guaranteed that the length of the string is two times the block size of the hash value. This is necessary because it takes two hexadecimal digits to represent 8 bits or one char.

  1. You may be tempted to use string functions here. That is a bad idea. While we are storing the data in arrays of characters, we are not really dealing with strings. (Why not?)
  2. Due to the way that command line arguments are processed, you may have difficulty testing if you provide a hash value (on the command line) that is smaller than the block size. We will not test with hash values that are shorter than the block size.
  3. This function can be tested without STDIN or command line arguments: simply provide two different character arrays and check if they match.

Call this function from main iff a second command line argument is provided. The only output your program should produce in the case the second command line argument is provided is the return value of the check_hash function.

Test thoroughly before submitting a final version.

Submission and Marking

We are using automated grading tools to provide functional feedback, so it’s important that your submission be fully submitted and compile cleanly.

Your program must compile on the lab machines or mathlab, so please test it there before submission. We will be using gcc to compile program with the flags -Wall and -std=c99:

    gcc -Wall -std=c99 -o compute_hash compute_hash.c hash_functions.c

Your program should not produce any error or warning messages when compiled. Programs that do not compile will receive a 0. Programs that produce warning messages will be penalized.

For this assignment you will be working individually (rather than in teams). You will be submitting via svn to Markus. The instructions below will setup your svn work environment. If you are not already familiar with version control, make sure to review the lecture material from Week 2.

Your first step should be to log into mathlab or any of the lab machines in BV 473 using your UtorID and password. Create a directory in your home directory called cscb09 (directories are created using mkdir). Change into that directory (using cd). Verify that you are actually in the right directory (running pwd should return /cmshome/your_utor_id/cscb09). The following commands should accomplish the above:

mkdir ~/cscb09
cd ~/cscb09

Next you need to check out your SVN repo. To check out your SVN repo you need to log in to Markus and click on A1. You will see on the right side a field saying “URL to your group’s repository”. The URL will probably have the form Your repository will only be generated by the Markus server once you log in and click on A1 and it might take a little while. So wait a bit before you try to check out your repo into the cscb09 directory you created earlier. You check out your repo from the shell command-line using svn checkout with your repo’s URL as the argument, so this should look something like :

svn co

You will be asked for a password which is your usual utorid password. You will find the repository in your current working directory (the place from where you were running the svn checkout command). It will be a directory named after your utorid and you will see that it contains a sub-directory called A1. (Use ls to check that the directory is there). This directory is a working copy of your svn repository.

For this assignment you will need to submit two files: compute_hash.c and hash_functions.c. You will need to create these two files inside the A1 directory in your repository. After you first create these files in your A1 directory you need to add them to your repository by running svn add inside your A1 directory.

 svn add compute_hash.c hash_functions.c

As you work on these two files while implementing this assignment, you want to periodically upload your modifications to the svn server by running svn commit. You commit by running the following in your A1 directory (with a message of your choice):

 svn ci -m "Committing a new version of A1"

You can log into Markus to view the files it has received.

When doing the marking, we will use the latest version of your A1 that was committed before the deadline.