Example: Buffer Overflows

To give a brief overview of the capabilities of CBMC we start with a simple example. The issue of buffer overflows has obtained wide public attention. A buffer is a contiguously allocated chunk of memory, represented by an array or a pointer in C. Programs written in C do not provide automatic bounds checking on the buffer, which means a program can – accidentally or deliberately – write beyond a buffer. The following example is a perfectly valid C program (in the sense that a compiler compiles it without any errors):

int main()
{
  int buffer[10];
  buffer[20] = 10;
}

However, the write access to an address outside the allocated memory region can lead to unexpected behavior. In particular, such bugs can be exploited to overwrite the return address of a function, thus enabling the execution of arbitrary user-induced code. CBMC is able to detect this problem and reports that the “upper bound property” of the buffer has been violated. CBMC is capable of checking these lower and upper bounds, even for arrays with dynamic size. A detailed discussion of the properties that CBMC can check automatically is here.

First Steps

We assume you have already installed CBMC and the necessary support files on your system. If not so, please follow the instructions here.

Like a compiler, CBMC takes the names of .c files as command line arguments. CBMC then translates the program and merges the function definitions from the various .c files, just like a linker. But instead of producing a binary for execution, CBMC performs symbolic simulation on the program.

As an example, consider the following simple program, named file1.c:

int puts(const char *s)
{
}
 
int main(int argc, char **argv)
{
  puts(argv[2]);
  return 0;
}

Of course, this program is faulty, as the argv array might have fewer than three elements, and then the array access argv[2] is out of bounds. Now, run CBMC:

cbmc file1.c --show-properties --bounds-check --pointer-check

The two options --bounds-check and --pointer-check instruct CBMC to look for errors related to pointers and array bounds. CBMC will print the list of properties it checks. Note that it lists, among others, a property labelled with “pointer outside object bounds in argv” together with the location of the faulty array access. As you can see, CBMC largely determines the property it needs to check itself. This is realized by means of a preliminary static analysis, which relies on computing a fixed point on various abstract domains. More detail on automatically generated properties is provided here.

Note that these automatically generated properties need not necessarily correspond to bugs – these are just potential flaws, as abstract interpretation might be imprecise. Whether these properties hold or correspond to actual bugs needs to be determined by further analysis.

CBMC performs this analysis using symbolic simulation, which corresponds to a translation of the program into a formula. The formula is then combined with the property. Let’s look at the formula that is generated by CBMC’s symbolic simulation:

cbmc file1.c --show-vcc --bounds-check --pointer-check

With this option, CBMC performs the symbolic simulation and prints the verification conditions on the screen. A verification condition needs to be proven to be valid by a decision procedure in order to assert that the corresponding property holds. Let’s run the decision procedure:

cbmc file1.c --bounds-check --pointer-check

CBMC transforms the equation you have seen before into CNF and passes it to a SAT solver (more background on this step is in the book on Decision Procedures). It then determines which of the properties that it has generated for the program hold and which do not. Using the SAT solver, CBMC detects that the property for the object bounds of argv does not hold, and will display:

[main.pointer_dereference.6] line 7 dereference failure: pointer outside object bounds in argv[(signed long int)2]: FAILURE

Counterexample Traces

Let us have a closer look at this property and why it fails. To aid the understanding of the problem, CBMC can generate a counterexample trace for failed properties. To obtain this trace, run:

cbmc file1.c --bounds-check --pointer-check --trace

CBMC then prints a counterexample trace, that is, a program trace that begins with main and ends in a state which violates the property. In our example, the program trace ends in the faulty array access. It also gives the values the input variables must have for the bug to occur. In this example, argc must be one to trigger the out-of-bounds array access. If you add a branch to the example that requires that argc>=3, the bug is fixed and CBMC will report that the proofs of all properties have been successful.

To simplify further processing of counterexample traces, CBMC supports XML as a possible output format.

    cbmc file1.c --bounds-check --pointer-check --trace --xml-ui

Verifying Modules

In the example above, we used a program that starts with a main function. However, CBMC is aimed at embedded software, and these kinds of programs usually have different entry points. Furthermore, CBMC is also useful for verifying program modules. Consider the following example, called file2.c:

int array[10];
 
int sum()
{
  unsigned i, sum;
 
  sum = 0;
  for(i = 0; i < 10; i++)
    sum += array[i];
 
  return sum;
}

To set the entry point to the sum function, run:

cbmc file2.c --function sum --bounds-check

It is often necessary to build a suitable harness for the function in order to set up the environment appropriately.

Loop Unwinding

When running the previous example, you will have noted that CBMC unwinds the for loop in the program. As CBMC performs Bounded Model Checking, all loops have to have a finite upper run-time bound in order to guarantee that all bugs are found. CBMC can optionally check that enough unwinding is performed. As an example, consider the program binsearch.c:

int binsearch(int x)
{
  int a[16];
  signed low = 0, high = 16;
 
  while(low < high)
  {
    signed middle = low + ((high - low) >> 1);
 
    if(a[middle] < x)
      high = middle;
    else if(a[middle] > x)
      low = middle + 1;
    else // a[middle]==x
      return middle;
  }
 
  return -1;
}

If you run CBMC on this function, you will notice that the unwinding does not stop on its own. The built-in simplifier is not able to determine a run time bound for this loop. The unwinding bound has to be given as a command line argument:

cbmc binsearch.c --function binsearch --unwind 6 --bounds-check --unwinding-assertions

CBMC verifies that the array accesses are within the bounds; note that this actually depends on the result of the right shift. In addition, as CBMC is given the option --unwinding-assertions, it also checks that enough unwinding is done, i.e., it proves a run-time bound. For any lower unwinding bound, there are traces that require more loop iterations. Thus, CBMC will report that the unwinding assertion has failed. As usual, a counterexample trace that documents this can be obtained with the option --property.

Unbounded Loops

CBMC can also be used for programs with unbounded loops. In this case, CBMC is used for bug hunting only; CBMC does not attempt to find all bugs. The following program (lock-example.c) is an example of a program with a user-specified property:

_Bool nondet_bool();
unsigned int nondet_unsigned_int();
_Bool LOCK = 0;
 
_Bool lock()
{
  if(nondet_bool())
  {
    assert(!LOCK);
    LOCK = 1;
    return 1;
  }
 
  return 0;
}
 
void unlock()
{
  assert(LOCK);
  LOCK = 0;
}
 
int main()
{
  unsigned got_lock = 0;
  unsigned times = nondet_unsigned_int();
 
  while(times > 0)
  {
    if(lock())
    {
      got_lock++;
      /* critical section */
    }
 
    if(got_lock != 0)
      unlock();
 
    got_lock--;
    times--;
  }
}

The while loop in the main function has no (useful) run-time bound. Thus, a bound has to be set on the amount of unwinding that CBMC performs. There are two ways to do so:

The --unwind command-line parameter can to be used to limit the number of times loops are unwound.
The --depth command-line parameter can be used to limit the number of program steps to be processed.

Given the option --unwinding-assertions, CBMC checks whether the argument to --unwind is large enough to cover all program paths. If the argument is too small, CBMC will detect that not enough unwinding is done reports that an unwinding assertion has failed.

Reconsider the example. For a loop unwinding bound of one, no bug is found. But for a bound of two, CBMC detects a trace that violates an assertion. Without unwinding assertions, or when using the --depth command line switch, CBMC does not prove the program correct, but it can be helpful to find program bugs. The various command line options that CBMC offers for loop unwinding are described in the section on understanding loop unwinding.

A Note About Compilers and the ANSI-C Library

Most C programs make use of functions provided by a library; instances are functions from the standard ANSI-C library such as malloc or printf. The verification of programs that use such functions has two requirements:

Appropriate header files have to be provided. These header files contain declarations of the functions that are to be used.
Appropriate definitions have to be provided.

Most C compilers come with header files for the ANSI C library functions. We briefly discuss how to obtain/install these library files.

Linux

Linux systems that are able to compile software are usually equipped with the appropriate header files. Consult the documentation of your distribution on how to install the compiler and the header files. First try to compile some significant program before attempting to verify it.

Windows

On Microsoft Windows, CBMC is pre-configured to use the compiler that is part of Microsoft’s Visual Studio. Microsoft’s Visual Studio Community is fully featured and available for download for free from the Microsoft webpage. Visual Studio installs the usual set of header files together with the compiler. However, the Visual Studio compiler requires a large set of environment variables to function correctly. It is therefore required to run CBMC from the Visual Studio Command Prompt, which can be found in the menu Visual Studio Tools.

Note that in both cases, only header files are available. CBMC only comes with a small set of definitions, which includes functions such as malloc. Detailed information about the built-in definitions is here.