Goto Harness

This is a tool for generating harnesses, that is, functions that instrument another function under analysis, by setting up an appropriate environment before calling them.

This is useful when trying to analyse an isolated unit of a program without having to manually construct an appropriate environment.

Quick Start Guide

For a given C program - program.c - to generate a harness for a function test_function, we have to do the following:

# Compile the program
$ goto-cc program.c -o program.gb
# Run goto-harness to produce harness file
$ goto-harness --harness-type call-function --harness-function-name generated_harness_test_function --function test_function program.gb harness.c
# Run the checks targetting the generated harness
$ cbmc --pointer-check harness.c program.c --function generated_harness_test_function

Detailed Usage

We have two types of harnesses we can generate for now:

The function-call harness, which automatically synthesises an environment without having any information about the program state.
The memory-snapshot harness, which loads an existing program state (in form of a JSON file) and selectively havoc-ing some variables.

The harness generator can either produce the harness (i.e., the function environment) as C code, or a full GOTO binary. C code is generated when the output file name ends in “.c”.

The function call harness generator

For the most basic use of the function-call harness generator, imagine that we have the following C program:

// main.c
 
#include <assert.h>
 
int function_to_test(int some_argument)
{
  assert(some_argument == 0);
  return some_argument;
}

We first need to generate a GOTO binary.

$ goto-cc -o main.gb main.c

Then we can call goto-harness on the generated GOTO binary to get a new C file that contains the harness function:

$ goto-harness \
--harness-function-name harness \
--harness-type call-function \
--function function_to_test \
main.gb \
harness.c

The options we pass to goto-harness are:

harness-function-name is the name of the function generated by the harness generator (this needs to be specified for all harness generators).
harness-type is the type of harness to generate (call-function is the function-call harness generator)
function is the name of the function that gets instrumented
then we pass the input GOTO-binary and a name for the output C file.

What comes out of this is a C file that looks like this:

// function_to_test
// file main.c line 3
signed int function_to_test(signed int some_argument);
// harness
// 
void harness(void);
// type_constructor_int
// 
void type_constructor_int(signed int depth_int, signed int *result_int);
 
// __GOTO_HARNESS::max_depth
// file __GOTO_HARNESSharness.c
signed int max_depth=2;
// __GOTO_HARNESS::min_depth
// file __GOTO_HARNESSharness.c
signed int min_depth=1;
 
// harness
// 
void harness(void)
{
  signed int some_argument;
  type_constructor_int(0, &some_argument);
  function_to_test(some_argument);
}
 
// type_constructor_int
// 
void type_constructor_int(signed int depth_int, signed int *result_int)
{
  signed int nondet;
  *result_int = nondet;
}

After you have generated the harness file, you have two choices. The first one is to pass it along with the original file containing the function to be harnessed to CBMC for analysis, something that would look like this:

$ cbmc --function harness harness.c main.c
[...]
 
[function_to_test.assertion.1] line 5 assertion some_argument == 0: FAILURE
 
** 1 of 1 failed (2 iterations)
VERIFICATION FAILED

Or, you could choose to turn it into another goto binary with goto-cc, to be served as input to another instrumentation tool (like goto-instrument).

The example above demonstrates the simplest case, which is roughly the same as the entry point cbmc automatically generates for functions. However, the function-call harness can also non-deterministically initialise global variables, array and struct elements. Consider this more complicated example:

// list_example.c
#include <assert.h>
#include <stdlib.h>
 
typedef struct linked_list linked_list;
struct linked_list {
    int value;
    linked_list *next;
};
 
linked_list *global_linked_list;
 
void initialize_global(void)
{
  for(linked_list *ll = global_linked_list;
    ll != NULL;
    ll = ll->next)
  {
    ll->value = 0;
  }
}
 
void initialize_other(linked_list *ll)
{
    do {
        ll->value = 0;
        ll = ll->next;
    } while(ll->next != NULL);
}
 
void check_list(linked_list *list_parameter)
{
    assert(list_parameter != global_linked_list);
    initialize_global();
    initialize_other(list_parameter);
    linked_list *global_cursor = global_linked_list;
    linked_list *parameter_cursor = list_parameter;
 
    // global list should be a prefix of the parameter now,
    // or the other way round
    while(global_cursor != NULL && parameter_cursor != NULL)
    {
        // this assertion should fail since we didn't
        // initialize the last element of of the
        // list parameter correctly
        assert(global_cursor->value
          == parameter_cursor->value);
        global_cursor = global_cursor->next;
        parameter_cursor = parameter_cursor->next;
    }
}

Now we’ll try to find the bug in check_list by generating a harness for it.

$ goto-cc -o list_example.gb list_example.c
$ goto-harness \
   --harness-function-name harness \
   --harness-type call-function \
   --function check_list \
   --max-nondet-tree-depth 4 \
   --min-null-tree-depth 1 \
   --nondet-globals \
   list_example.gb \
   list_example_harness.c
$ cbmc list_example.c list_example_harness.c --function harness --unwind 20 --unwinding-assertions

We have 3 new options here:

max-nondet-tree-depth is the maximum extent to which we will generate and initialize non-null pointers - in this case, this means generating lists up to length 4
min-null-tree-depth this is the minimum depth at which pointers can be nullpointers for generated values - in this case, this sets the minimum length for our linked lists to one.
nondet-globals is non-deterministically initialising global variables.

CBMC version 5.11 (cbmc-5.11-1523-g419a958-dirty) 64-bit x86_64 linux
[...]

** Results:
example.c function initialize_global
[initialize_global.unwind.0] line 17 unwinding assertion loop 0: SUCCESS

example.c function initialize_other
[initialize_other.unwind.0] line 32 unwinding assertion loop 0: SUCCESS

example.c function check_list
[check_list.assertion.1] line 42 assertion list_parameter != global_linked_list: SUCCESS
[check_list.unwind.0] line 50 unwinding assertion loop 0: SUCCESS
[check_list.assertion.2] line 55 assertion global_cursor->value == parameter_cursor->value: FAILURE

** 1 of 5 failed (2 iterations)
VERIFICATION FAILED

We also have support for arrays (currently only for array function parameters, globals and struct members are considered for the future).

Take this example of an implementation of an is_prefix_of function that should return true if the first string parameter prefix is a prefix of the second one string.

// array_example.c
 
int is_prefix_of(
  const char *prefix,
  int prefix_length,
  const char *string,
  int string_length
)
{
  if(prefix_length > string_length) { return 0; }
  // oops, we should have used prefix_length here
  for(int i = 0; i < string_length; ++i)
  {
    // we'll get an out of bounds error here!
    if(prefix[i] != string[i]) { return 0; }
  }
  return 1;
}

We can compile and run it like this:

$ goto-cc -o array_example.gb array_example.c
$ goto-harness \
  --harness-function-name harness \
  --harness-type call-function \
  --function is_prefix_of \
  --associated-array-size string:string_length \
  --associated-array-size prefix:prefix_length \
  array_example.gb array_example_harness.c
$ cbmc --function harness --unwind 10 --pointer-check array_example_harness.c array_example.c

We have the additional option associated-array-size here. This is in the format <array-parameter-name>:<array-size-parameter-name> and will cause the array parameter with namearray-parameter-name to be initialised as an array, with the parameter array-size-parameter-name holding its size (it should have some integral type like int or size_t).

Running the example shows the bug highlighted in the comments:

[...]
[is_prefix_of.pointer_dereference.6] line 14 dereference failure: pointer outside object bounds in prefix[(signed long int)i]: FAILURE

By default, arrays are created with a minimum size of 1 and a maximum size of 2. These limits can be set with the --min-array-size and --max-array-size options.

If you have a function that takes an array parameter, but without an associated size parameter, you can also use the --treat-pointer-as-array <parameter-name> option.

If you want to non-deterministically initialise a pointer as a C string (character array with last element ‘’\0'`) then you can do that like this:

// nondet_string.c
 
#include <assert.h>
 
void function(char *pointer, int size)
{
  assert(pointer[size-1] == '\0');
}

Then call the following:

$ goto-cc -o nondet_string.gb nondet_string.c
$ goto-harness \
  --harness-function-name harness \
  --harness-type call-function \
  --function function \
  --treat-pointer-as-cstring pointer \
  --associated-array-size pointer:size \
  nondet_string.gb nondet_string_harness.c
$ cbmc --function harness nondet_string_harness.c nondet_string.c --unwind 10

Note that C strings are supported by the same mechanism behind the non-deterministic initialisation of pointers and arrays, so the same command line arguments apply, in particular --associated-array-size.

This will result in:

[...]

** Results:
main.c function function
[function.assertion.1] line 5 assertion pointer[size-1] == '\0': SUCCESS

** 0 of 1 failed (1 iterations)
VERIFICATION SUCCESSFUL

The memory snapshot harness

The function-call harness is used in situations in which we want to analyze a function in an arbitrary environment. If we want to analyze a function starting from a real program state, we can use the memory-snapshot harness instead.

The snapshot of the program state of interest may be taken at a particular program location within a function (using the memory-analyzer tool). In that case we want to generate a harness that behaves as if execution starts at a particular program location. The initial program location can be specified via the options --initial-goto-location <function>[:<location-number>] or --initial-source-location <file>:<line-number>.

Say we want to check the assertion in the following code:

// main.c
#include <assert.h>
#include <stdlib.h>
 
int x;
int y;
int z;
 
// complex function which returns 1
int get_one()
{
  int i;
 
  for(i = 0; i < 100001; i++)
  {
    if(rand() && ((i & 1) == 1))
      break;
  }
 
  return i & 1;
}
 
// return a random value (!= 0)
int get_random_value()
{
  int r;
  while((r = rand()) == 0) {}
  return r;
}
 
int clip(int i)
{
  if(i > 99)
  {
    i = 99;
  }
 
  return i;
}
 
int main()
{
  x = get_random_value();
  y = get_one();
 
  // snapshot taken here (line 46)
 
  z = clip(x);
 
  assert(y + z <= 100);
 
  return 0;
}

Assume we are interested in the code represented by the clip() function and its effect on the assertion below. To that end, we want to take a memory snapshot after the calls to get_random_value() and get_one().

In order to take the snapshot with memory-analyzer, we need to first compile the program which goto-gcc, which produces a binary containing both native machine code and the corresponding goto program:

$ goto-gcc -g -o main.gb main.c

Then we can execute the program with memory-analyzer and take a snapshot at the specified breakpoint. The variables to be included in the snapshot need to be specified via the --symbols option.

$ memory-analyzer \
  --breakpoint 46 \
  --symbols 'x, y, z' \
  --symtab-snapshot \
  --json-ui \
  main.gb \
  > snapshot.json

We then generate a harness with goto-harness that behaves as if execution started from the state given by the memory snapshot at the specified program location. We further overapproximate the value returned by get_random_value() by havocking the variable x.

$ goto-cc -o main.gb main.c
 
$ goto-harness \
  --harness-function-name harness \
  --harness-type initialize-with-memory-snapshot \
  --memory-snapshot snapshot.json \
  --initial-source-location main.c:46 \
  --havoc-variables x \
  main.gb main-mod.gb

We can now verify the resulting goto program with cbmc:

$ cbmc --function harness main-mod.gb

This will result in:

[...]

** Results:
main.c function main
[main.assertion.1] line 50 assertion y + z <= 100: SUCCESS

** 0 of 1 failed (1 iterations)
VERIFICATION SUCCESSFUL

Last modified: 2024-12-16 13:58:47 -0800