NAME

goto-instrument - Perform analysis or instrumentation of goto binaries

SYNOPSIS

goto-instrument [-?] [-h] [--help]

show help

goto-instrument --version

show version and exit

goto-instrument [options] in [out]

perform analysis or instrumentation

DESCRIPTION

goto-instrument reads a GOTO binary, performs a given program transformation, and then writes the resulting program as GOTO binary on disk.

OPTIONS

Dump Source:

--dump-c

generate C source

--dump-c-type-header m

generate a C header for types local in m

--dump-cpp

generate C++ source

--no-system-headers

generate C source expanding libc includes

--use-all-headers

generate C source with all includes

--harness

include input generator in output

--horn

print program as constrained horn clauses

Diagnosis:

--show-properties

show the properties, but don't run analysis

--document-properties-html

generate HTML property documentation

--document-properties-latex

generate Latex property documentation

--show-symbol-table

show loaded symbol table

--list-symbols

list symbols with type information

--show-goto-functions

show loaded goto program

--list-goto-functions

list loaded goto functions

--count-eloc

count effective lines of code

--list-eloc

list full path names of lines containing code

--print-global-state-size

count the total number of bits of global objects

--print-path-lengths

print statistics about control-flow graph paths

--show-locations

show all source locations

--dot

generate CFG graph in DOT format

--print-internal-representation

show verbose internal representation of the program

--list-undefined-functions

list functions without body

--list-calls-args

list all function calls with their arguments

--call-graph

show graph of function calls

--reachable-call-graph

show graph of function calls potentially reachable from main function

--show-class-hierarchy

show the class hierarchy

--validate-goto-model

enable additional well-formedness checks on the goto program

--validate-ssa-equation

enable additional well-formedness checks on the SSA representation

--validate-goto-binary

check the well-formedness of the passed in GOTO binary and then exit

--interpreter

do concrete execution

Data-flow analyses:

--show-struct-alignment

show struct members that might be concurrently accessed

--show-threaded

show instructions that may be executed by more than one thread

--show-local-safe-pointers

show pointer expressions that are trivially dominated by a not-null check

--show-safe-dereferences

show pointer expressions that are trivially dominated by a not-null check *and* used as a dereference operand

--show-value-sets

show points-to information (using value sets)

--show-global-may-alias

show may-alias information over globals

--show-local-bitvector-analysis

show procedure-local pointer analysis

--escape-analysis

perform escape analysis

--show-escape-analysis

show results of escape analysis

--custom-bitvector-analysis

perform configurable bitvector analysis

--show-custom-bitvector-analysis

show results of configurable bitvector analysis

--interval-analysis

perform interval analysis

--show-intervals

show results of interval analysis

--show-uninitialized

show maybe-uninitialized variables

--show-points-to

show points-to information

--show-rw-set

show read-write sets

--show-call-sequences

show function call sequences

--show-reaching-definitions

show reaching definitions

--show-dependence-graph

show program-dependence graph

--show-sese-regions

show single-entry-single-exit regions

Safety checks:

--no-assertions

ignore user assertions

--bounds-check

enable array bounds checks

--pointer-check

enable pointer checks

--memory-leak-check

enable memory leak checks

--memory-cleanup-check

Enable memory cleanup checks: assert that all dynamically allocated memory is explicitly freed before terminating the program.

--div-by-zero-check

enable division by zero checks for integer division

--float-div-by-zero-check

enable division by zero checks for floating-point division

--signed-overflow-check

enable signed arithmetic over- and underflow checks

--unsigned-overflow-check

enable arithmetic over- and underflow checks

--pointer-overflow-check

enable pointer arithmetic over- and underflow checks

--conversion-check

check whether values can be represented after type cast

--undefined-shift-check

check shift greater than bit-width

--float-overflow-check

check floating-point for +/-Inf

--nan-check

check floating-point for NaN

--enum-range-check

checks that all enum type expressions have values in the enum range

--pointer-primitive-check

checks that all pointers in pointer primitives are valid or null

--retain-trivial-checks

include checks that are trivially true

--error-label label

check that label is unreachable

--no-built-in-assertions

ignore assertions in built-in library

--no-assertions

ignore user assertions

--no-assumptions

ignore user assumptions

--assert-to-assume

convert user assertions to assumptions

--uninitialized-check

add checks for uninitialized locals (experimental)

--stack-depth n

add check that call stack size of non-inlined functions never exceeds n

--race-check

add floating-point data race checks

Semantic transformations:

--nondet-volatile
--nondet-volatile-variable variable

By default, cbmc(1) treats volatile variables the same as non-volatile variables. That is, it assumes that a volatile variable does not change between subsequent reads, unless it was written to by the program. With the above options, goto-instrument can be instructed to instrument the given goto program such as to (1) make reads from all volatile expressions non-deterministic (--nondet-volatile), (2) make reads from specific variables non-deterministic (--nondet-volatile-variable), or (3) model reads from specific variables by given models (--nondet-volatile-model).

Below we give two usage examples for the options. Consider the following test, for function get_celsius and with harness test_get_celsius:

#include <assert.h>
#include <limits.h>
#include <stdint.h>

// hardware sensor for temperature in kelvin
extern volatile uint16_t temperature;

int get_celsius() {
  if (temperature > (1000 + 273)) {
    return INT_MIN; // value indicating error
  }
  return temperature - 273;
}

void test_get_celsius() {
  int t = get_celsius();
  assert(t == INT_MIN || t <= 1000);
  assert(t == INT_MIN || t >= -273);
}

Here the variable temperature corresponds to a hardware sensor. It returns the current temperature on each read. The get_celsius function converts the value in Kelvin to degrees Celsius, given the value is in the expected range. However, it has a bug where it reads temperature a second time after the check, which may yield a value for which the check would not succeed. Verifying this program as is with cbmc(1) would yield a verification success. We can use goto-instrument to make reads from temperature non-deterministic:

goto-cc -o get_celsius_test.gb get_celsius_test.c
goto-instrument --nondet-volatile-variable temperature \
  get_celsius_test.gb get_celsius_test-mod.gb
cbmc --function test_get_celsius get_celsius_test-mod.gb

Here the final invocation of cbmc(1) correctly reports a verification failure.

--nondet-volatile-model variable:model

Simply treating volatile variables as non-deterministic may for some use cases be too inaccurate. Consider the following test, for function get_message and with harness test_get_message:

#include <assert.h>
#include <stdint.h>

extern volatile uint32_t clock;

typedef struct message {
  uint32_t timestamp;
  void *data;
} message_t;

void *read_data();

message_t get_message() {
  message_t msg;
  msg.timestamp = clock;
  msg.data = read_data();
  return msg;
}

void test_get_message() {
  message_t msg1 = get_message();
  message_t msg2 = get_message();
  assert(msg1.timestamp <= msg2.timestamp);
}

The harness verifies that get_message assigns non-decreasing time stamps to the returned messages. However, simply treating clock as non-deterministic would not suffice to prove this property. Thus, we can supply a model for reads from clock:

// model for reads of the variable clock
uint32_t clock_read_model() {
  static uint32_t clock_value = 0;
  uint32_t increment;
  __CPROVER_assume(increment <= 100);
  clock_value += increment;
  return clock_value;
}

The model is stateful in that it keeps the current clock value between invocations in the variable clock_value. On each invocation, it increments the clock by a non-deterministic value in the range 0 to 100. We can tell goto-instrument to use the model clock_read_model for reads from the variable clock as follows:

goto-cc -o get_message_test.gb get_message_test.c
goto-instrument --nondet-volatile-model clock:clock_read_model \
  get_message_test.gb get_message_test-mod.gb
cbmc --function get_message_test get_message_test-mod.gb

Now the final invocation of cbmc(1) reports verification success.

--isr function

instruments an interrupt service routine

--mmio

instruments memory-mapped I/O

--nondet-static

add nondeterministic initialization of variables with static lifetime

--nondet-static-exclude e

same as nondet-static except for the variable e (use multiple times if required)

--nondet-static-matching r

add nondeterministic initialization of variables with static lifetime matching regex r

--function-enter f
--function-exit f
--branch f

instruments a call to f at the beginning, the exit, or a branch point, respectively

--splice-call caller,callee

prepends a call to callee in the body of caller

--check-call-sequence seq

instruments checks to assert that all call sequences match seq

--undefined-function-is-assume-false

convert each call to an undefined function to assume(false)

--insert-final-assert-false function

generate assert(false) at end of function

--generate-function-body regex

This transformation inserts implementations of functions without definition, i.e., a body. The behavior of the generated function is chosen via --generate-function-body-options option:

--generate-function-body-options option

One of assert-false, assume-false, nondet-return, assert-false-assume-false and havoc[,params:regex][,globals:regex][,params:p_n1;p_n2;..] (default: nondet-return)

assert-false: The body consists of a single command: assert(0).

assume-false: The body consists of a single command: assume(0).

assert-false-assume-false: Two commands as above.

nondet-return: The generated function returns a non-deterministic value of its return type.

havoc[,params:p-regex][,globals:g-regex]: Assign non-deterministic values to the targets of pointer-to-non-constant parameters matching the regular expression p-regex, and non-constant globals matching g-regex, and then (in case of non-void function) returning as with nondet-return. The following example demonstrates the use:

// main.c
int global;
const int c_global;
int function(int *param, const int *c_param);

Often we want to avoid overwriting internal symbols, i.e., those with an __ prefix, which is done using the pattern (?!__).

goto-cc main.c -o main.gb
goto-instrument main.gb main-out.gb \
  --generate-function-body-options 'havoc,params:(?!__).*,globals:(?!__).*' \
  --generate-funtion-body function

This leads to a GOTO binary equivalent to the following C code:

// main-mod.c
int function(int *param, const int *c_param) {
  *param = nondet_int();
  global = nondet_int();
  return nondet_int();
}

The parameters should that should be non-deterministically updated can be specified either by a regular expression (as above) or by a semicolon-separated list of their numbers. For example havoc,params:0;3;4 will assign non-deterministic values to the first, fourth, and fifth parameter.

Note that only parameters of pointer type can be havoced and goto-instrument will produce an error report if given a parameter number associated with a non-pointer parameter. Requesting to havoc a parameter with a number higher than the number of parameters a given function takes will also results in an error report.

--generate-havocing-body option fun_name,params:p_n1;p_n2;..
--generate-havocing-body option fun_name[,call-site-id,params:p_n1;p_n2;..>]+

Request a different implementation for a number of call-sites of a single function. The option --generate-havocing-body inserts new functions for selected call-sites and replaces the calls to the origin function with calls to the respective new functions.

// main.c
int function(int *first, int *second, int *third);

int main() {
  int a = 10;
  int b = 10;
  int c = 10;
  function(&a, &b, &c);
  function(&a, &b, &c);
}

The user can specify different behavior for each call-site as follows:

goto-cc main.c -o main.gb
goto-instrument main.gb  main-mod.gb \
  --generate-havocing-body 'function,1,params:0;2,2,params:1'

This results in a GOTO binary equivalent to:

// main-mod.c
int function_1(int *first, int *second, int *third) {
  *first = nondet_int();
  *third = nondet_int();
}

int function_2(int *first, int *second, int *third) { *second = nondet_int(); }

int main() {
  int a = 10;
  int b = 10;
  int c = 10;
  function_1(&a, &b, &c);
  function_2(&a, &b, &c);
}
--restrict-function-pointer

pointer_name/target[,targets]* Replace function pointers by a user-defined set of targets. This may be required when --remove-function-pointers creates to large a set of direct calls. Consider the example presented for --remove-function-pointers. Assume that call will always receive pointers to either f or g during actual executions of the program, and symbolic execution for h is too expensive to simply ignore the cost of its branch.

To facilitate the controlled replace, we will label the places in each function where function pointers are being called, to this pattern:

function-name.function_pointer_call.N

where N is refers to the N-th function call via a function pointer in function-name, i.e., the first call to a function pointer in a function will have N=1, the fifth N=5 etc. Alternatively, if the calls carry labels in the source code, we can also refer to a function pointer as

function-name.label

To implement this assumption that the first call to a function pointer in function call an only be a call to f or g, use

goto-instrument --restrict-function-pointer \
  call.function_pointer_call.1/f,g in.gb out.gb

The resulting output (written to GOTO binary out.gb) looks similar to the original example, except now there will not be a call to h:

void call(fptr_t fptr) {
  int r;
  if (fptr == &f) {
    r = f(10);
  } else if (fptr == &g) {
    r = g(10);
  } else {
    // sanity check
    assert(false);
    assume(false);
  }
  return r;
}

As another example imagine we have a simple virtual filesystem API and implementation like this:

typedef struct filesystem_t filesystem_t;
struct filesystem_t {
  int (*open)(filesystem_t *filesystem, const char *file_name);
};

int fs_open(filesystem_t *filesystem, const char *file_name) {
  filesystem->open(filesystem, file_name);
}

int nullfs_open(filesystem_t *filesystem, const char *file_name) { return -1; }

filesystem_t nullfs_val = {.open = nullfs_open};
filesystem *const nullfs = &nullfs_val;

filesystem_t *get_fs_impl() {
  // some fancy logic to determine
  // which filesystem we're getting -
  // in-memory, backed by a database, OS file system
  // - but in our case, we know that
  // it always ends up being nullfs
  // for the cases we care about
  return nullfs;
}
int main(void) {
  filesystem_t *fs = get_fs_impl();
  assert(fs_open(fs, "hello.txt") != -1);
}

In this case, the assumption is that in function main, fs can be nothing other than nullfs. But perhaps due to the logic being too complicated, symbolic execution ends up being unable to figure this out, so in the call to fs_open we end up branching on all functions matching the signature of filesystem_t::open, which could be quite a few functions within the program. Worst of all, if its address is ever taken in the program, as far as function pointer removal via --remove-function-pointers is concerned it could be fs_open itself due to it having a matching signature, leading to symbolic execution being forced to follow a potentially infinite recursion until its unwind limit.

In this case we can again restrict the function pointer to the value which we know it will have:

goto-instrument --restrict-function-pointer \
  fs_open.function_pointer_call.1/nullfs_open in.gb out.gb
--function-pointer-restrictions-file file_name

If you have many places where you want to restrict function pointers, it'd be a nuisance to have to specify them all on the command line. In these cases, you can specify a file to load the restrictions from instead, which you can give the name of a JSON file with this format:

{
  "function_call_site_name": ["function1", "function2", ...],
   ...
}

If you pass in multiple files, or a mix of files and command line restrictions, the final restrictions will be a set union of all specified restrictions.

Note that if something goes wrong during type checking (i.e., making sure that all function pointer replacements refer to functions in the symbol table that have the correct type), the error message will refer to the command line option --restrict-function-pointer regardless of whether the restriction in question came from the command line or a file.

--restrict-function-pointer-by-name symbol_name/target[,targets]*

Restrict a function pointer where symbol_name is the unmangled name, before labeling function pointers.

--remove-calls-no-body

remove calls to functions without a body

--add-library

add models of C library functions

--malloc-may-fail

allow malloc calls to return a null pointer

--malloc-fail-assert

set malloc failure mode to assert-then-assume

--malloc-fail-null

set malloc failure mode to return null

--no-malloc-may-fail

do not allow malloc calls to fail by default

--string-abstraction

track C string lengths and zero-termination

--model-argc-argv n

Create up to n non-deterministic C strings as entries to argv and set argc accordingly. In absence of such modelling, argv is left uninitialized except for a terminating NULL pointer. Consider the following example:

// needs_argv.c
#include <assert.h>

int main(int argc, char *argv[]) {
  if (argc >= 2)
    assert(argv[1] != 0);

  return 0;
}

If cbmc(1) is run directly on this example, it will report a failing assertion for the lack of modeling of argv. To make the assertion succeed, as expected, use:

goto-cc needs_argv.c
goto-instrument --model-argc-argv 2 a.out a.out
cbmc a.out
--remove-function-body f

remove the implementation of function f (may be repeated)

--replace-calls f:g

replace calls to f with calls to g

--max-nondet-tree-depth N

limit size of nondet (e.g. input) object tree; at level N pointers are set to null

--min-null-tree-depth N

minimum level at which a pointer can first be NULL in a recursively nondet initialized struct

Semantics-preserving transformations:

--ensure-one-backedge-per-target

transform loop bodies such that there is a single edge back to the loop head

--drop-unused-functions

drop functions trivially unreachable from main function

--remove-pointers

converts pointer arithmetic to base+offset expressions

--constant-propagator

propagate constants and simplify expressions

--inline

perform full inlining

--partial-inline

perform partial inlining

--function-inline function

transitively inline all calls function makes

--no-caching

disable caching of intermediate results during transitive function inlining

--log file

log in JSON format which code segments were inlined, use with --function-inline

--remove-function-pointers

Resolve calls via function pointers to direct function calls. Candidate functions are chosen based on their signature and whether or not they have their address taken somewhere in the program The following example illustrates the approach taken. Given that there are functions with these signatures available in the program:

int f(int x);
int g(int x);
int h(int x);

And we have a call site like this:

typedef int (*fptr_t)(int x);
void call(fptr_t fptr) {
  int r = fptr(10);
  assert(r > 0);
}

Function pointer removal will turn this into code similar to this:

void call(fptr_t fptr) {
  int r;
  if (fptr == &f) {
    r = f(10);
  } else if (fptr == &g) {
    r = g(10);
  } else if (fptr == &h) {
    r = h(10);
  } else {
    // sanity check
    assert(false);
    assume(false);
  }
  return r;
}

Beware that there may be many functions matching a particular signature, and some of them may be costly to a subsequently run analysis. Consider using --restrict-function-pointer to manually specify this set of functions, or --value-set-fi-fp-removal.

--remove-const-function-pointers

remove function pointers that are constant or constant part of an array

--value-set-fi-fp-removal

Build a flow-insensitive value set and replace function pointers by a case statement over the possible assignments. If the set of possible assignments is empty the function pointer is removed using the standard --remove-function-pointers pass.

Loop information and transformations:

--show-loops

show the loops in the program

--unwind nr

unwind all loops nr times

--unwindset [T:]L:B,...

unwind loop L with a bound of B (optionally restricted to thread T) (use --show-loops to get the loop IDs)

--unwindset-file file

read unwindset from file

--partial-loops

permit paths that execute loops only partially (up to the given unwinding bound) and then continue beyond the loop even when the loop condition would still hold (such paths may be spurious, resulting in false alarms)

--no-unwinding-assertions

do not generate unwinding assertions

--unwinding-assertions

generate unwinding assertions (which are enabled by default; overrides --no-unwinding-assertions when both of these are given)

--continue-as-loops

add loop for remaining iterations after unwound part

--k-induction k

check loops with k-induction

--step-case

k-induction: do step-case

--base-case

k-induction: do base-case

--havoc-loops

over-approximate all loops

--accelerate

add loop accelerators

--z3

use Z3 when computing loop accelerators

--skip-loops loop-ids

add gotos to skip selected loops during execution

--show-lexical-loops

Show lexical loops. A lexical loop is a block of goto program instructions with a single entry edge at the top and a single backedge leading from bottom to top, where "top" and "bottom" refer to program order. The loop may have holes: instructions which sit in between the top and bottom in program order, but which can't reach the loop backedge. Lexical loops are a subset of the natural loops, which are cheaper to compute and include most natural loops generated from typical C code.

--show-natural-loops

Show natural loop heads. A natural loop is when the nodes and edges of a graph make one self-encapsulating circle with no incoming edges from external nodes. For example A -> B -> C -> D -> A is a natural loop, but if B has an incoming edge from X, then it isn't a natural loop, because X is an external node. Outgoing edges don't affect the natural-ness of a loop.

Memory model instrumentations:

--mm [tso|pso|rmo|power]

Instruments the program so that it can be verified for different weak memory models with a model-checker verifying sequentially consistent programs.

--scc

detects critical cycles per SCC (one thread per SCC)

--one-event-per-cycle

only instruments one event per cycle

--minimum-interference

instruments an optimal number of events

--my-events

only instruments events whose ids appear in inst.evt

--read-first, --write-first

only instrument cycles where a read or write occurs as first event, respectively

--max-var N

limit cycles to N variables read/written

--max-po-trans N

limit cycles to N program-order edges

--ignore-arrays

instrument arrays as a single object

--cav11

always instrument shared variables, even when they are not part of any cycle

--force-loop-duplication, --no-loop-duplication

optional program transformation to construct cycles in program loops

--cfg-kill

enables symbolic execution used to reduce spurious cycles

--no-dependencies

no dependency analysis

--no-po-rendering

no representation of the threads in the dot

--hide-internals

do not include thread-internal (Rfi) events in dot output

--render-cluster-file

clusterises the dot by files

--render-cluster-function

clusterises the dot by functions

Slicing:

--fp-reachability-slice f

Remove instructions that cannot appear on a trace that visits all given functions. The list of functions has to be given as a comma separated list f.

--reachability-slice

remove instructions that cannot appear on a trace from entry point to a property

--reachability-slice-fb

remove instructions that cannot appear on a trace from entry point through a property

--full-slice

slice away instructions that don't affect assertions

--property id

slice with respect to specific property id only

--slice-global-inits

slice away initializations of unused global variables

--aggressive-slice

remove bodies of any functions not on the shortest path between the start function and the function containing the property(s)

--aggressive-slice-call-depth n

used with --aggressive-slice, preserves all functions within n function calls of the functions on the shortest path

--aggressive-slice-preserve-function f

force the aggressive slicer to preserve function f

--aggressive-slice-preserve-functions-containing f

force the aggressive slicer to preserve all functions with names containing f

--aggressive-slice-preserve-all-direct-paths

force aggressive slicer to preserve all direct paths

Code contracts:

--apply-loop-contracts
-disable-loop-contracts-side-effect-check

UNSOUND OPTION. Disable checking the absence of side effects in loop contract clauses. In absence of such checking, loop contracts clauses will accept more expressions, such as pure functions and statement expressions. But user have to make sure the loop contracts are side-effect free by them self to get a sound result.

-loop-contracts-no-unwind

do not unwind transformed loops

-loop-contracts-file file

annotate loop contracts from the file to the goto program

--replace-call-with-contract fun

replace calls to fun with fun's contract

--enforce-contract fun

wrap fun with an assertion of its contract

--enforce-contract-rec fun

wrap fun with an assertion of its contract that can handle recursive calls

--dfcc fun

instrument dynamic frame condition checks method using fun as entry point

User-interface options:

--flush

flush every line of output

--xml

output files in XML where supported

--xml-ui

use XML-formatted output

--json-ui

use JSON-formatted output

--verbosity n

verbosity level

--timestamp [monotonic|wall]

Print microsecond-precision timestamps. monotonic: stamps increase monotonically. wall: ISO-8601 wall clock timestamps.

ENVIRONMENT

All tools honor the TMPDIR environment variable when generating temporary files and directories.

BUGS

If you encounter a problem please create an issue at https://github.com/diffblue/cbmc/issues

SEE ALSO

cbmc(1), goto-cc(1)

COPYRIGHT

2008-2013, Daniel Kroening