CBMC
|
In doc/central-data-structures we talked about the relationship between various central data structures of CBMC.
We identified the goto_programt
as being the core data structure representing GOTO-IR, which we defined as a list of GOTO program instructions.
As a reminder, instructions are composed of three subcomponents:
true
before the instruction is to be executed - if one is attached to the instruction).In this document, we are going to take a closer look at the first subcomponent, the instruction types, along with the interplay between the instructions and a central CBMC module, the symbolic execution engine (from now on, just symex).
Symex is, at its core, a GOTO-program interpreter that uses symbolic values instead of actual ones. This produces a formula which describes all possible outputs rather than a single output value. While Symex is interpreting the program, it also builds a list of Static Single Assignment (SSA) steps that form part of the equation that is to be sent to the solver. For more information see Symbolic Execution.
You can see the main instruction dispatcher (what corresponds to the main interpreter loop) at goto_symext::execute_next_instruction
.
Symex's source code is available under goto-symex.
Every GOTO-program instruction has a specific type. You can see the instruction type at goto_program_instruction_typet but we will also list some of them here for illustration purposes:
(NOTE: The above is for illustration purposes only - the list is not complete, neither is it expected to be kept up-to-date. Please refer to the type goto_program_instruction_typet for a list of what the instructions look like at this point in time.)
You can observe these instruction types in the user-friendly print of the goto-program that various CPROVER programs produce with the flag --show-goto-functions
. For a live example, consider the following C file:
If we ask CBMC to print the GOTO-program instructions with --show-goto-functions
, then part of the output looks like this:
In the above excerpt, the capitalised words at the beginning each instruction are the correspondent instruction types (DECL
, ASSIGN
, etc).
Symex (as mentioned above) is going to pick a designated entry point and then start going through each instruction. This happens at goto_symext::execute_next_instruction
. While doing so, it will inspect the instruction's type, and then dispatch to a designated handling function (which usually go by the name symex_<instruction-type>
) to handle that particular instruction type and its symbolic execution. In pseudocode, it looks like this:
The way the goto-symex subfolder is structured, the different dispatching functions are usually in their own file, designated by the instruction's name. As an example, you can find the code for the function goto_symext::symex_goto in symex_goto.cpp
The output of symex is an symex_target_equationt which consists of equalities between different expressions in the program. You can visualise it as a data structure that serialises to this: (a = 5 ∨ a = 3) ∧ (b = 3) ∧ (c = 4) ∧ ...
that describe assignments and conditions for a range of possible executions of a program that cover a range of potential paths within it.
This is a high-level overview of symex and goto-program instructions. For more information (and a more robust introduction), please have a look at Symbolic Execution.
Last modified: 2024-12-16 13:58:47 -0800