CBMC
|
The following is some light technical documentation of the major data structures used in the input transformation to Intermediate Representation (IR) inside CBMC and the assorted CProver tools.
The goto_modelt
class is the top-level data structure that CBMC (and the other tools) use for holding the GOTO intermediate representation. The following diagram is a simplified view of how the data structures relate to each other -
A goto_modelt
is effectively a pair, consisting of:
In pseudocode, the type looks this:
There is an abstract interface for goto models provided by the abstract_goto_modelt
class. This is defined and documented in the file src/goto-programs/abstract_goto_model.h
. Ideally code which takes a goto model as input should accept any implementation of the abstract interface rather than accepting the concrete goto_modelt
exclusively. This helps with compatibility with jbmc
which uses lazy loading. See the lazy_goto_modelt
class which is defined and documented in jbmc/src/java_bytecode/lazy_goto_model.h
for details of lazy loading.
For further information about symbols see the symbolt
class which is defined and documented in src/util/symbol.h
and the symbol_exprt
(symbol expression) class which is defined and documented in src/util/std_expr.h
.
A goto_functiont
is also defined as a pair. It's designed to represent a function at the goto level, and effectively it's the following data type (in pseudocode):
The goto_programt
denoting the body
of the function will be the subject of a more elaborate explanation in the next section. The in-memory structure of a goto function allows the body
to be optional, but only functions with bodies are included in the serialised goto binaries.
The parameters
subcomponent is a list of identifiers that are to be looked-up in the symbol-table for their definitions.
A goto program is a sequence of GOTO instructions (goto_instructiont
). For details of the goto_programt
class itself see the documentation in goto_program.h
. For further details of the life cycle of goto programs see goto programs
An instruction (goto_instructiont
) is a triple (an element with three subcomponents), describing a GOTO level instruction with the following 3 component subtypes, again in pseudocode:
The pseudocode subtypes above require some more elaboration, so we will provide some extra detail to illuminate some details at the code level:
instruction_type
above corresponds to the goto_program_instruction_typet
type listed in goto_program.h
assign
, function_call
, return
, etc.instruction
is a statement represented by a goto_instruction_codet
.goto_instruction_codet
is an alias of codet
(a data structure broadly representing a statement inside CBMC) that contains the actual code to be executed.get_statement
member function of the codet
class.guard
is an exprt
(a data structure broadly representing an expression inside CBMC) that is expected to have type bool
.guard
associated with it.Another important data structure that needs to be discussed at this point is source_locationt
.
source_locationt
are attached into various exprt
s (the data structure representing various expressions, usually the result of some early processing, e.g. the result of the frontend parsing a file).
This means that codet
s, representing GOTO instructions also have a source_locationt
attached to them, by virtue of inheriting from exprt
.
For the most part, source_locationt
is something that is only being used when we print various nodes (for property listing, counterexample/trace printing, etc).
It might be possible that a specific source location might point to a CBMC instrumentation primitive (which might be reported as a built-in-addition
) or there might even be no-source location (because it might be part of harnesses generated as an example, that have no presence in the user code).
irep
s are the underlying data structure used to implement many of the data structures in CBMC and the assorted tools. These include the exprt
, typet
, codet
and source_locationt
classes. This is a tree data structure where each node is expected to contain a string/ID and may have child nodes stored in both a sequence of child nodes and a map of strings/IDs to child nodes. This enables the singular irept
data structure to be used to model graphs such as ASTs, CFGs, etc.
The classes extending irept
define how the higher level concepts are mapped onto the underlying tree data structure. For this reason it is usually advised that the member functions of the sub-classes should be used to access the data held, rather than the member functions of the base irept
class. This aids potential future restructuring and associates accesses of the data with the member functions which have the doxygen explaining what the data is.
The strings/IDs held within irept
are of type irep_idt
. These can be converted to std::string
using the id2string
function. There is a mechanism provided for casting expressions and types in src/util/expr_cast.h
. In depth documentation of the irept
class itself can be found in src/util/irep.h
.
Last modified: 2024-11-20 06:00:32 -0800