Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2021, Cornell University - arXiv
This is an evolving document describing the meta-theory, the implementation, and the instantiations of Gillian, a multi-language symbolic analysis platform. 1 INTRODUCTION Gillian was introduced in [4] as a multi-language platform for whole-program symbolic execution, parametric on the concrete and symbolic memory models of the target language (TL), and underpinned by a core symbolic execution engine with strong mathematical foundations. Gillian analysis is done on GIL, an intermediate goto language parametric on a set of memory actions, which describe the fundamental ways in which TL programs interact with their memories. To instantiate Gillian to a new TL, a tool developer must: (1) identify the set of the TL memory actions and implement the TL memory models using these actions; and (2) provide a trusted compiler from the TL to GIL, which preserves the TL memory models and the semantics. In [4], Gillian was instantiated to JavaScript (JS) and C, and these instantiations, called Gillian-JS and Gillian-C, were used to find bugs in two real-world data-structure libraries. In [1], Gillian was extended with support for compositional memory models and verification based on separation logic. The compositional memory models of Gillian work with partial memories and are formulated in terms of core predicates and associated consumer and producer actions for the TL memory models, which need to be provided by the tool developer. The core predicates describe the fundamental units of TL memories: e.g., a JS object-property pair and a C block cell. The consumers and producers, respectively, frame off and frame on the TL memory resource described by the core predicate. The partial memories also need to track negative resource: that is, the resource known to be absent from the partial memory. Gillian verification is built on top of compositional memory models. In particular, the core predicates induce an assertion language for writing function specifications in separation logic and the consumers and producers allow for the creation of a fully parametric spatial entailment engine, enabling re-use of function specifications in symbolic execution. Gillian also allows tool developers to extend assertions with user-defined predicates so as to identify the TL language interface familiar to code developers, and code developers to provide additional predicates and lemmas to verify the particular data structures in their programs. In [1], Gillian-JS and Gillian-C were extended to support verification, and used to provide verified specifications of the JS and C implementations of the deserialisation module of the AWS Encryption SDK, discovering two bugs in the former and three in the latter. Outline. This document currently contains the following content: (§2) an account of Gillian's whole-program execution, including: (§2.1) the syntax of Gillian's intermediate language, GIL (§2.2) memory models for whole-program execution and their properties, which require the memory actions of the target language (§2.3) allocators, which relieve Gillian users of the need to reason explicitly about allocation (§2.4) state models, which are built on top of memory models, and their properties (§2.5) the single-trace and the collecting GIL semantics, defined in terms of state models
Artifact Digital Object Group, 2020
We introduce Gillian, a platform for developing symbolic analysis tools for programming languages. Here, we focus on the symbolic execution engine at the heart of Gillian, which is parametric on the memory model of the target language. We give a formal description of the symbolic analysis and a modular implementation that closely follows this description. We prove a parametric soundness result, introducing restriction on abstract states, which generalises path conditions used in classical symbolic execution. We instantiate Gillian to obtain trusted symbolic testing tools for JavaScript and C, and use these tools to find bugs in real-world code, thus demonstrating the viability of our parametric approach. CCS Concepts: • Theory of computation → Program analysis; Program semantics; • Software and its engineering → Formal language definitions.
Computer Aided Verification, 2021
We introduce verification based on separation logic to Gillian, a multi-language platform for the development of symbolic analysis tools which is parametric on the memory model of the target language. Our work develops a methodology for constructing compositional memory models for Gillian, leading to a unified presentation of the JavaScript and C memory models. We verify the JavaScript and C implementations of the AWS Encryption SDK message header deserialisation module, specifically designing common abstractions used for both verification tasks, and find two bugs in the JavaScript and three bugs in the C implementation.
This paper presents a formal verification with the Coq proof assistant of a memory model for C-like imperative languages. This model defines the memory layout and the operations that manage the memory. The model has been specified at two levels of abstraction and implemented as part of an ongoing certification in Coq of a moderately-optimising C compiler. Many properties of the memory have been verified in the specification. They facilitate the definition of precise formal semantics of C pointers. A certified OCaml code implementing the memory model has been automatically extracted from the specifications.
Proceedings of the ACM on Programming Languages, 2019
We propose a novel, unified approach to the development of compositional symbolic execution tools, bridging the gap between classical symbolic execution and compositional program reasoning based on separation logic. Using this approach, we build JaVerT 2.0, a symbolic analysis tool for JavaScript that follows the language semantics without simplifications. JaVerT 2.0 supports whole-program symbolic testing, verification, and, for the first time, automatic compositional testing based on bi-abduction. The meta-theory underpinning JaVerT 2.0 is developed modularly, streamlining the proofs and informing the implementation. Our explicit treatment of symbolic execution errors allows us to give meaningful feedback to the developer during whole-program symbolic testing and guides the inference of resource of the bi-abductive execution. We evaluate the performance of JaVerT 2.0 on a number of JavaScript data-structure libraries, demonstrating: the scalability of our whole-program symbolic te...
This article presents the formal verification, using the Coq proof assistant, of a memory model for low-level imperative languages such as C and compiler intermediate languages. Beyond giving semantics to pointer-based programs, this model supports reasoning over transformations of such programs. We show how the properties of the memory model are used to prove semantic preservation for three passes of the Compcert verified compiler.
Lecture Notes in Computer Science, 2013
We propose a language-independent symbolic execution framework for languages endowed with a formal operational semantics based on term rewriting. Starting from a given definition of a language, a new language definition is automatically generated, which has the same syntax as the original one but whose semantics extends data domains with symbolic values and adapts semantical rules to deal with these values. Then, the symbolic execution of concrete programs is the execution of programs with the new symbolic semantics, on symbolic input data. We prove that the symbolic execution thus defined has the properties naturally expected from it. A prototype implementation of our approach was developed in the K Framework. We demonstrate the genericity of our tool by instantiating it on several languages, and show how it can be used for the symbolic execution and model checking of several programs. Id ::= domain of identifiers Int ::= domain of integer numbers (including operations) Bool ::= domain of boolean constants (including operations) AExp :: = Int | AExp / AExp [strict] | Id | AExp * AExp [strict] | (AExp) | AExp + AExp [strict] BExp :: = Bool | (BExp) | AExp <= AExp [strict] | not BExp [strict] | BExp and BExp [strict(1)] Stmt :: = skip | { Stmt } | Stmt ; Stmt | Id := AExp | while BExp do Stmt | if BExp then Stmt else Stmt [strict(1)] Code ::= Id | Int | Bool | AExp | BExp | Stmt | Code Code
TVLA (Three-Valued-Logic Analyzer) is a "YACC"-like framework for automatically constructing abstract interpreters from an operational semantics. The operational semantics is specified as a generic transition system based on first-order logic. TVLA was implemented in Java and successfully used to prove interesting properties of (concurrent) Java programs manipulating dynamically allocated linked data structures.
Software Testing, Verification and Reliability, 2019
Symbolic execution is a popular program analysis technique that allows seeking for bugs by reasoning over multiple alternative execution states at once. As the number of states to explore may grow exponentially, a symbolic executor may quickly run out of space. For instance, a memory access to a symbolic address may potentially reference the entire address space, leading to a combinatorial explosion of the possible resulting execution states. To cope with this issue, state-of-the-art executors either concretize symbolic addresses that span memory intervals larger than some threshold or rely on advanced capabilities of modern SMT solvers. Unfortunately, concretization may result in missing interesting execution states, e.g., where a bug arises, while offloading the entire problem to constraint solvers can lead to very large query times. In this article, we first contribute to systematizing knowledge about memory models for symbolic execution, discussing how four mainstream symbolic executors deal with symbolic addresses. We then introduce MEMSIGHT, a new approach to symbolic memory that reduces the need for concretization: rather than mapping address instances to data as previous approaches do, our technique maps symbolic address expressions to data, maintaining the possible alternative states resulting from the memory referenced by a symbolic address in a compact, implicit form. Experiments on prominent programs show that MEMSIGHT, which we implemented in both ANGR and KLEE, enables the exploration of states that are unreachable for memory models that perform concretization, and provides a performance level comparable to memory models relying on advanced solver theories.
Formal Modeling and Analysis of Timed Systems, 2017
TESL addresses the specification of the temporal aspects of an architectural composition language that allows the composition of timed subsystems. TESL specifies the synchronization points between events and time scales. Methodologically, subsystems having potentially different models of execution are abstracted to their interfaces expressed in terms of timed events. In this paper, we present an operational semantics of TESL for constructing symbolic traces that can be used in an online-test scenario: the symbolic trace containing a set of constraints over time-stamps and occurrences of events is matched against concrete runs of the system. We present the operational rules for building symbolic traces and illustrate them with examples. Finally, we show a prototype implementation that generates symbolic traces, and its use for testing.
2008
Bytecode languages are at a very desirable degree of abstraction for performing formal analysis of programs, but at the same time pose new challenges when compared with traditional languages. This paper proposes a methodology for bytecode analysis which harmonizes two well-known formal verification techniques, model checking and symbolic execution. Model checking is a property-guided exploration of the system state space until the property is proved or disproved, producing in the latter case a counterexample execution trace. Symbolic execution emulates program execution by replacing concrete variable values with symbolic ones, so that the symbolic execution along a path represents the potentially infinite numeric executions that may occur along that path. We propose an approach where symbolic execution is used for building a possibly partial model of the program state space, and on-the-fly model checking is exploited for verifying temporal properties on it. The synergy of the two techniques yields considerable potential advantages: symbolic execution allows for modeling the state space of infinite-state software systems, limits the state explosion, and fosters modular verification; model checking provides fully automated verification of reachability properties of a program. To assess these potential advantages, we report our preliminary experience with the analysis of a safety-critical software system.
Logic-Based Program Synthesis and Transformation, 2018
Analysis tools like abstract interpreters, symbolic execution tools and testing tools usually require a proper context to give useful results when analyzing a particular function. Such a context initializes the function parameters and global variables to comply with function requirements. However it may be error-prone to write it by hand: the handwritten context might contain bugs or not match the intended specification. A more robust approach is to specify the context in a dedicated specification language, and hold the analysis tools to support it properly. This may mean to put significant development efforts for enhancing the tools, something that is often not feasible if ever possible. This paper presents a way to systematically generate such a context from a formal specification of a C function. This is applied to a subset of the ACSL specification language in order to generate suitable contexts for the abstract interpretationbased value analysis plug-ins of Frama-C, a framework for analysis of code written in C. The idea here presented has been implemented in a new Frama-C plugin which is currently in use in an operational industrial setting.
2000
This paper presents the formal Isabelle/HOL framework we use to prove refinement between an executable, monadic specification and the C implementation of the seL4 microkernel. We describe the refinement framework itself, the automated tactics it supports, and the connection to our previous C verification framework. We also report on our experience in applying the framework to seL4. The characteristics of
2011
This paper describes an executable formal semantics of C expressed using a formalism based on term rewriting. Being executable, the semantics has been thoroughly tested against the GCC torture test suite and successfully passes over 96% of 715 test programs. It is the most complete and thoroughly tested formal definition of C to date.
Proceedings of the ACM on Programming Languages
The development of constraint solvers simplified automated reasoning about programs and shifted the engineering burden to implementing symbolic compilation tools that translate programs into efficiently solvable constraints. We describe Grisette, a reusable symbolic evaluation framework for implementing domain-specific symbolic compilers. Grisette evaluates all execution paths and merges their states into a normal form that avoids making guards mutually exclusive. This ordered-guards representation reduces the constraint size 5-fold and the solving time more than 2-fold. Grisette is designed entirely as a library, which sidesteps the complications of lifting the host language into the symbolic domain. Grisette is purely functional, enabling memoization of symbolic compilation as well as monadic integration with host libraries. Grisette is statically typed, which allows catching programming errors at compile time rather than delaying their detection to the constraint solver. We imple...
Formal Aspects of Computing
Runtime Assertion Checking (RAC) for expressive specification languages is a non-trivial verification task, that becomes even more complex for memory-related properties of imperative languages with dynamic memory allocation. It is important to ensure the soundness of RAC verdicts, in particular when RAC reports the absence of failures for execution traces. This paper presents a formalization of a program transformation technique for RAC of memory properties for a representative language with pointers and memory operations, including dynamic allocation and deallocation. The generated program instrumentation relies on an axiomatized observation memory model, which is essential to record and monitor memory-related properties. We prove the soundness of RAC verdicts with regard to the semantics of this language.
ACM SIGPLAN Notices, 2012
This paper describes an executable formal semantics of C expressed using a formalism based on term rewriting. Being executable, the semantics has been thoroughly tested against the GCC torture test suite and successfully passes over 96% of 715 test programs. It is the most complete and thoroughly tested formal definition of C to date.
Journal of Automated Reasoning, 2010
Boogie is a verification condition generator for an imperative core language. It has front-ends for the programming languages C# and C enriched by annotations in first-order logic, i. e. pre-and postconditions, assertions, and loop invariants. Moreover, concepts like ghost fields, ghost variables, ghost code and specification functions have been introduced to support a specific modeling methodology. Boogie's verification conditions-constructed via a wp calculus from annotated programs-are usually transferred to automated theorem provers such as Simplify or Z3. This also comprises the expansion of language-specific modeling constructs in terms of a theory describing memory and elementary operations on it; this theory is called a machine/memory model. In this paper, we present a proof environment, HOL-Boogie, that combines Boogie with the interactive theorem prover Isabelle/HOL, for a specific C front-end and a machine/memory model. In particular, we present specific techniques combining automated and interactive proof methods for code verification. The main goal of our environment is to help program verification engineers in their task to "debug" annotations and to find combined proofs where purely automatic proof attempts fail.
Artificial Intelligence, 1999
We present a framework for the application of abstract interpretation as an aid during program development, rather than in the more traditional application of program optimization. Program validation and detection of errors is rst performed statically by comparing (partial) speci cations written in terms of assertions against information obtained from static analysis of the program. The results of this process are expressed in the user assertion language. Assertions (or parts of assertions) which cannot be veri ed statically are translated into run-time tests. The framework allows the use of assertions to be optional. It also allows using very general properties in assertions, beyond the prede ned set understandable by the static analyzer and including properties de ned by means of user programs. We also report brie y on an implementation of the framework. The resulting tool generates and checks assertions for Prolog, CLP(R), and CHIP/CLP(fd) programs, and integrates compile-time and run-time checking in a uniform way. The tool allows using properties such as types, modes, non-failure, determinacy, and computational cost, and can treat modules separately, performing incremental analysis. In practice, this modularity allows detecting statically bugs in user programs even if they do not contain any assertions.
Lecture Notes in Computer Science, 2006
We present a generic symbolic analysis framework for imperative programming languages. Our framework is capable of computing all valid variable bindings of a program at given program points. This information is invaluable for domain-specific static program analyses such as memory leak detection, program parallelisation, and the detection of superfluous bound checks, variable aliases and task deadlocks. We employ path expression algebra to model the control flow information of programs. A homomorphism maps path expressions into the symbolic domain. At the center of the symbolic domain is a compact algebraic structure called supercontext. A supercontext contains the complete control and data flow analysis information valid at a given program point. Our approach to compute supercontexts is based purely on algebra and is fully automated. This novel representation of program semantics closes the gap between program analysis and computer algebra systems, which makes supercontexts an ideal intermediate representation for all domainspecific static program analyses. Our approach is more general than existing methods because it can derive solutions for arbitrary (even intra-loop) nodes of reducible and irreducible control flow graphs. We prove the correctness of our symbolic analysis method. Our experimental results show that the problem sizes arising from real-world applications such as the SPEC95 benchmark suite are tractable for our symbolic analysis framework.
Automated Software Engineering, 2011
Programs that manipulate dynamic heap objects are difficult to analyze due to issues like aliasing. Lazy initialization algorithm enables the classical symbolic execution to handle such programs. Despite its successes, there are two unresolved issues: (1) inefficiency; (2) lack of formal study. For the inefficiency issue, we have proposed two improved algorithms that give significant analysis time reduction over the original lazy initialization algorithm. In this article, we formalize the lazy initialization algorithm and the improved algorithms as operational semantics of a core subset of the Java Virtual Machine (JVM) instructions, and prove that all algorithms are relatively sound and complete with respect to the JVM concrete semantics. Finally, we conduct a set of extensive experiments that compare the three algorithms and demonstrate the efficiency of the improved algorithms.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.