A common practice for a rapid prototyping of an objectoriented program analysis is to define a li... more A common practice for a rapid prototyping of an objectoriented program analysis is to define a lightweight fragment of Java, that is sufficiently small to facilitate a rigorous analysis of key properties. Due to a lack of important Java features, an experimental ...
Towards Compiling Region Types Into RTSJ-Compliant Java Code
2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2018
In the last decade, multiple Real-Time Specification for Java (RTSJ) compliant Java Virtual Machi... more In the last decade, multiple Real-Time Specification for Java (RTSJ) compliant Java Virtual Machines have been developed and used in safety critical applications. Region-based memory management is a core feature of RTSJ. In this paper, we provide an automatic generation of RTSJ region-based memory management code. We start from a Java program annotated with region types and we apply three type-based analyses. The region types are provided either by our previous region type inference or by the programmers and verified by our previous region type checker. The first two analyses simplify the region type annotations, while the last analysis generates the code according to the RTSJ API.
Ensuring the correctness of software for communicationcentric programs is important but challengi... more Ensuring the correctness of software for communicationcentric programs is important but challenging. Previous approaches, based on session types, have been intensively investigated over the past decade. They provide a concise way to express protocol specifications and a lightweight approach for checking their implementation. Current solutions are based on only implicit synchronization, and are based on the less precise types rather than logical formulae. In this paper, we propose a more expressive session logic to capture multi-party protocols. By using two kinds of ordering constraints, namely “happens-before” ≺HB and “communicatesbefore” ≺CB, we show how to ensure from first principle racefreedom over common channels. Our approach refines each specification with both assumptions and proof obligations to ensure compliance to some global protocol. Each specification is then projected for each party and then each channel, to allow cooperative proving through localized automated verif...
Ensuring software correctness and safety for communicationcentric programs is important but chall... more Ensuring software correctness and safety for communicationcentric programs is important but challenging. In this paper we introduce a solution for writing communication protocols, for checking protocol conformance and for verifying implementation safety. This work draws on ideas from both multiparty session types, which provide a concise way to express communication protocols, as well as from separationstyle logics for shared-memory concurrency, which provide strong safety guarantees for resource sharing. On the one hand, our proposal improves the expressiveness and precision of session types, without sacrificing their conciseness. On the other hand, it increases the applicability of software verification as well as its precision, by making it protocol aware. We also show how to perform the verification of such programs in a modular and automatic fashion. one of the communicating entities requires the re-validation of the entire system. Since validation might be expensive, or difficult to achieve if the source code of certain components is not available, it is desirable for the developer to be able to only validate her changes locally, rather than at the global level. Over the last decades, behavioral types [35,26] have been studied as specifications of the interactions in communicating systems. In particular, multiparty session types [23], or MPSTs, provide a user-friendly syntax for writing choreographic specifications of distributed systems, and a lightweight mechanism for enforcing communication safety. Communication is considered correct when the system's constituent processes are statically type-checked against the end-point projections of the MPST. This formalism and its numerous extensions are attractive in checking if the implementation follows the intended communication pattern, but it lacks the strong safety and correctness guarantees normally provided by the resource-aware verification systems. Specifically, the MPST approach checks if a transmission's exchanged type is the expected one. However, in their most common form, MPSTs are unable to assert something about the message's numerical properties, and even less so about its carried resources in the case of tightly coupled systems. All these, while numerical properties and resource sharing constitute the pièce de résistance for separation logic [38], a logic for reasoning about resource sharing. In this work, we attach a communication logic in the user-friendly style of MPST, to a separation logic for program verification. Even though we draw on ideas from MPST, the proposed logic differs from MPST in a number of features which yield a more expressive communication specification-without compromising its friendly syntax. The current proposal ultimately leads to stronger guarantees w.r.t. the safety and correctness of distributed system. We shall next highlight these differences. Writing Multiparty Communication Protocols. The language we propose for writing communication protocols is described in Fig. 1a. Similar to MPST, the language contains the terminal notation S− →R : c v•∆ to describe a transmission from sender S to receiver R, over channel c. Different from type approaches where a message abstracts a type, the exchanged message v is expressed in the logical form ∆ (defined in Fig. 1b). Do note that v•∆ is in fact a shorthand for the lambda function (λv. ∆). This language uses G 1 * G 2 for the concurrency of global protocols G 1 and G 2 , and G 1 ∨ G 2 for disjunctive choice between either G 1 or G 2 , and finally G 1 ; G 2 on the implicit sequentialization of G 1 before G 2 for either the same party or the same channel. Let us next consider a series of examples to introduce this language and to highlight the benefits over MPST. Example 1: We consider a cloud service for video editing, where a client sends to the cloud a file of some video format, and expects back an enhanced version of the original file, see Fig. 2a. A client-server protocol to describe this simple interaction is written as follows: CS a C− →S : c v•v : file ; S− →C : c v•v : file. The CS a lightweight protocol suffices to describe the order of communication and the exchanged message type. A rigorous specification though, also emphasizes that the server applies some filter on the original file:
2018 23rd International Conference on Engineering of Complex Computer Systems (ICECCS), 2018
Region-based memory management has been shown to be an effective alternative that can co-exist wi... more Region-based memory management has been shown to be an effective alternative that can co-exist with garbage collectors in memory managed languages especially for Real-Time and Big Data applications. In this paper we propose a novel variant region type system that extends our previous Java region types to Generic Java. The main difficulties are given by the type variables used by Generic Java. Our proposal is based on a modular flow analysis that captures regions lifetime relations via subtyping constraints at the method boundary. Our variant region type system guarantees that well-typed Generic Java programs use lexically-scoped regions and never create dangling references in the store and on the program stack.
Android malware has become a serious threat in our daily digital life, and thus there is a pressi... more Android malware has become a serious threat in our daily digital life, and thus there is a pressing need to effectively detect or defend against them. Recent techniques have relied on the extraction of lightweight syntactic features that are suitable for machine learning classification, but despite of their promising results, the features they extract are often too simple to characterise Android applications, and thus may be insufficient when used to detect Android malware. In this paper, we propose CDGDroid, an effective approach for Android malware detection based on deep learning. We use the semantics graph representations, that is, control flow graph, data flow graph, and their possible combinations, as the features to characterise Android applications. We encode the graphs into matrices, and use them to train the classification model via Convolutional Neural Network (CNN). We have conducted some experiments on Marvin, Drebin, VirusShare and ContagioDump datasets to evaluate our approach and have identified that the classification model taking the horizontal combination of CFG and DFG as features offers the best performance in terms of accuracy among all combinations. We have also conducted experiments to compare our approach against Yeganeh Safaei et al.'s approach, Allix et al.'s approach, Drebin and many antivirus tools gathered in VirusTotal, and the experimental results have confirmed that our classification model gives a better performance than the others.
Discovering program specifications automatically for heap-manipulating programs is a challenging ... more Discovering program specifications automatically for heap-manipulating programs is a challenging task due to the complexity of aliasing and mutability of data structures. This task is further complicated by an expressive domain that combines shape, numerical and bag information. In this paper, we propose a compositional analysis framework that would derive the summary for each method in the expressive abstract domain, independently from its callers. We propose a novel abstraction method with a bi-abduction technique in the combined domain to discover pre-/post-conditions that could not be automatically inferred before. The analysis does not only infer memory safety properties, but also finds relationships between pure and shape domains towards full functional correctness of programs. A prototype of the framework has been implemented and initial experiments have shown that our approach can discover interesting properties for non-trivial programs.
Companion to the 21st ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications - OOPSLA '06, 2006
A common practice for rapid prototyping of an object-oriented program analysis is to define a lig... more A common practice for rapid prototyping of an object-oriented program analysis is to define a lightweight fragment of Java, that is sufficiently small to facilitate a rigorous analysis of key properties. Such a lightweight fragment lacks important Java features, thus the experimental evaluation on real-world code is not easy. The solution is either to extend the prototype to the whole Java or to rewrite the real-world code in the lightweight language. We propose an intermediate solution through Core-Java, an expression-oriented core calculus of Java and a comprehensive set of translation rules from Java to Core-Java. The translation can be guided by the specific requirements of each program analysis. We have built an implementation of our framework and have used it for two different analyses on Java programs.
2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, 2013
Constructing software automatically from highlevel models is one of the challenges in software en... more Constructing software automatically from highlevel models is one of the challenges in software engineering nowadays. There is an urgent need for adequate methods to ensure high quality of models. The Executable Foundational UML (fUML) has been proposed as a computationally complete and compact subset of UML. A fUML model is supposed to be executed and tested in the early stage of the software development process. The complete static and operational semantics of fUML is still in its early stages, and although several proposals to execute and verify fUML models have been issued, this problem is still open. Our project aims to develop a complete virtual machine for fUML models using the K-framework which is a rewrite-based executable semantic framework. Our novel model execution will enable to efficiently test and verify fUML models.
We study the automated verification of pointer safety for heap-manipulating imperative programs w... more We study the automated verification of pointer safety for heap-manipulating imperative programs with unknown procedure calls. Given a Hoare-style partial correctness specification S = {Pre} C {Post} in separation logic, where the program C contains calls to some unknown procedure U , we infer a specification SU for the unknown procedure U from the calling contexts. We show that the problem of verifying the program C against the specification S can be safely reduced to the problem of proving that the procedure U (once its code is available) meets the derived specification SU . The expected specification SU for the unknown procedure U is automatically calculated using an abduction-based shape analysis adapted from the bottom-up shape analysis by Calcagno et al. . We have also implemented a prototype system to validate the viability of our approach.
Region-based memory management can offer increased time performance, providing support for real-t... more Region-based memory management can offer increased time performance, providing support for real-time constraints in program execution. We have implemented region-based memory support into the SSCLI 2.0 platform and also devised a region inference system for CIL programs, with the aid of newly introduced instructions. Results seem promising, as the programs running with regions have considerably smaller interrupting delays compared to those running with garbage collector.
Region-based memory management can offer improved time performance, relatively good memory locali... more Region-based memory management can offer improved time performance, relatively good memory locality and reuse, and also provide better adherence to real-time constraints during execution, when compared against traditional garbage collection. We have implemented a region-memory subsystem into the SSCLI 2.0 platform and also adapted an inference system to region-enable CIL programs, with the aid of newly introduced instructions. Results seem promising, as the programs running with regions have considerably smaller interrupting delays, compared to those running with garbage collection. Regions can bring runtime speed improvement of up to 50%, depending on how complicated are the data structures used in execution. The present work can also be viewed as an experience report based on our initiatory attempt at integrating a new region memory subsystem into a commercially developed shared source platform.
A common practice for a rapid prototyping of an objectoriented program analysis is to define a li... more A common practice for a rapid prototyping of an objectoriented program analysis is to define a lightweight fragment of Java, that is sufficiently small to facilitate a rigorous analysis of key properties. Due to a lack of important Java features, an experimental ...
Towards Compiling Region Types Into RTSJ-Compliant Java Code
2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2018
In the last decade, multiple Real-Time Specification for Java (RTSJ) compliant Java Virtual Machi... more In the last decade, multiple Real-Time Specification for Java (RTSJ) compliant Java Virtual Machines have been developed and used in safety critical applications. Region-based memory management is a core feature of RTSJ. In this paper, we provide an automatic generation of RTSJ region-based memory management code. We start from a Java program annotated with region types and we apply three type-based analyses. The region types are provided either by our previous region type inference or by the programmers and verified by our previous region type checker. The first two analyses simplify the region type annotations, while the last analysis generates the code according to the RTSJ API.
Ensuring the correctness of software for communicationcentric programs is important but challengi... more Ensuring the correctness of software for communicationcentric programs is important but challenging. Previous approaches, based on session types, have been intensively investigated over the past decade. They provide a concise way to express protocol specifications and a lightweight approach for checking their implementation. Current solutions are based on only implicit synchronization, and are based on the less precise types rather than logical formulae. In this paper, we propose a more expressive session logic to capture multi-party protocols. By using two kinds of ordering constraints, namely “happens-before” ≺HB and “communicatesbefore” ≺CB, we show how to ensure from first principle racefreedom over common channels. Our approach refines each specification with both assumptions and proof obligations to ensure compliance to some global protocol. Each specification is then projected for each party and then each channel, to allow cooperative proving through localized automated verif...
Ensuring software correctness and safety for communicationcentric programs is important but chall... more Ensuring software correctness and safety for communicationcentric programs is important but challenging. In this paper we introduce a solution for writing communication protocols, for checking protocol conformance and for verifying implementation safety. This work draws on ideas from both multiparty session types, which provide a concise way to express communication protocols, as well as from separationstyle logics for shared-memory concurrency, which provide strong safety guarantees for resource sharing. On the one hand, our proposal improves the expressiveness and precision of session types, without sacrificing their conciseness. On the other hand, it increases the applicability of software verification as well as its precision, by making it protocol aware. We also show how to perform the verification of such programs in a modular and automatic fashion. one of the communicating entities requires the re-validation of the entire system. Since validation might be expensive, or difficult to achieve if the source code of certain components is not available, it is desirable for the developer to be able to only validate her changes locally, rather than at the global level. Over the last decades, behavioral types [35,26] have been studied as specifications of the interactions in communicating systems. In particular, multiparty session types [23], or MPSTs, provide a user-friendly syntax for writing choreographic specifications of distributed systems, and a lightweight mechanism for enforcing communication safety. Communication is considered correct when the system's constituent processes are statically type-checked against the end-point projections of the MPST. This formalism and its numerous extensions are attractive in checking if the implementation follows the intended communication pattern, but it lacks the strong safety and correctness guarantees normally provided by the resource-aware verification systems. Specifically, the MPST approach checks if a transmission's exchanged type is the expected one. However, in their most common form, MPSTs are unable to assert something about the message's numerical properties, and even less so about its carried resources in the case of tightly coupled systems. All these, while numerical properties and resource sharing constitute the pièce de résistance for separation logic [38], a logic for reasoning about resource sharing. In this work, we attach a communication logic in the user-friendly style of MPST, to a separation logic for program verification. Even though we draw on ideas from MPST, the proposed logic differs from MPST in a number of features which yield a more expressive communication specification-without compromising its friendly syntax. The current proposal ultimately leads to stronger guarantees w.r.t. the safety and correctness of distributed system. We shall next highlight these differences. Writing Multiparty Communication Protocols. The language we propose for writing communication protocols is described in Fig. 1a. Similar to MPST, the language contains the terminal notation S− →R : c v•∆ to describe a transmission from sender S to receiver R, over channel c. Different from type approaches where a message abstracts a type, the exchanged message v is expressed in the logical form ∆ (defined in Fig. 1b). Do note that v•∆ is in fact a shorthand for the lambda function (λv. ∆). This language uses G 1 * G 2 for the concurrency of global protocols G 1 and G 2 , and G 1 ∨ G 2 for disjunctive choice between either G 1 or G 2 , and finally G 1 ; G 2 on the implicit sequentialization of G 1 before G 2 for either the same party or the same channel. Let us next consider a series of examples to introduce this language and to highlight the benefits over MPST. Example 1: We consider a cloud service for video editing, where a client sends to the cloud a file of some video format, and expects back an enhanced version of the original file, see Fig. 2a. A client-server protocol to describe this simple interaction is written as follows: CS a C− →S : c v•v : file ; S− →C : c v•v : file. The CS a lightweight protocol suffices to describe the order of communication and the exchanged message type. A rigorous specification though, also emphasizes that the server applies some filter on the original file:
2018 23rd International Conference on Engineering of Complex Computer Systems (ICECCS), 2018
Region-based memory management has been shown to be an effective alternative that can co-exist wi... more Region-based memory management has been shown to be an effective alternative that can co-exist with garbage collectors in memory managed languages especially for Real-Time and Big Data applications. In this paper we propose a novel variant region type system that extends our previous Java region types to Generic Java. The main difficulties are given by the type variables used by Generic Java. Our proposal is based on a modular flow analysis that captures regions lifetime relations via subtyping constraints at the method boundary. Our variant region type system guarantees that well-typed Generic Java programs use lexically-scoped regions and never create dangling references in the store and on the program stack.
Android malware has become a serious threat in our daily digital life, and thus there is a pressi... more Android malware has become a serious threat in our daily digital life, and thus there is a pressing need to effectively detect or defend against them. Recent techniques have relied on the extraction of lightweight syntactic features that are suitable for machine learning classification, but despite of their promising results, the features they extract are often too simple to characterise Android applications, and thus may be insufficient when used to detect Android malware. In this paper, we propose CDGDroid, an effective approach for Android malware detection based on deep learning. We use the semantics graph representations, that is, control flow graph, data flow graph, and their possible combinations, as the features to characterise Android applications. We encode the graphs into matrices, and use them to train the classification model via Convolutional Neural Network (CNN). We have conducted some experiments on Marvin, Drebin, VirusShare and ContagioDump datasets to evaluate our approach and have identified that the classification model taking the horizontal combination of CFG and DFG as features offers the best performance in terms of accuracy among all combinations. We have also conducted experiments to compare our approach against Yeganeh Safaei et al.'s approach, Allix et al.'s approach, Drebin and many antivirus tools gathered in VirusTotal, and the experimental results have confirmed that our classification model gives a better performance than the others.
Discovering program specifications automatically for heap-manipulating programs is a challenging ... more Discovering program specifications automatically for heap-manipulating programs is a challenging task due to the complexity of aliasing and mutability of data structures. This task is further complicated by an expressive domain that combines shape, numerical and bag information. In this paper, we propose a compositional analysis framework that would derive the summary for each method in the expressive abstract domain, independently from its callers. We propose a novel abstraction method with a bi-abduction technique in the combined domain to discover pre-/post-conditions that could not be automatically inferred before. The analysis does not only infer memory safety properties, but also finds relationships between pure and shape domains towards full functional correctness of programs. A prototype of the framework has been implemented and initial experiments have shown that our approach can discover interesting properties for non-trivial programs.
Companion to the 21st ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications - OOPSLA '06, 2006
A common practice for rapid prototyping of an object-oriented program analysis is to define a lig... more A common practice for rapid prototyping of an object-oriented program analysis is to define a lightweight fragment of Java, that is sufficiently small to facilitate a rigorous analysis of key properties. Such a lightweight fragment lacks important Java features, thus the experimental evaluation on real-world code is not easy. The solution is either to extend the prototype to the whole Java or to rewrite the real-world code in the lightweight language. We propose an intermediate solution through Core-Java, an expression-oriented core calculus of Java and a comprehensive set of translation rules from Java to Core-Java. The translation can be guided by the specific requirements of each program analysis. We have built an implementation of our framework and have used it for two different analyses on Java programs.
2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, 2013
Constructing software automatically from highlevel models is one of the challenges in software en... more Constructing software automatically from highlevel models is one of the challenges in software engineering nowadays. There is an urgent need for adequate methods to ensure high quality of models. The Executable Foundational UML (fUML) has been proposed as a computationally complete and compact subset of UML. A fUML model is supposed to be executed and tested in the early stage of the software development process. The complete static and operational semantics of fUML is still in its early stages, and although several proposals to execute and verify fUML models have been issued, this problem is still open. Our project aims to develop a complete virtual machine for fUML models using the K-framework which is a rewrite-based executable semantic framework. Our novel model execution will enable to efficiently test and verify fUML models.
We study the automated verification of pointer safety for heap-manipulating imperative programs w... more We study the automated verification of pointer safety for heap-manipulating imperative programs with unknown procedure calls. Given a Hoare-style partial correctness specification S = {Pre} C {Post} in separation logic, where the program C contains calls to some unknown procedure U , we infer a specification SU for the unknown procedure U from the calling contexts. We show that the problem of verifying the program C against the specification S can be safely reduced to the problem of proving that the procedure U (once its code is available) meets the derived specification SU . The expected specification SU for the unknown procedure U is automatically calculated using an abduction-based shape analysis adapted from the bottom-up shape analysis by Calcagno et al. . We have also implemented a prototype system to validate the viability of our approach.
Region-based memory management can offer increased time performance, providing support for real-t... more Region-based memory management can offer increased time performance, providing support for real-time constraints in program execution. We have implemented region-based memory support into the SSCLI 2.0 platform and also devised a region inference system for CIL programs, with the aid of newly introduced instructions. Results seem promising, as the programs running with regions have considerably smaller interrupting delays compared to those running with garbage collector.
Region-based memory management can offer improved time performance, relatively good memory locali... more Region-based memory management can offer improved time performance, relatively good memory locality and reuse, and also provide better adherence to real-time constraints during execution, when compared against traditional garbage collection. We have implemented a region-memory subsystem into the SSCLI 2.0 platform and also adapted an inference system to region-enable CIL programs, with the aid of newly introduced instructions. Results seem promising, as the programs running with regions have considerably smaller interrupting delays, compared to those running with garbage collection. Regions can bring runtime speed improvement of up to 50%, depending on how complicated are the data structures used in execution. The present work can also be viewed as an experience report based on our initiatory attempt at integrating a new region memory subsystem into a commercially developed shared source platform.
Uploads
Papers by Florin Craciun