Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2023, arXiv (Cornell University)
…
23 pages
1 file
Static analysis tools are traditionally used to detect and flag programs that violate properties. We show that static analysis tools can also be used to perturb programs that satisfy a property to construct variants that violate the property. Using this insight we can construct paired data sets of unsafe-safe program pairs, and learn strategies to automatically repair property violations. We present a system called StaticFixer, which automatically repairs information flow vulnerabilities using this approach. Since information flow properties are non-local (both to check and repair), StaticFixer also introduces a novel domain specific language (DSL) and strategy learning algorithms for synthesizing non-local repairs. We use StaticFixer to synthesize strategies for repairing two types of information flow vulnerabilities, unvalidated dynamic calls and cross-site scripting, and show that StaticFixer successfully repairs several hundred vulnerabilities from open source JavaScript repositories, outperforming neural baselines built using CodeT5 and Codex. Our datasets can be downloaded from http://aka.ms/StaticFixer.
Proceedings of the 25th International Symposium on Software Testing and Analysis, 2016
The state of web security remains troubling as web applications continue to be favorite targets of hackers. Static analysis tools are important mechanisms for programmers to deal with this problem as they search for vulnerabilities automatically in the application source code, allowing programmers to remove them. However, developing these tools requires explicitly coding knowledge about how to discover each kind of vulnerability. This paper presents a new approach in which static analysis tools learn to detect vulnerabilities automatically using machine learning. The approach uses a sequence model to learn to characterize vulnerabilities based on a set of annotated source code slices. This model takes into consideration the order in which the code elements appear and are executed in the slices. The model created can then be used as a static analysis tool to discover and identify vulnerabilities in source code. The approach was implemented in the DEKANT tool and evaluated experimentally with a set of open source PHP applications and WordPress plugins, finding 16 zero-day vulnerabilities. CCS Concepts •Software and its engineering → Software verification and validation; •Security and privacy → Vulnerability management; Web application security; •Computing methodologies → Machine learning;
Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Traditional automatic program repair (APR) tools rely on a testsuite as a repair specification. But test suites even when available are not of specification quality, limiting the performance and hence viability of test-suite based repair. On the other hand, static analysisbased bug finding tools are seeing increasing adoption in industry but still face challenges since the reported violations are viewed as not easily actionable. We propose a novel solution that solves both these challenges through a technique for automatically generating high-quality patches for static analysis violations by learning from examples. Our approach uses the static analyzer as an oracle and does not require a test suite. We realize our solution in a system, Phoenix, that implements a fully-automated pipeline that mines and cleans patches for static analysis violations from the wild, learns generalized executable repair strategies as programs in a novel Domain Specific Language (DSL), and then instantiates concrete repairs from them on new unseen violations. Using Phoenix we mine a corpus of 5,389 unique violations and patches from 517 Github projects. In a cross-validation study on this corpus Phoenix successfully produced 4,596 bug-fixes, with a recall of 85% and a precision of 54%. When applied to the latest revisions of a further 5 Github projects, Phoenix produced 94 correct patches to previously unknown bugs, 19 of which have already been accepted and merged by the development teams. To the best of our knowledge this constitutes, by far the largest application of any automatic patch generation technology to large-scale real-world systems. CCS CONCEPTS • Software and its engineering → Automated static analysis; Domain specific languages; Programming by example; Software testing and debugging.
2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
Automatically locating vulnerable statements in source code is crucial to assure software security and alleviate developers' debugging efforts. This becomes even more important in today's software ecosystem, where vulnerable code can flow easily and unwittingly within and across software repositories like GitHub. Across such millions of lines of code, traditional static and dynamic approaches struggle to scale. Although existing machine-learning-based approaches look promising in such a setting, most work detects vulnerable code at a higher granularity-at the method or file level. Thus, developers still need to inspect a significant amount of code to locate the vulnerable statement(s) that need to be fixed. This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph and effectively understand code semantics and vulnerable patterns. To study VELVET's effectiveness, we use an off-the-shelf synthetic dataset and a recently published real-world dataset. In the static analysis setting, where vulnerable functions are not detected in advance, VELVET achieves 4.5× better performance than the baseline static analyzers on the real-world data. For the isolated vulnerability localization task, where we assume the vulnerability of a function is known while the specific vulnerable statement is unknown, we compare VELVET with several neural networks that also attend to local and global context of code. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively, outperforming the baseline deep learning models by 5.3-29.0%.
Theoretical and Applied Cybersecurity, 2020
We present the DeeDP system for automatic vulnerabilities detection and patch providing. DeeDP allows to detect vulnerabilities in C/C++ source code and generate patch for fixing detected issue. This system uses deep learning methods to organize rules for deciding whether a code fragment is vulnerable. Patch generation processes can be performed based on neural network and rule-based approaches. The system uses the abstract syntax tree (AST) representations of the source code fragments. We have tested effectiveness of our approach on different open source projects. For example, Microsoft/Terminal (https://github.com/microsoft/Terminal) was analyzed with DeeDP: our system detected security issue and generated patch which was successfully approved and applied by Microsoft maintainers.
Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
As software vulnerabilities grow in volume and complexity, researchers proposed various Artificial Intelligence (AI)-based approaches to help under-resourced security analysts to find, detect, and localize vulnerabilities. However, security analysts still have to spend a huge amount of effort to manually fix or repair such vulnerable functions. Recent work proposed an NMT-based Automated Vulnerability Repair, but it is still far from perfect due to various limitations. In this paper, we propose VulRepair, a T5-based automated software vulnerability repair approach that leverages the pre-training and BPE components to address various technical limitations of prior work. Through an extensive experiment with over 8,482 vulnerability fixes from real-world software projects, we find that our VulRepair achieves a Perfect Prediction of 44%, which is 13%-21% more accurate than competitive baseline approaches. These results lead us to conclude that our VulRepair is considerably more accurate than two baseline approaches, highlighting the substantial advancement of NMT-based Automated Vulnerability Repairs. Our additional investigation also shows that our VulRepair can accurately repair as many as 745 out of 1,706 real-world well-known vulnerabilities (e.g., Use After Free, Improper Input Validation, OS Command Injection), demonstrating the practicality and significance of our VulRepair for generating vulnerability repairs, helping under-resourced security analysts on fixing vulnerabilities.
2020
When developers fix a defect, they may change multiple files. The number of files changed for resolving the defect depends on how strongly the files are coupled with each other. In earlier works, researchers leveraged this coupling for better understanding and analyzing software as well as for guiding developers to quickly find all probable code areas to complete fixing a defect. In some studies, researchers generated association rules reflecting the coupling among files and built tools to automate the discovery of the related changes in the files. Such tools, however, do not consider the type of defects resolved earlier for generating the rules as a result of which many unrelated files may come up while changing a file in later releases for resolving a specific type of defect. Therefore, in our study, we consider only security defects or vulnerabilities to generate the rules and then automate the finding process of other related files while fixing a vulnerability. Our tool “SecureC...
2019 IEEE Symposium on Security and Privacy (SP)
Security vulnerabilities are among the most critical software defects in existence. When identified, programmers aim to produce patches that prevent the vulnerability as quickly as possible, motivating the need for automatic program repair (APR) methods to generate patches automatically. Unfortunately, most current APR methods fall short because they approximate the properties necessary to prevent the vulnerability using examples. Approximations result in patches that either do not fix the vulnerability comprehensively, or may even introduce new bugs. Instead, we propose property-based APR, which uses human-specified, program-independent and vulnerability-specific safety properties to derive source code patches for security vulnerabilities. Unlike properties that are approximated by observing the execution of test cases, such safety properties are precise and complete. The primary challenge lies in mapping such safety properties into source code patches that can be instantiated into an existing program. To address these challenges, we propose Senx, which, given a set of safety properties and a single input that triggers the vulnerability, detects the safety property violated by the vulnerability input and generates a corresponding patch that enforces the safety property and thus, removes the vulnerability. Senx solves several challenges with property-based APR: it identifies the program expressions and variables that must be evaluated to check safety properties and identifies the program scopes where they can be evaluated, it generates new code to selectively compute the values it needs if calling existing program code would cause unwanted side effects, and it uses a novel access range analysis technique to avoid placing patches inside loops where it could incur performance overhead. Our evaluation shows that the patches generated by Senx successfully fix 32 of 42 real-world vulnerabilities from 11 applications including various tools or libraries for manipulating graphics/media files, a programming language interpreter, a relational database engine, a collection of programming tools for creating and managing binary programs, and a collection of basic file, shell, and text manipulation tools.
2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
Static analysis has established itself as a weapon of choice for detecting security vulnerabilities. Taint analysis in particular is a very general and powerful technique, where security policies are expressed in terms of forbidden flows, either from untrusted input sources to sensitive sinks (in integrity policies) or from sensitive sources to untrusted sinks (in confidentiality policies). The appeal of this approach is that the taint-tracking mechanism has to be implemented only once, and can then be parameterized with different taint specifications (that is, sets of sources and sinks, as well as any sanitizers that render otherwise problematic flows innocuous) to detect many different kinds of vulnerabilities. But while techniques for implementing scalable inter-procedural static taint tracking are fairly well established, crafting taint specifications is still more of an art than a science, and in practice tends to involve a lot of manual effort. Past work has focussed on automated techniques for inferring taint specifications for libraries either from their implementation or from the way they tend to be used in client code. Among the latter, machine learning-based approaches have shown great promise. In this work we present our experience combining an existing machine-learning approach to mining sink specifications for JavaScript libraries with manual taint modelling in the context of GitHub's CodeQL analysis framework. We show that the machinelearning component can successfully infer many new taint sinks that either are not part of the manual modelling or are not detected due to analysis incompleteness. Moreover, we present techniques for organizing sink predictions using automated ranking and codesimilarity metrics that allow an analysis engineer to efficiently sift through large numbers of predictions to identify true positives. CCS CONCEPTS • Software and its engineering → Automated static analysis.
2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2019
Fix pattern-based patch generation is a promising direction in Automated Program Repair (APR). Notably, it has been demonstrated to produce more acceptable and correct patches than the patches obtained with mutation operators through genetic programming. The performance of pattern-based APR systems, however, depends on the fix ingredients mined from fix changes in development histories. Unfortunately, collecting a reliable set of bug fixes in repositories can be challenging. In this paper, we propose to investigate the possibility in an APR scenario of leveraging code changes that address violations by static bug detection tools. To that end, we build the AVATAR APR system, which exploits fix patterns of static analysis violations as ingredients for patch generation. Evaluated on the Defects4J benchmark, we show that, assuming a perfect localization of faults, AVATAR can generate correct patches to fix 34/39 bugs. We further find that AVATAR yields performance metrics that are comparable to that of the closely-related approaches in the literature. While AVATAR outperforms many of the state-of-theart pattern-based APR systems, it is mostly complementary to current approaches. Overall, our study highlights the relevance of static bug finding tools as indirect contributors of fix ingredients for addressing code defects identified with functional test cases.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
NeurIPS 2020 Workshop on Computer-Assisted Programming, 2021
Proceedings of the AAAI Conference on Artificial Intelligence
Source Code Analysis and …, 2008
IEEE Transactions on Reliability, 2015
Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007), 2007
arXiv (Cornell University), 2023
2009 13th European Conference on Software Maintenance and Reengineering, 2009
arXiv (Cornell University), 2022
arXiv (Cornell University), 2023
ijetrm journal, 2023
Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2021
2020 IEEE Symposium on Security and Privacy (SP), 2020
2017 13th European Dependable Computing Conference (EDCC)