Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
…
13 pages
1 file
As software vulnerabilities grow in volume and complexity, researchers proposed various Artificial Intelligence (AI)-based approaches to help under-resourced security analysts to find, detect, and localize vulnerabilities. However, security analysts still have to spend a huge amount of effort to manually fix or repair such vulnerable functions. Recent work proposed an NMT-based Automated Vulnerability Repair, but it is still far from perfect due to various limitations. In this paper, we propose VulRepair, a T5-based automated software vulnerability repair approach that leverages the pre-training and BPE components to address various technical limitations of prior work. Through an extensive experiment with over 8,482 vulnerability fixes from real-world software projects, we find that our VulRepair achieves a Perfect Prediction of 44%, which is 13%-21% more accurate than competitive baseline approaches. These results lead us to conclude that our VulRepair is considerably more accurate than two baseline approaches, highlighting the substantial advancement of NMT-based Automated Vulnerability Repairs. Our additional investigation also shows that our VulRepair can accurately repair as many as 745 out of 1,706 real-world well-known vulnerabilities (e.g., Use After Free, Improper Input Validation, OS Command Injection), demonstrating the practicality and significance of our VulRepair for generating vulnerability repairs, helping under-resourced security analysts on fixing vulnerabilities.
Theoretical and Applied Cybersecurity, 2020
We present the DeeDP system for automatic vulnerabilities detection and patch providing. DeeDP allows to detect vulnerabilities in C/C++ source code and generate patch for fixing detected issue. This system uses deep learning methods to organize rules for deciding whether a code fragment is vulnerable. Patch generation processes can be performed based on neural network and rule-based approaches. The system uses the abstract syntax tree (AST) representations of the source code fragments. We have tested effectiveness of our approach on different open source projects. For example, Microsoft/Terminal (https://github.com/microsoft/Terminal) was analyzed with DeeDP: our system detected security issue and generated patch which was successfully approved and applied by Microsoft maintainers.
2020
When developers fix a defect, they may change multiple files. The number of files changed for resolving the defect depends on how strongly the files are coupled with each other. In earlier works, researchers leveraged this coupling for better understanding and analyzing software as well as for guiding developers to quickly find all probable code areas to complete fixing a defect. In some studies, researchers generated association rules reflecting the coupling among files and built tools to automate the discovery of the related changes in the files. Such tools, however, do not consider the type of defects resolved earlier for generating the rules as a result of which many unrelated files may come up while changing a file in later releases for resolving a specific type of defect. Therefore, in our study, we consider only security defects or vulnerabilities to generate the rules and then automate the finding process of other related files while fixing a vulnerability. Our tool “SecureC...
2019
The article provides a view on modern technologies, which are used for automatic software vulnerability testing in critically important systems. Features of fuzzing realization (which is based on making many inputs with different mutated data) are also studied. As a result, testing algorithm picks input data that is more likely to cause a fail or incorrect work of software product. Deep learning algorithms are used to decrease the computational complexity of testing process. The use of simple fuzzer and Deep Reinforcement Learning algorithm shows that the amount of mutations necessary to find vulnerabilities decreases by 30%.
2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)
Preventing vulnerability exploits is a critical software maintenance task, and software engineers often rely on Common Vulnerability and Exposure (CVEs) reports for information about vulnerable systems and libraries. These reports include descriptions, disclosure sources, and manually-populated vulnerability characteristics such as root cause from the NIST Vulnerability Description Ontology (VDO). This information needs to be complete and accurate so stakeholders of affected products can prevent and react to exploits of the reported vulnerabilities. However, characterizing each report requires significant time and expertise which can lead to inaccurate or incomplete reports. This directly impacts stakeholders ability to quickly and correctly maintain their affected systems. In this study, we demonstrate that VDO characteristics can be automatically detected from the textual descriptions included in CVE reports. We evaluated the performance of 6 classification algorithms with a dataset of 365 vulnerability descriptions, each mapped to 1 of 19 characteristics from the VDO. This work demonstrates that it is feasible to train classification techniques to accurately characterize vulnerabilities from their descriptions. All 6 classifiers evaluated produced accurate results, and the Support Vector Machine classifier was the best-performing individual classifier. Automating the vulnerability characterization process is a step towards ensuring stakeholders have the necessary data to effectively maintain their systems.
Technix International Journal for Engineering Research, 2024
Legacy vulnerability management paradigms are demonstrably insufficient in addressing the burgeoning volume and intricacy of software vulnerabilities. This discourse delves into the transformative potential of Artificial Intelligence (AI) to fundamentally reshape vulnerability management. We investigate the application of AI techniques, encompassing machine learning and deep learning, across the vulnerability management lifecycle, spanning identification, prioritization, remediation, and predictive modeling. Furthermore, this paper elucidates the inherent challenges and prospective trajectories of AI-driven vulnerability management, underscoring its capacity to substantially augment cybersecurity posture in the face of the dynamic threat landscape.
2010
The security demands on modern system administration are enormous and getting worse. Chief among these demands, administrators must monitor the continual ongoing disclosure of software vulnerabilities that have the potential to compromise their systems in some way. Such vulnerabilities include buffer overflow errors, improperly validated inputs, and other unanticipated attack modalities. In 2008, over 7,400 new vulnerabilities were disclosedwell over 100 per week. While no enterprise is affected by all of these disclosures, administrators commonly face many outstanding vulnerabilities across the software systems they manage. Vulnerabilities can be addressed by patches, reconfigurations, and other workarounds; however, these actions may incur down-time or unforeseen side-effects. Thus, a key question for systems administrators is which vulnerabilities to prioritize. From publicly available databases that document past vulnerabilities, we show how to train classifiers that predict whether and how soon a vulnerability is likely to be exploited. As input, our classifiers operate on high dimensional feature vectors that we extract from the text fields, time stamps, cross-references, and other entries in existing vulnerability disclosure reports. Compared to current industry-standard heuristics based on expert knowledge and static formulas, our classifiers predict much more accurately whether and how soon individual vulnerabilities are likely to be exploited.
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), 2021
Weaknesses in computer systems such as faults, bugs and errors in the architecture, design or implementation of software provide vulnerabilities that can be exploited by attackers to compromise the security of a system. Common Weakness Enumerations (CWE) are a hierarchically designed dictionary of software weaknesses that provide a means to understand software flaws, potential impact of their exploitation, and means to mitigate these flaws. Common Vulnerabilities and Exposures (CVE) are brief low-level descriptions that uniquely identify vulnerabilities in a specific product or protocol. Classifying or mapping of CVEs to CWEs provides a means to understand the impact and mitigate the vulnerabilities. Since manual mapping of CVEs is not a viable option, automated approaches are desirable but challenging. We present a novel Transformer-based learning framework (V2W-BERT) in this paper. By using ideas from natural language processing, link prediction and transfer learning, our method outperforms previous approaches not only for CWE instances with abundant data to train, but also rare CWE classes with little or no data to train. Our approach also shows significant improvements in using historical data to predict links for future instances of CVEs, and therefore, provides a viable approach for practical applications. Using data from MITRE and National Vulnerability Database, we achieve up to 97% prediction accuracy for randomly partitioned data and up to 94% prediction accuracy in temporally partitioned data. We believe that our work will influence the design of better methods and training models, as well as applications to solve increasingly harder problems in cybersecurity.
2013
Web application security is an important problem in today’s internet. A major cause of this status is that many program-mers do not have adequate knowledge about secure coding, so they leave applications with vulnerabilities. An approach to solve this problem is to use source code static analysis to find these bugs, but these tools are known to report many false positives that make hard the task of correcting the applica-tion. This paper explores the use of a hybrid of methods to detect vulnerabilities with less false positives. After an initial step that uses taint analysis to flag candidate vulnerabilities, our approach uses data mining to predict the existence of false positives. This approach reaches a trade-off between two ap-parently opposite approaches: humans coding the knowledge about vulnerabilities (for taint analysis) versus automatically obtaining that knowledge (with machine learning, for data mining). Given this more precise form of detection, we do au-tomatic code co...
Proceedings of the 17th International Conference on Mining Software Repositories
Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by automatically predicting the vulnerability-relatedness of each data item. It supports a complete pipeline from data collection, model training and prediction, to the validation of new models before deployment. It is executed iteratively to generate better models as new input data become available. We use self-training to significantly and automatically increase the size of the training dataset, opportunistically maximizing the improvement in the models' quality at each iteration. We devised new deployment stability metric to evaluate the quality of the new models before deployment into production, which helped to discover an error. We experimentally evaluate the improvement in the performance of the models in one iteration, with 27.59% maximum PR AUC improvements. Ours is the first of such study across a variety of data sources. We discover that the addition of the features of the corresponding commits to the features of issues/pull requests improve the precision for the recall values that matter. We demonstrate the effectiveness of self-training alone, with 10.50% PR AUC improvement, and we discover that there is no uniform ordering of word2vec parameters sensitivity across data sources. CCS CONCEPTS • Security and privacy → Software security engineering; • Software and its engineering → Software maintenance tools.
IEEE Access, 2022
Automatic software vulnerability detection has caught the eyes of researchers as because software vulnerabilities are exploited vehemently causing major cyber-attacks. Thus, designing a vulnerability detector is an inevitable approach to eliminate vulnerabilities. With the advances of Natural language processing in the application of interpreting source code as text, AI approaches based on Machine Learning, Deep Learning and Graph Neural Network have impactful research works. The key requirement for developing an AI based vulnerability detector model from a developer perspective is to identify which AI model to adopt, availability of labelled dataset, how to represent essential feature and tokenizing the extracted feature vectors, specification of vulnerability coverage with detection granularity. Most of the literature review work explores AI approaches based on either Machine Learning or Deep Learning model. The existing literature work either highlight only feature representation technique or identifying granularity level and dataset. A qualitative comparative analysis on ML, DL, GNN based model is presented in this work to get a complete picture on VDM thus addressing the challenges of a researcher to choose suitable architecture, feature representation and processing required for designing a VDM. This work focuses on putting together all the essential bits required for designing an automated software vulnerability detection model using any various AI approaches.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
ijetrm journal, 2023
Proceedings of the AAAI Conference on Artificial Intelligence
IEEE Transactions on Software Engineering
Arxiv preprint arXiv:1010.2511, 2010
Zenodo (CERN European Organization for Nuclear Research), 2022
IEEE Security & Privacy, 2009
Proceedings of the 25th International Symposium on Software Testing and Analysis, 2016
2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019
IEEE Transactions on Software Engineering, 2021
arXiv (Cornell University), 2023
Proceedings of the 23rd international conference on World wide web - WWW '14, 2014