Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2018
Computer malware is one of the greatest dangers to the modern society, allowing attackers to uncover restricted data and to control a wide range of critical infrastructure. Furthermore, computer malware evolve rapidly, forcing anti-malware vendors to put most of their efforts on developing techniques for detecting new and therefore previously unknown malware. We present Split Malware, a method for splitting malware into small pieces. Each piece is not discovered by anti-malware tools, yet together they perform a malicious task.
Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop - AISec '14, 2014
Clustering algorithms have become a popular tool in computer security to analyze the behavior of malware variants, identify novel malware families, and generate signatures for antivirus systems. However, the suitability of clustering algorithms for security-sensitive settings has been recently questioned by showing that they can be significantly compromised if an attacker can exercise some control over the input data. In this paper, we revisit this problem by focusing on behavioral malware clustering approaches, and investigate whether and to what extent an attacker may be able to subvert these approaches through a careful injection of samples with poisoning behavior. To this end, we present a case study on Malheur, an open-source tool for behavioral malware clustering. Our experiments not only demonstrate that this tool is vulnerable to poisoning attacks, but also that it can be significantly compromised even if the attacker can only inject a very small percentage of attacks into the input data. As a remedy, we discuss possible countermeasures and highlight the need for more secure clustering algorithms.
Lecture Notes in Computer Science, 2014
A critical challenge when combating malware threat is how to efficiently and effectively identify the targeted victim's environment, given an unknown malware sample. Unfortunately, existing malware analysis techniques either use a limited, fixed set of analysis environments (not effective) or employ expensive, time-consuming multi-path exploration (not efficient), making them not well-suited to solve this challenge. As such, this paper proposes a new dynamic analysis scheme to deal with this problem by applying the concept of speculative execution in this new context. Specifically, by providing multiple dynamically created, parallel, and virtual environment spaces, we speculatively execute a malware sample and adaptively switch to the right environment during the analysis. Interestingly, while our approach appears to trade space for speed, we show that it can actually use less memory space and achieve much higher speed than existing schemes. We have implemented a prototype system, GOLDENEYE, and evaluated it with a large real-world malware dataset. The experimental results show that GOLDENEYE outperforms existing solutions and can effectively and efficiently expose malware's targeted environment, thereby speeding up the analysis in the critical battle against the emerging targeted malware threat.
2014 Recent Advances in Engineering and Computational Sciences (RAECS), 2014
With the rise in the underground Internet economy, automated malicious programs popularly known as malwares have become a major threat to computers and information systems connected to the internet. Properties such as self healing, self hiding and ability to deceive the security devices make these software hard to detect and mitigate. Therefore, the detection and the mitigation of such malicious software is a major challenge for researchers and security personals. The conventional systems for the detection and mitigation of such threats are mostly signature based systems. Major drawback of such systems are their inability to detect malware samples for which there is no signature available in their signature database. Such malwares are known as zero day malware. Moreover, more and more malware writers uses obfuscation technology such as polymorphic and metamorphic, packing, encryption, to avoid being detected by antivirus. Therefore, the traditional signature based detection system is neither effective nor efficient for the detection of zero-day malware. Hence to improve the effectiveness and efficiency of malware detection system we are using classification method based on structural information and behavioral specifications. In this paper we have used both static and dynamic analysis approaches. In static analysis we are extracting the features of an executable file followed by classification. In dynamic analysis we are taking the traces of executable files using NtTrace within controlled atmosphere. Experimental results obtained from our algorithm indicate that our proposed algorithm is effective in extracting malicious behavior of executables. Further it can also be used to detect malware variants.
Lecture Notes in Computer Science, 2012
Over the past years, we have experienced an increase in the quantity and complexity of malware binaries. This change has been fueled by the introduction of malware generation tools and reuse of different malcode modules. Recent malware appears to be highly modular and less functionally typified. A side-effect of this "composition" of components across different malware types, a growing number of new malware samples cannot be explicitly assigned to traditional classes defined by Anti-Virus (AV) vendors. Indeed, by nature, clustering techniques capture dominant behavior that could be a manifestation of only one of the malware component failing to reveal malware similarities that depend on other, less dominant components and other evolutionary traits. In this paper, we introduce a novel malware behavioral commonality analysis scheme that takes into consideration component-wise grouping, called behavioral mapping. Our effort attempts to shed light to malware behavioral relationships and go beyond simply clustering the malware into a family. To this end, we implemented a method for identifying soft clusters and reveal shared malware components and traits. Using our method, we demonstrate that a malware sample can belong to several groups (clusters), implying sharing of its respective components with other samples from the groups. We performed experiments with a large corpus of real-world malware data-sets and identified that we can successfully highlight malware component relationships across the existing AV malware families and variants.
2019
Malware detection plays a vital role in computer security. Modern machine learning approaches have been centered around domain knowledge for extracting malicious features. However, many potential features can be used, and it is time consuming and difficult to manually identify the best features, especially given the diverse nature of malware. In this paper, we propose Neurlux, a neural network for malware detection. Neurlux does not rely on any feature engineering, rather it learns automatically from dynamic analysis reports that detail behavioral information. Our model borrows ideas from the field of document classification, using word sequences present in the reports to predict if a report is from a malicious binary or not. We investigate the learned features of our model and show which components of the reports it tends to give the highest importance. Then, we evaluate our approach on two different datasets and report formats, showing that Neurlux improves on the state of the art and can effectively learn from the dynamic analysis reports. Furthermore, we show that our approach is portable to other malware analysis environments and generalizes to different datasets. CCS CONCEPTS • Security and privacy → Software and application security; • Computing methodologies → Neural networks.
2007
Automatic analysis of malicious binaries is necessary in order to scale with the rapid development and recovery of malware found in the wild. The results of automatic analysis are useful for creating defense systems and understanding the current capabilities of attackers.
Anubis is a dynamic malware analysis platform that executes submitted binaries in a controlled environment. To perform the analysis, the system monitors the invocation of important Windows API calls and system services, it records the network traffic, and it tracks data flows. For each submission, reports are generated that provide comprehensive reports about the activities of the binary under analysis. Anubis receives malware samples through a public web interface and a number of feeds from security organizations and anti-malware companies. Because the samples are collected from a wide range of users, the collected samples represent a comprehensive and diverse mix of malware found in the wild. In this paper, we aim to shed light on common malware behaviors. To this end, we evaluate the Anubis analysis results for almost one million malware samples, study trends and evolution of malicious behaviors over a period of almost two years, and examine the influence of code polymorphism on malware statistics.
IEEE Security & Privacy, 2011
2021
Over the past two decades, packed malware is always a veritable challenge to security analysts. Not only is determining the end of the unpacking increasingly difficult, but also advanced packers embed a variety of anti-analysis tricks to impede reverse engineering. As malware's APIs provide rich information about malicious behavior, one common anti-analysis strategy is API obfuscation, which removes the metadata of imported APIs from malware's PE header and complicates API name resolution from API callsites. In this way, even when security analysts obtain the unpacked code, a disassembler still fails to recognize imported API names, and the unpacked code cannot be successfully executed. Recently, generic binary unpacking has made breakthrough progress with noticeable performance improvement. However, reconstructing unpacked code's import tables, which is vital for further malware static/dynamic analyses, has largely been overlooked. Existing approaches are far from matur...
arXiv (Cornell University), 2018
The Cyber world is plagued with ever-evolving malware that readily infiltrates all defense mechanisms, operates viciously unbeknownst to the user and surreptitiously exfiltrate sensitive data. Understanding the inner workings of such malware provides a leverage to effectively combat them. This understanding, is pursued through dynamic analysis which is conducted manually or automatically. Malware authors accordingly, have devised and advanced evasion techniques to thwart or evade these analyses. In this paper, we present a comprehensive survey on malware dynamic analysis evasion techniques. In addition, we propose a detailed classification of these techniques and further demonstrate how their efficacy hold against different types of detection and analysis approach. Our observations attest that evasive behavior is mostly interested in detecting and evading sandboxes. The primary tactic of such malware we argue, is fingerprinting followed by new trends for reverse Turing test tactic which aims at detecting human interaction. Furthermore, we will posit that the current defensive strategies beginning with reactive methods to endeavors for more transparent analysis systems, are readily foiled by zeroday fingerprinting techniques or other evasion tactics such as stalling. Accordingly, we would recommend pursuit of more generic defensive strategies with emphasis on path exploration techniques that has the potential to thwart all the evasive tactics.
2010
This paper proposes a scalable approach for distinguishing malicious files from clean files by investigating the behavioural features using logs of various API calls. We also propose, as an alternative to the traditional method of manually identifying malware files, an automated classification system using runtime features of malware files. For both projects, we use an automated tool running in a virtual environment to extract API call features from executables and apply pattern recognition algorithms and statistical methods to differentiate between files. Our experimental results, based on a dataset of 1368 malware and 456 cleanware files, provide an accuracy of over 97% in distinguishing malware from cleanware. Our techniques provide a similar accuracy for classifying malware into families. In both cases, our results outperform comparable previously published techniques.
The volume and the sophistication of malware are continuously increasing and evolving. Automated dynamic malware analysis is a widely-adopted approach for detecting malicious software. However, many recent malware samples try to evade detection by identifying the presence of the analysis environment itself, and refraining from performing malicious actions. Because of the sophistication of the techniques used by the malware authors, so far the analysis and detection of evasive malware has been largely a manual process. One approach to automatic detection of these evasive malware samples is to execute the same sample in multiple analysis environments, and then compare its behaviors, in the assumption that a deviation in the behavior is evidence of an attempt to evade one or more analysis systems. For this reason, it is important to provide a reference system (often called bare-metal) in which the malware is analyzed without the use of any detectable component.
Leveraging Applications of Formal Methods, Verification and Validation. Modeling, 2018
This tutorial presents and motivates various malware detection tools and illustrates their usage on a clear example. We demonstrate how statically-extracted syntactic signatures can be used for quickly detecting simple variants of malware. Since such signatures can easily be obfuscated, we also present dynamically-extracted behavioral signatures which are obtained by running the malware in an isolated environment known as a sandbox. However, some malware can use sandbox detection to detect that they run in such an environment and so avoid exhibiting their malicious behavior. To counteract sandbox detection, we present concolic execution that can explore several paths of a binary. We conclude by showing how opaque predicates and JIT can be used to hinder concolic execution.
ACM Computing Surveys
The cyber world is plagued with ever-evolving malware that readily infiltrate all defense mechanisms, operate viciously unbeknownst to the user, and surreptitiously exfiltrate sensitive data. Understanding the inner workings of such malware provides a leverage to effectively combat them. This understanding is pursued often through dynamic analysis which is conducted manually or automatically. Malware authors accordingly, have devised and advanced evasion techniques to thwart or evade these analyses. In this article, we present a comprehensive survey on malware dynamic analysis evasion techniques. In addition, we propose a detailed classification of these techniques and further demonstrate how their efficacy holds against different types of detection and analysis approaches. Our observations attest that evasive behavior is mostly concerned with detecting and evading sandboxes. The primary tactic of such malware we argue is fingerprinting followed by new trends for reverse Turing test...
Computers & Security, 2021
As the malware research field became more established over the last two decades, new research questions arose, such as how to make malware research reproducible, how to bring scientific rigor to attack papers, or what is an appropriate malware dataset for relevant experimental results. The challenges these questions pose also brings pitfalls that affect the multiple malware research stakeholders. To help answering those questions and to highlight potential research pitfalls to be avoided, in this paper, we present a systematic literature review of 491 papers on malware research published in major security conferences between 2000 and 2018. We identified the most common pitfalls present in past literature and propose a method for assessing current (and future) malware research. Our goal is towards integrating science and engineering best practices to develop further, improved research by learning from issues present in the published body of work. As far as we know, this is the largest literature review of its kind and the first to summarize research pitfalls in a research methodology that avoids them. In total, we discovered 20 pitfalls that limit current research impact and reproducibility. The identified pitfalls range from (i) the lack of a proper threat model, that complicates paper's evaluation, to (ii) the use of closed-source solutions and private datasets, that limit reproducibility. We also report yet-to-be-overcome challenges that are inherent to the malware nature, such as non-deterministic analysis results. Based on our findings, we propose a set of actions to be taken by the malware research and development community for future work: (i) Consolidation of malware research as a field constituted of diverse research approaches (e.g., engineering solutions, offensive research, landscapes/observational studies, and network traffic/system traces analysis); (ii) design of engineering solutions with clearer, direct assumptions (e.g., positioning solutions as proofs-of-concept vs. deployable); (iii) design of experiments that reflects (and emphasizes) the target scenario for the proposed solution (e.g., corporation, user, country-wide); (iv) clearer exposition and discussion of limitations of used technologies and exercised norms/standards for research (e.g., the use of closedsource antiviruses as ground-truth). Hypothesis Definition & Research Requirements Background Research Solution Requirements Solution Design Solution Development / Prototyping Research Objective Definition Engineering Method Common Core Experiment Design Test of Hypothesis / Evaluation of Solution Analysis of Results Results align with Hypothesis / Requirements? Communicate Results
2006 22nd Annual Computer Security Applications Conference (ACSAC'06), 2006
Modern malware often hide the malicious portion of their program code by making it appear as data at compiletime and transforming it back into executable code at runtime. This obfuscation technique poses obstacles to researchers who want to understand the malicious behavior of new or unknown malware and to practitioners who want to create models of detection and methods of recovery. In this paper we propose a technique for automating the process of extracting the hidden-code bodies of this class of malware. Our approach is based on the observation that sequences of packed or hidden code in a malware instance can be made self-identifying when its runtime execution is checked against its static code model. In deriving our technique, we formally define the unpack-executing behavior that such malware exhibits and devise an algorithm for identifying and extracting its hidden-code. We also provide details of the implementation and evaluation of our extraction technique; the results from our experiments on several thousand malware binaries show our approach can be used to significantly reduce the time required to analyze such malware, and to improve the performance of malware detection tools.
Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007), 2007
Malicious software (or malware) has become a growing threat as malware writers have learned that signaturebased detectors can be easily evaded by "packing" the malicious payload in layers of compression or encryption. State-of-the-art malware detectors have adopted both static and dynamic techinques to recover the payload of packed malware, but unfortunately such techniques are highly ineffective. In this paper we propose a new technique, called OmniUnpack, to monitor the execution of a program in real-time and to detect when the program has removed the various layers of packing. OmniUnpack aids malware detection by directly providing to the detector the unpacked malicious payload. Experimental results demonstrate the effectiveness of our approach. OmniUnpack is able to deal with both known and unknown packing algorithms and introduces a low overhead (at most 11% for packed benign programs).
IJEAST, 2021
In recent times new Antivirus software are using Machine learning to make their detection even more sophisticated. Machine learning, reinforcement learning, and deep learning along with data analysis have made it possible to implement a dynamic analysis procedure to detect any malware. So in this paper, we will be introducing an algorithm by which we will not only be able to bypass signature-based detection by 'rephrasing the code' using CLP, along with the behavioral-based analysis, which are the most prominent methods for the job, but also will be attempting to go around the real-time monitoring and try to be undetected during the forensic investigation by clearing code footprint.
International Journal of Secure Software Engineering, 2011
This paper describes a research effort to use executable slicing as a pre-processing aid to improve the prediction performance of rogue software detection. The prediction technique used here is an information retrieval classifier known as cosine similarity that can be used to detect previously unknown, known or variances of known rogue software by applying the feature extraction technique of randomized projection. This paper provides direction in answering the question of is it possible to only use portions or subsets, known as slices, of an application to make a prediction on whether or not the software contents are rogue. This research extracts sections or slices from potentially rogue applications and uses these slices instead of the entire application to make a prediction. Results show promise when applying randomized projections to cosine similarity for the predictions, with as much as a 4% increase in prediction performance and a five-fold decrease in processing time when compared to using the entire application.
arXiv (Cornell University), 2017
In this work we introduce malware detection from raw byte sequences as a fruitful research area to the larger machine learning community. Building a neural network for such a problem presents a number of interesting challenges that have not occurred in tasks such as image processing or NLP. In particular, we note that detection from raw bytes presents a sequence problem with over two million time steps and a problem where batch normalization appear to hinder the learning process. We present our initial work in building a solution to tackle this problem, which has linear complexity dependence on the sequence length, and allows for interpretable sub-regions of the binary to be identified. In doing so we will discuss the many challenges in building a neural network to process data at this scale, and the methods we used to work around them.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.