Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2019
…
12 pages
1 file
Malware detection plays a vital role in computer security. Modern machine learning approaches have been centered around domain knowledge for extracting malicious features. However, many potential features can be used, and it is time consuming and difficult to manually identify the best features, especially given the diverse nature of malware. In this paper, we propose Neurlux, a neural network for malware detection. Neurlux does not rely on any feature engineering, rather it learns automatically from dynamic analysis reports that detail behavioral information. Our model borrows ideas from the field of document classification, using word sequences present in the reports to predict if a report is from a malicious binary or not. We investigate the learned features of our model and show which components of the reports it tends to give the highest importance. Then, we evaluate our approach on two different datasets and report formats, showing that Neurlux improves on the state of the art and can effectively learn from the dynamic analysis reports. Furthermore, we show that our approach is portable to other malware analysis environments and generalizes to different datasets. CCS CONCEPTS • Security and privacy → Software and application security; • Computing methodologies → Neural networks.
Android is liable to malware attacks because of its open architecture, large userbase and access to its code. The security investigation is depend upon the Dynamic analysis for the malware detection, in this the binary samples or we can say the system calls are analysed and runtime behavioural profile of the malicious apps is generated and analysed. Resulting report is then used to detect the malware and attribute threat types, using the manually chosen features. But due to the diversity of malware and execution environment it is not scalable, because for every new execution environment new feature need to be engineered manually. MalDy (mal die) is a portable malware detection and family threat attribution framework using supervised machine learning techniques. The explaination of MalDy portability is that the modelling of behavioural reports in sequence of words, together with advanced NLP and ML techniques for automatic engineering of relevant security measures to detect and attribute malware without the intervention of the investigator. More precisely the BOW-NLP is used to represent the behavioral report. On top of the BOW-NLP ML ensemble is constructed. MalDy is then evaluated on various datasets from different platforms and execution environment.
IRJET, 2020
Android is liable to malware attacks because of its open architecture, large userbase and access to its code. The security investigation is depend upon the Dynamic analysis for the malware detection, in this the binary samples or we can say the system calls are analysed and runtime behavioural profile of the malicious apps is generated and analysed. Resulting report is then used to detect the malware and attribute threat types, using the manually chosen features. But due to the diversity of malware and execution environment it is not scalable, because for every new execution environment new feature need to be engineered manually. MalDy (mal die) is a portable malware detection and family threat attribution framework using supervised machine learning techniques. The explaination of MalDy portability is that the modelling of behavioural reports in sequence of words, together with advanced NLP and ML techniques for automatic engineering of relevant security measures to detect and attribute malware without the intervention of the investigator. More precisely the BOW-NLP is used to represent the behavioral report. On top of the BOW-NLP ML ensemble is constructed. MalDy is then evaluated on various datasets from different platforms and execution environment.
2021
Nowadays, malware and malware incidents are increasing daily, even with various anti-viruses systems and malware detection or classification methodologies. Many static, dynamic, and hybrid techniques have been presented to detect malware and classify them into malware families. Dynamic and hybrid malware classification methods have advantages over static malware classification methods by being highly efficient. Since it is difficult to mask malware behavior while executing than its underlying code in static malware classification, machine learning techniques have been the main focus of the security experts to detect malware and determine their families dynamically. The rapid increase of malware also brings the necessity of recent and updated datasets of malicious software. We introduce two new, updated datasets in this work: One with 9,795 samples obtained and compiled from VirusSamples and the one with 14,616 samples from VirusShare. This paper also analyzes multi-class malware cla...
IRJET, 2021
Machine learning is amongst the most celebrated research avenues today and is growing as the harbinger of advancements in every field. It is receiving growing attention in the area of privacy and security for building robust systems. Malware ascription is a relatively unexplored area, and it is rather difficult to attribute malware and detect authorship. Our work focuses on leveraging machine learning models for malware detection by determining the relation between the training dataset and the output achieved. To this end, we develop three different datasets that include pure malware data, non-malware data, and obscure malware data. We present three different scenarios to train the model and test its effectiveness in a more simulated scenario to a more realistic one. In our model, we apply temporal-based methodologies to train and validate the classifier. Further, we study how much we can reduce the training dataset without compromising the optimal results. Upon applying a multi-layer approach, we improved our base model by 20%. Our reports are extremely useful in malware ascription.
Proceedings of the 35th Annual Computer Security Applications Conference, 2019
Malware detection plays a vital role in computer security. Modern machine learning approaches have been centered around domain knowledge for extracting malicious features. However, many potential features can be used, and it is time consuming and difficult to manually identify the best features, especially given the diverse nature of malware. In this paper, we propose Neurlux, a neural network for malware detection. Neurlux does not rely on any feature engineering, rather it learns automatically from dynamic analysis reports that detail behavioral information. Our model borrows ideas from the field of document classification, using word sequences present in the reports to predict if a report is from a malicious binary or not. We investigate the learned features of our model and show which components of the reports it tends to give the highest importance. Then, we evaluate our approach on two different datasets and report formats, showing that Neurlux improves on the state of the art and can effectively learn from the dynamic analysis reports. Furthermore, we show that our approach is portable to other malware analysis environments and generalizes to different datasets. • Security and privacy → Software and application security; • Computing methodologies → Neural networks.
Proceedings of the 7th International Conference on Information Systems Security and Privacy, 2021
Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on features such as opcode sequences, API calls, and byte n-grams, among many others. In this research, we consider opcode features. We implement hybrid machine learning techniques, where we engineer feature vectors by training hidden Markov models-a technique that we refer to as HMM2Vecand Word2Vec embeddings on these opcode sequences. The resulting HMM2Vec and Word2Vec embedding vectors are then used as features for classification algorithms. Specifically, we consider support vector machine (SVM), k-nearest neighbor (k-NN), random forest (RF), and convolutional neural network (CNN) classifiers. We conduct substantial experiments over a variety of malware families. Our experiments extend well beyond any previous work in this field.
Digital Investigation, 2018
Current malware detection and classification approaches generally rely on time consuming and knowledge intensive processes to extract patterns (signatures) and behaviors from malware, which are then used for identification. Moreover, these signatures are often limited to local, contiguous sequences within the data whilst ignoring their context in relation to each other and throughout the malware file as a whole. We present a Deep Learning based malware classification approach that requires no expert domain knowledge and is based on a purely data driven approach for complex pattern and feature identification.
Sensors
Every day, hundreds of thousands of malicious files are created to exploit zero-day vulnerabilities. Existing pattern-based antivirus solutions face difficulties in coping with such a large number of new malicious files. To solve this problem, artificial intelligence (AI)-based malicious file detection methods have been proposed. However, even if we can detect malicious files with high accuracy using deep learning, it is difficult to identify why files are malicious. In this study, we propose a malicious file feature extraction method based on attention mechanism. First, by adapting the attention mechanism, we can identify application program interface (API) system calls that are more important than others for determining whether a file is malicious. Second, we confirm that this approach yields an accuracy that is approximately 12% and 5% higher than a conventional AI-based detection model using convolutional neural networks and skip-connected long short-term memory-based detection ...
2021
Malware is a constant threat for the security of devices and users. Successful and automatic malware detection is a critical necessity [1]. Existing malware detection solutions cannot accurately characterize the behavior of a malware and, thereby, they rely on other indicators, e.g., digital signatures. Nevertheless, behavior-based detection is an active field of research since it can deal with zero-day malware. Although many proposals leveraging machine learning (ML) classifiers have been put forward, finding proper behavioral features is still an open problem. Existing solutions typically consider either static or dynamic software features. Static refers to the program syntax while dynamic refers to features observed at runtime. However, both of them suffer from limitations which impact on the effectiveness of the ML classification. Here we follow a different approach. We used symbolic execution to model features that denote the malware behavior in a more precise way. To this aim,...
Advances in Science, Technology and Engineering Systems Journal
Malware has always been a big problem for companies, government agencies, and individuals because people still use it as a primary tool to influence networks, applications, and computer operating systems to gain unilateral benefits. Until now, malware detection with heuristic and signature-based methods are still struggling to keep up with the evolution of malware. Machine learning is known to be able to automate the work needed to detect families of existing and newly discovered malware. Unfortunately, the machine learning method using Support Vector Machine (SVM) for detecting malware can only reach a low level of accuracy. In this work, we propose a dynamic analysis method and uses a system call sequence to monitor malware behavior. It uses the word2vec technique as word embedding and implements deep learning models, namely Long Short-Term Memory (LSTM) and Nested LSTM, as classifiers. To compare with existing machine learning approach, we also apply the Support Vector Machine (SVM) as a benchmark method. The Nested LSTM gets an accuracy of 93.11%, while the LSTM gets the best accuracy of 98.61%. The LSTM also achieved the best performance in terms of average precision at 97.57%, the average recall at 97.29%, and the average score of f1 at 97.43%. We have found that our model is lightweight but powerful for detecting malware with significant accuracy.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International journal of innovative research in engineering and management, 2022
2021
2017 IEEE Symposium Series on Computational Intelligence (SSCI), 2017
Proceedings of the 5th International Conference on Information Systems Security and Privacy, 2019
International Journal of Electrical and Computer Engineering (IJECE), 2024
International Journal of Distributed Sensor Networks, 2019
2014 Recent Advances in Engineering and Computational Sciences (RAECS), 2014
arXiv (Cornell University), 2021